Remove Subversion metadata

These files were committed accidentally.
This commit is contained in:
Florian Weimer 2013-04-02 14:25:44 +02:00
parent c2251f3022
commit c861995515
164 changed files with 0 additions and 21568 deletions

View file

@ -1,23 +0,0 @@
K 25
svn:wc:ra_dav:version-url
V 65
/repos/product-security/!svn/ver/302/defensive-coding/trunk/en-US
END
Defensive_Coding.ent
K 25
svn:wc:ra_dav:version-url
V 85
/repos/product-security/!svn/ver/64/defensive-coding/trunk/en-US/Defensive_Coding.ent
END
Book_Info.xml
K 25
svn:wc:ra_dav:version-url
V 79
/repos/product-security/!svn/ver/281/defensive-coding/trunk/en-US/Book_Info.xml
END
Defensive_Coding.xml
K 25
svn:wc:ra_dav:version-url
V 86
/repos/product-security/!svn/ver/281/defensive-coding/trunk/en-US/Defensive_Coding.xml
END

View file

@ -1,145 +0,0 @@
10
dir
305
https://svn.devel.redhat.com/repos/product-security/defensive-coding/trunk/en-US
https://svn.devel.redhat.com/repos/product-security
2013-01-16T14:32:22.318444Z
302
fweimer@REDHAT.COM
9bd5cf0f-f2b3-0410-b1a9-d5c590f50bf1
Defensive_Coding.ent
file
2013-01-10T17:17:49.038814Z
240837ebc2948c0404c903c2b25ee90a
2012-07-16T12:32:39.042163Z
64
fweimer@REDHAT.COM
54
Python
dir
C
dir
CXX
dir
Book_Info.xml
file
2013-01-10T17:17:49.038814Z
0a9c514a2db8c6783b91a20eea2918c2
2012-12-13T13:25:23.103424Z
281
fweimer@REDHAT.COM
has-props
999
Tasks
dir
Defensive_Coding.xml
file
2013-01-10T17:17:49.038814Z
81327d12a4be4bc9189fbd3eea5c5215
2012-12-13T13:25:23.103424Z
281
fweimer@REDHAT.COM
has-props
1522
Features
dir

View file

@ -1,5 +0,0 @@
K 13
svn:mime-type
V 8
text/xml
END

View file

@ -1,5 +0,0 @@
K 13
svn:mime-type
V 8
text/xml
END

View file

@ -1,29 +0,0 @@
<?xml version='1.0' encoding='UTF-8' ?>
<!DOCTYPE bookinfo PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
]>
<bookinfo id="book-Defensive_Coding">
<title>Defensive Coding</title>
<subtitle>A Guide to Improving Software Security</subtitle>
<edition>1.0</edition>
<issuenum>1.0</issuenum>
<pubsnumber>20</pubsnumber>
<productname>Internal</productname>
<productnumber>6.4</productnumber>
<abstract>
<para>
This document provides guidelines for improving software
security through secure coding. It covers common
programming languages and libraries, and focuses on
concrete recommendations.
</para>
</abstract>
<corpauthor>
<inlinemediaobject>
<imageobject>
<imagedata fileref="Common_Content/images/redhat-logo.svg" format="SVG" />
</imageobject>
</inlinemediaobject>
</corpauthor>
<xi:include href="Common_Content/Legal_Notice.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
</bookinfo>

View file

@ -1,2 +0,0 @@
<!ENTITY YEAR "2012">
<!ENTITY HOLDER "Red Hat, Inc">

View file

@ -1,26 +0,0 @@
<?xml version='1.0' encoding='utf-8' ?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
<book>
<xi:include href="Book_Info.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
<part>
<title>Programming Languages</title>
<xi:include href="C/C.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
<xi:include href="CXX/CXX.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
<xi:include href="Python/Language.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
</part>
<part>
<title>Specific Programming Tasks</title>
<xi:include href="Tasks/Library_Design.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
<xi:include href="Tasks/Descriptors.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
<xi:include href="Tasks/File_System.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
<xi:include href="Tasks/Temporary_Files.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
<xi:include href="Tasks/Processes.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
<xi:include href="Tasks/Serialization.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
<xi:include href="Tasks/Cryptography.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
</part>
<part>
<title>Implementing Security Features</title>
<xi:include href="Features/Authentication.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
<xi:include href="Features/TLS.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
</part>
</book>

View file

@ -1,35 +0,0 @@
K 25
svn:wc:ra_dav:version-url
V 67
/repos/product-security/!svn/ver/292/defensive-coding/trunk/en-US/C
END
C.xml
K 25
svn:wc:ra_dav:version-url
V 73
/repos/product-security/!svn/ver/281/defensive-coding/trunk/en-US/C/C.xml
END
Allocators.xml
K 25
svn:wc:ra_dav:version-url
V 82
/repos/product-security/!svn/ver/281/defensive-coding/trunk/en-US/C/Allocators.xml
END
Language.xml
K 25
svn:wc:ra_dav:version-url
V 80
/repos/product-security/!svn/ver/281/defensive-coding/trunk/en-US/C/Language.xml
END
schemas.xml
K 25
svn:wc:ra_dav:version-url
V 79
/repos/product-security/!svn/ver/292/defensive-coding/trunk/en-US/C/schemas.xml
END
Libc.xml
K 25
svn:wc:ra_dav:version-url
V 76
/repos/product-security/!svn/ver/281/defensive-coding/trunk/en-US/C/Libc.xml
END

View file

@ -1,6 +0,0 @@
K 10
svn:ignore
V 9
snippets
END

View file

@ -1,198 +0,0 @@
10
dir
305
https://svn.devel.redhat.com/repos/product-security/defensive-coding/trunk/en-US/C
https://svn.devel.redhat.com/repos/product-security
2012-12-14T10:18:44.472257Z
292
fweimer@REDHAT.COM
has-props
9bd5cf0f-f2b3-0410-b1a9-d5c590f50bf1
C.xml
file
2013-01-10T17:17:40.330763Z
152059b0949055c27918169fb0406ee5
2012-12-13T13:25:23.103424Z
281
fweimer@REDHAT.COM
has-props
503
Allocators.xml
file
2013-01-10T17:17:40.330763Z
483d91643e7a6a8d6545649b2fa0b144
2012-12-13T13:25:23.103424Z
281
fweimer@REDHAT.COM
has-props
8762
Language.xml
file
2013-01-10T17:17:40.330763Z
11fb84b7e9a7c76cfc95dbe918118998
2012-12-13T13:25:23.103424Z
281
fweimer@REDHAT.COM
has-props
6528
schemas.xml
file
2013-01-10T17:17:40.331763Z
769bc2635d36b318161574a1adf2f6e7
2012-12-14T10:18:44.472257Z
292
fweimer@REDHAT.COM
has-props
150
Libc.xml
file
2013-01-10T17:17:40.331763Z
6e4999f743167fd393cbd521fd4d662c
2012-12-13T13:25:23.103424Z
281
fweimer@REDHAT.COM
has-props
8733

View file

@ -1,5 +0,0 @@
K 13
svn:mime-type
V 8
text/xml
END

View file

@ -1,5 +0,0 @@
K 13
svn:mime-type
V 8
text/xml
END

View file

@ -1,5 +0,0 @@
K 13
svn:mime-type
V 8
text/xml
END

View file

@ -1,5 +0,0 @@
K 13
svn:mime-type
V 8
text/xml
END

View file

@ -1,5 +0,0 @@
K 13
svn:mime-type
V 8
text/xml
END

View file

@ -1,207 +0,0 @@
<?xml version='1.0' encoding='utf-8' ?>
<!DOCTYPE section PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
]>
<section id="sect-Defensive_Coding-C-Allocators">
<title>Memory allocators</title>
<section>
<title><function>malloc</function> and related functions</title>
<para>
The C library interfaces for memory allocation are provided by
<function>malloc</function>, <function>free</function> and
<function>realloc</function>, and the
<function>calloc</function> function. In addition to these
generic functions, there are derived functions such as
<function>strdup</function> which perform allocation using
<function>malloc</function> internally, but do not return
untyped heap memory (which could be used for any object).
</para>
<para>
The C compiler knows about these functions and can use their
expected behavior for optimizations. For instance, the compiler
assumes that an existing pointer (or a pointer derived from an
existing pointer by arithmetic) will not point into the memory
area returned by <function>malloc</function>.
</para>
<para>
If the allocation fails, <function>realloc</function> does not
free the old pointer. Therefore, the idiom <literal>ptr =
realloc(ptr, size);</literal> is wrong because the memory
pointed to by <literal>ptr</literal> leaks in case of an error.
</para>
<section>
<title>Use-after-free errors</title>
<para>
After <function>free</function>, the pointer is invalid.
Further pointer dereferences are not allowed (and are usually
detected by <application>valgrind</application>). Less obvious
is that any <emphasis>use</emphasis> of the old pointer value is
not allowed, either. In particular, comparisons with any other
pointer (or the null pointer) are undefined according to the C
standard.
</para>
<para>
The same rules apply to <function>realloc</function> if the
memory area cannot be enlarged in-place. For instance, the
compiler may assume that a comparison between the old and new
pointer will always return false, so it is impossible to detect
movement this way.
</para>
</section>
<section>
<title>Handling memory allocation errors</title>
<para>
Recovering from out-of-memory errors is often difficult or even
impossible. In these cases, <function>malloc</function> and
other allocation functions return a null pointer. Dereferencing
this pointer lead to a crash. Such dereferences can even be
exploitable for code execution if the dereference is combined
with an array subscript.
</para>
<para>
In general, if you cannot check all allocation calls and
handle failure, you should abort the program on allocation
failure, and not rely on the null pointer dereference to
terminate the process. See
<xref
linkend="sect-Defensive_Coding-Tasks-Serialization-Decoders"/>
for related memory allocation concerns.
</para>
</section>
</section>
<section id="sect-Defensive_Coding-C-Allocators-alloca">
<title><function>alloca</function> and other forms of stack-based
allocation</title>
<para>
Allocation on the stack is risky because stack overflow checking
is implicit. There is a guard page at the end of the memory
area reserved for the stack. If the program attempts to read
from or write to this guard page, a <literal>SIGSEGV</literal>
signal is generated and the program typically terminates.
</para>
<para>
This is sufficient for detecting typical stack overflow
situations such as unbounded recursion, but it fails when the
stack grows in increments larger than the size of the guard
page. In this case, it is possible that the stack pointer ends
up pointing into a memory area which has been allocated for a
different purposes. Such misbehavior can be exploitable.
</para>
<para>
A common source for large stack growth are calls to
<function>alloca</function> and related functions such as
<function>strdupa</function>. These functions should be avoided
because of the lack of error checking. (They can be used safely
if the allocated size is less than the page size (typically,
4096 bytes), but this case is relatively rare.) Additionally,
relying on <function>alloca</function> makes it more difficult
to reorgnize the code because it is not allowed to use the
pointer after the function calling <function>alloca</function>
has returned, even if this function has been inlined into its
caller.
</para>
<para>
Similar concerns apply to <emphasis>variable-length
arrays</emphasis> (VLAs), a feature of the C99 standard which
started as a GNU extension. For large objects exceeding the
page size, there is no error checking, either.
</para>
<para>
In both cases, negative or very large sizes can trigger a
stack-pointer wraparound, and the stack pointer and end up
pointing into caller stack frames, which is fatal and can be
exploitable.
</para>
<para>
If you want to use <function>alloca</function> or VLAs for
performance reasons, consider using a small on-stack array (less
than the page size, large enough to fulfill most requests). If
the requested size is small enough, use the on-stack array.
Otherwise, call <function>malloc</function>. When exiting the
function, check if <function>malloc</function> had been called,
and free the buffer as needed.
</para>
</section>
<section id="sect-Defensive_Coding-C-Allocators-Arrays">
<title>Array allocation</title>
<para>
When allocating arrays, it is important to check for overflows.
The <function>calloc</function> function performs such checks.
</para>
<para>
If <function>malloc</function> or <function>realloc</function>
is used, the size check must be written manually. For instance,
to allocate an array of <literal>n</literal> elements of type
<literal>T</literal>, check that the requested size is not
greater than <literal>n / sizeof(T)</literal>.
</para>
</section>
<section>
<title>Custom memory allocators</title>
<para>
Custom memory allocates come in two forms: replacements for
<function>malloc</function>, and completely different interfaces
for memory management. Both approaches can reduce the
effectiveness of <application>valgrind</application> and similar
tools, and the heap corruption detection provided by GNU libc, so
they should be avoided.
</para>
<para>
Memory allocators are difficult to write and contain many
performance and security pitfalls.
</para>
<itemizedlist>
<listitem>
<para>
When computing array sizes or rounding up allocation
requests (to the next allocation granularity, or for
alignment purposes), checks for arithmetic overflow are
required.
</para>
</listitem>
<listitem>
<para>
Size computations for array allocations need overflow
checking. See <xref
linkend="sect-Defensive_Coding-C-Allocators-Arrays"/>.
</para>
</listitem>
<listitem>
<para>
It can be difficult to beat well-tuned general-purpose
allocators. In micro-benchmarks, pool allocators can show
huge wins, and size-specific pools can reduce internal
fragmentation. But often, utilization of individual pools
is poor, and
</para>
</listitem>
</itemizedlist>
</section>
<section>
<title>Conservative garbage collection</title>
<para>
Garbage collection can be an alternative to explicit memory
management using <function>malloc</function> and
<function>free</function>. The Boehm-Dehmers-Weiser allocator
can be used from C programs, with minimal type annotations.
Performance is competitive with <function>malloc</function> on
64-bit architectures, especially for multi-threaded programs.
The stop-the-world pauses may be problematic for some real-time
applications, though.
</para>
<para>
However, using a conservative garbage collector may reduce
opertunities for code reduce because once one library in a
program uses garbage collection, the whole process memory needs
to be subject to it, so that no pointers are missed. The
Boehm-Dehmers-Weiser collector also reserves certain signals for
internal use, so it is not fully transparent to the rest of the
program.
</para>
</section>
</section>

View file

@ -1,11 +0,0 @@
<?xml version='1.0' encoding='utf-8' ?>
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
]>
<chapter id="chap-Defensive_Coding-C">
<title>The C Programming Language</title>
<xi:include href="Language.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
<xi:include href="Libc.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
<xi:include href="Allocators.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
</chapter>

View file

@ -1,150 +0,0 @@
<?xml version='1.0' encoding='utf-8' ?>
<!DOCTYPE section PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
]>
<section id="sect-Defensive_Coding-C-Language">
<title>The core language</title>
<para>
C provides no memory safety. Most recommendations in this section
deal with this aspect of the language.
</para>
<section id="sect-Defensive_Coding-C-Undefined">
<title>Undefined behavior</title>
<para>
Some C constructs are defined to be undefined by the C standard.
This does not only mean that the standard does not describe
what happens when the construct is executed. It also allows
optimizing compilers such as GCC to assume that this particular
construct is never reached. In some cases, this has caused
GCC to optimize security checks away. (This is not a flaw in GCC
or the C language. But C certainly has some areas which are more
difficult to use than others.)
</para>
<para>
Common sources of undefined behavior are:
</para>
<itemizedlist>
<listitem><para>out-of-bounds array accesses</para></listitem>
<listitem><para>null pointer dereferences</para></listitem>
<listitem><para>overflow in signed integer arithmetic</para></listitem>
</itemizedlist>
</section>
<section id="sect-Defensive_Coding-C-Pointers">
<title>Recommendations for pointers and array handling</title>
<para>
Always keep track of the size of the array you are working with.
Often, code is more obviously correct when you keep a pointer
past the last element of the array, and calculate the number of
remaining elements by substracting the current position from
that pointer. The alternative, updating a separate variable
every time when the position is advanced, is usually less
obviously correct.
</para>
<para>
<xref linkend="ex-Defensive_Coding-C-Pointers-remaining"/>
shows how to extract Pascal-style strings from a character
buffer. The two pointers kept for length checks are
<varname>inend</varname> and <varname>outend</varname>.
<varname>inp</varname> and <varname>outp</varname> are the
respective positions.
The number of input bytes is checked using the expression
<literal>len > (size_t)(inend - inp)</literal>.
The cast silences a compiler warning;
<varname>inend</varname> is always larger than
<varname>inp</varname>.
</para>
<example id="ex-Defensive_Coding-C-Pointers-remaining">
<title>Array processing in C</title>
<xi:include href="snippets/Pointers-remaining.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
<para>
It is important that the length checks always have the form
<literal>len > (size_t)(inend - inp)</literal>, where
<varname>len</varname> is a variable of type
<type>size_t</type> which denotes the <emphasis>total</emphasis>
number of bytes which are about to be read or written next. In
general, it is not safe to fold multiple such checks into one,
as in <literal>len1 + len2 > (size_t)(inend - inp)</literal>,
because the expression on the left can overflow or wrap around
(see <xref linkend="sect-Defensive_Coding-C-Arithmetic"/>), and it
no longer reflects the number of bytes to be processed.
</para>
</section>
<section id="sect-Defensive_Coding-C-Arithmetic">
<title>Recommendations for integer arithmetic</title>
<para>
Overflow in signed integer arithmetic is undefined. This means
that it is not possible to check for overflow after it happened,
see <xref linkend="ex-Defensive_Coding-C-Arithmetic-bad"/>.
</para>
<example id="ex-Defensive_Coding-C-Arithmetic-bad">
<title>Incorrect overflow detection in C</title>
<xi:include href="snippets/Arithmetic-add.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
<para>
The following approaches can be used to check for overflow,
without actually causing it.
</para>
<itemizedlist>
<listitem>
<para>
Use a wider type to perform the calculation, check that the
result is within bounds, and convert the result to the
original type. All intermediate results must be checked in
this way.
</para>
</listitem>
<listitem>
<para>
Perform the calculation in the corresponding unsigned type
and use bit fiddling to detect the overflow.
</para>
</listitem>
<listitem>
<para>
Compute bounds for acceptable input values which are known
to avoid overflow, and reject other values. This is the
preferred way for overflow checking on multiplications,
see <xref linkend="ex-Defensive_Coding-C-Arithmetic-mult"/>.
<!-- This approach can result in bogus compiler warnings
with signed types:
http://gcc.gnu.org/bugzilla/post_bug.cgi -->
</para>
</listitem>
</itemizedlist>
<example id="ex-Defensive_Coding-C-Arithmetic-mult">
<title>Overflow checking for unsigned multiplication</title>
<xi:include href="snippets/Arithmetic-mult.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
<para>
Basic arithmetic operations a commutative, so for bounds checks,
there are two different but mathematically equivalent
expressions. Sometimes, one of the expressions results in
better code because parts of it can be reduced to a constant.
This applies to overflow checks for multiplication <literal>a *
b</literal> involving a constant <literal>a</literal>, where the
expression is reduced to <literal>b &gt; C</literal> for some
constant <literal>C</literal> determined at compile time. The
other expression, <literal>b &amp;&amp; a > ((unsigned)-1) /
b</literal>, is more difficult to optimize at compile time.
</para>
<para>
When a value is converted to a signed integer, GCC always
chooses the result based on 2's complement arithmetic. This GCC
extension (which is also implemented by other compilers) helps a
lot when implementing overflow checks.
</para>
<para>
Legacy code should be compiled with the <option>-fwrapv</option>
GCC option. As a result, GCC will provide 2's complement
semantics for integer arithmetic, including defined behavior on
integer overflow.
</para>
</section>
</section>

View file

@ -1,227 +0,0 @@
<?xml version='1.0' encoding='utf-8' ?>
<!DOCTYPE section PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
]>
<section id="sect-Defensive_Coding-C-Libc">
<title>The C standard library</title>
<para>
Parts of the C standard library (and the UNIX and GNU extensions)
are difficult to use, so you shoud avoid them.
</para>
<para>
Please check the applicable documentation before using the
recommended replacements. Many of these functions allocate
buffers using <function>malloc</function> which your code must
deallocate explicitly using <function>free</function>.
</para>
<section id="sect-Defensive_Coding-C-Absolutely-Banned">
<title>Absolutely banned interfaces</title>
<para>
The functions listed below must not be used because they are
almost always unsafe. Use the indicated replacements instead.
</para>
<itemizedlist>
<listitem><para><function>gets</function>
⟶ <function>fgets</function></para></listitem>
<listitem><para><function>getwd</function>
⟶ <function>getcwd</function>
or <function>get_current_dir_name</function></para></listitem>
<listitem>
<para>
<function>readdir_r</function> ⟶ <function>readdir</function>
<!-- It is quite complicated to allocate a properly-sized
buffer for use with readdir_r, and readdir provides
sufficient thread safety guarantees. -->
<!-- ??? Add File_System cross-reference -->
</para>
</listitem>
<listitem>
<para>
<function>realpath</function> (with a non-NULL second parameter)
⟶ <function>realpath</function> with NULL as the second parameter,
or <function>canonicalize_file_name</function>
<!-- It is complicated to allocate a properly-sized buffer
for use with realpath. -->
<!-- ??? Add File_System cross-reference -->
</para>
</listitem>
</itemizedlist>
<para>
The constants listed below must not be used, either. Instead,
code must allocate memory dynamically and use interfaces with
length checking.
</para>
<itemizedlist>
<listitem>
<para>
<literal>NAME_MAX</literal> (limit not actually enforced by
the kernel)
</para>
</listitem>
<listitem>
<para>
<literal>PATH_MAX</literal> (limit not actually enforced by
the kernel)
</para>
</listitem>
<listitem>
<para>
<literal>_PC_NAME_MAX</literal> (This limit, returned by the
<function>pathconf</function> function, is not enforced by
the kernel.)
</para>
</listitem>
<listitem>
<para>
<literal>_PC_PATH_MAX</literal> (This limit, returned by the
<function>pathconf</function> function, is not enforced by
the kernel.)
</para>
</listitem>
</itemizedlist>
<para>
The following structure members must not be used.
</para>
<itemizedlist>
<listitem>
<para>
<literal>f_namemax</literal> in <literal>struct
statvfs</literal> (limit not actually enforced by the kernel,
see <literal>_PC_NAME_MAX</literal> above)
</para>
</listitem>
</itemizedlist>
</section>
<section id="sect-Defensive_Coding-C-Avoid">
<title>Functions to avoid</title>
<para>
The following string manipulation functions can be used securely
in principle, but their use should be avoided because they are
difficult to use correctly. Calls to these functions can be
replaced with <function>asprintf</function> or
<function>vasprintf</function>. (For non-GNU targets, these
functions are available from Gnulib.) In some cases, the
<function>snprintf</function> function might be a suitable
replacement, see <xref
linkend="sect-Defensive_Coding-C-String-Functions-Length"/>.
</para>
<itemizedlist>
<listitem><para><function>sprintf</function></para></listitem>
<listitem><para><function>strcat</function></para></listitem>
<listitem><para><function>strcpy</function></para></listitem>
<listitem><para><function>vsprintf</function></para></listitem>
</itemizedlist>
<para>
Use the indicated replacements for the functions below.
</para>
<itemizedlist>
<listitem>
<para>
<function>alloca</function> ⟶
<function>malloc</function> and <function>free</function>
(see <xref linkend="sect-Defensive_Coding-C-Allocators-alloca"/>)
</para>
</listitem>
<listitem>
<para>
<function>putenv</function> ⟶
explicit <varname>envp</varname> argument in process creation
(see <xref linkend="sect-Defensive_Coding-Tasks-Processes-environ"/>)
</para>
</listitem>
<listitem>
<para>
<function>setenv</function> ⟶
explicit <varname>envp</varname> argument in process creation
(see <xref linkend="sect-Defensive_Coding-Tasks-Processes-environ"/>)
</para>
</listitem>
<listitem>
<para>
<function>strdupa</function> ⟶
<function>strdup</function> and <function>free</function>
(see <xref linkend="sect-Defensive_Coding-C-Allocators-alloca"/>)
</para>
</listitem>
<listitem>
<para>
<function>strndupa</function> ⟶
<function>strndup</function> and <function>free</function>
(see <xref linkend="sect-Defensive_Coding-C-Allocators-alloca"/>)
</para>
</listitem>
<listitem>
<para>
<function>system</function> ⟶
<function>posix_spawn</function>
or <function>fork</function>/<function>execve</function>/
(see <xref linkend="sect-Defensive_Coding-Tasks-Processes-execve"/>)
</para>
</listitem>
<listitem>
<para>
<function>unsetenv</function> ⟶
explicit <varname>envp</varname> argument in process creation
(see <xref linkend="sect-Defensive_Coding-Tasks-Processes-environ"/>)
</para>
</listitem>
</itemizedlist>
</section>
<section id="sect-Defensive_Coding-C-String-Functions-Length">
<title>String Functions With Explicit Length Arguments</title>
<para>
The <function>snprintf</function> function provides a way to
construct a string in a statically-sized buffer. (If the buffer
size is dynamic, use <function>asprintf</function> instead.)
</para>
<informalexample>
<xi:include href="snippets/String-Functions-snprintf.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</informalexample>
<para>
The second argument to the <function>snprintf</function> should
always be the size of the buffer in the first argument (which
should be a character array). Complex pointer and length
arithmetic can introduce errors and nullify the security
benefits of <function>snprintf</function>. If you need to
construct a string iteratively, by repeatedly appending
fragments, consider constructing the string on the heap,
increasing the buffer with <function>realloc</function> as
needed. (<function>snprintf</function> does not support
overlapping the result buffer with argument strings.)
</para>
<para>
If you use <function>vsnprintf</function> (or
<function>snprintf</function>) with a format string which is not
a constant, but a function argument, it is important to annotate
the function with a <literal>format</literal> function
attribute, so that GCC can warn about misuse of your function
(see <xref
linkend="ex-Defensive_Coding-C-String-Functions-format-Attribute"/>).
</para>
<example id="ex-Defensive_Coding-C-String-Functions-format-Attribute">
<title>The <literal>format</literal> function attribute</title>
<xi:include href="snippets/String-Functions-format.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
<para>
There are other functions which operator on NUL-terminated
strings and take a length argument which affects the number of
bytes written to the destination: <function>strncpy</function>,
<function>strncat</function>, and <function>stpncpy</function>.
These functions do not ensure that the result string is
NUL-terminated. For <function>strncpy</function>,
NUL termination can be added this way:
</para>
<informalexample>
<xi:include href="snippets/String-Functions-strncpy.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</informalexample>
<para>
Some systems support <function>strlcpy</function> and
<function>strlcat</function> functions which behave this way,
but these functions are not part of GNU libc. Using
<function>snprintf</function> with a suitable format string is a
simple (albeit slightly slower) replacement.
</para>
</section>
</section>

View file

@ -1,4 +0,0 @@
<?xml version="1.0"?>
<locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
<include rules="../../schemas.xml"/>
</locatingRules>

View file

@ -1,29 +0,0 @@
K 25
svn:wc:ra_dav:version-url
V 69
/repos/product-security/!svn/ver/292/defensive-coding/trunk/en-US/CXX
END
CXX.xml
K 25
svn:wc:ra_dav:version-url
V 77
/repos/product-security/!svn/ver/281/defensive-coding/trunk/en-US/CXX/CXX.xml
END
Language.xml
K 25
svn:wc:ra_dav:version-url
V 82
/repos/product-security/!svn/ver/281/defensive-coding/trunk/en-US/CXX/Language.xml
END
schemas.xml
K 25
svn:wc:ra_dav:version-url
V 81
/repos/product-security/!svn/ver/292/defensive-coding/trunk/en-US/CXX/schemas.xml
END
Std.xml
K 25
svn:wc:ra_dav:version-url
V 77
/repos/product-security/!svn/ver/281/defensive-coding/trunk/en-US/CXX/Std.xml
END

View file

@ -1,6 +0,0 @@
K 10
svn:ignore
V 9
snippets
END

View file

@ -1,164 +0,0 @@
10
dir
305
https://svn.devel.redhat.com/repos/product-security/defensive-coding/trunk/en-US/CXX
https://svn.devel.redhat.com/repos/product-security
2012-12-14T10:18:44.472257Z
292
fweimer@REDHAT.COM
has-props
9bd5cf0f-f2b3-0410-b1a9-d5c590f50bf1
CXX.xml
file
2013-01-10T17:17:40.360763Z
b0f0bf8b20378408157b933ace95025b
2012-12-13T13:25:23.103424Z
281
fweimer@REDHAT.COM
has-props
419
Language.xml
file
2013-01-10T17:17:40.361763Z
0c223f5c8e653b24ad9ee512a9347ff6
2012-12-13T13:25:23.103424Z
281
fweimer@REDHAT.COM
has-props
7232
schemas.xml
file
2013-01-10T17:17:40.361763Z
769bc2635d36b318161574a1adf2f6e7
2012-12-14T10:18:44.472257Z
292
fweimer@REDHAT.COM
has-props
150
Std.xml
file
2013-01-10T17:17:40.362763Z
43d4998b7a340602a1cfb058cac483c9
2012-12-13T13:25:23.103424Z
281
fweimer@REDHAT.COM
has-props
1392

View file

@ -1,5 +0,0 @@
K 13
svn:mime-type
V 8
text/xml
END

View file

@ -1,5 +0,0 @@
K 13
svn:mime-type
V 8
text/xml
END

View file

@ -1,5 +0,0 @@
K 13
svn:mime-type
V 8
text/xml
END

View file

@ -1,5 +0,0 @@
K 13
svn:mime-type
V 8
text/xml
END

View file

@ -1,10 +0,0 @@
<?xml version='1.0' encoding='utf-8' ?>
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
]>
<chapter id="chap-Defensive_Coding-CXX">
<title>The C++ Programming Language</title>
<xi:include href="Language.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
<xi:include href="Std.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
</chapter>

View file

@ -1,186 +0,0 @@
<?xml version='1.0' encoding='utf-8' ?>
<!DOCTYPE section PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
]>
<section id="sect-Defensive_Coding-CXX-Language">
<title>The core language</title>
<para>
C++ includes a large subset of the C language. As far as the C
subset is used, the recommendations in <xref
linkend="chap-Defensive_Coding-C"/> apply.
</para>
<section>
<title>Array allocation with <literal>operator new[]</literal></title>
<para>
For very large values of <literal>n</literal>, an expression
like <literal>new T[n]</literal> can return a pointer to a heap
region which is too small. In other words, not all array
elements are actually backed with heap memory reserved to the
array. Current GCC versions generate code that performs a
computation of the form <literal>sizeof(T) * size_t(n) +
cookie_size</literal>, where <literal>cookie_size</literal> is
currently at most 8. This computation can overflow, and
GCC-generated code does not detect this.
</para>
<para>
The <literal>std::vector</literal> template can be used instead
an explicit array allocation. (The GCC implementation detects
overflow internally.)
</para>
<para>
If there is no alternative to <literal>operator new[]</literal>,
code which allocates arrays with a variable length must check
for overflow manually. For the <literal>new T[n]</literal>
example, the size check could be <literal>n || (n > 0 &amp;&amp;
n &gt; (size_t(-1) - 8) / sizeof(T))</literal>. (See <xref
linkend="sect-Defensive_Coding-C-Arithmetic"/>.) If there are
additional dimensions (which must be constants according to the
C++ standard), these should be included as factors in the
divisor.
</para>
<para>
These countermeasures prevent out-of-bounds writes and potential
code execution. Very large memory allocations can still lead to
a denial of service. <xref
linkend="sect-Defensive_Coding-Tasks-Serialization-Decoders"/>
contains suggestions for mitigating this problem when processing
untrusted data.
</para>
<para>
See <xref linkend="sect-Defensive_Coding-C-Allocators-Arrays"/>
for array allocation advice for C-style memory allocation.
</para>
</section>
<section>
<title>Overloading</title>
<para>
Do not overload functions with versions that have different
security characteristics. For instance, do not implement a
function <function>strcat</function> which works on
<type>std::string</type> arguments. Similarly, do not name
methods after such functions.
</para>
</section>
<section>
<title>ABI compatibility and preparing for security updates</title>
<para>
A stable binary interface (ABI) is vastly preferred for security
updates. Without a stable ABI, all reverse dependencies need
recompiling, which can be a lot of work and could even be
impossible in some cases. Ideally, a security update only
updates a single dynamic shared object, and is picked up
automatically after restarting affected processes.
</para>
<para>
Outside of extremely performance-critical code, you should
ensure that a wide range of changes is possible without breaking
ABI. Some very basic guidelines are:
</para>
<itemizedlist>
<listitem>
<para>
Avoid inline functions.
</para>
</listitem>
<listitem>
<para>
Use the pointer-to-implementation idiom.
</para>
</listitem>
<listitem>
<para>
Try to avoid templates. Use them if the increased type
safety provides a benefit to the programmer.
</para>
</listitem>
<listitem>
<para>
Move security-critical code out of templated code, so that
it can be patched in a central place if necessary.
</para>
</listitem>
</itemizedlist>
<para>
The KDE project publishes a document with more extensive
guidelines on ABI-preserving changes to C++ code, <ulink
url="http://techbase.kde.org/Policies/Binary_Compatibility_Issues_With_C++">Policies/Binary
Compatibility Issues With C++</ulink>
(<emphasis>d-pointer</emphasis> refers to the
pointer-to-implementation idiom).
</para>
</section>
<section id="sect-Defensive_Coding-CXX-Language-CXX11">
<title>C++0X and C++11 support</title>
<para>
GCC offers different language compatibility modes:
</para>
<itemizedlist>
<listitem>
<para>
<option>-std=c++98</option> for the original 1998 C++
standard
</para>
</listitem>
<listitem>
<para>
<option>-std=c++03</option> for the 1998 standard with the
changes from the TR1 technical report
</para>
</listitem>
<listitem>
<para>
<option>-std=c++11</option> for the 2011 C++ standard. This
option should not be used.
</para>
</listitem>
<listitem>
<para>
<option>-std=c++0x</option> for several different versions
of C++11 support in development, depending on the GCC
version. This option should not be used.
<!-- There were two incompatibilies before GCC 4.7.2
(std::list and std::pair), but link C++98 and C++11
code is still unsupported, although it currently has
some chance of working by accident. -->
</para>
</listitem>
</itemizedlist>
<para>
For each of these flags, there are variants which also enable
GNU extensions (mostly language features also found in C99 or
C11): <option>-std=gnu++98</option>,
<option>-std=gnu++03</option>, <option>-std=gnu++11</option>.
Again, <option>-std=gnu++11</option> should not be used.
</para>
<para>
If you enable C++11 support, the ABI of the standard C++ library
<literal>libstdc++</literal> will change in subtle ways.
Currently, no C++ libraries are compiled in C++11 mode, so if
you compile your code in C++11 mode, it will be incompatible
with the rest of the system. Unfortunately, this is also the
case if you do not use any C++11 features. Currently, there is
no safe way to enable C++11 mode (except for freestanding
applications).
</para>
<para>
The meaning of C++0X mode changed from GCC release to GCC
release. Earlier versions were still ABI-compatible with C++98
mode, but in the most recent versions, switching to C++0X mode
activates C++11 support, with its compatibility problems.
</para>
<para>
Some C++11 features (or approximations thereof) are available
with TR1 support, that is, with <option>-std=c++03</option> or
<option>-std=gnu++03</option> and in the
<literal>&lt;tr1/*&gt;</literal> header files. This includes
<literal>std::tr1::shared_ptr</literal> (from
<literal>&lt;tr1/memory&gt;</literal>) and
<literal>std::tr1::function</literal> (from
<literal>&lt;tr1/functional&gt;</literal>). For other C++11
features, the Boost C++ library contains replacements.
</para>
</section>
</section>

View file

@ -1,32 +0,0 @@
<?xml version='1.0' encoding='utf-8' ?>
<!DOCTYPE section PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
]>
<section id="sect-Defensive_Coding-CXX-Std">
<title>The C++ standard library</title>
<para>
The C++ standard library includes most of its C counterpart
by reference, see <xref linkend="sect-Defensive_Coding-C-Libc"/>.
</para>
<section>
<title>Containers and <literal>operator[]</literal></title>
<para>
Many containers similar to <literal>std::vector</literal>
provide both <literal>operator[](size_type)</literal> and a
member function <literal>at(size_type)</literal>. This applies
to <literal>std::vector</literal> itself,
<literal>std::array</literal>, <literal>std::string</literal>
and other instances of <literal>std::basic_string</literal>.
</para>
<para>
<literal>operator[](size_type)</literal> is not required by the
standard to perform bounds checking (and the implementation in
GCC does not). In contrast, <literal>at(size_type)</literal>
must perform such a check. Therefore, in code which is not
performance-critical, you should prefer
<literal>at(size_type)</literal> over
<literal>operator[](size_type)</literal>, even though it is
slightly more verbose.
</para>
</section>
</section>

View file

@ -1,4 +0,0 @@
<?xml version="1.0"?>
<locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
<include rules="../../schemas.xml"/>
</locatingRules>

View file

@ -1,23 +0,0 @@
K 25
svn:wc:ra_dav:version-url
V 74
/repos/product-security/!svn/ver/302/defensive-coding/trunk/en-US/Features
END
TLS.xml
K 25
svn:wc:ra_dav:version-url
V 82
/repos/product-security/!svn/ver/302/defensive-coding/trunk/en-US/Features/TLS.xml
END
schemas.xml
K 25
svn:wc:ra_dav:version-url
V 86
/repos/product-security/!svn/ver/292/defensive-coding/trunk/en-US/Features/schemas.xml
END
Authentication.xml
K 25
svn:wc:ra_dav:version-url
V 93
/repos/product-security/!svn/ver/281/defensive-coding/trunk/en-US/Features/Authentication.xml
END

View file

@ -1,6 +0,0 @@
K 10
svn:ignore
V 9
snippets
END

View file

@ -1,130 +0,0 @@
10
dir
305
https://svn.devel.redhat.com/repos/product-security/defensive-coding/trunk/en-US/Features
https://svn.devel.redhat.com/repos/product-security
2013-01-16T14:32:22.318444Z
302
fweimer@REDHAT.COM
has-props
9bd5cf0f-f2b3-0410-b1a9-d5c590f50bf1
TLS.xml
file
2013-01-16T22:05:55.369436Z
d466f82b291f65cf802244af678d52dd
2013-01-16T14:32:22.318444Z
302
fweimer@REDHAT.COM
has-props
41635
schemas.xml
file
2013-01-10T17:17:49.036814Z
769bc2635d36b318161574a1adf2f6e7
2012-12-14T10:18:44.472257Z
292
fweimer@REDHAT.COM
has-props
150
Authentication.xml
file
2013-01-10T17:17:49.036814Z
6430a1389eb187d0fbcc79bea6c1a21e
2012-12-13T13:25:23.103424Z
281
fweimer@REDHAT.COM
has-props
8257

View file

@ -1,5 +0,0 @@
K 13
svn:mime-type
V 8
text/xml
END

View file

@ -1,5 +0,0 @@
K 13
svn:mime-type
V 8
text/xml
END

View file

@ -1,5 +0,0 @@
K 13
svn:mime-type
V 8
text/xml
END

View file

@ -1,189 +0,0 @@
<?xml version='1.0' encoding='utf-8' ?>
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
]>
<chapter id="chap-Defensive_Coding-Authentication">
<title>Authentication and Authorization</title>
<section id="sect-Defensive_Coding-Authentication-Server">
<title>Authenticating servers</title>
<para>
When connecting to a server, a client has to make sure that it
is actually talking to the server it expects. There are two
different aspects, securing the network path, and making sure
that the expected user runs the process on the target host.
There are several ways to ensure that:
</para>
<itemizedlist>
<listitem>
<para>
The server uses a TLS certificate which is valid according
to the web browser public key infrastructure, and the client
verifies the certificate and the host name.
</para>
</listitem>
<listitem>
<para>
The server uses a TLS certificate which is expectedby the
client (perhaps it is stored in a configuration file read by
the client). In this case, no host name checking is
required.
</para>
</listitem>
<listitem>
<para>
On Linux, UNIX domain sockets (of the
<literal>PF_UNIX</literal> protocol family, sometimes called
<literal>PF_LOCAL</literal>) are restricted by file system
permissions. If the server socket path is not
world-writable, the server identity cannot be spoofed by
local users.
</para>
</listitem>
<listitem>
<para>
Port numbers less than 1024 (<emphasis>trusted
ports</emphasis>) can only be used by
<literal>root</literal>, so if a UDP or TCP server is
running on the local host and it uses a trusted port, its
identity is assured. (Not all operating systems enforce the
trusted ports concept, and the network might not be trusted,
so it is only useful on the local system.)
</para>
</listitem>
</itemizedlist>
<para>
TLS (<xref linkend="chap-Defensive_Coding-TLS"/>) is the
recommended way for securing connections over untrusted
networks.
</para>
<para>
If the server port number is 1024 is higher, a local user can
impersonate the process by binding to this socket, perhaps after
crashing the real server by exploiting a denial-of-service
vulnerability.
</para>
</section>
<section id="sect-Defensive_Coding-Authentication-Host_based">
<title>Host-based authentication</title>
<para>
Host-based authentication uses access control lists (ACLs) to
accept or deny requests from clients. Thsis authentication
method comes in two flavors: IP-based (or, more generally,
address-based) and name-based (with the name coming from DNS or
<filename>/etc/hosts</filename>). IP-based ACLs often use
prefix notation to extend access to entire subnets. Name-based
ACLs sometimes use wildcards for adding groups of hosts (from
entire DNS subtrees). (In the SSH context, host-based
authentication means something completely different and is not
covered in this section.)
</para>
<para>
Host-based authentication trust the network and may not offer
sufficient granularity, so it has to be considered a weak form
of authentication. On the other hand, IP-based authentication
can be made extremely robust and can be applied very early in
input processing, so it offers an opportunity for significantly
reducing the number of potential attackers for many services.
</para>
<para>
The names returned by <function>gethostbyaddr</function> and
<function>getnameinfo</function> functions cannot be trusted.
(DNS PTR records can be set to arbitrary values, not just names
belong to the address owner.) If these names are used for ACL
matching, a forward lookup using
<function>gethostbyaddr</function> or
<function>getaddrinfo</function> has to be performed. The name
is only valid if the original address is found among the results
of the forward lookup (<emphasis>double-reverse
lookup</emphasis>).
</para>
<para>
An empty ACL should deny all access (deny-by-default). If empty
ACLs permits all access, configuring any access list must switch
to deny-by-default for all unconfigured protocols, in both
name-based and address-based variants.
</para>
<para>
Similarly, if an address or name is not matched by the list, it
should be denied. However, many implementations behave
differently, so the actual behavior must be documented properly.
</para>
<para>
IPv6 addresses can embed IPv4 addresses. There is no
universally correct way to deal with this ambiguity. The
behavior of the ACL implementation should be documented.
</para>
</section>
<section id="sect-Defensive_Coding-Authentication-UNIX_Domain">
<title>UNIX domain socket authentication</title>
<para>
UNIX domain sockets (with address family
<literal>AF_UNIX</literal> or <literal>AF_LOCAL</literal>) are
restricted to the local host and offer a special authentication
mechanism: credentials passing.
</para>
<para>
Nowadays, most systems support the
<literal>SO_PEERCRED</literal> (Linux) or
<literal>LOCAL_PEERCRED</literal> (FreeBSD) socket options, or
the <function>getpeereid</function> (other BSDs, MacOS X).
These interfaces provide direct access to the (effective) user
ID on the other end of a domain socket connect, without
cooperation from the other end.
</para>
<para>
Historically, credentials passing was implemented using
ancillary data in the <function>sendmsg</function> and
<function>recvmsg</function> functions. On some systems, only
credentials data that the peer has explicitly sent can be
received, and the kernel checks the data for correctness on the
sending side. This means that both peers need to deal with
ancillary data. Compared to that, the modern interfaces are
easier to use. Both sets of interfaces vary considerably among
UNIX-like systems, unfortunately.
</para>
<para>
If you want to authenticate based on supplementary groups, you
should obtain the user ID using one of these methods, and look
up the list of supplementary groups using
<function>getpwuid</function> (or
<function>getpwuid_r</function>) and
<function>getgrouplist</function>. Using the PID and
information from <filename>/proc/PID/status</filename> is prone
to race conditions and insecure.
</para>
</section>
<section id="sect-Defensive_Coding-Authentication-Netlink">
<title><literal>AF_NETLINK</literal> authentication of origin</title>
<!-- ??? kernel change may make this obsolete:
https://bugzilla.redhat.com/show_bug.cgi?id=851968 -->
<para>
Netlink messages are used as a high-performance data transfer
mechanism between the kernel and the userspace. Traditionally,
they are used to exchange information related to the network
statck, such as routing table entries.
</para>
<para>
When processing Netlink messages from the kernel, it is
important to check that these messages actually originate from
the kernel, by checking that the port ID (or PID) field
<literal>nl_pid</literal> in the <literal>sockaddr_nl</literal>
structure is <literal>0</literal>. (This structure can be
obtained using <function>recvfrom</function> or
<function>recvmsg</function>, it is different from the
<literal>nlmsghdr</literal> structure.) The kernel does not
prevent other processes from sending unicast Netlink messages,
but the <literal>nl_pid</literal> field in the sender's socket
address will be non-zero in such cases.
</para>
<para>
Applications should not use <literal>AF_NETLINK</literal>
sockets as an IPC mechanism among processes, but prefer UNIX
domain sockets for this tasks.
</para>
</section>
</chapter>

View file

@ -1,988 +0,0 @@
<?xml version='1.0' encoding='utf-8' ?>
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
]>
<chapter id="chap-Defensive_Coding-TLS">
<title>Transport Layer Security</title>
<para>
Transport Layer Security (TLS, formerly Secure Sockets
Layer/SSL) is the recommended way to to protect integrity and
confidentiality while data is transferred over an untrusted
network connection, and to identify the endpoint.
</para>
<section id="sect-Defensive_Coding-TLS-Pitfalls">
<title>Common Pitfalls</title>
<para>
TLS implementations are difficult to use, and most of them lack
a clean API design. The following sections contain
implementation-specific advice, and some generic pitfalls are
mentioned below.
</para>
<itemizedlist>
<listitem>
<para>
Most TLS implementations have questionable default TLS
cipher suites. Most of them enable anonymous Diffie-Hellman
key exchange (but we generally want servers to authenticate
themselves). Many do not disable ciphers which are subject
to brute-force attacks because of restricted key lengths.
Some even disable all variants of AES in the default
configuration.
</para>
<para>
When overriding the cipher suite defaults, it is recommended
to disable all cipher suites which are not present on a
whitelist, instead of simply enabling a list of cipher
suites. This way, if an algorithm is disabled by default in
the TLS implementation in a future security update, the
application will not re-enable it.
</para>
</listitem>
<listitem>
<para>
The name which is used in certificate validation must match
the name provided by the user or configuration file. No host
name canonicalization or IP address lookup must be performed.
</para>
</listitem>
<listitem>
<para>
The TLS handshake has very poor performance if the TCP Nagle
algorithm is active. You should switch on the
<literal>TCP_NODELAY</literal> socket option (at least for the
duration of the handshake), or use the Linux-specific
<literal>TCP_CORK</literal> option.
</para>
<example id="ex-Defensive_Coding-TLS-Nagle">
<title>Deactivating the TCP Nagle algorithm</title>
<xi:include href="snippets/TLS-Nagle.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
</listitem>
<listitem>
<para>
Implementing proper session resumption decreases handshake
overhead considerably. This is important if the upper-layer
protocol uses short-lived connections (like most application
of HTTPS).
</para>
</listitem>
<listitem>
<para>
Both client and server should work towards an orderly
connection shutdown, that is send
<literal>close_notify</literal> alerts and respond to them.
This is especially important if the upper-layer protocol
does not provide means to detect connection truncation (like
some uses of HTTP).
</para>
</listitem>
<listitem>
<para>
When implementing a server using event-driven programming,
it is important to handle the TLS handshake properly because
it includes multiple network round-trips which can block
when an ordinary TCP <function>accept</function> would not.
Otherwise, a client which fails to complete the TLS
handshake for some reason will prevent the server from
handling input from other clients.
</para>
</listitem>
<listitem>
<para>
Unlike regular file descriptors, TLS connections cannot be
passed between processes. Some TLS implementations add
additional restrictions, and TLS connections generally
cannot be used across <function>fork</function> function
calls (see <xref
linkend="sect-Defensive_Coding-Tasks-Processes-Fork-Parallel"/>).
</para>
</listitem>
</itemizedlist>
<section id="sect-Defensive_Coding-TLS-OpenSSL">
<title>OpenSSL Pitfalls</title>
<para>
Some OpenSSL function use <emphasis>tri-state return
values</emphasis>. Correct error checking is extremely
important. Several functions return <literal>int</literal>
values with the following meaning:
</para>
<itemizedlist>
<listitem>
<para>
The value <literal>1</literal> indicates success (for
example, a successful signature verification).
</para>
</listitem>
<listitem>
<para>
The value <literal>0</literal> indicates semantic
failure (for example, a signature verification which was
unsuccessful because the signing certificate was
self-signed).
</para>
</listitem>
<listitem>
<para>
The value <literal>-1</literal> indicates a low-level
error in the system, such as failure to allocate memory
using <function>malloc</function>.
</para>
</listitem>
</itemizedlist>
<para>
Treating such tri-state return values as booleans can lead
to security vulnerabilities. Note that some OpenSSL
functions return boolean results or yet another set of
status indicators. Each function needs to be checked
individually.
</para>
<para>
Recovering precise error information is difficult.
<xref linkend="ex-Defensive_Coding-TLS-OpenSSL-Errors"/>
shows how to obtain a more precise error code after a function
call on an <literal>SSL</literal> object has failed. However,
there are still cases where no detailed error information is
available (e.g., if <function>SSL_shutdown</function> fails
due to a connection teardown by the other end).
</para>
<example id="ex-Defensive_Coding-TLS-OpenSSL-Errors">
<title>Obtaining OpenSSL error codes</title>
<xi:include href="snippets/TLS-OpenSSL-Errors.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
<para>
The <function>OPENSSL_config</function> function is
documented to never fail. In reality, it can terminate the
entire process if there is a failure accessing the
configuration file. An error message is written to standard
error, but which might not be visible if the function is
called from a daemon process.
</para>
<para>
OpenSSL contains two separate ASN.1 DER decoders. One set
of decoders operate on BIO handles (the input/output stream
abstraction provided by OpenSSL); their decoder function
names start with <literal>d2i_</literal> and end in
<literal>_fp</literal> or <literal>_bio</literal> (e.g.,
<function>d2i_X509_fp</function> or
<function>d2i_X509_bio</function>). These decoders must not
be used for parsing data from untrusted sources; instead,
the variants without the <literal>_fp</literal> and
<literal>_bio</literal> (e.g.,
<function>d2i_X509</function>) shall be used. The BIO
variants have received considerably less testing and are not
very robust.
</para>
<para>
For the same reason, the OpenSSL command line tools (such as
<command>openssl x509</command>) are generally generally less
robust than the actual library code. They use the BIO
functions internally, and not the more robust variants.
</para>
<para>
The command line tools do not always indicate failure in the
exit status of the <application>openssl</application> process.
For instance, a verification failure in <command>openssl
verify</command> result in an exit status of zero.
</para>
<para>
The OpenSSL server and client applications (<command>openssl
s_client</command> and <command>openssl s_server</command>)
are debugging tools and should <emphasis>never</emphasis> be
used as generic clients. For instance, the
<application>s_client</application> tool reacts in a
surprisign way to lines starting with <literal>R</literal> and
<literal>Q</literal>.
</para>
<para>
OpenSSL allows application code to access private key
material over documented interfaces. This can significantly
increase the part of the code base which has to undergo
security certification.
</para>
</section>
<section id="sect-Defensive_Coding-TLS-Pitfalls-GNUTLS">
<title>GNUTLS Pitfalls</title>
<para>
<filename>libgnutls.so.26</filename> links to
<filename>libpthread.so.0</filename>. Loading the threading
library too late causes problems, so the main program should
be linked with <literal>-lpthread</literal> as well. As a
result, it can be difficult to use GNUTLS in a plugin which is
loaded with the <function>dlopen</function> function. Another
side effect is that applications which merely link against
GNUTLS (even without actually using it) may incur a
substantial overhead because other libraries automatically
switch to thread-safe algorithms.
</para>
<para>
The <function>gnutls_global_init</function> function must be
called before using any functionality provided by the library.
This function is not thread-safe, so external locking is
required, but it is not clear which lock should be used.
Omitting the synchronization does not just lead to a memory
leak, as it is suggested in the GNUTLS documentation, but to
undefined behavior because there is no barrier that would
enforce memory ordering.
</para>
<para>
The <function>gnutls_global_deinit</function> function does
not actually deallocate all resources allocated by
<function>gnutls_global_init</function>. It is currently not
thread-safe. Therefore, it is best to avoid calling it
altogether.
</para>
<para>
The X.509 implementation in GNUTLS is rather lenient. For
example, it is possible to create and process X.509
version&nbsp;1 certificates which carry extensions. These
certificates are (correctly) rejected by other
implementations.
</para>
</section>
<section id="sect-Defensive_Coding-TLS-Pitfalls-OpenJDK">
<title>OpenJDK Pitfalls</title>
<para>
The Java cryptographic framework is highly modular. As a
result, when you request an object implementing some
cryptographic functionality, you cannot be completely sure
that you end up with the well-tested, reviewed implementation
in OpenJDK.
</para>
<para>
OpenJDK (in the source code as published by Oracle) and other
implementations of the Java platform require that the system
administrator has installed so-called <emphasis>unlimited
strength jurisdiction policy files</emphasis>. Without this
step, it is not possible to use the secure algorithms which
offer sufficient cryptographic strength. Most downstream
redistributors of OpenJDK remove this requirement.
</para>
<para>
Some versions of OpenJDK use <filename>/dev/random</filename>
as the randomness source for nonces and other random data
which is needed for TLS operation, but does not actually
require physical randomness. As a result, TLS applications
can block, waiting for more bits to become available in
<filename>/dev/random</filename>.
</para>
</section>
<section id="sect-Defensive_Coding-TLS-Pitfalls-NSS">
<title>NSS Pitfalls</title>
<para>
NSS was not designed to be used by other libraries which can
be linked into applications without modifying them. There is
a lot of global state. There does not seem to be a way to
perform required NSS initialization without race conditions.
</para>
<para>
If the NSPR descriptor is in an unexpected state, the
<function>SSL_ForceHandshake</function> function can succeed,
but no TLS handshake takes place, the peer is not
authenticated, and subsequent data is exchanged in the clear.
</para>
<para>
NSS disables itself if it detects that the process underwent a
<function>fork</function> after the library has been
initialized. This behavior is required by the PKCS#11 API
specification.
</para>
</section>
</section>
<section id="sect-Defensive_Coding-TLS-Client">
<title>TLS Clients</title>
<para>
Secure use of TLS in a client generally involves all of the
following steps. (Individual instructions for specific TLS
implementations follow in the next sections.)
</para>
<itemizedlist>
<listitem>
<para>
The client must configure the TLS library to use a set of
trusted root certificates. These certificates are provided
by the system in <filename
class="directory">/etc/ssl/certs</filename> or files derived
from it.
</para>
</listitem>
<listitem>
<para>
The client selects sufficiently strong cryptographic
primitives and disables insecure ones (such as no-op
encryption). Compression and SSL version 2 support must be
disabled (including the SSLv2-compatible handshake).
</para>
</listitem>
<listitem>
<para>
The client initiates the TLS connection. The Server Name
Indication extension should be used if supported by the
TLS implementation. Before switching to the encrypted
connection state, the contents of all input and output
buffers must be discarded.
</para>
</listitem>
<listitem>
<para>
The client needs to validate the peer certificate provided
by the server, that is, the client must check that there
is a cryptographically protected chain from a trusted root
certificate to the peer certificate. (Depending on the
TLS implementation, a TLS handshake can succeed even if
the certificate cannot be validated.)
</para>
</listitem>
<listitem>
<para>
The client must check that the configured or user-provided
server name matches the peer certificate provided by the
server.
</para>
</listitem>
</itemizedlist>
<para>
It is safe to provide users detailed diagnostics on
certificate validation failures. Other causes of handshake
failures and, generally speaking, any details on other errors
reported by the TLS implementation (particularly exception
tracebacks), must not be divulged in ways that make them
accessible to potential attackers. Otherwise, it is possible
to create decryption oracles.
</para>
<important>
<para>
Depending on the application, revocation checking (against
certificate revocations lists or via OCSP) and session
resumption are important aspects of production-quality
client. These aspects are not yet covered.
</para>
</important>
<section>
<title>Implementation TLS Clients With OpenSSL</title>
<para>
In the following code, the error handling is only exploratory.
Proper error handling is required for production use,
especially in libraries.
<!-- FIXME: Cross-reference event-driven I/O section when it
exists and mention that this is really quite complex to
implement. -->
</para>
<para>
The OpenSSL library needs explicit initialization (see <xref
linkend="ex-Defensive_Coding-TLS-OpenSSL-Init"/>).
</para>
<example id="ex-Defensive_Coding-TLS-OpenSSL-Init">
<title>OpenSSL library initialization</title>
<xi:include href="snippets/TLS-Client-OpenSSL-Init.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
<para>
After that, a context object has to be created, which acts as
a factory for connection objects (<xref
linkend="ex-Defensive_Coding-TLS-Client-OpenSSL-CTX"/>). We
use an explicit cipher list so that we do not pick up any
strange ciphers when OpenSSL is upgraded. The actual version
requested in the client hello depends on additional
restrictions in the OpenSSL library. If possible, you should
follow the example code and use the default list of trusted
root certificate authorities provided by the system because
you would have to maintain your own set otherwise, which can
be cumbersome.
</para>
<example id="ex-Defensive_Coding-TLS-Client-OpenSSL-CTX">
<title>OpenSSL client context creation</title>
<xi:include href="snippets/TLS-Client-OpenSSL-CTX.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
<para>
A single context object can be used to create multiple
connection objects. It is safe to use the same
<literal>SSL_CTX</literal> object for creating connections
concurrently from multiple threads, provided that the
<literal>SSL_CTX</literal> object is not modified (e.g.,
callbacks must not be changed).
</para>
<para>
After creating the TCP socket and disabling the Nagle
algorithm (per <xref
linkend="ex-Defensive_Coding-TLS-Nagle"/>), the actual
connection object needs to be created, as show in <xref
linkend="ex-Defensive_Coding-TLS-Client-OpenSSL-CTX"/>. If
the handshake started by <function>SSL_connect</function>
fails, the <function>ssl_print_error_and_exit</function>
function from <xref
linkend="ex-Defensive_Coding-TLS-OpenSSL-Errors"/> is called.
</para>
<para>
The <function>certificate_validity_override</function>
function provides an opportunity to override the validity of
the certificate in case the OpenSSL check fails. If such
functionality is not required, the call can be removed,
otherwise, the application developer has to implement it.
</para>
<para>
The host name passed to the functions
<function>SSL_set_tlsext_host_name</function> and
<function>X509_check_host</function> must be the name that was
passed to <function>getaddrinfo</function> or a similar name
resolution function. No host name canonicalization must be
performed. The <function>X509_check_host</function> function
used in the final step for host name matching is currently
only implemented in OpenSSL 1.1, which is not released yet.
In case host name matching fails, the function
<function>certificate_host_name_override</function> is called.
This function should check user-specific certificate store, to
allow a connection even if the host name does not match the
certificate. This function has to be provided by the
application developer. Note that the override must be keyed
by both the certificate <emphasis>and</emphasis> the host
name.
</para>
<example id="ex-Defensive_Coding-TLS-Client-OpenSSL-Connect">
<title>Creating a client connection using OpenSSL</title>
<xi:include href="snippets/TLS-Client-OpenSSL-Connect.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
<para>
The connection object can be used for sending and receiving
data, as in <xref
linkend="ex-Defensive_Coding-TLS-OpenSSL-Connection-Use"/>.
It is also possible to create a <literal>BIO</literal> object
and use the <literal>SSL</literal> object as the underlying
transport, using <function>BIO_set_ssl</function>.
</para>
<example id="ex-Defensive_Coding-TLS-OpenSSL-Connection-Use">
<title>Using an OpenSSL connection to send and receive data</title>
<xi:include href="snippets/TLS-Client-OpenSSL-Connection-Use.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
<para>
When it is time to close the connection, the
<function>SSL_shutdown</function> function needs to be called
twice for an orderly, synchronous connection termination
(<xref
linkend="ex-Defensive_Coding-TLS-OpenSSL-Connection-Close"/>).
This exchanges <literal>close_notify</literal> alerts with the
server. The additional logic is required to deal with an
unexpected <literal>close_notify</literal> from the server.
Note that is necessary to explicitly close the underlying
socket after the connection object has been freed.
</para>
<example id="ex-Defensive_Coding-TLS-OpenSSL-Connection-Close">
<title>Closing an OpenSSL connection in an orderly fashion</title>
<xi:include href="snippets/TLS-OpenSSL-Connection-Close.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
<para>
<xref linkend="ex-Defensive_Coding-TLS-OpenSSL-Context-Close"/> shows how
to deallocate the context object when it is no longer needed
because no further TLS connections will be established.
</para>
<example id="ex-Defensive_Coding-TLS-OpenSSL-Context-Close">
<title>Closing an OpenSSL connection in an orderly fashion</title>
<xi:include href="snippets/TLS-OpenSSL-Context-Close.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
</section>
<section id="sect-Defensive_Coding-TLS-Client-GNUTLS">
<title>Implementation TLS Clients With GNUTLS</title>
<para>
This section describes how to implement a TLS client with full
certificate validation (but without certificate revocation
checking). Note that the error handling in is only
exploratory and needs to be replaced before production use.
</para>
<para>
The GNUTLS library needs explicit initialization:
</para>
<informalexample id="ex-Defensive_Coding-TLS-GNUTLS-Init">
<xi:include href="snippets/TLS-GNUTLS-Init.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</informalexample>
<para>
Failing to do so can result in obscure failures in Base64
decoding. See <xref
linkend="sect-Defensive_Coding-TLS-Pitfalls-GNUTLS"/> for
additional aspects of initialization.
</para>
<para>
Before setting up TLS connections, a credentials objects has
to be allocated and initialized with the set of trusted root
CAs (<xref
linkend="ex-Defensive_Coding-TLS-Client-GNUTLS-Credentials"/>).
</para>
<example id="ex-Defensive_Coding-TLS-Client-GNUTLS-Credentials">
<title>Initializing a GNUTLS credentials structure</title>
<xi:include href="snippets/TLS-Client-GNUTLS-Credentials.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
<para>
After the last TLS connection has been closed, this credentials
object should be freed:
</para>
<informalexample>
<xi:include href="snippets/TLS-GNUTLS-Credentials-Close.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</informalexample>
<para>
During its lifetime, the credentials object can be used to
initialize TLS session objects from multiple threads, provided
that it is not changed.
</para>
<para>
Once the TCP connection has been established, the Nagle
algorithm should be disabled (see <xref
linkend="ex-Defensive_Coding-TLS-Nagle"/>). After that, the
socket can be associated with a new GNUTLS session object.
The previously allocated credentials object provides the set
of root CAs. The <literal>NORMAL</literal> set of cipher
suites and protocols provides a reasonable default. Then the
TLS handshake must be initiated. This is shown in <xref
linkend="ex-Defensive_Coding-TLS-Client-GNUTLS-Connect"/>.
</para>
<example id="ex-Defensive_Coding-TLS-Client-GNUTLS-Connect">
<title>Establishing a TLS client connection using GNUTLS</title>
<xi:include href="snippets/TLS-Client-GNUTLS-Connect.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
<para>
After the handshake has been completed, the server certificate
needs to be verified (<xref
linkend="ex-Defensive_Coding-TLS-Client-GNUTLS-Verify"/>). In
the example, the user-defined
<function>certificate_validity_override</function> function is
called if the verification fails, so that a separate,
user-specific trust store can be checked. This function call
can be omitted if the functionality is not needed.
</para>
<example id="ex-Defensive_Coding-TLS-Client-GNUTLS-Verify">
<title>Verifying a server certificate using GNUTLS</title>
<xi:include href="snippets/TLS-Client-GNUTLS-Verify.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
<para>
In the next step (<xref
linkend="ex-Defensive_Coding-TLS-Client-GNUTLS-Match"/>, the
certificate must be matched against the host name (note the
unusual return value from
<function>gnutls_x509_crt_check_hostname</function>). Again,
an override function
<function>certificate_host_name_override</function> is called.
Note that the override must be keyed to the certificate
<emphasis>and</emphasis> the host name. The function call can
be omitted if the override is not needed.
</para>
<example id="ex-Defensive_Coding-TLS-Client-GNUTLS-Match">
<title>Matching the server host name and certificate in a
GNUTLS client</title>
<xi:include href="snippets/TLS-Client-GNUTLS-Match.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
<para>
In newer GNUTLS versions, certificate checking and host name
validation can be combined using the
<function>gnutls_certificate_verify_peers3</function> function.
</para>
<para>
An established TLS session can be used for sending and
receiving data, as in <xref
linkend="ex-Defensive_Coding-TLS-GNUTLS-Use"/>.
</para>
<example id="ex-Defensive_Coding-TLS-GNUTLS-Use">
<title>Using a GNUTLS session</title>
<xi:include href="snippets/TLS-GNUTLS-Use.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
<para>
In order to shut down a connection in an orderly manner, you
should call the <function>gnutls_bye</function> function.
Finally, the session object can be deallocated using
<function>gnutls_deinit</function> (see <xref
linkend="ex-Defensive_Coding-TLS-GNUTLS-Disconnect"/>).
</para>
<example id="ex-Defensive_Coding-TLS-GNUTLS-Disconnect">
<title>Using a GNUTLS session</title>
<xi:include href="snippets/TLS-GNUTLS-Disconnect.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
</section>
<section id="sect-Defensive_Coding-TLS-Client-OpenJDK">
<title>Implementing TLS Clients With OpenJDK</title>
<para>
The examples below use the following cryptographic-related
classes:
</para>
<informalexample>
<xi:include href="snippets/TLS-Client-OpenJDK-Import.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</informalexample>
<para>
If compatibility with OpenJDK 6 is required, it is necessary
to use the internal class
<literal>sun.security.util.HostnameChecker</literal>. (The
public OpenJDK API does not provide any support for dissecting
the subject distinguished name of an X.509 certificate, so a
custom-written DER parser is needed—or we have to use an
internal class, which we do below.) In OpenJDK 7, the
<function>setEndpointIdentificationAlgorithm</function> method
was added to the
<literal>javax.net.ssl.SSLParameters</literal> class,
providing an official way to implement host name checking.
</para>
<para>
TLS connections are established using an
<literal>SSLContext</literal> instance. With a properly
configured OpenJDK installation, the
<literal>SunJSSE</literal> provider uses the system-wide set
of trusted root certificate authorities, so no further
configuration is necessary. For backwards compatibility with
OpenJDK&nbsp;6, the <literal>TLSv1</literal> provider has to
be supported as a fall-back option. This is shown in <xref
linkend="ex-Defensive_Coding-TLS-Client-OpenJDK-Context"/>.
</para>
<example id="ex-Defensive_Coding-TLS-Client-OpenJDK-Context">
<title>Setting up an <literal>SSLContext</literal> for OpenJDK TLS
clients</title>
<xi:include href="snippets/TLS-Client-OpenJDK-Context.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
<para>
In addition to the context, a TLS parameter object will be
needed which adjusts the cipher suites and protocols (<xref
linkend="ex-Defensive_Coding-TLS-OpenJDK-Parameters"/>). Like
the context, these parameters can be reused for multiple TLS
connections.
</para>
<example id="ex-Defensive_Coding-TLS-OpenJDK-Parameters">
<title>Setting up <literal>SSLParameters</literal> for TLS use
with OpenJDK</title>
<xi:include href="snippets/TLS-OpenJDK-Parameters.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
<para>
As initialized above, the parameter object does not yet
require host name checking. This has to be enabled
separately, and this is only supported by OpenJDK 7 and later:
</para>
<informalexample>
<xi:include href="snippets/TLS-Client-OpenJDK-Hostname.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</informalexample>
<para>
All application protocols can use the
<literal>"HTTPS"</literal> algorithm. (The algorithms have
minor differences with regard to wildcard handling, which
should not matter in practice.)
</para>
<para>
<xref linkend="ex-Defensive_Coding-TLS-Client-OpenJDK-Connect"/>
shows how to establish the connection. Before the handshake
is initialized, the protocol and cipher configuration has to
be performed, by applying the parameter object
<literal>params</literal>. (After this point, changes to
<literal>params</literal> will not affect this TLS socket.)
As mentioned initially, host name checking requires using an
internal API on OpenJDK 6.
</para>
<example id="ex-Defensive_Coding-TLS-Client-OpenJDK-Connect">
<title>Establishing a TLS connection with OpenJDK</title>
<xi:include href="snippets/TLS-Client-OpenJDK-Connect.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
<para>
Starting with OpenJDK 7, the last lines can be omitted,
provided that host name verification has been enabled by
calling the
<function>setEndpointIdentificationAlgorithm</function> method
on the <literal>params</literal> object (before it was applied
to the socket).
</para>
<para>
The TLS socket can be used as a regular socket, as shown in
<xref linkend="ex-Defensive_Coding-TLS-Client-OpenJDK-Use"/>.
</para>
<example id="ex-Defensive_Coding-TLS-Client-OpenJDK-Use">
<title>Using a TLS client socket in OpenJDK</title>
<xi:include href="snippets/TLS-Client-OpenJDK-Use.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
<section>
<title>Overriding server certificate validation with OpenJDK 6</title>
<para>
Overriding certificate validation requires a custom trust
manager. With OpenJDK 6, the trust manager lacks
information about the TLS session, and to which server the
connection is made. Certificate overrides have to be tied
to specific servers (host names). Consequently, different
<literal>TrustManager</literal> and
<literal>SSLContext</literal> objects have to be used for
different servers.
</para>
<para>
In the trust manager shown in <xref
linkend="ex-Defensive_Coding-TLS-Client-MyTrustManager"/>,
the server certificate is identified by its SHA-256 hash.
</para>
<example id="ex-Defensive_Coding-TLS-Client-MyTrustManager">
<title>A customer trust manager for OpenJDK TLS clients</title>
<xi:include href="snippets/TLS-Client-OpenJDK-MyTrustManager.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
<para>
This trust manager has to be passed to the
<literal>init</literal> method of the
<literal>SSLContext</literal> object, as show in <xref
linkend="ex-Defensive_Coding-TLS-Client-Context_For_Cert"/>.
</para>
<example id="ex-Defensive_Coding-TLS-Client-Context_For_Cert">
<title>Using a custom TLS trust manager with OpenJDK</title>
<xi:include href="snippets/TLS-Client-OpenJDK-Context_For_Cert.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
<para>
When certificate overrides are in place, host name
verification should not be performed because there is no
security requirement that the host name in the certificate
matches the host name used to establish the connection (and
it often will not). However, without host name
verification, it is not possible to perform transparent
fallback to certification validation using the system
certificate store.
</para>
<para>
The approach described above works with OpenJDK 6 and later
versions. Starting with OpenJDK 7, it is possible to use a
custom subclass of the
<literal>javax.net.ssl.X509ExtendedTrustManager</literal>
class. The OpenJDK TLS implementation will call the new
methods, passing along TLS session information. This can be
used to implement certificate overrides as a fallback (if
certificate or host name verification fails), and a trust
manager object can be used for multiple servers because the
server address is available to the trust manager.
</para>
</section>
</section>
<section id="sect-Defensive_Coding-TLS-Client-NSS">
<title>Implementing TLS Clients With NSS</title>
<para>
The following code shows how to implement a simple TLS client
using NSS. Note that the error handling needs replacing
before production use.
</para>
<para>
Using NSS needs several header files, as shown in
<xref linkend="ex-Defensive_Coding-TLS-NSS-Includes"/>.
</para>
<example id="ex-Defensive_Coding-TLS-NSS-Includes">
<title>Include files for NSS</title>
<xi:include href="snippets/TLS-NSS-Includes.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
<para>
Initializing the NSS library is a complex task (<xref
linkend="ex-Defensive_Coding-TLS-NSS-Init"/>). It is not
thread-safe. By default, the library is in export mode, and
all strong ciphers are disabled. Therefore, after creating
the <literal>NSSInitCContext</literal> object, we probe all
the strong ciphers we want to use, and check if at least one
of them is available. If not, we call
<function>NSS_SetDomesticPolicy</function> to switch to
unrestricted policy mode. This function replaces the existing
global cipher suite policy, that is why we avoid calling it
unless absolutely necessary.
</para>
<para>
The simplest way to configured the trusted root certificates
involves loading the <filename>libnssckbi.so</filename> NSS
module with a call to the
<function>SECMOD_LoadUserModule</function> function. The root
certificates are compiled into this module. (The PEM module
for NSS, <filename>libnsspem.so</filename>, offers a way to
load trusted CA certificates from a file.)
</para>
<example id="ex-Defensive_Coding-TLS-NSS-Init">
<title>Initializing the NSS library</title>
<xi:include href="snippets/TLS-NSS-Init.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
<para>
Some of the effects of the initialization can be reverted with
the following function calls:
</para>
<informalexample id="ex-Defensive_Coding-TLS-NSS-Close">
<xi:include href="snippets/TLS-NSS-Close.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</informalexample>
<para>
After NSS has been initialized, the TLS connection can be
created (<xref
linkend="ex-Defensive_Coding-TLS-Client-NSS-Connect"/>). The
internal <function>PR_ImportTCPSocket</function> function is
used to turn the POSIX file descriptor
<literal>sockfd</literal> into an NSPR file descriptor. (This
function is de-facto part of the NSS public ABI, so it will
not go away.) Creating the TLS-capable file descriptor
requires a <emphasis>model</emphasis> descriptor, which is
configured with the desired set of protocols and ciphers.
(The <literal>good_ciphers</literal> variable is part of <xref
linkend="ex-Defensive_Coding-TLS-NSS-Init"/>.) We cannot
resort to disabling ciphers not on a whitelist because by
default, the AES cipher suites are disabled. The model
descriptor is not needed anymore after TLS support has been
activated for the existing connection descriptor.
</para>
<para>
The call to <function>SSL_BadCertHook</function> can be
omitted if no mechanism to override certificate verification
is needed. The <literal>bad_certificate</literal> function
must check both the host name specified for the connection and
the certificate before granting the override.
</para>
<para>
Triggering the actual handshake requires three function calls,
<function>SSL_ResetHandshake</function>,
<function>SSL_SetURL</function>, and
<function>SSL_ForceHandshake</function>. (If
<function>SSL_ResetHandshake</function> is omitted,
<function>SSL_ForceHandshake</function> will succeed, but the
data will not be encrypted.) During the handshake, the
certificate is verified and matched against the host name.
</para>
<example id="ex-Defensive_Coding-TLS-Client-NSS-Connect">
<title>Creating a TLS connection with NSS</title>
<xi:include href="snippets/TLS-Client-NSS-Connect.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
<para>
After the connection has been established, <xref
linkend="ex-Defensive_Coding-TLS-NSS-Use"/> shows how to use
the NSPR descriptor to communicate with the server.
</para>
<example id="ex-Defensive_Coding-TLS-NSS-Use">
<title>Using NSS for sending and receiving data</title>
<xi:include href="snippets/TLS-NSS-Use.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
<para>
<xref linkend="ex-Defensive_Coding-TLS-Client-NSS-Close"/>
shows how to close the connection.
</para>
<example id="ex-Defensive_Coding-TLS-Client-NSS-Close">
<title>Closing NSS client connections</title>
<xi:include href="snippets/TLS-Client-NSS-Close.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
</section>
<section id="sect-Defensive_Coding-TLS-Client-Python">
<title>Implementing TLS Clients With Python</title>
<para>
The Python distribution provides a TLS implementation in the
<literal>ssl</literal> module (actually a wrapper around
OpenSSL). The exported interface is somewhat restricted, so
that the client code shown below does not fully implement the
recommendations in <xref
linkend="sect-Defensive_Coding-TLS-OpenSSL"/>.
</para>
<important>
<para>
Currently, most Python function which accept
<literal>https://</literal> URLs or otherwise implement
HTTPS support do not perform certificate validation at all.
(For example, this is true for the <literal>httplib</literal>
and <literal>xmlrpclib</literal> modules.) If you use
HTTPS, you should not use the built-in HTTP clients. The
<literal>Curl</literal> class in the <literal>curl</literal>
module, as provided by the <literal>python-pycurl</literal>
package implements proper certificate validation.
</para>
</important>
<para>
The <literal>ssl</literal> module currently does not perform
host name checking on the server certificate. <xref
linkend="ex-Defensive_Coding-TLS-Client-Python-check_host_name"/>
shows how to implement certificate matching, using the parsed
certificate returned by <function>getpeercert</function>.
</para>
<example id="ex-Defensive_Coding-TLS-Client-Python-check_host_name">
<title>Implementing TLS host name checking Python (without
wildcard support)</title>
<xi:include href="snippets/TLS-Client-Python-check_host_name.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
<para>
To turn a regular, connected TCP socket into a TLS-enabled
socket, use the <function>ssl.wrap_socket</function> function.
The function call in <xref
linkend="ex-Defensive_Coding-TLS-Client-Python-Connect"/>
provides additional arguments to override questionable
defaults in OpenSSL and in the Python module.
</para>
<itemizedlist>
<listitem>
<para>
<literal>ciphers="HIGH:-aNULL:-eNULL:-PSK:RC4-SHA:RC4-MD5"</literal>
selects relatively strong cipher suites with
certificate-based authentication. (The call to
<function>check_host_name</function> function provides
additional protection against anonymous cipher suites.)
</para>
</listitem>
<listitem>
<para>
<literal>ssl_version=ssl.PROTOCOL_TLSv1</literal> disables
SSL 2.0 support. By default, the <literal>ssl</literal>
module sends an SSL 2.0 client hello, which is rejected by
some servers. Ideally, we would request OpenSSL to
negotiated the most recent TLS version supported by the
server and the client, but the Python module does not
allow this.
</para>
</listitem>
<listitem>
<para>
<literal>cert_reqs=ssl.CERT_REQUIRED</literal> turns on
certificate validation.
</para>
</listitem>
<listitem>
<para>
<literal>ca_certs='/etc/ssl/certs/ca-bundle.crt'</literal>
initializes the certificate store with a set of trusted
root CAs. Unfortunately, it is necessary to hard-code
this path into applications because the default path in
OpenSSL is not available through the Python
<literal>ssl</literal> module.
</para>
</listitem>
</itemizedlist>
<para>
The <literal>ssl</literal> module (and OpenSSL) perform
certificate validation, but the certificate must be compared
manually against the host name, by calling the
<function>check_host_name</function> defined above.
</para>
<example id="ex-Defensive_Coding-TLS-Client-Python-Connect">
<title>Establishing a TLS client connection with Python</title>
<xi:include href="snippets/TLS-Client-Python-Connect.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
<para>
After the connection has been established, the TLS socket can
be used like a regular socket:
</para>
<informalexample>
<xi:include href="snippets/TLS-Python-Use.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</informalexample>
<para>
Closing the TLS socket is straightforward as well:
</para>
<informalexample>
<xi:include href="snippets/TLS-Python-Close.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</informalexample>
</section>
</section>
</chapter>

View file

@ -1,4 +0,0 @@
<?xml version="1.0"?>
<locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
<include rules="../../schemas.xml"/>
</locatingRules>

View file

@ -1,17 +0,0 @@
K 25
svn:wc:ra_dav:version-url
V 72
/repos/product-security/!svn/ver/292/defensive-coding/trunk/en-US/Python
END
Language.xml
K 25
svn:wc:ra_dav:version-url
V 85
/repos/product-security/!svn/ver/281/defensive-coding/trunk/en-US/Python/Language.xml
END
schemas.xml
K 25
svn:wc:ra_dav:version-url
V 84
/repos/product-security/!svn/ver/292/defensive-coding/trunk/en-US/Python/schemas.xml
END

View file

@ -1,6 +0,0 @@
K 10
svn:ignore
V 9
snippets
END

View file

@ -1,96 +0,0 @@
10
dir
305
https://svn.devel.redhat.com/repos/product-security/defensive-coding/trunk/en-US/Python
https://svn.devel.redhat.com/repos/product-security
2012-12-14T10:18:44.472257Z
292
fweimer@REDHAT.COM
has-props
9bd5cf0f-f2b3-0410-b1a9-d5c590f50bf1
Language.xml
file
2013-01-10T17:17:40.317763Z
00327c6f05b6d4d52a043fe8caff08b9
2012-12-13T13:25:23.103424Z
281
fweimer@REDHAT.COM
has-props
2620
schemas.xml
file
2013-01-10T17:17:40.317763Z
769bc2635d36b318161574a1adf2f6e7
2012-12-14T10:18:44.472257Z
292
fweimer@REDHAT.COM
has-props
150

View file

@ -1,5 +0,0 @@
K 13
svn:mime-type
V 8
text/xml
END

View file

@ -1,5 +0,0 @@
K 13
svn:mime-type
V 8
text/xml
END

View file

@ -1,74 +0,0 @@
<?xml version='1.0' encoding='utf-8' ?>
<!DOCTYPE section PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
]>
<chapter id="chap-Defensive_Coding-Python">
<title>The Python Programming Language</title>
<para>
Python provides memory safety by default, so low-level security
vulnerabilities are rare and typically needs fixing the Python
interpreter or standard library itself.
</para>
<para>
Other sections with Python-specific advice include:
</para>
<itemizedlist>
<listitem>
<para>
<xref linkend="chap-Defensive_Coding-Tasks-Temporary_Files"/>
</para>
</listitem>
<listitem>
<para>
<xref linkend="sect-Defensive_Coding-Tasks-Processes-Creation"/>
</para>
</listitem>
<listitem>
<para>
<xref linkend="chap-Defensive_Coding-Tasks-Serialization"/>, in
particular <xref linkend="sect-Defensive_Coding-Tasks-Serialization-Library"/>
</para>
</listitem>
<listitem>
<para>
<xref linkend="sect-Defensive_Coding-Tasks-Cryptography-Randomness"/>
</para>
</listitem>
</itemizedlist>
<section>
<title>Dangerous standard library features</title>
<para>
Some areas of the standard library, notably the
<literal>ctypes</literal> module, do not provide memory safety
guarantees comparable to the rest of Python. If such
functionality is used, the advice in <xref
linkend="sect-Defensive_Coding-C-Language"/> should be followed.
</para>
</section>
<section>
<title>Run-time compilation and code generation</title>
<para>
The following Python functions and statements related to code
execution should be avoided:
</para>
<itemizedlist>
<listitem><para><function>compile</function></para></listitem>
<listitem><para><function>eval</function></para></listitem>
<listitem><para><literal>exec</literal></para></listitem>
<listitem><para><function>execfile</function></para></listitem>
</itemizedlist>
<para>
If you need to parse integers or floating point values, use the
<function>int</function> and <function>float</function>
functions instead of <function>eval</function>. Sandboxing
untrusted Python code does not work reliably.
</para>
</section>
<section>
<title>Sandboxing</title>
<para>
The <literal>rexec</literal> Python module cannot safely sandbox
untrusted code and should not be used. The standard CPython
implementation is not suitable for sandboxing.
</para>
</section>
</chapter>

View file

@ -1,4 +0,0 @@
<?xml version="1.0"?>
<locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
<include rules="../../schemas.xml"/>
</locatingRules>

View file

@ -1,59 +0,0 @@
K 25
svn:wc:ra_dav:version-url
V 71
/repos/product-security/!svn/ver/294/defensive-coding/trunk/en-US/Tasks
END
Descriptors.xml
K 25
svn:wc:ra_dav:version-url
V 87
/repos/product-security/!svn/ver/281/defensive-coding/trunk/en-US/Tasks/Descriptors.xml
END
File_System.xml
K 25
svn:wc:ra_dav:version-url
V 87
/repos/product-security/!svn/ver/281/defensive-coding/trunk/en-US/Tasks/File_System.xml
END
schemas.xml
K 25
svn:wc:ra_dav:version-url
V 83
/repos/product-security/!svn/ver/292/defensive-coding/trunk/en-US/Tasks/schemas.xml
END
Temporary_Files.xml
K 25
svn:wc:ra_dav:version-url
V 91
/repos/product-security/!svn/ver/281/defensive-coding/trunk/en-US/Tasks/Temporary_Files.xml
END
Locking.xml
K 25
svn:wc:ra_dav:version-url
V 83
/repos/product-security/!svn/ver/292/defensive-coding/trunk/en-US/Tasks/Locking.xml
END
Processes.xml
K 25
svn:wc:ra_dav:version-url
V 85
/repos/product-security/!svn/ver/281/defensive-coding/trunk/en-US/Tasks/Processes.xml
END
Cryptography.xml
K 25
svn:wc:ra_dav:version-url
V 88
/repos/product-security/!svn/ver/281/defensive-coding/trunk/en-US/Tasks/Cryptography.xml
END
Library_Design.xml
K 25
svn:wc:ra_dav:version-url
V 90
/repos/product-security/!svn/ver/281/defensive-coding/trunk/en-US/Tasks/Library_Design.xml
END
Serialization.xml
K 25
svn:wc:ra_dav:version-url
V 89
/repos/product-security/!svn/ver/294/defensive-coding/trunk/en-US/Tasks/Serialization.xml
END

View file

@ -1,6 +0,0 @@
K 10
svn:ignore
V 9
snippets
END

View file

@ -1,334 +0,0 @@
10
dir
305
https://svn.devel.redhat.com/repos/product-security/defensive-coding/trunk/en-US/Tasks
https://svn.devel.redhat.com/repos/product-security
2012-12-19T14:04:47.671665Z
294
fweimer@REDHAT.COM
has-props
9bd5cf0f-f2b3-0410-b1a9-d5c590f50bf1
Descriptors.xml
file
2013-01-10T17:17:40.559764Z
a351aa6cb2ff552031644c821a1562d7
2012-12-13T13:25:23.103424Z
281
fweimer@REDHAT.COM
has-props
10998
File_System.xml
file
2013-01-10T17:17:40.559764Z
bf703da532d93a853979e09b04a2f21f
2012-12-13T13:25:23.103424Z
281
fweimer@REDHAT.COM
has-props
14682
schemas.xml
file
2013-01-10T17:17:40.559764Z
769bc2635d36b318161574a1adf2f6e7
2012-12-14T10:18:44.472257Z
292
fweimer@REDHAT.COM
has-props
150
Temporary_Files.xml
file
2013-01-10T17:17:40.559764Z
c3db39345e4baab59ab738e3912a73ca
2012-12-13T13:25:23.103424Z
281
fweimer@REDHAT.COM
has-props
10131
Locking.xml
file
2013-01-10T17:17:40.560764Z
f44d72a773df72e1e5f5101a3c9a66af
2012-12-14T10:18:44.472257Z
292
fweimer@REDHAT.COM
has-props
226
Processes.xml
file
2013-01-10T17:17:40.560764Z
46f3a354235a27a94fd915ebe73f3db5
2012-12-13T13:25:23.103424Z
281
fweimer@REDHAT.COM
has-props
18957
Cryptography.xml
file
2013-01-10T17:17:40.560764Z
dfd01ca248a464c524b4badbdce2679c
2012-12-13T13:25:23.103424Z
281
fweimer@REDHAT.COM
has-props
4180
Library_Design.xml
file
2013-01-10T17:17:40.560764Z
db4969b9abc8c5d9272ea395488a8896
2012-12-13T13:25:23.103424Z
281
fweimer@REDHAT.COM
has-props
7787
Serialization.xml
file
2013-01-10T17:17:40.560764Z
bc8c4dc03264854d83747d8f2cd1ab6f
2012-12-19T14:04:47.671665Z
294
fweimer@REDHAT.COM
has-props
16361

View file

@ -1,5 +0,0 @@
K 13
svn:mime-type
V 8
text/xml
END

View file

@ -1,5 +0,0 @@
K 13
svn:mime-type
V 8
text/xml
END

View file

@ -1,5 +0,0 @@
K 13
svn:mime-type
V 8
text/xml
END

View file

@ -1,5 +0,0 @@
K 13
svn:mime-type
V 8
text/xml
END

View file

@ -1,5 +0,0 @@
K 13
svn:mime-type
V 8
text/xml
END

View file

@ -1,5 +0,0 @@
K 13
svn:mime-type
V 8
text/xml
END

View file

@ -1,5 +0,0 @@
K 13
svn:mime-type
V 8
text/xml
END

View file

@ -1,5 +0,0 @@
K 13
svn:mime-type
V 8
text/xml
END

View file

@ -1,5 +0,0 @@
K 13
svn:mime-type
V 8
text/xml
END

View file

@ -1,111 +0,0 @@
<?xml version='1.0' encoding='utf-8' ?>
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
]>
<chapter id="chap-Defensive_Coding-Tasks-Cryptography">
<title>Cryptography</title>
<section>
<title>Primitives</title>
<para>
Chosing from the following cryptographic primitives is
recommended:
</para>
<itemizedlist>
<listitem><para>RSA with 2048 bit keys and OAEP</para></listitem>
<listitem><para>AES-128 in CBC mode</para></listitem>
<listitem><para>SHA-256</para></listitem>
<listitem><para>HMAC-SHA-256</para></listitem>
<listitem><para>HMAC-SHA-1</para></listitem>
</itemizedlist>
<para>
Other cryptographic algorithms can be used if they are required
for interoperability with existing software:
</para>
<itemizedlist>
<listitem><para>RSA with key sizes larger than 1024
and legacy padding</para></listitem>
<listitem><para>AES-192</para></listitem>
<listitem><para>AES-256</para></listitem>
<listitem><para>3DES (triple DES, with two or three 56 bit keys)</para></listitem>
<listitem><para>RC4 (but very, very strongly discouraged)</para></listitem>
<listitem><para>SHA-1</para></listitem>
<listitem><para>HMAC-MD5</para></listitem>
</itemizedlist>
<important>
<title>Important</title>
<para>
These primitives are difficult to use in a secure way. Custom
implementation of security protocols should be avoided. For
protecting confidentiality and integrity of network
transmissions, TLS should be used (<xref
linkend="chap-Defensive_Coding-TLS"/>).
</para>
</important>
<!-- TODO: More algorithms are available in the NIST documents
linked from: http://wiki.brq.redhat.com/SecurityTechnologies/FIPS -->
</section>
<section>
<title id="sect-Defensive_Coding-Tasks-Cryptography-Randomness">Randomness</title>
<para>
The following facilities can be used to generate unpredictable
and non-repeating values. When these functions are used without
special safeguards, each individual rnadom value should be at
least 12 bytes long.
</para>
<itemizedlist>
<listitem>
<para><function>PK11_GenerateRandom</function> in the NSS library
(usable for high data rates)</para>
</listitem>
<listitem>
<para><function>RAND_bytes</function> in the OpenSSL library
(usable for high data rates)</para>
</listitem>
<listitem>
<para><function>gnutls_rnd</function> in GNUTLS, with
<literal>GNUTLS_RND_RANDOM</literal> as the first argument
(usable for high data rates)</para>
</listitem>
<listitem>
<para><type>java.security.SecureRandom</type> in Java
(usable for high data rates)</para>
</listitem>
<listitem>
<para><function>os.urandom</function> in Python</para>
</listitem>
<listitem>
<para>Reading from the <filename>/dev/urandom</filename>
character device</para>
</listitem>
</itemizedlist>
<para>
All these functions should be non-blocking, and they should not
wait until physical randomness becomes available. (Some
cryptography providers for Java can cause
<type>java.security.SecureRandom</type> to block, however.)
Those functions which do not obtain all bits directly from
<filename>/dev/urandom</filename> are suitable for high data
rates because they do not deplete the system-wide entropy pool.
</para>
<important>
<title>Difficult to use API</title>
<para>
Both <function>RAND_bytes</function> and
<function>PK11_GenerateRandom</function> have three-state
return values (with conflicting meanings). Careful error
checking is required. Please review the documentation when
using these functions.
</para>
</important>
<para>
Other sources of randomness should be considered predictable.
</para>
<para>
Generating randomness for cryptographic keys in long-term use
may need different steps and is best left to cryptographic
libraries.
</para>
</section>
</chapter>

View file

@ -1,266 +0,0 @@
<?xml version='1.0' encoding='utf-8' ?>
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
]>
<chapter id="sect-Defensive_Coding-Tasks-Descriptors">
<title>File Descriptor Management</title>
<para>
File descriptors underlie all input/output mechanisms offered by
the system. They are used to implementation the <literal>FILE
*</literal>-based functions found in
<literal>&lt;stdio.h&gt;</literal>, and all the file and network
communication facilities provided by the Python and Java
environments are eventually implemented in them.
</para>
<para>
File descriptors are small, non-negative integers in userspace,
and are backed on the kernel side with complicated data structures
which can sometimes grow very large.
</para>
<section>
<title>Closing descriptors</title>
<para>
If a descriptor is no longer used by a program and is not closed
explicitly, its number cannot be reused (which is problematic in
itself, see <xref
linkend="sect-Defensive_Coding-Tasks-Descriptors-Limit"/>), and
the kernel resources are not freed. Therefore, it is important
to close all descriptors at the earlierst point in time
possible, but not earlier.
</para>
<section>
<title>Error handling during descriptor close</title>
<para>
The <function>close</function> system call is always
successful in the sense that the passed file descriptor is
never valid after the function has been called. However,
<function>close</function> still can return an error, for
example if there was a file system failure. But this error is
not very useful because the absence of an error does not mean
that all caches have been emptied and previous writes have
been made durable. Programs which need such guarantees must
open files with <literal>O_SYNC</literal> or use
<literal>fsync</literal> or <literal>fdatasync</literal>, and
may also have to <literal>fsync</literal> the directory
containing the file.
</para>
</section>
<section>
<title>Closing descriptors and race conditions</title>
<para>
Unlike process IDs, which are recycle only gradually, the
kernel always allocates the lowest unused file descriptor when
a new descriptor is created. This means that in a
multi-threaded program which constantly opens and closes file
descriptors, descriptors are reused very quickly. Unless
descriptor closing and other operations on the same file
descriptor are synchronized (typically, using a mutex), there
will be race coniditons and I/O operations will be applied to
the wrong file descriptor.
</para>
<para>
Sometimes, it is necessary to close a file descriptor
concurrently, while another thread might be about to use it in
a system call. In order to support this, a program needs to
create a single special file descriptor, one on which all I/O
operations fail. One way to achieve this is to use
<function>socketpair</function>, close one of the descriptors,
and call <literal>shutdown(fd, SHUTRDWR)</literal> on the
other.
</para>
<para>
When a descriptor is closed concurrently, the program does not
call <function>close</function> on the descriptor. Instead it
program uses <function>dup2</function> to replace the
descriptor to be closed with the dummy descriptor created
earlier. This way, the kernel will not reuse the descriptor,
but it will carry out all other steps associated with calling
a descriptor (for instance, if the descriptor refers to a
stream socket, the peer will be notified).
</para>
<para>
This is just a sketch, and many details are missing.
Additional data structures are needed to determine when it is
safe to really close the descriptor, and proper locking is
required for that.
</para>
</section>
<section>
<title>Lingering state after close</title>
<para>
By default, closing a stream socket returns immediately, and
the kernel will try to send the data in the background. This
means that it is impossible to implement accurate accounting
of network-related resource utilization from userspace.
</para>
<para>
The <literal>SO_LINGER</literal> socket option alters the
behavior of <function>close</function>, so that it will return
only after the lingering data has been processed, either by
sending it to the peer successfully, or by discarding it after
the configured timeout. However, there is no interface which
could perform this operation in the background, so a separate
userspace thread is needed for each <function>close</function>
call, causing scalability issues.
</para>
<para>
Currently, there is no application-level countermeasure which
applies universally. Mitigation is possible with
<application>iptables</application> (the
<literal>connlimit</literal> match type in particular) and
specialized filtering devices for denial-of-service network
traffic.
</para>
<para>
These problems are not related to the
<literal>TIME_WAIT</literal> state commonly seen in
<application>netstat</application> output. The kernel
automatically expires such sockets if necessary.
</para>
</section>
</section>
<section id="sect-Defensive_Coding-Tasks-Descriptors-Child_Processes">
<title>Preventing file descriptor leaks to child processes</title>
<para>
Child processes created with <function>fork</function> share
the initial set of file descriptors with their parent
process. By default, file descriptors are also preserved if
a new process image is created with <function>execve</function>
(or any of the other functions such as <function>system</function>
or <function>posix_spawn</function>).
</para>
<para>
Usually, this behavior is not desirable. There are two ways to
turn it off, that is, to prevent new process images from
inheriting the file descriptors in the parent process:
</para>
<itemizedlist>
<listitem>
<para>
Set the close-on-exec flag on all newly created file
descriptors. Traditionally, this flag is controlled by the
<literal>FD_CLOEXEC</literal> flag, using
<literal>F_GETFD</literal> and <literal>F_SETFD</literal>
operations of the <function>fcntl</function> function.
</para>
<para>
However, in a multi-threaded process, there is a race
condition: a subprocess could have been created between the
time the descriptor was created and the
<literal>FD_CLOEXEC</literal> was set. Therefore, many system
calls which create descriptors (such as
<function>open</function> and <function>openat</function>)
now accept the <function>O_CLOEXEC</function> flag
(<function>SOCK_CLOEXEC</function> for
<function>socket</function> and
<function>socketpair</function>), which cause the
<literal>FD_CLOEXEC</literal> flag to be set for the file
descriptor in an atomic fashion. In addition, a few new
systems calls were introduced, such as
<function>pipe2</function> and <function>dup3</function>.
</para>
<para>
The downside of this approach is that every descriptor needs
to receive special treatment at the time of creation,
otherwise it is not completely effective.
</para>
</listitem>
<listitem>
<para>
After calling <function>fork</function>, but before creating
a new process image with <function>execve</function>, all
file descriptors which the child process will not need are
closed.
</para>
<para>
Traditionally, this was implemented as a loop over file
descriptors ranging from <literal>3</literal> to
<literal>255</literal> and later <literal>1023</literal>.
But this is only an approximatio because it is possible to
create file descriptors outside this range easily (see <xref
linkend="sect-Defensive_Coding-Tasks-Descriptors-Limit"/>).
Another approach reads <filename>/proc/self/fd</filename>
and closes the unexpected descriptors listed there, but this
approach is much slower.
</para>
</listitem>
</itemizedlist>
<para>
At present, environments which care about file descriptor
leakage implement the second approach. OpenJDK 6 and 7
are among them.
</para>
</section>
<section id="sect-Defensive_Coding-Tasks-Descriptors-Limit">
<title>Dealing with the <function>select</function> limit</title>
<para>
By default, a user is allowed to open only 1024 files in a
single process, but the system administrator can easily change
this limit (which is necessary for busy network servers).
However, there is another restriction which is more difficult to
overcome.
</para>
<para>
The <function>select</function> function only supports a
maximum of <literal>FD_SETSIZE</literal> file descriptors
(that is, the maximum permitted value for a file descriptor
is <literal>FD_SETSIZE - 1</literal>, usually 1023.) If a
process opens many files, descriptors may exceed such
limits. It is impossible to query such descriptors using
<function>select</function>.
</para>
<para>
If a library which creates many file descriptors is used in
the same process as a library which uses
<function>select</function>, at least one of them needs to
be changed. <!-- ??? refer to event-driven programming -->
Calls to <function>select</function> can be replaced with
calls to <function>poll</function> or another event handling
mechanism.
</para>
<para>
Alternatively, the library with high descriptor usage can
relocate descriptors above the <literal>FD_SETSIZE</literal>
limit using the following procedure.
</para>
<itemizedlist>
<listitem>
<para>
Create the file descriptor <literal>fd</literal> as
usual, preferably with the <literal>O_CLOEXEC</literal>
flag.
</para>
</listitem>
<listitem>
<para>
Before doing anything else with the descriptor
<literal>fd</literal>, invoke:
</para>
<programlisting language="C">
int newfd = fcntl(fd, F_DUPFD_CLOEXEC, (long)FD_SETSIZE);
</programlisting>
</listitem>
<listitem>
<para>
Check that <literal>newfd</literal> result is
non-negative, otherwise close <literal>fd</literal> and
report an error, and return.
</para>
</listitem>
<listitem>
<para>
Close <literal>fd</literal> and continue to use
<literal>newfd</literal>.
</para>
</listitem>
</itemizedlist>
<para>
The new descriptor has been allocated above the
<literal>FD_SETSIZE</literal>. Even though this algorithm
is racy in the sense that the <literal>FD_SETSIZE</literal>
first descriptors could fill up, a very high degree of
physical parallelism is required before this becomes a problem.
</para>
</section>
</chapter>

View file

@ -1,339 +0,0 @@
<?xml version='1.0' encoding='utf-8' ?>
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
]>
<chapter id="chap-Defensive_Coding-Tasks-File_System">
<title>File system manipulation</title>
<para>
In this chapter, we discuss general file system manipulation, with
a focus on access files and directories to which an other,
potentially untrusted user has write access.
</para>
<para>
Temporary files are covered in their own chapter, <xref
linkend="chap-Defensive_Coding-Tasks-Temporary_Files"/>.
</para>
<section id="sect-Defensive_Coding-Tasks-File_System-Unowned">
<title>Working with files and directories owned by other users</title>
<para>
Sometimes, it is necessary to operate on files and directories
owned by other (potentially untrusted) users. For example, a
system administrator could remove the home directory of a user,
or a package manager could update a file in a directory which is
owned by an application-specific user. This differs from
accessing the file system as a specific user; see
<xref linkend="sect-Defensive_Coding-Tasks-File_System-Foreign"/>.
</para>
<para>
Accessing files across trust boundaries faces several
challenges, particularly if an entire directory tree is being
traversed:
</para>
<orderedlist>
<listitem>
<para>
Another user might add file names to a writable directory at
any time. This can interfere with file creation and the
order of names returned by <function>readdir</function>.
</para>
</listitem>
<listitem>
<para>
Merely opening and closing a file can have side effects.
For instance, an automounter can be triggered, or a tape
device rewound. Opening a file on a local file system can
block indefinitely, due to mandatory file locking, unless
the <literal>O_NONBLOCK</literal> flag is specified.
</para>
</listitem>
<listitem>
<para>
Hard links and symbolic links can redirect the effect of
file system operations in unexpected ways. The
<literal>O_NOFOLLOW</literal> and
<literal>AT_SYMLINK_NOFOLLOW</literal> variants of system
calls only affected final path name component.
</para>
</listitem>
<listitem>
<para>
The structure of a directory tree can change. For example,
the parent directory of what used to be a subdirectory
within the directory tree being processed could suddenly
point outside that directory tree.
</para>
</listitem>
</orderedlist>
<para>
Files should always be created with the
<literal>O_CREAT</literal> and <literal>O_EXCL</literal> flags,
so that creating the file will fail if it already exists. This
guards against the unexpected appearance of file names, either
due to creation of a new file, or hard-linking of an existing
file. In multi-threaded programs, rather than manipulating the
umask, create the files with mode <literal>000</literal> if
possible, and adjust it afterwards with
<function>fchmod</function>.
</para>
<para>
To avoid issues related to symbolic links and directory tree
restructuring, the “<literal>at</literal>” variants of system
calls have to be used (that is, functions like
<function>openat</function>, <function>fchownat</function>,
<function>fchmodat</function>, and
<function>unlinkat</function>, together with
<literal>O_NOFOLLOW</literal> or
<literal>AT_SYMLINK_NOFOLLOW</literal>). Path names passed to
these functions must have just a single component (that is,
without a slash). When descending, the descriptors of parent
directories must be kept open. The missing
<literal>opendirat</literal> function can be emulated with
<literal>openat</literal> (with an
<literal>O_DIRECTORY</literal> flag, to avoid opening special
files with side effects), followed by
<literal>fdopendir</literal>.
</para>
<para>
If the “<literal>at</literal>” functions are not available, it
is possible to emulate them by changing the current directory.
(Obviously, this only works if the process is not multi-threaded.)
<function>fchdir</function> has to be used to change the current
directory, and the descriptors of the parent directories have to
be kept open, just as with the “<literal>at</literal>”-based
approach. <literal>chdir("...")</literal> is unsafe because it
might ascend outside the intended directory tree.
</para>
<para>
This “<literal>at</literal>” function emulation is currently
required when manipulating extended attributes. In this case,
the <function>lsetxattr</function> function can be used, with a
relative path name consisting of a single component. This also
applies to SELinux contexts and the
<function>lsetfilecon</function> function.
</para>
<para>
Currently, it is not possible to avoid opening special files
<emphasis>and</emphasis> changes to files with hard links if the
directory containing them is owned by an untrusted user.
(Device nodes can be hard-linked, just as regular files.)
<function>fchmodat</function> and <function>fchownat</function>
affect files whose link count is greater than one. But opening
the files, checking that the link count is one with
<function>fstat</function>, and using
<function>fchmod</function> and <function>fchown</function> on
the file descriptor may have unwanted side effects, due to item
2 above. When creating directories, it is therefore important
to change the ownership and permissions only after it has been
fully created. Until that point, file names are stable, and no
files with unexpected hard links can be introduced.
</para>
<para>
Similarly, when just reading a directory owned by an untrusted
user, it is currently impossible to reliably avoid opening
special files.
</para>
<para>
There is no workaround against the instability of the file list
returned by <function>readdir</function>. Concurrent
modification of the directory can result in a list of files
being returned which never actually existed on disk.
</para>
<para>
Hard links and symbolic links can be safely deleted using
<function>unlinkat</function> without further checks because
deletion only affects the name within the directory tree being
processed.
</para>
</section>
<section id="sect-Defensive_Coding-Tasks-File_System-Foreign">
<title>Accessing the file system as a different user</title>
<para>
This section deals with access to the file system as a specific
user. This is different from accessing files and directories owned by a
different, potentially untrusted user; see <xref
linkend="sect-Defensive_Coding-Tasks-File_System-Foreign"/>.
</para>
<para>
One approach is to spawn a child process which runs under the
target user and group IDs (both effective and real IDs). Note
that this child process can block indefinitely, even when
processing regular files only. For example, a special FUSE file
system could cause the process to hang in uninterruptible sleep
inside a <function>stat</function> system call.
</para>
<para>
An existing process could change its user and group ID using
<function>setfsuid</function> and <function>setfsgid</function>.
(These functions are preferred over <function>seteuid</function>
and <function>setegid</function> because they do not allow the
impersonated user to send signals to the process.) These
functions are not thread safe. In multi-threaded processes,
these operations need to be performed in a single-threaded child
process. Unexpected blocking may occur as well.
</para>
<para>
It is not recommended to try to reimplement the kernel
permission checks in user space because the required checks are
complex. It is also very difficult to avoid race conditions
during path name resolution.
</para>
</section>
<section id="sect-Defensive_Coding-Tasks-File_System-Limits">
<title>File system limits</title>
<para>
For historical reasons, there are preprocessor constants such as
<literal>PATH_MAX</literal>, <literal>NAME_MAX</literal>.
However, on most systems, the length of canonical path names
(absolute path names with all symbolic links resolved, as
returned by <function>realpath</function> or
<function>canonicalize_file_name</function>) can exceed
<literal>PATH_MAX</literal> bytes, and individual file name
components can be longer than <literal>NAME_MAX</literal>. This
is also true of the <literal>_PC_PATH_MAX</literal> and
<literal>_PC_NAME_MAX</literal> values returned by
<function>pathconf</function>, and the
<literal>f_namemax</literal> member of <literal>struct
statvfs</literal>. Therefore, these constants should not be
used. This is also reason why the
<function>readdir_r</function> should never be used (instead,
use <function>readdir</function>).
</para>
<para>
You should not write code in a way that assumes that there is an
upper limit on the number of subdirectories of a directory, the
number of regular files in a directory, or the link count of an
inode.
</para>
</section>
<section id="sect-Defensive_Coding-Tasks-File_System-Features">
<title>File system features</title>
<para>
Not all file systems support all features. This makes it very
difficult to write general-purpose tools for copying files. For
example, a copy operation intending to preserve file permissions
will generally fail when copying to a FAT file system.
</para>
<itemizedlist>
<listitem>
<para>
Some file systems are case-insensitive. Most should be
case-preserving, though.
</para>
</listitem>
<listitem>
<para>
Name length limits vary greatly, from eight to thousands of
bytes. Path length limits differ as well. Most systems
impose an upper bound on path names passed to the kernel,
but using relative path names, it is possible to create and
access files whose absolute path name is essentially of
unbounded length.
</para>
</listitem>
<listitem>
<para>
Some file systems do not store names as fairly unrestricted
byte sequences, as it has been traditionally the case on GNU
systems. This means that some byte sequences (outside the
POSIX safe character set) are not valid names. Conversely,
names of existing files may not be representable as byte
sequences, and the files are thus inaccessible on GNU
systems. Some file systems perform Unicode canonicalization
on file names. These file systems preserve case, but
reading the name of a just-created file using
<function>readdir</function> might still result in a
different byte sequence.
</para>
</listitem>
<listitem>
<para>
Permissions and owners are not universally supported (and
SUID/SGID bits may not be available). For example, FAT file
systems assign ownership based on a mount option, and
generally mark all files as executable. Any attempt to
change permissions would result in an error.
</para>
</listitem>
<listitem>
<para>
Non-regular files (device nodes, FIFOs) are not generally
available.
</para>
</listitem>
<listitem>
<para>
Only on some file systems, files can have holes, that is,
not all of their contents is backed by disk storage.
</para>
</listitem>
<listitem>
<para>
<function>ioctl</function> support (even fairly generic
functionality such as <literal>FIEMAP</literal> for
discovering physical file layout and holes) is
file-system-specific.
</para>
</listitem>
<listitem>
<para>
Not all file systems support extended attributes, ACLs and
SELinux metadata. Size and naming restriction on extended
attributes vary.
</para>
</listitem>
<listitem>
<para>
Hard links may not be supported at all (FAT) or only within
the same directory (AFS). Symbolic links may not be
available, either. Reflinks (hard links with copy-on-write
semantics) are still very rare. Recent systems restrict
creation of hard links to users which own the target file or
have read/write access to it, but older systems do not.
</para>
</listitem>
<listitem>
<para>
Renaming (or moving) files using <function>rename</function>
can fail (even when <function>stat</function> indicates that
the source and target directories are located on the same
file system). This system call should work if the old and
new paths are located in the same directory, though.
</para>
</listitem>
<listitem>
<para>
Locking semantics vary among file systems. This affects
advisory and mandatory locks. For example, some network
file systems do not allow deleting files which are opened by
any process.
</para>
</listitem>
<listitem>
<para>
Resolution of time stamps varies from two seconds to
nanoseconds. Not all time stamps are available on all file
systems. File creation time (<emphasis>birth
time</emphasis>) is not exposed over the
<function>stat</function>/<function>fstat</function>
interface, even if stored by the file system.
</para>
</listitem>
</itemizedlist>
</section>
<section id="sect-Defensive_Coding-Tasks-File_System-Free_Space">
<title>Checking free space</title>
<para>
The <function>statvfs</function> and
<function>fstatvfs</function> functions allow programs to
examine the number of available blocks and inodes, through the
members <literal>f_bfree</literal>, <literal>f_bavail</literal>,
<literal>f_ffree</literal>, and <literal>f_favail</literal> of
<literal>struct statvfs</literal>. Some file systems return
fictional values in the <literal>f_ffree</literal> and
<literal>f_favail</literal> fields, so the only reliable way to
discover if the file system still has space for a file is to try
to create it. The <literal>f_bfree</literal> field should be
reasonably accurate, though.
</para>
</section>
</chapter>

View file

@ -1,195 +0,0 @@
<?xml version='1.0' encoding='utf-8' ?>
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
]>
<chapter id="chap-Defensive_Coding-Tasks-Library_Design">
<title>Library Design</title>
<para>
Throught this section, the term <emphasis>client code</emphasis>
refers to applications and other libraries using the library.
</para>
<section>
<title>State management</title>
<para>
</para>
<section>
<title>Global state</title>
<para>
Global state should be avoided.
</para>
<para>
If this is impossible, the global state must be protected with
a lock. For C/C++, you can use the
<function>pthread_mutex_lock</function>
and <function>pthread_mutex_unlock</function>
functions without linking against <literal>-lpthread</literal>
because the system provides stubs for non-threaded processes.
</para>
<para>
For compatibility with <function>fork</function>, these locks
should be acquired and released in helpers registered with
<function>pthread_atfork</function>. This function is not
available without <literal>-lpthread</literal>, so you need to
use <function>dlsym</function> or a weak symbol to obtain its
address.
</para>
<para>
If you need <function>fork</function> protection for other
reasons, you should store the process ID and compare it to the
value returned by <function>getpid</function> each time you
access the global state. (<function>getpid</function> is not
implemented as a system call and is fast.) If the value
changes, you know that you have to re-create the state object.
(This needs to be combined with locking, of course.)
</para>
</section>
<section>
<title>Handles</title>
<para>
Library state should be kept behind a curtain. Client code
should receive only a handle. In C, the handle can be a
pointer to an incomplete <literal>struct</literal>. In C++,
the handle can be a pointer to an abstract base class, or it
can be hidden using the pointer-to-implementation idiom.
</para>
<para>
The library should provide functions for creating and
destroying handles. (In C++, it is possible to use virtual
destructors for the latter.) Consistency between creation and
destruction of handles is strongly recommended: If the client
code created a handle, it is the responsibility of the client
code to destroy it. (This is not always possible or
convenient, so sometimes, a transfer of ownership has to
happen.)
</para>
<para>
Using handles ensures that it is possible to change the way
the library represents state in a way that is transparent to
client code. This is important to facilitate security updates
and many other code changes.
</para>
<para>
It is not always necessary to protect state behind a handle
with a lock. This depends on the level of thread safety
the library provides.
</para>
</section>
</section>
<section>
<title>Object orientation</title>
<para>
Classes should be either designed as base classes, or it should
be impossible to use them as base classes (like
<literal>final</literal> classes in Java). Classes which are
not designed for inheritance and are used as base classes
nevertheless create potential maintenance hazards because it is
difficult to predict how client code will react when calls to
virtual methods are added, reordered or removed.
</para>
<para>
Virtual member functions can be used as callbacks. See
<xref linkend="sect-Defensive_Coding-Tasks-Library_Design-Callbacks"/>
for some of the challenges involved.
</para>
</section>
<section id="sect-Defensive_Coding-Tasks-Library_Design-Callbacks">
<title>Callbacks</title>
<para>
Higher-order code is difficult to analyze for humans and
computers alike, so it should be avoided. Often, an
iterator-based interface (a library function which is called
repeatedly by client code and returns a stream of events) leads
to a better design which is easier to document and use.
</para>
<para>
If callbacks are unavoidable, some guidelines for them follow.
</para>
<para>
In modern C++ code, <literal>std::function</literal> objects
should be used for callbacks.
</para>
<para>
In older C++ code and in C code, all callbacks must have an
additional closure parameter of type <literal>void *</literal>,
the value of which can be specified by client code. If
possible, the value of the closure parameter should be provided
by client code at the same time a specific callback is
registered (or specified as a function argument). If a single
closure parameter is shared by multiple callbacks, flexibility
is greatly reduced, and conflicts between different pieces of
client code using the same library object could be unresolvable.
In some cases, it makes sense to provide a de-registration
callback which can be used to destroy the closure parameter when
the callback is no longer used.
</para>
<para>
Callbacks can throw exceptions or call
<function>longjmp</function>. If possible, all library objects
should remain in a valid state. (All further operations on them
can fail, but it should be possible to deallocate them without
causing resource leaks.)
</para>
<para>
The presence of callbacks raises the question if functions
provided by the library are <emphasis>reentrant</emphasis>.
Unless a library was designed for such use, bad things will
happen if a callback function uses functions in the same library
(particularly if they are invoked on the same objects and
manipulate the same state). When the callback is invoked, the
library can be in an inconsistent state. Reentrant functions
are more difficult to write than thread-safe functions (by
definition, simple locking would immediately lead to deadlocks).
It is also difficult to decide what to do when destruction of an
object which is currently processing a callback is requested.
</para>
</section>
<section>
<title>Process attributes</title>
<para>
Several attributes are global and affect all code in the
process, not just the library that manipulates them.
</para>
<itemizedlist>
<listitem><para>
environment variables
(see <xref linkend="sect-Defensive_Coding-Tasks-secure_getenv"/>)
</para></listitem>
<listitem><para>
umask
</para></listitem>
<listitem><para>
user IDs, group IDs and capabilities
</para></listitem>
<listitem><para>
current working directory
</para></listitem>
<listitem><para>
signal handlers, signal masks and signal delivery
</para></listitem>
<listitem><para>
file locks (especially <function>fcntl</function> locks
behave in surprising ways, not just in a multi-threaded
environment)
</para></listitem>
</itemizedlist>
<para>
Library code should avoid manipulating these global process
attributes. It should not rely on environment variables, umask,
the current working directory and signal masks because these
attributes can be inherted from an untrusted source.
</para>
<para>
In addition, there are obvious process-wide aspects such as the
virtual memory layout, the set of open files and dynamic shared
objects, but with the exception of shared objects, these can be
manipulated in a relatively isolated way.
</para>
</section>
</chapter>

View file

@ -1,5 +0,0 @@
<?xml version='1.0' encoding='utf-8' ?>
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
]>
<chapter id="sect-Defensive_Coding-Tasks-Locking">
</chapter>

View file

@ -1,483 +0,0 @@
<?xml version='1.0' encoding='utf-8' ?>
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
]>
<chapter id="sect-Defensive_Coding-Tasks-Processes">
<title>Processes</title>
<section id="sect-Defensive_Coding-Tasks-Processes-Creation">
<title>Safe process creation</title>
<para>
This section describes how to create new child processes in a
safe manner. In addition to the concerns addressed below, there
is the possibility of file descriptor leaks, see <xref
linkend="sect-Defensive_Coding-Tasks-Descriptors-Child_Processes"/>.
</para>
<section>
<title>Obtaining the program path and the command line
template</title>
<para>
The name and path to the program being invoked should be
hard-coded or controlled by a static configuration file stored
at a fixed location (at an file system absolute path). The
same applies to the template for generating the command line.
</para>
<para>
The configured program name should be an absolute path. If it
is a relative path, the contents of the <envar>PATH</envar>
must be obtained in s secure manner (see <xref
linkend="sect-Defensive_Coding-Tasks-secure_getenv"/>).
If the <envar>PATH</envar> variable is not set or untrusted,
the safe default <literal>/bin:/usr/bin</literal> must be
used.
</para>
<para>
If too much flexibility is provided here, it may allow
invocation of arbitrary programs without proper authorization.
</para>
</section>
<section id="sect-Defensive_Coding-Tasks-Processes-execve">
<title>Bypassing the shell</title>
<para>
Child processes should be created without involving the system
shell.
</para>
<para>
For C/C++, <function>system</function> should not be used.
The <function>posix_spawn</function> function can be used
instead, or a combination <function>fork</function> and
<function>execve</function>. (In some cases, it may be
preferable to use <function>vfork</function> or the
Linux-specific <function>clone</function> system call instead
of <function>fork</function>.)
</para>
<para>
In Python, the <literal>subprocess</literal> module bypasses
the shell by default (when the <literal>shell</literal>
keyword argument is not set to true).
<function>os.system</function> should not be used.
</para>
<para>
The Java class <type>java.lang.ProcessBuilder</type> can be
used to create subprocesses without interference from the
system shell.
</para>
<important>
<title>Portability notice</title>
<para>
On Windows, there is no argument vector, only a single
argument string. Each application is responsible for parsing
this string into an argument vector. There is considerable
variance among the quoting style recognized by applications.
Some of them expand shell wildcards, others do not. Extensive
application-specific testing is required to make this secure.
</para>
</important>
<para>
Note that some common applications (notably
<application>ssh</application>) unconditionally introduce the
use of a shell, even if invoked directly without a shell. It is
difficult to use these applications in a secure manner. In this
case, untrusted data should be supplied by other means. For
example, standard input could be used, instead of the command
line.
</para>
</section>
<section id="sect-Defensive_Coding-Tasks-Processes-environ">
<title>Specifying the process environment</title>
<para>
Child processes should be created with a minimal set of
environment variables. This is absolutely essential if there
is a trust transition involved, either when the parent process
was created, or during the creation of the child process.
</para>
<para>
In C/C++, the environment should be constructed as an array of
strings and passed as the <varname>envp</varname> argument to
<function>posix_spawn</function> or <function>execve</function>.
The functions <function>setenv</function>,
<function>unsetenv</function> and <function>putenv</function>
should not be used. They are not thread-safe and suffer from
memory leaks.
</para>
<para>
Python programs need to specify a <literal>dict</literal> for
the the <varname>env</varname> argument of the
<function>subprocess.Popen</function> constructor.
The Java class <literal>java.lang.ProcessBuilder</literal>
provides a <function>environment()</function> method,
which returns a map that can be manipulated.
</para>
<para>
The following list provides guidelines for selecting the set
of environment variables passed to the child process.
</para>
<itemizedlist>
<listitem>
<para>
<envar>PATH</envar> should be initialized to
<literal>/bin:/usr/bin</literal>.
</para>
</listitem>
<listitem>
<para>
<envar>USER</envar> and <envar>HOME</envar> can be inhereted
from the parent process environment, or they can be
initialized from the <literal>pwent</literal> structure
for the user. <!-- ??? refer to dropping privileges -->
</para>
</listitem>
<listitem>
<para>The <envar>DISPLAY</envar> and <envar>XAUTHORITY</envar>
variables should be passed to the subprocess if it is an X
program. Note that this will typically not work across trust
boundaries because <envar>XAUTHORITY</envar> refers to a file
with <literal>0600</literal> permissions.
</para>
</listitem>
<listitem>
<para>
The location-related environment variables
<envar>LANG</envar>, <envar>LANGUAGE</envar>,
<envar>LC_ADDRESS</envar>, <envar>LC_ALL</envar>,
<envar>LC_COLLATE</envar>, <envar>LC_CTYPE</envar>,
<envar>LC_IDENTIFICATION</envar>,
<envar>LC_MEASUREMENT</envar>, <envar>LC_MESSAGES</envar>,
<envar>LC_MONETARY</envar>, <envar>LC_NAME</envar>,
<envar>LC_NUMERIC</envar>, <envar>LC_PAPER</envar>,
<envar>LC_TELEPHONE</envar> and <envar>LC_TIME</envar>
can be passed to the subprocess if present.
</para>
</listitem>
<listitem>
<para>
The called process may need application-specific
environment variables, for example for passing passwords.
(See <xref
linkend="sect-Defensive_Coding-Tasks-Processes-Command_Line_Visibility"/>.)
</para>
</listitem>
<listitem>
<para>
All other environment variables should be dropped. Names
for new environment variables should not be accepted from
untrusted sources.
</para>
</listitem>
</itemizedlist>
</section>
<section>
<title>Robust argument list processing</title>
<para>
When invoking a program, it is sometimes necessary to include
data from untrusted sources. Such data should be check
against embedded <literal>NUL</literal> characters because the
system APIs will sliently truncate argument strings at the first
<literal>NUL</literal> character.
</para>
<para>
The following recommendations assume that the program being
invoked uses GNU-style option processing using
<function>getopt_long</function>. This convention is widely
used, but it is just that, and individual programs might
interpret a command line in a different way.
</para>
<para>
If the untrusted data has to go into an option, use the
<literal>--option-name=VALUE</literal> syntax, placing the
option and its value into the same command line argument.
This avoids any potential confusion if the data starts with
<literal>-</literal>.
</para>
<para>
For positional arguments, terminate the option list with a
single <option>--</option> marker after the last option, and
include the data at the right position. The
<option>--</option> marker terminates option processing, and
the data will not be treated as an option even if it starts
with a dash.
</para>
</section>
<section id="sect-Defensive_Coding-Tasks-Processes-Command_Line_Visibility">
<title>Passing secrets to subprocesses</title>
<para>
The command line (the name of the program and its argument) of
a running process is traditionally available to all local
users. The called program can overwrite this information, but
only after it has run for a bit of time, during which the
information may have been read by other processes. However,
on Linux, the process environment is restricted to the user
who runs the process. Therefore, if you need a convenient way
to pass a password to a child process, use an environment
variable, and not a command line argument. (See <xref
linkend="sect-Defensive_Coding-Tasks-Processes-environ"/>.)
</para>
<important>
<title>Portability notice</title>
<para>
On some UNIX-like systems (notably Solaris), environment
variables can be read by any system user, just like command
lines.
</para>
</important>
<para>
If the environment-based approach cannot be used due to
portability concerns, the data can be passed on standard
input. Some programs (notably <application>gpg</application>)
use special file descriptors whose numbers are specified on
the command line. Temporary files are an option as well, but
they might give digital forensics access to sensitive data
(such as passphrases) because it is difficult to safely delete
them in all cases.
</para>
</section>
</section>
<section>
<title>Handling child process termination</title>
<para>
When child processes terminate, the parent process is signalled.
A stub of the terminated processes (a
<emphasis>zombie</emphasis>, shown as
<literal>&lt;defunct&gt;</literal> by
<application>ps</application>) is kept around until the status
information is collected (<emphasis>reaped</emphasis>) by the
parent process. Over the years, several interfaces for this
have been invented:
</para>
<itemizedlist>
<listitem>
<para>
The parent process calls <function>wait</function>,
<function>waitpid</function>, <function>waitid</function>,
<function>wait3</function> or <function>wait4</function>,
without specifying a process ID. This will deliver any
matching process ID. This approach is typically used from
within event loops.
</para>
</listitem>
<listitem>
<para>
The parent process calls <function>waitpid</function>,
<function>waitid</function>, or <function>wait4</function>,
with a specific process ID. Only data for the specific
process ID is returned. This is typically used in code
which spawns a single subprocess in a synchronous manner.
</para>
</listitem>
<listitem>
<para>
The parent process installs a handler for the
<literal>SIGCHLD</literal> signal, using
<function>sigaction</function>, and specifies to the
<literal>SA_NOCLDWAIT</literal> flag.
This approach could be used by event loops as well.
</para>
</listitem>
</itemizedlist>
<para>
None of these approaches can be used to wait for child process
terminated in a completely thread-safe manner. The parent
process might execute an event loop in another thread, which
could pick up the termination signal. This means that libraries
typically cannot make free use of child processes (for example,
to run problematic code with reduced privileges in a separate
address space).
</para>
<para>
At the moment, the parent process should explicitly wait for
termination of the child process using
<function>waitpid</function> or <function>waitpid</function>,
and hope that the status is not collected by an event loop
first.
</para>
</section>
<section>
<title><literal>SUID</literal>/<literal>SGID</literal>
processes</title>
<!-- ??? need to document real vs effective UID -->
<para>
Programs can be marked in the file system to indicate to the
kernel that a trust transition should happen if the program is
run. The <literal>SUID</literal> file permission bit indicates
that an executable should run with the effective user ID equal
to the owner of the executable file. Similarly, with the
<literal>SGID</literal> bit, the effective group ID is set to
the group of the executable file.
</para>
<para>
Linux supports <emphasis>fscaps</emphasis>, which can grant
additional capabilities to a process in a finer-grained manner.
Additional mechanisms can be provided by loadable security
modules.
</para>
<para>
When such a trust transition has happened, the process runs in a
potentially hostile environment. Additional care is necessary
not to rely on any untrusted information. These concerns also
apply to libraries which can be linked into such processes.
</para>
<section id="sect-Defensive_Coding-Tasks-secure_getenv">
<title>Accessing environment variables</title>
<para>
The following steps are required so that a program does not
accidentally pick up untrusted data from environment
variables.
</para>
<itemizedlist>
<listitem><para>
Compile your C/C++ sources with <literal>-D_GNU_SOURCE</literal>.
The Autoconf macro <literal>AC_GNU_SOURCE</literal> ensures this.
</para></listitem>
<listitem><para>
Check for the presence of the <function>secure_getenv</function>
and <function>__secure_getenv</function> function. The Autoconf
directive <literal>AC_CHECK_FUNCS([__secure_getenv secure_getenv])</literal>
performs these checks.
</para></listitem>
<listitem><para>
Arrange for a proper definition of the
<function>secure_getenv</function> function. See <xref
linkend="ex-Defensive_Coding-Tasks-secure_getenv"/>.
</para></listitem>
<listitem><para>
Use <function>secure_getenv</function> instead of
<function>getenv</function> to obtain the value of critical
environment variables. <function>secure_getenv</function>
will pretend the variable has not bee set if the process
environment is not trusted.
</para></listitem>
</itemizedlist>
<para>
Critical environment variables are debugging flags,
configuration file locations, plug-in and log file locations,
and anything else that might be used to bypass security
restrictions or cause a privileged process to behave in an
unexpected way.
</para>
<para>
Either the <function>secure_getenv</function> function or the
<function>__secure_getenv</function> is available from GNU libc.
</para>
<example id="ex-Defensive_Coding-Tasks-secure_getenv">
<title>Obtaining a definition for <function>secure_getenv</function></title>
<programlisting language="C">
<![CDATA[
#include <stdlib.h>
#ifndef HAVE_SECURE_GETENV
# ifdef HAVE__SECURE_GETENV
# define secure_getenv __secure_getenv
# else
# error neither secure_getenv nor __secure_getenv are available
# endif
#endif
]]>
</programlisting>
</example>
</section>
</section>
<section id="sect-Defensive_Coding-Tasks-Processes-Daemons">
<title>Daemons</title>
<para>
Background processes providing system services
(<emphasis>daemons</emphasis>) need to decouple themselves from
the controlling terminal and the parent process environment:
</para>
<itemizedlist>
<listitem>
<para>Fork.</para>
</listitem>
<listitem>
<para>
In the child process, call <function>setsid</function>. The
parent process can simply exit (using
<function>_exit</function>, to avoid running clean-up
actions twice).
</para>
</listitem>
<listitem>
<para>
In the child process, fork again. Processing continues in
the child process. Again, the parent process should just
exit.
</para>
</listitem>
<listitem>
<para>
Replace the descriptors 0, 1, 2 with a descriptor for
<filename>/dev/null</filename>. Logging should be
redirected to <application>syslog</application>.
</para>
</listitem>
</itemizedlist>
<para>
Older instructions for creating daemon processes recommended a
call to <literal>umask(0)</literal>. This is risky because it
often leads to world-writable files and directories, resulting
in security vulnerabilities such as arbitrary process
termination by untrusted local users, or log file truncation.
If the <emphasis>umask</emphasis> needs setting, a restrictive
value such as <literal>027</literal> or <literal>077</literal>
is recommended.
</para>
<para>
Other aspects of the process environment may have to changed as
well (environment variables, signal handler disposition).
</para>
<para>
It is increasingly common that server processes do not run as
background processes, but as regular foreground process under a
supervising master process (such as
<application>systemd</application>). Server processes should
offer a command line option which disables forking and
replacement of the standard output and standard error streams.
Such an option is also useful for debugging.
</para>
</section>
<section>
<title>Semantics of command line arguments</title>
<!-- ??? This applies in two ways, safely calling an other process
and support for being called safely. Also need to address
untrusted current directory on USB sticks. -->
<para>
After process creation and option processing, it is up to the
child process to interpret the arguments. Arguments can be
file names, host names, or URLs, and many other things. URLs
can refer to the local network, some server on the Internet,
or to the local file system. Some applications even accept
arbitrary code in arguments (for example,
<application>python</application> with the
<option>-c</option> option).
</para>
<para>
Similar concerns apply to environment variables, the contents
of the current directory and its subdirectories.
<!-- ??? refer to section on temporary directories -->
</para>
<para>
Consequently, careful analysis is required if it is safe to
pass untrusted data to another program.
</para>
</section>
<section id="sect-Defensive_Coding-Tasks-Processes-Fork-Parallel">
<title><function>fork</function> as a primitive for parallelism</title>
<para>
A call to <function>fork</function> which is not immediately
followed by a call to <function>execve</function> (perhaps after
rearranging and closing file descriptors) is typically unsafe,
especially from a library which does not control the state of
the entire process. Such use of <function>fork</function>
should be replaced with proper child processes or threads.
</para>
</section>
</chapter>

View file

@ -1,397 +0,0 @@
<?xml version='1.0' encoding='utf-8' ?>
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
]>
<chapter id="chap-Defensive_Coding-Tasks-Serialization">
<title>Serialization and Deserialization</title>
<para>
Protocol decoders and file format parsers are often the
most-exposed part of an application because they are exposed with
little or no user interaction and before any authentication and
security checks are made. They are also difficult to write
robustly in languages which are not memory-safe.
</para>
<section id="sect-Defensive_Coding-Tasks-Serialization-Decoders">
<title>Recommendations for manually written decoders</title>
<para>
For C and C++, the advice in <xref
linkend="sect-Defensive_Coding-C-Pointers"/> applies. In
addition, avoid non-character pointers directly into input
buffers. Pointer misalignment causes crashes on some
architectures.
</para>
<para>
When reading variable-sized objects, do not allocate large
amounts of data solely based on the value of a size field. If
possible, grow the data structure as more data is read from the
source, and stop when no data is available. This helps to avoid
denial-of-service attacks where little amounts of input data
results in enormous memory allocations during decoding.
Alternatively, you can impose reasonable bounds on memory
allocations, but some protocols do not permit this.
</para>
</section>
<section>
<title>Protocol design</title>
<para>
Binary formats with explicit length fields are more difficult to
parse robustly than those where the length of dynamically-sized
elements is derived from sentinel values. A protocol which does
not use length fields and can be written in printable ASCII
characters simplifies testing and debugging. However, binary
protocols with length fields may be more efficient to parse.
</para>
</section>
<section>
<title id="sect-Defensive_Coding-Tasks-Serialization-Library">Library
support for deserialization</title>
<para>
For some languages, generic libraries are available which allow
to serialize and deserialize user-defined objects. The
deserialization part comes in one of two flavors, depending on
the library. The first kind uses type information in the data
stream to control which objects are instantiated. The second
kind uses type definitions supplied by the programmer. The
first one allows arbitrary object instantiation, the second one
generally does not.
</para>
<para>
The following serialization frameworks are in the first category,
are known to be unsafe, and must not be used for untrusted data:
</para>
<itemizedlist>
<listitem><para>
Python's <package>pickle</package> and <package>cPickle</package>
modules
</para></listitem>
<listitem><para>
Perl's <package>Storable</package> package
</para></listitem>
<listitem><para>
Java serialization (<type>java.io.ObjectInputStream</type>)
</para></listitem>
<listitem><para>
PHP serialization (<function>unserialize</function>)
</para></listitem>
<listitem><para>
Most implementations of YAML
</para></listitem>
</itemizedlist>
<para>
When using a type-directed deserialization format where the
types of the deserialized objects are specified by the
programmer, make sure that the objects which can be instantiated
cannot perform any destructive actions in their destructors,
even when the data members have been manipulated.
</para>
<para>
JSON decoders do not suffer from this problem. But you must not
use the <function>eval</function> function to parse JSON objects
in Javascript; even with the regular expression filter from RFC
4627, there are still information leaks remaining.
</para>
</section>
<section id="sect-Defensive_Coding-Tasks-Serialization-XML">
<title>XML serialization</title>
<para>
</para>
<section id="sect-Defensive_Coding-Tasks-Serialization-XML-External">
<title>External references</title>
<para>
XML documents can contain external references. They can occur
in various places.
</para>
<itemizedlist>
<listitem>
<para>
In the DTD declaration in the header of an XML document:
</para>
<informalexample>
<programlisting language="XML">
<![CDATA[<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">]]>
</programlisting>
</informalexample>
</listitem>
<listitem>
<para>
In a namespace declaration:
</para>
<informalexample>
<programlisting language="XML">
<![CDATA[<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">]]>
</programlisting>
</informalexample>
</listitem>
<listitem>
<para>
In an entity defintion:
</para>
<informalexample>
<programlisting language="XML">
<![CDATA[<!ENTITY sys SYSTEM "http://www.example.com/ent.xml">
<!ENTITY pub PUBLIC "-//Example//Public Entity//EN"
"http://www.example.com/pub-ent.xml">]]>
</programlisting>
</informalexample>
</listitem>
<listitem>
<para>
In a notation:
</para>
<informalexample>
<programlisting language="XML">
<![CDATA[<!NOTATION not SYSTEM "../not.xml">]]>
</programlisting>
</informalexample>
</listitem>
</itemizedlist>
<para>
Originally, these external references were intended as unique
identifiers, but by many XML implementations, they are used
for locating the data for the referenced element. This causes
unwanted network traffic, and may disclose file system
contents or otherwise unreachable network resources, so this
functionality should be disabled.
</para>
<para>
Depending on the XML library, external referenced might be
processed not just when parsing XML, but also when generating
it.
</para>
</section>
<section id="sect-Defensive_Coding-Tasks-Serialization-XML-Entities">
<title>Entity expansion</title>
<para>
When external DTD processing is disabled, an internal DTD
subset can still contain entity definitions. Entity
declarations can reference other entities. Some XML libraries
expand entities automatically, and this processing cannot be
switched off in some places (such as attribute values or
content models). Without limits on the entity nesting level,
this expansion results in data which can grow exponentially in
length with size of the input. (If there is a limit on the
nesting level, the growth is still polynomial, unless further
limits are imposed.)
</para>
<para>
Consequently, the processing internal DTD subsets should be
disabled if possible, and only trusted DTDs should be
processed. If a particular XML application does not permit
such restrictions, then application-specific limits are called
for.
</para>
</section>
<section id="sect-Defensive_Coding-Tasks-Serialization-XML-XInclude">
<title>XInclude processing</title>
<para>
XInclude processing can reference file and network resources
and include them into the document, much like external entity
references. When parsing untrusted XML documents, XInclude
processing should be truned off.
</para>
<para>
XInclude processing is also fairly complex and may pull in
support for the XPointer and XPath specifications,
considerably increasing the amount of code required for XML
processing.
</para>
</section>
<section id="sect-Defensive_Coding-Tasks-Serialization-XML-Validation">
<title>Algorithmic complexity of XML validation</title>
<para>
DTD-based XML validation uses regular expressions for content
models. The XML specification requires that content models
are deterministic, which means that efficient validation is
possible. However, some implementations do not enforce
determinism, and require exponential (or just polynomial)
amount of space or time for validating some DTD/document
combinations.
</para>
<para>
XML schemas and RELAX NG (via the <literal>xsd:</literal>
prefix) directly support textual regular expressions which are
not required to be deterministic.
</para>
</section>
<section id="sect-Defensive_Coding-Tasks-Serialization-XML-Expat">
<title>Using Expat for XML parsing</title>
<para>
By default, Expat does not try to resolve external IDs, so no
steps are required to block them. However, internal entity
declarations are processed. Installing a callback which stops
parsing as soon as such entities are encountered disables
them, see <xref
linkend="ex-Defensive_Coding-Tasks-Serialization-XML-Expat-EntityDeclHandler"/>.
Expat does not perform any validation, so there are no
problems related to that.
</para>
<example id="ex-Defensive_Coding-Tasks-Serialization-XML-Expat-EntityDeclHandler">
<title>Disabling XML entity processing with Expat</title>
<xi:include href="snippets/Serialization-XML-Expat-EntityDeclHandler.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
<para>
This handler must be installed when the
<literal>XML_Parser</literal> object is created (<xref
linkend="ex-Defensive_Coding-Tasks-Serialization-XML-Expat-Create"/>).
</para>
<example id="ex-Defensive_Coding-Tasks-Serialization-XML-Expat-Create">
<title>Creating an Expat XML parser</title>
<xi:include href="snippets/Serialization-XML-Expat-Create.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
<para>
It is also possible to reject internal DTD subsets altogeher,
using a suitable
<literal>XML_StartDoctypeDeclHandler</literal> handler
installed with <function>XML_SetDoctypeDeclHandler</function>.
</para>
</section>
<section id="sect-Defensive_Coding-Tasks-Serialization-XML-OpenJDK_Parse">
<title>Using OpenJDK for XML parsing and validation</title>
<para>
OpenJDK contains facilities for DOM-based, SAX-based, and
StAX-based document parsing. Documents can be validated
against DTDs or XML schemas.
</para>
<para>
The approach taken to deal with entity expansion differs from
the general recommendation in <xref
linkend="sect-Defensive_Coding-Tasks-Serialization-XML-Entities"/>.
We enable the the feature flag
<literal>javax.xml.XMLConstants.FEATURE_SECURE_PROCESSING</literal>,
which enforces heuristic restrictions on the number of entity
expansions. Note that this flag alone does not prevent
resolution of external references (system IDs or public IDs),
so it is slightly misnamed.
</para>
<para>
In the following sections, we use helper classes to prevent
external ID resolution.
</para>
<example id="ex-Defensive_Coding-Tasks-Serialization-XML-OpenJDK-NoEntityResolver">
<title>Helper class to prevent DTD external entity resolution in OpenJDK</title>
<xi:include href="snippets/Serialization-XML-OpenJDK-NoEntityResolver.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
<example id="ex-Defensive_Coding-Tasks-Serialization-XML-OpenJDK-NoResourceResolver">
<title>Helper class to prevent schema resolution in
OpenJDK</title>
<xi:include href="snippets/Serialization-XML-OpenJDK-NoResourceResolver.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
<para>
<xref linkend="ex-Defensive_Coding-Tasks-Serialization-XML-OpenJDK-Imports"/>
shows the imports used by the examples.
</para>
<example id="ex-Defensive_Coding-Tasks-Serialization-XML-OpenJDK-Imports">
<title>Java imports for OpenJDK XML parsing</title>
<xi:include href="snippets/Serialization-XML-OpenJDK-Imports.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
<section id="sect-Defensive_Coding-Tasks-Serialization-XML-OpenJDK_Parse-DOM">
<title>DOM-based XML parsing and DTD validation in OpenJDK</title>
<para>
This approach produces a
<literal>org.w3c.dom.Document</literal> object from an input
stream. <xref linkend="ex-Defensive_Coding-Tasks-Serialization-XML-OpenJDK_Parse-DOM"/>
use the data from the <literal>java.io.InputStream</literal>
instance in the <literal>inputStream</literal> variable.
</para>
<example id="ex-Defensive_Coding-Tasks-Serialization-XML-OpenJDK_Parse-DOM">
<title>DOM-based XML parsing in OpenJDK</title>
<xi:include href="snippets/Serialization-XML-OpenJDK_Parse-DOM.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
<para>
External entity references are prohibited using the
<literal>NoEntityResolver</literal> class in
<xref linkend="ex-Defensive_Coding-Tasks-Serialization-XML-OpenJDK-NoEntityResolver"/>.
Because external DTD references are prohibited, DTD validation
(if enabled) will only happen against the internal DTD subset
embedded in the XML document.
</para>
<para>
To validate the document against an external DTD, use a
<literal>javax.xml.transform.Transformer</literal> class to
add the DTD reference to the document, and an entity
resolver which whitelists this external reference.
</para>
</section>
<section id="sect-Defensive_Coding-Tasks-Serialization-XML-OpenJDK_Parse-SAX">
<title>XML Schema validation in OpenJDK</title>
<para>
<xref linkend="ex-Defensive_Coding-Tasks-Serialization-XML-OpenJDK_Parse-XMLSchema_SAX"/>
shows how to validate a document against an XML Schema,
using a SAX-based approach. The XML data is read from an
<literal>java.io.InputStream</literal> in the
<literal>inputStream</literal> variable.
</para>
<example id="ex-Defensive_Coding-Tasks-Serialization-XML-OpenJDK_Parse-XMLSchema_SAX">
<title>SAX-based validation against an XML schema in
OpenJDK</title>
<xi:include href="snippets/Serialization-XML-OpenJDK_Parse-XMLSchema_SAX.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
<para>
The <literal>NoResourceResolver</literal> class is defined
in <xref linkend="ex-Defensive_Coding-Tasks-Serialization-XML-OpenJDK-NoResourceResolver"/>.
</para>
<para>
If you need to validate a document against an XML schema,
use the code in <xref
linkend="ex-Defensive_Coding-Tasks-Serialization-XML-OpenJDK_Parse-DOM"/>
to create the document, but do not enable validation at this
point. Then use
<xref linkend="ex-Defensive_Coding-Tasks-Serialization-XML-OpenJDK_Parse-XMLSchema_DOM"/>
to perform the schema-based validation on the
<literal>org.w3c.dom.Document</literal> instance
<literal>document</literal>.
</para>
<example id="ex-Defensive_Coding-Tasks-Serialization-XML-OpenJDK_Parse-XMLSchema_DOM">
<title>Validation of a DOM document against an XML schema in
OpenJDK</title>
<xi:include href="snippets/Serialization-XML-OpenJDK_Parse-XMLSchema_DOM.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
</section>
</section>
</section>
<section>
<title>Protocol Encoders</title>
<para>
For protocol encoders, you should write bytes to a buffer which
grows as needed, using an exponential sizing policy. Explicit
lengths can be patched in later, once they are known.
Allocating the required number of bytes upfront typically
requires separate code to compute the final size, which must be
kept in sync with the actual encoding step, or vulnerabilities
may result. In multi-threaded code, parts of the object being
deserialized might change, so that the computed size is out of
date.
</para>
<para>
You should avoid copying data directly from a received packet
during encoding, disregarding the format. Propagating malformed
data could enable attacks on other recipients of that data.
</para>
<para>
When using C or C++ and copying whole data structures directly
into the output, make sure that you do not leak information in
padding bytes between fields or at the end of the
<literal>struct</literal>.
</para>
</section>
</chapter>

View file

@ -1,257 +0,0 @@
<?xml version='1.0' encoding='utf-8' ?>
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
]>
<chapter id="chap-Defensive_Coding-Tasks-Temporary_Files">
<title>Temporary files</title>
<para>
In this chapter, we describe how to create temporary files and
directories, how to remove them, and how to work with programs
which do not create files in ways that a safe with a shared
directory for temporary files. General file system manipulation
is treated in a separate chapter, <xref
linkend="chap-Defensive_Coding-Tasks-File_System"/>.
</para>
<para>
Secure creation of temporary files has four different aspects.
</para>
<itemizedlist>
<listitem>
<para>
The location of the directory for temporary files must be
obtained in a secure manner (that is, untrusted environment
variables must be ignored, see <xref
linkend="sect-Defensive_Coding-Tasks-secure_getenv"/>).
</para>
</listitem>
<listitem>
<para>
A new file must be created. Reusing an existing file must be
avoided (the <filename class="directory">/tmp</filename> race
condition). This is tricky because traditionally, system-wide
temporary directories shared by all users are used.
</para>
</listitem>
<listitem>
<para>
The file must be created in a way that makes it impossible for
other users to open it.
</para>
</listitem>
<listitem>
<para>
The descriptor for the temporary file should not leak to
subprocesses.
</para>
</listitem>
</itemizedlist>
<para>
All functions mentioned below will take care of these aspects.
</para>
<para>
Traditionally, temporary files are often used to reduce memory
usage of programs. More and more systems use RAM-based file
systems such as <literal>tmpfs</literal> for storing temporary
files, to increase performance and decrease wear on Flash storage.
As a result, spooling data to temporary files does not result in
any memory savings, and the related complexity can be avoided if
the data is kept in process memory.
</para>
<section id="chap-Defensive_Coding-Tasks-Temporary_Files-Location">
<title>Obtaining the location of temporary directory</title>
<para>
Some functions below need the location of a directory which
stores temporary files. For C/C++ programs, use the following
steps to obtain that directory:
</para>
<itemizedlist>
<listitem>
<para>
Use <function>secure_getenv</function> to obtain the value
of the <literal>TMPDIR</literal> environment variable. If
it is set, convert the path to a fully-resolved absolute
path, using <literal>realpath(path, NULL)</literal>. Check
if the new path refers to a directory and is writeable. In
this case, use it as the temporary directory.
</para>
</listitem>
<listitem>
<para>
Fall back to <filename class="directory">/tmp</filename>.
</para>
</listitem>
</itemizedlist>
<para>
In Python, you can use the <varname>tempfile.tempdir</varname>
variable.
</para>
<para>
Java does not support SUID/SGID programs, so you can use the
<function>java.lang.System.getenv(String)</function> method to
obtain the value of the <literal>TMPDIR</literal> environment
variable, and follow the two steps described above. (Java's
default directory selection does not honor
<literal>TMPDIR</literal>.)
</para>
</section>
<section>
<title>Named temporary files</title>
<para>
The <function>mkostemp</function> function creates a named
temporary file. You should specify the
<literal>O_CLOEXEC</literal> flag to avoid file descriptor leaks
to subprocesses. (Applications which do not use multiple threads
can also use <function>mkstemp</function>, but libraries should
use <function>mkostemp</function>.) For determining the
directory part of the file name pattern, see <xref
linkend="chap-Defensive_Coding-Tasks-Temporary_Files-Location"/>.
</para>
<para>
The file is not removed automatically. It is not safe to rename
or delete the file before processing, or transform the name in
any way (for example, by adding a file extension). If you need
multiple temporary files, call <function>mkostemp</function>
multiple times. Do not create additional file names derived
from the name provided by a previous
<function>mkostemp</function> call. However, it is safe to close
the descriptor returned by <function>mkostemp</function> and
reopen the file using the generated name.
</para>
<para>
The Python class <literal>tempfile.NamedTemporaryFile</literal>
provides similar functionality, except that the file is deleted
automatically by default. Note that you may have to use the
<literal>file</literal> attribute to obtain the actual file
object because some programming interfaces cannot deal with
file-like objects. The C function <function>mkostemp</function>
is also available as <function>tempfile.mkstemp</function>.
</para>
<para>
In Java, you can use the
<function>java.io.File.createTempFile(String, String,
File)</function> function, using the temporary file location
determined according to <xref
linkend="chap-Defensive_Coding-Tasks-Temporary_Files-Location"/>.
Do not use <function>java.io.File.deleteOnExit()</function> to
delete temporary files, and do not register a shutdown hook for
each temporary file you create. In both cases, the deletion
hint cannot be removed from the system if you delete the
temporary file prior to termination of the VM, causing a memory
leak.
</para>
</section>
<section>
<title>Temporary files without names</title>
<para>
The <function>tmpfile</function> function creates a temporary
file and immediately deletes it, while keeping the file open.
As a result, the file lacks a name and its space is deallocated
as soon as the file descriptor is closed (including the implicit
close when the process terminates). This avoids cluttering the
temporary directory with orphaned files.
</para>
<para>
Alternatively, if the maximum size of the temporary file is
known beforehand, the <function>fmemopen</function> function can
be used to create a <literal>FILE *</literal> object which is
backed by memory.
</para>
<para>
In Python, unnamed temporary files are provided by the
<literal>tempfile.TemporaryFile</literal> class, and the
<literal>tempfile.SpooledTemporaryFile</literal> class provides
a way to avoid creation of small temporary files.
</para>
<para>
Java does not support unnamed temporary files.
</para>
</section>
<section id="chap-Defensive_Coding-Tasks-Temporary_Directory">
<title>Temporary directories</title>
<para>
The <function>mkdtemp</function> function can be used to create
a temporary directory. (For determining the directory part of
the file name pattern, see <xref
linkend="chap-Defensive_Coding-Tasks-Temporary_Files-Location"/>.)
The directory is not automatically removed. In Python, this
function is available as <function>tempfile.mkdtemp</function>.
In Java 7, temporary directories can be created using the
<function>java.nio.file.Files.createTempDirectory(Path, String,
FileAttribute...)</function> function.
</para>
<para>
When creating files in the temporary directory, use
automatically generated names, e.g., derived from a sequential
counter. Files with externally provided names could be picked
up in unexpected contexts, and crafted names could actually
point outside of the tempoary directory (due to
<emphasis>directory traversal</emphasis>).
</para>
<para>
Removing a directory tree in a completely safe manner is
complicated. Unless there are overriding performance concerns,
the <application>rm</application> program should be used, with
the <option>-rf</option> and <option>--</option> options.
</para>
</section>
<section>
<title>Compensating for unsafe file creation</title>
<para>
There are two ways to make a function or program which excepts a
file name safe for use with temporary files. See
<xref linkend="sect-Defensive_Coding-Tasks-Processes-Creation"/>,
for details on subprocess creation.
</para>
<itemizedlist>
<listitem>
<para>
Create a temporary directory and place the file there. If
possible, run the program in a subprocess which uses the
temporary directory as its current directory, with a
restricted environment.
Use generated names for all files in that temporary
directory. (See <xref
linkend="chap-Defensive_Coding-Tasks-Temporary_Directory"/>.)
</para>
</listitem>
<listitem>
<para>
Create the temporary file and pass the generated file name
to the function or program. This only works if the function
or program can cope with a zero-length existing file. It is
safe only under additional assumptions:
</para>
<itemizedlist>
<listitem>
<para>
The function or program must not create additional files
whose name is derived from the specified file name or
are otherwise predictable.
</para>
</listitem>
<listitem>
<para>
The function or program must not delete the file before
processing it.
</para>
</listitem>
<listitem>
<para>
It must not access any existing files in the same
directory.
</para>
</listitem>
</itemizedlist>
<para>
It is often difficult to check whether these additional
assumptions are matched, therefore this approach is not
recommended.
</para>
</listitem>
</itemizedlist>
</section>
</chapter>

View file

@ -1,4 +0,0 @@
<?xml version="1.0"?>
<locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
<include rules="../../schemas.xml"/>
</locatingRules>