defensive-coding-guide/defensive-coding/en-US/C/Language.xml
2013-03-11 18:11:16 -04:00

150 lines
6.4 KiB
XML

<?xml version='1.0' encoding='utf-8' ?>
<!DOCTYPE section PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
]>
<section id="sect-Defensive_Coding-C-Language">
<title>The core language</title>
<para>
C provides no memory safety. Most recommendations in this section
deal with this aspect of the language.
</para>
<section id="sect-Defensive_Coding-C-Undefined">
<title>Undefined behavior</title>
<para>
Some C constructs are defined to be undefined by the C standard.
This does not only mean that the standard does not describe
what happens when the construct is executed. It also allows
optimizing compilers such as GCC to assume that this particular
construct is never reached. In some cases, this has caused
GCC to optimize security checks away. (This is not a flaw in GCC
or the C language. But C certainly has some areas which are more
difficult to use than others.)
</para>
<para>
Common sources of undefined behavior are:
</para>
<itemizedlist>
<listitem><para>out-of-bounds array accesses</para></listitem>
<listitem><para>null pointer dereferences</para></listitem>
<listitem><para>overflow in signed integer arithmetic</para></listitem>
</itemizedlist>
</section>
<section id="sect-Defensive_Coding-C-Pointers">
<title>Recommendations for pointers and array handling</title>
<para>
Always keep track of the size of the array you are working with.
Often, code is more obviously correct when you keep a pointer
past the last element of the array, and calculate the number of
remaining elements by substracting the current position from
that pointer. The alternative, updating a separate variable
every time when the position is advanced, is usually less
obviously correct.
</para>
<para>
<xref linkend="ex-Defensive_Coding-C-Pointers-remaining"/>
shows how to extract Pascal-style strings from a character
buffer. The two pointers kept for length checks are
<varname>inend</varname> and <varname>outend</varname>.
<varname>inp</varname> and <varname>outp</varname> are the
respective positions.
The number of input bytes is checked using the expression
<literal>len > (size_t)(inend - inp)</literal>.
The cast silences a compiler warning;
<varname>inend</varname> is always larger than
<varname>inp</varname>.
</para>
<example id="ex-Defensive_Coding-C-Pointers-remaining">
<title>Array processing in C</title>
<xi:include href="snippets/Pointers-remaining.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
<para>
It is important that the length checks always have the form
<literal>len > (size_t)(inend - inp)</literal>, where
<varname>len</varname> is a variable of type
<type>size_t</type> which denotes the <emphasis>total</emphasis>
number of bytes which are about to be read or written next. In
general, it is not safe to fold multiple such checks into one,
as in <literal>len1 + len2 > (size_t)(inend - inp)</literal>,
because the expression on the left can overflow or wrap around
(see <xref linkend="sect-Defensive_Coding-C-Arithmetic"/>), and it
no longer reflects the number of bytes to be processed.
</para>
</section>
<section id="sect-Defensive_Coding-C-Arithmetic">
<title>Recommendations for integer arithmetic</title>
<para>
Overflow in signed integer arithmetic is undefined. This means
that it is not possible to check for overflow after it happened,
see <xref linkend="ex-Defensive_Coding-C-Arithmetic-bad"/>.
</para>
<example id="ex-Defensive_Coding-C-Arithmetic-bad">
<title>Incorrect overflow detection in C</title>
<xi:include href="snippets/Arithmetic-add.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
<para>
The following approaches can be used to check for overflow,
without actually causing it.
</para>
<itemizedlist>
<listitem>
<para>
Use a wider type to perform the calculation, check that the
result is within bounds, and convert the result to the
original type. All intermediate results must be checked in
this way.
</para>
</listitem>
<listitem>
<para>
Perform the calculation in the corresponding unsigned type
and use bit fiddling to detect the overflow.
</para>
</listitem>
<listitem>
<para>
Compute bounds for acceptable input values which are known
to avoid overflow, and reject other values. This is the
preferred way for overflow checking on multiplications,
see <xref linkend="ex-Defensive_Coding-C-Arithmetic-mult"/>.
<!-- This approach can result in bogus compiler warnings
with signed types:
http://gcc.gnu.org/bugzilla/post_bug.cgi -->
</para>
</listitem>
</itemizedlist>
<example id="ex-Defensive_Coding-C-Arithmetic-mult">
<title>Overflow checking for unsigned multiplication</title>
<xi:include href="snippets/Arithmetic-mult.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
<para>
Basic arithmetic operations a commutative, so for bounds checks,
there are two different but mathematically equivalent
expressions. Sometimes, one of the expressions results in
better code because parts of it can be reduced to a constant.
This applies to overflow checks for multiplication <literal>a *
b</literal> involving a constant <literal>a</literal>, where the
expression is reduced to <literal>b &gt; C</literal> for some
constant <literal>C</literal> determined at compile time. The
other expression, <literal>b &amp;&amp; a > ((unsigned)-1) /
b</literal>, is more difficult to optimize at compile time.
</para>
<para>
When a value is converted to a signed integer, GCC always
chooses the result based on 2's complement arithmetic. This GCC
extension (which is also implemented by other compilers) helps a
lot when implementing overflow checks.
</para>
<para>
Legacy code should be compiled with the <option>-fwrapv</option>
GCC option. As a result, GCC will provide 2's complement
semantics for integer arithmetic, including defined behavior on
integer overflow.
</para>
</section>
</section>