defensive-coding-guide/defensive-coding/en-US/C/Language.xml

<?xml version='1.0' encoding='utf-8' ?>
<!DOCTYPE section PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
]>
<section id="sect-Defensive_Coding-C-Language">
  <title>The core language</title>
  <para>
    C provides no memory safety.  Most recommendations in this section
    deal with this aspect of the language.
  </para>

  <section id="sect-Defensive_Coding-C-Undefined">
    <title>Undefined behavior</title>
    <para>
      Some C constructs are defined to be undefined by the C standard.
      This does not only mean that the standard does not describe
      what happens when the construct is executed.  It also allows
      optimizing compilers such as GCC to assume that this particular
      construct is never reached.  In some cases, this has caused
      GCC to optimize security checks away.  (This is not a flaw in GCC
      or the C language.  But C certainly has some areas which are more
      difficult to use than others.)
    </para>

    <para>
      Common sources of undefined behavior are:
    </para>
    <itemizedlist>
      <listitem><para>out-of-bounds array accesses</para></listitem>
      <listitem><para>null pointer dereferences</para></listitem>
      <listitem><para>overflow in signed integer arithmetic</para></listitem>
    </itemizedlist>
  </section>

  <section id="sect-Defensive_Coding-C-Pointers">
    <title>Recommendations for pointers and array handling</title>
    <para>
      Always keep track of the size of the array you are working with.
      Often, code is more obviously correct when you keep a pointer
      past the last element of the array, and calculate the number of
      remaining elements by substracting the current position from
      that pointer.  The alternative, updating a separate variable
      every time when the position is advanced, is usually less
      obviously correct.
    </para>
    <para>
      <xref linkend="ex-Defensive_Coding-C-Pointers-remaining"/>
      shows how to extract Pascal-style strings from a character
      buffer.  The two pointers kept for length checks are
      <varname>inend</varname> and <varname>outend</varname>.
      <varname>inp</varname> and <varname>outp</varname> are the
      respective positions.
      The number of input bytes is checked using the expression
      <literal>len > (size_t)(inend - inp)</literal>.
      The cast silences a compiler warning;
      <varname>inend</varname> is always larger than
      <varname>inp</varname>.
    </para>
    <example id="ex-Defensive_Coding-C-Pointers-remaining">
      <title>Array processing in C</title>
      <xi:include href="snippets/Pointers-remaining.xml"
		  xmlns:xi="http://www.w3.org/2001/XInclude" />
    </example>
    <para>
      It is important that the length checks always have the form
      <literal>len > (size_t)(inend - inp)</literal>, where
      <varname>len</varname> is a variable of type
      <type>size_t</type> which denotes the <emphasis>total</emphasis>
      number of bytes which are about to be read or written next.  In
      general, it is not safe to fold multiple such checks into one,
      as in <literal>len1 + len2 > (size_t)(inend - inp)</literal>,
      because the expression on the left can overflow or wrap around
      (see <xref linkend="sect-Defensive_Coding-C-Arithmetic"/>), and it
      no longer reflects the number of bytes to be processed.
    </para>
  </section>

  <section id="sect-Defensive_Coding-C-Arithmetic">
    <title>Recommendations for integer arithmetic</title>
    <para>
      Overflow in signed integer arithmetic is undefined.  This means
      that it is not possible to check for overflow after it happened,
      see <xref linkend="ex-Defensive_Coding-C-Arithmetic-bad"/>.
    </para>
    <example id="ex-Defensive_Coding-C-Arithmetic-bad">
      <title>Incorrect overflow detection in C</title>
      <xi:include href="snippets/Arithmetic-add.xml"
		  xmlns:xi="http://www.w3.org/2001/XInclude" />
    </example>
    <para>
      The following approaches can be used to check for overflow,
      without actually causing it.
    </para>
    <itemizedlist>
      <listitem>
	<para>
	  Use a wider type to perform the calculation, check that the
	  result is within bounds, and convert the result to the
	  original type.  All intermediate results must be checked in
	  this way.
	</para>
      </listitem>
      <listitem>
	<para>
	  Perform the calculation in the corresponding unsigned type
	  and use bit fiddling to detect the overflow.
	</para>
      </listitem>
      <listitem>
	<para>
	  Compute bounds for acceptable input values which are known
	  to avoid overflow, and reject other values.  This is the
	  preferred way for overflow checking on multiplications,
	  see <xref linkend="ex-Defensive_Coding-C-Arithmetic-mult"/>.
	  <!-- This approach can result in bogus compiler warnings
	       with signed types:
	       http://gcc.gnu.org/bugzilla/post_bug.cgi -->
	</para>
      </listitem>
    </itemizedlist>
    <example id="ex-Defensive_Coding-C-Arithmetic-mult">
      <title>Overflow checking for unsigned multiplication</title>
      <xi:include href="snippets/Arithmetic-mult.xml"
		  xmlns:xi="http://www.w3.org/2001/XInclude" />
    </example>
    <para>
      Basic arithmetic operations a commutative, so for bounds checks,
      there are two different but mathematically equivalent
      expressions.  Sometimes, one of the expressions results in
      better code because parts of it can be reduced to a constant.
      This applies to overflow checks for multiplication <literal>a *
      b</literal> involving a constant <literal>a</literal>, where the
      expression is reduced to <literal>b &gt; C</literal> for some
      constant <literal>C</literal> determined at compile time.  The
      other expression, <literal>b &amp;&amp; a > ((unsigned)-1) /
      b</literal>, is more difficult to optimize at compile time.
    </para>
    <para>
      When a value is converted to a signed integer, GCC always
      chooses the result based on 2's complement arithmetic.  This GCC
      extension (which is also implemented by other compilers) helps a
      lot when implementing overflow checks.
    </para>
    <para>
      Legacy code should be compiled with the <option>-fwrapv</option>
      GCC option.  As a result, GCC will provide 2's complement
      semantics for integer arithmetic, including defined behavior on
      integer overflow.
    </para>
  </section>
</section>