defensive-coding-guide/modules/ROOT/pages/programming-languages/C-Language.adoc


:experimental:

[[sect-Defensive_Coding-C-Language]]
== The Core Language

C provides no memory safety. Most recommendations in this section
deal with this aspect of the language.

[[sect-Defensive_Coding-C-Undefined]]
=== Undefined Behavior

Some C constructs are defined to be undefined by the C standard.
This does not only mean that the standard does not describe
what happens when the construct is executed. It also allows
optimizing compilers such as GCC to assume that this particular
construct is never reached. In some cases, this has caused
GCC to optimize security checks away. (This is not a flaw in GCC
or the C language. But C certainly has some areas which are more
difficult to use than others.)

Common sources of undefined behavior are:

* out-of-bounds array accesses

* null pointer dereferences

* overflow in signed integer arithmetic

[[sect-Defensive_Coding-C-Pointers]]
=== Recommendations for Pointers and Array Handling

Always keep track of the size of the array you are working with.
Often, code is more obviously correct when you keep a pointer
past the last element of the array, and calculate the number of
remaining elements by subtracting the current position from
that pointer. The alternative, updating a separate variable
every time when the position is advanced, is usually less
obviously correct.

<<ex-Defensive_Coding-C-Pointers-remaining>>
shows how to extract Pascal-style strings from a character
buffer. The two pointers kept for length checks are
`inend` and `outend`.
`inp` and `outp` are the
respective positions.
The number of input bytes is checked using the expression
`len > (size_t)(inend - inp)`.
The cast silences a compiler warning;
`inend` is always larger than
`inp`.

[[ex-Defensive_Coding-C-Pointers-remaining]]
.Array processing in C
====

[source,c]
----
include::example$C-Pointers-remaining.adoc[]

----

====

It is important that the length checks always have the form
`len > (size_t)(inend - inp)`, where
`len` is a variable of type
`size_t` which denotes the *total*
number of bytes which are about to be read or written next. In
general, it is not safe to fold multiple such checks into one,
as in `len1 + len2 > (size_t)(inend - inp)`,
because the expression on the left can overflow or wrap around
(see <<sect-Defensive_Coding-C-Arithmetic>>), and it
no longer reflects the number of bytes to be processed.

[[sect-Defensive_Coding-C-Arithmetic]]
=== Recommendations for Integer Arithmetic

Overflow in signed integer arithmetic is undefined. This means
that it is not possible to check for overflow after it happened,
see <<ex-Defensive_Coding-C-Arithmetic-bad>>.

[[ex-Defensive_Coding-C-Arithmetic-bad]]
.Incorrect overflow detection in C
====

[source,c]
----
include::example$C-Arithmetic-add.adoc[]

----

====

The following approaches can be used to check for overflow,
without actually causing it.

* Use a wider type to perform the calculation, check that the
result is within bounds, and convert the result to the
original type. All intermediate results must be checked in
this way.

* Perform the calculation in the corresponding unsigned type
and use bit fiddling to detect the overflow.
<<ex-Defensive_Coding-C-Arithmetic-add_unsigned>>
shows how to perform an overflow check for unsigned integer
addition. For three or more terms, all the intermediate
additions have to be checked in this way.

[[ex-Defensive_Coding-C-Arithmetic-add_unsigned]]
.Overflow checking for unsigned addition
====

[source,c]
----
include::example$C-Arithmetic-add_unsigned.adoc[]
----

====

* Compute bounds for acceptable input values which are known
to avoid overflow, and reject other values. This is the
preferred way for overflow checking on multiplications,
see <<ex-Defensive_Coding-C-Arithmetic-mult>>.

[[ex-Defensive_Coding-C-Arithmetic-mult]]
.Overflow checking for unsigned multiplication
====

[source,c]
----
include::example$C-Arithmetic-mult.adoc[]
----

====

Basic arithmetic operations are commutative, so for bounds checks,
there are two different but mathematically equivalent
expressions. Sometimes, one of the expressions results in
better code because parts of it can be reduced to a constant.
This applies to overflow checks for multiplication `a *
b` involving a constant `a`, where the
expression is reduced to `b > C` for some
constant `C` determined at compile time. The
other expression, `b && a > ((unsigned)-1) /
b`, is more difficult to optimize at compile time.

When a value is converted to a signed integer, GCC always
chooses the result based on 2's complement arithmetic. This GCC
extension (which is also implemented by other compilers) helps a
lot when implementing overflow checks.

Sometimes, it is necessary to compare unsigned and signed
integer variables. This results in a compiler warning,
*comparison between signed and unsigned integer
expressions*, because the comparison often gives
unexpected results for negative values. When adding a cast,
make sure that negative values are covered properly. If the
bound is unsigned and the checked quantity is signed, you should
cast the checked quantity to an unsigned type as least as wide
as either operand type. As a result, negative values will fail
the bounds check. (You can still check for negative values
separately for clarity, and the compiler will optimize away this
redundant check.)

Legacy code should be compiled with the [option]`-fwrapv`
GCC option. As a result, GCC will provide 2's complement
semantics for integer arithmetic, including defined behavior on
integer overflow.

[[sect-Defensive_Coding-C-Globals]]
=== Global Variables

Global variables should be avoided because they usually lead to
thread safety hazards. In any case, they should be declared
`static`, so that access is restricted to a
single translation unit.

Global constants are not a problem, but declaring them can be
tricky. <<ex-Defensive_Coding-C-Globals-String_Array>>
shows how to declare a constant array of constant strings.
The second `const` is needed to make the
array constant, and not just the strings. It must be placed
after the `*`, and not before it.

[[ex-Defensive_Coding-C-Globals-String_Array]]
.Declaring a constant array of constant strings
====

[source,c]
----
include::example$C-Globals-String_Array.adoc[]

----

====

Sometimes, static variables local to functions are used as a
replacement for proper memory management. Unlike non-static
local variables, it is possible to return a pointer to static
local variables to the caller. But such variables are
well-hidden, but effectively global (just as static variables at
file scope). It is difficult to add thread safety afterwards if
such interfaces are used. Merely dropping the
`static` keyword in such cases leads to
undefined behavior.

Another source for static local variables is a desire to reduce
stack space usage on embedded platforms, where the stack may
span only a few hundred bytes. If this is the only reason why
the `static` keyword is used, it can just be
dropped, unless the object is very large (larger than
128 kilobytes on 32-bit platforms). In the latter case, it is
recommended to allocate the object using
`malloc`, to obtain proper array checking, for
the same reasons outlined in <<sect-Defensive_Coding-C-Allocators-alloca>>.