defensive-coding-guide/modules/ROOT/pages/programming-languages/CXX.adoc

:experimental:
:toc:

include::partial$entities.adoc[]

= The {cpp} Programming Language

[[sect-Defensive_Coding-CXX-Language]]
== The Core Language

C++ includes a large subset of the C language. As far as the C
subset is used, the recommendations in xref:programming-languages/C.adoc#chap-Defensive_Coding-C[Defensive Coding in C] apply.

=== Array Allocation with `operator new[]`

For very large values of `n`, an expression
like `new T[n]` can return a pointer to a heap
region which is too small. In other words, not all array
elements are actually backed with heap memory reserved to the
array. Current GCC versions generate code that performs a
computation of the form `sizeof(T) * size_t(n) + cookie_size`, where `cookie_size` is
currently at most 8. This computation can overflow, and GCC
versions prior to 4.8 generated code which did not detect this.
(Fedora 18 was the first release which fixed this in GCC.)

The `std::vector` template can be used instead
an explicit array allocation. (The GCC implementation detects
overflow internally.)

If there is no alternative to `operator new[]`
and the sources will be compiled with older GCC versions, code
which allocates arrays with a variable length must check for
overflow manually. For the `new T[n]` example,
the size check could be `n || (n > 0 && n >
(size_t(-1) - 8) / sizeof(T))`. (See xref:programming-languages/C.adoc#sect-Defensive_Coding-C-Arithmetic[Recommendations for Integer Arithmetic]) If there are
additional dimensions (which must be constants according to the
{cpp} standard), these should be included as factors in the
divisor.

These countermeasures prevent out-of-bounds writes and potential
code execution. Very large memory allocations can still lead to
a denial of service. xref:tasks/Tasks-Serialization.adoc#sect-Defensive_Coding-Tasks-Serialization-Decoders[Recommendations for Manually-written Decoders]
contains suggestions for mitigating this problem when processing
untrusted data.

See xref:tasks/programming-languages/C.adoc#sect-Defensive_Coding-C-Allocators-Arrays[Array Allocation]
for array allocation advice for C-style memory allocation.

=== Overloading

Do not overload functions with versions that have different
security characteristics. For instance, do not implement a
function `strcat` which works on
`std::string` arguments. Similarly, do not name
methods after such functions.

=== ABI compatibility and preparing for security updates

A stable binary interface (ABI) is vastly preferred for security
updates. Without a stable ABI, all reverse dependencies need
recompiling, which can be a lot of work and could even be
impossible in some cases. Ideally, a security update only
updates a single dynamic shared object, and is picked up
automatically after restarting affected processes.

Outside of extremely performance-critical code, you should
ensure that a wide range of changes is possible without breaking
ABI. Some very basic guidelines are:

* Avoid inline functions.

* Use the pointer-to-implementation idiom.

* Try to avoid templates. Use them if the increased type
safety provides a benefit to the programmer.

* Move security-critical code out of templated code, so that
it can be patched in a central place if necessary.

The KDE project publishes a document with more extensive
guidelines on ABI-preserving changes to {cpp} code, link:++https://community.kde.org/Policies/Binary_Compatibility_Issues_With_C%2B%2B++[Policies/Binary
Compatibility Issues With {cpp}]
(*d-pointer* refers to the
pointer-to-implementation idiom).

[[sect-Defensive_Coding-CXX-Language-CXX11]]
=== {cpp}0X and {cpp}11 Support

GCC offers different language compatibility modes:

* [option]`-std=c++98` for the original 1998 {cpp}
standard

* [option]`-std=c++03` for the 1998 standard with the
changes from the TR1 technical report

* [option]`-std=c++11` for the 2011 {cpp} standard. This
option should not be used.

* [option]`-std=c++0x` for several different versions
of {cpp}11 support in development, depending on the GCC
version. This option should not be used.

For each of these flags, there are variants which also enable
GNU extensions (mostly language features also found in C99 or
C11):

* [option]`-std=gnu++98`
* [option]`-std=gnu++03`
* [option]`-std=gnu++11`

Again, [option]`-std=gnu++11` should not be used.

If you enable {cpp}11 support, the ABI of the standard {cpp} library
`libstdc++` will change in subtle ways.
Currently, no {cpp} libraries are compiled in {cpp}11 mode, so if
you compile your code in {cpp}11 mode, it will be incompatible
with the rest of the system. Unfortunately, this is also the
case if you do not use any {cpp}11 features. Currently, there is
no safe way to enable {cpp}11 mode (except for freestanding
applications).

The meaning of {cpp}0X mode changed from GCC release to GCC
release. Earlier versions were still ABI-compatible with {cpp}98
mode, but in the most recent versions, switching to {cpp}0X mode
activates {cpp}11 support, with its compatibility problems.

Some {cpp}11 features (or approximations thereof) are available
with TR1 support, that is, with [option]`-std=c++03` or
[option]`-std=gnu++03` and in the
`<tr1/*>` header files. This includes
`std::tr1::shared_ptr` (from
`<tr1/memory>`) and
`std::tr1::function` (from
`<tr1/functional>`). For other {cpp}11
features, the Boost {cpp} library contains replacements.

[[sect-Defensive_Coding-CXX-Std]]
== The C++ Standard Library

The C++ standard library includes most of its C counterpart
by reference, see xref:programming-languages/C.adoc#chap-Defensive_Coding-C[Defensive Coding in C].

[[sect-Defensive_Coding-CXX-Std-Functions]]
=== Functions That Are Difficult to Use

This section collects functions and function templates which are
part of the standard library and are difficult to use.

[[sect-Defensive_Coding-CXX-Std-Functions-Unpaired_Iterators]]
==== Unpaired Iterators

Functions which use output operators or iterators which do not
come in pairs (denoting ranges) cannot perform iterator range
checking.
(See <<sect-Defensive_Coding-CXX-Std-Iterators>>)
Function templates which involve output iterators are
particularly dangerous:

* `std::copy`

* `std::copy_backward`

* `std::copy_if`

* `std::move` (three-argument variant)

* `std::move_backward`

* `std::partition_copy_if`

* `std::remove_copy`

* `std::remove_copy_if`

* `std::replace_copy`

* `std::replace_copy_if`

* `std::swap_ranges`

* `std::transform`

In addition, `std::copy_n`,
`std::fill_n` and
`std::generate_n` do not perform iterator
checking, either, but there is an explicit count which has to be
supplied by the caller, as opposed to an implicit length
indicator in the form of a pair of forward iterators.

These output-iterator-expecting functions should only be used
with unlimited-range output iterators, such as iterators
obtained with the `std::back_inserter`
function.

Other functions use single input or forward iterators, which can
read beyond the end of the input range if the caller is not careful:

* `std::equal`

* `std::is_permutation`

* `std::mismatch`

[[sect-Defensive_Coding-CXX-Std-String]]
=== String Handling with `std::string`

The `std::string` class provides a convenient
way to handle strings. Unlike C strings,
`std::string` objects have an explicit length
(and can contain embedded NUL characters), and storage for its
characters is managed automatically. This section discusses
`std::string`, but these observations also
apply to other instances of the
`std::basic_string` template.

The pointer returned by the `data()` member
function does not necessarily point to a NUL-terminated string.
To obtain a C-compatible string pointer, use
`c_str()` instead, which adds the NUL
terminator.

The pointers returned by the `data()` and
`c_str()` functions and iterators are only
valid until certain events happen. It is required that the
exact `std::string` object still exists (even
if it was initially created as a copy of another string object).
Pointers and iterators are also invalidated when non-const
member functions are called, or functions with a non-const
reference parameter. The behavior of the GCC implementation
deviates from that required by the {cpp} standard if multiple
threads are present. In general, only the first call to a
non-const member function after a structural modification of the
string (such as appending a character) is invalidating, but this
also applies to member function such as the non-const version of
`begin()`, in violation of the {cpp} standard.

Particular care is necessary when invoking the
`c_str()` member function on a temporary
object. This is convenient for calling C functions, but the
pointer will turn invalid as soon as the temporary object is
destroyed, which generally happens when the outermost expression
enclosing the expression on which `c_str()`
is called completes evaluation. Passing the result of
`c_str()` to a function which does not store
or otherwise leak that pointer is safe, though.

Like with `std::vector` and
`std::array`, subscribing with
`operator[]` does not perform bounds checks.
Use the `at(size_type)` member function
instead. See <<sect-Defensive_Coding-CXX-Std-Subscript>>.
Furthermore, accessing the terminating NUL character using
`operator[]` is not possible. (In some
implementations, the `c_str()` member function
writes the NUL character on demand.)

Never write to the pointers returned by
`data()` or `c_str()`
after casting away `const`. If you need a
C-style writable string, use a
`std::vector<char>` object and its
`data()` member function. In this case, you
have to explicitly add the terminating NUL character.

GCC's implementation of `std::string` is
currently based on reference counting. It is expected that a
future version will remove the reference counting, due to
performance and conformance issues. As a result, code that
implicitly assumes sharing by holding to pointers or iterators
for too long will break, resulting in run-time crashes or worse.
On the other hand, non-const iterator-returning functions will
no longer give other threads an opportunity for invalidating
existing iterators and pointers because iterator invalidation
does not depend on sharing of the internal character array
object anymore.

[[sect-Defensive_Coding-CXX-Std-Subscript]]
=== Containers and `operator[]`

Many sequence containers similar to `std::vector`
provide both `operator[](size_type)` and a
member function `at(size_type)`. This applies
to `std::vector` itself,
`std::array`, `std::string`
and other instances of `std::basic_string`.

`operator[](size_type)` is not required by the
standard to perform bounds checking (and the implementation in
GCC does not). In contrast, `at(size_type)`
must perform such a check. Therefore, in code which is not
performance-critical, you should prefer
`at(size_type)` over
`operator[](size_type)`, even though it is
slightly more verbose.

The `front()` and `back()`
member functions are undefined if a vector object is empty. You
can use `vec.at(0)` and
`vec.at(vec.size() - 1)` as checked
replacements. For an empty vector, `data()` is
defined; it returns an arbitrary pointer, but not necessarily
the NULL pointer.

[[sect-Defensive_Coding-CXX-Std-Iterators]]
=== Iterators

Iterators do not perform any bounds checking. Therefore, all
functions that work on iterators should accept them in pairs,
denoting a range, and make sure that iterators are not moved
outside that range. For forward iterators and bidirectional
iterators, you need to check for equality before moving the
first or last iterator in the range. For random-access
iterators, you need to compute the difference before adding or
subtracting an offset. It is not possible to perform the
operation and check for an invalid operator afterwards.

Output iterators cannot be compared for equality. Therefore, it
is impossible to write code that detects that it has been
supplied an output area that is too small, and their use should
be avoided.

These issues make some of the standard library functions
difficult to use correctly, see <<sect-Defensive_Coding-CXX-Std-Functions-Unpaired_Iterators>>.