defensive-coding-guide/en-US/Tasks-Library_Design.xml

195 lines
7.6 KiB
XML

<?xml version='1.0' encoding='utf-8' ?>
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
]>
<chapter id="chap-Defensive_Coding-Tasks-Library_Design">
<title>Library Design</title>
<para>
Throught this section, the term <emphasis>client code</emphasis>
refers to applications and other libraries using the library.
</para>
<section>
<title>State management</title>
<para>
</para>
<section>
<title>Global state</title>
<para>
Global state should be avoided.
</para>
<para>
If this is impossible, the global state must be protected with
a lock. For C/C++, you can use the
<function>pthread_mutex_lock</function>
and <function>pthread_mutex_unlock</function>
functions without linking against <literal>-lpthread</literal>
because the system provides stubs for non-threaded processes.
</para>
<para>
For compatibility with <function>fork</function>, these locks
should be acquired and released in helpers registered with
<function>pthread_atfork</function>. This function is not
available without <literal>-lpthread</literal>, so you need to
use <function>dlsym</function> or a weak symbol to obtain its
address.
</para>
<para>
If you need <function>fork</function> protection for other
reasons, you should store the process ID and compare it to the
value returned by <function>getpid</function> each time you
access the global state. (<function>getpid</function> is not
implemented as a system call and is fast.) If the value
changes, you know that you have to re-create the state object.
(This needs to be combined with locking, of course.)
</para>
</section>
<section>
<title>Handles</title>
<para>
Library state should be kept behind a curtain. Client code
should receive only a handle. In C, the handle can be a
pointer to an incomplete <literal>struct</literal>. In C++,
the handle can be a pointer to an abstract base class, or it
can be hidden using the pointer-to-implementation idiom.
</para>
<para>
The library should provide functions for creating and
destroying handles. (In C++, it is possible to use virtual
destructors for the latter.) Consistency between creation and
destruction of handles is strongly recommended: If the client
code created a handle, it is the responsibility of the client
code to destroy it. (This is not always possible or
convenient, so sometimes, a transfer of ownership has to
happen.)
</para>
<para>
Using handles ensures that it is possible to change the way
the library represents state in a way that is transparent to
client code. This is important to facilitate security updates
and many other code changes.
</para>
<para>
It is not always necessary to protect state behind a handle
with a lock. This depends on the level of thread safety
the library provides.
</para>
</section>
</section>
<section>
<title>Object orientation</title>
<para>
Classes should be either designed as base classes, or it should
be impossible to use them as base classes (like
<literal>final</literal> classes in Java). Classes which are
not designed for inheritance and are used as base classes
nevertheless create potential maintenance hazards because it is
difficult to predict how client code will react when calls to
virtual methods are added, reordered or removed.
</para>
<para>
Virtual member functions can be used as callbacks. See
<xref linkend="sect-Defensive_Coding-Tasks-Library_Design-Callbacks"/>
for some of the challenges involved.
</para>
</section>
<section id="sect-Defensive_Coding-Tasks-Library_Design-Callbacks">
<title>Callbacks</title>
<para>
Higher-order code is difficult to analyze for humans and
computers alike, so it should be avoided. Often, an
iterator-based interface (a library function which is called
repeatedly by client code and returns a stream of events) leads
to a better design which is easier to document and use.
</para>
<para>
If callbacks are unavoidable, some guidelines for them follow.
</para>
<para>
In modern C++ code, <literal>std::function</literal> objects
should be used for callbacks.
</para>
<para>
In older C++ code and in C code, all callbacks must have an
additional closure parameter of type <literal>void *</literal>,
the value of which can be specified by client code. If
possible, the value of the closure parameter should be provided
by client code at the same time a specific callback is
registered (or specified as a function argument). If a single
closure parameter is shared by multiple callbacks, flexibility
is greatly reduced, and conflicts between different pieces of
client code using the same library object could be unresolvable.
In some cases, it makes sense to provide a de-registration
callback which can be used to destroy the closure parameter when
the callback is no longer used.
</para>
<para>
Callbacks can throw exceptions or call
<function>longjmp</function>. If possible, all library objects
should remain in a valid state. (All further operations on them
can fail, but it should be possible to deallocate them without
causing resource leaks.)
</para>
<para>
The presence of callbacks raises the question if functions
provided by the library are <emphasis>reentrant</emphasis>.
Unless a library was designed for such use, bad things will
happen if a callback function uses functions in the same library
(particularly if they are invoked on the same objects and
manipulate the same state). When the callback is invoked, the
library can be in an inconsistent state. Reentrant functions
are more difficult to write than thread-safe functions (by
definition, simple locking would immediately lead to deadlocks).
It is also difficult to decide what to do when destruction of an
object which is currently processing a callback is requested.
</para>
</section>
<section>
<title>Process attributes</title>
<para>
Several attributes are global and affect all code in the
process, not just the library that manipulates them.
</para>
<itemizedlist>
<listitem><para>
environment variables
(see <xref linkend="sect-Defensive_Coding-Tasks-secure_getenv"/>)
</para></listitem>
<listitem><para>
umask
</para></listitem>
<listitem><para>
user IDs, group IDs and capabilities
</para></listitem>
<listitem><para>
current working directory
</para></listitem>
<listitem><para>
signal handlers, signal masks and signal delivery
</para></listitem>
<listitem><para>
file locks (especially <function>fcntl</function> locks
behave in surprising ways, not just in a multi-threaded
environment)
</para></listitem>
</itemizedlist>
<para>
Library code should avoid manipulating these global process
attributes. It should not rely on environment variables, umask,
the current working directory and signal masks because these
attributes can be inherted from an untrusted source.
</para>
<para>
In addition, there are obvious process-wide aspects such as the
virtual memory layout, the set of open files and dynamic shared
objects, but with the exception of shared objects, these can be
manipulated in a relatively isolated way.
</para>
</section>
</chapter>