defensive-coding-guide/en-US/Tasks-Library_Design.xml

<?xml version='1.0' encoding='utf-8' ?>
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
]>
<chapter id="chap-Defensive_Coding-Tasks-Library_Design">
  <title>Library Design</title>
  <para>
    Throught this section, the term <emphasis>client code</emphasis>
    refers to applications and other libraries using the library.
  </para>

  <section>
    <title>State management</title>
    <para>
    </para>
    <section>
      <title>Global state</title>
      <para>
	Global state should be avoided.
      </para>
      <para>
	If this is impossible, the global state must be protected with
	a lock.  For C/C++, you can use the
	<function>pthread_mutex_lock</function>
	and <function>pthread_mutex_unlock</function>
	functions without linking against <literal>-lpthread</literal>
	because the system provides stubs for non-threaded processes.
      </para>
      <para>
	For compatibility with <function>fork</function>, these locks
	should be acquired and released in helpers registered with
	<function>pthread_atfork</function>.  This function is not
	available without <literal>-lpthread</literal>, so you need to
	use <function>dlsym</function> or a weak symbol to obtain its
	address.
      </para>
      <para>
	If you need <function>fork</function> protection for other
	reasons, you should store the process ID and compare it to the
	value returned by <function>getpid</function> each time you
	access the global state.  (<function>getpid</function> is not
	implemented as a system call and is fast.)  If the value
	changes, you know that you have to re-create the state object.
	(This needs to be combined with locking, of course.)
      </para>
    </section>
    <section>
      <title>Handles</title>
      <para>
	Library state should be kept behind a curtain.  Client code
	should receive only a handle.  In C, the handle can be a
	pointer to an incomplete <literal>struct</literal>.  In C++,
	the handle can be a pointer to an abstract base class, or it
	can be hidden using the pointer-to-implementation idiom.
      </para>
      <para>
	The library should provide functions for creating and
	destroying handles.  (In C++, it is possible to use virtual
	destructors for the latter.)  Consistency between creation and
	destruction of handles is strongly recommended: If the client
	code created a handle, it is the responsibility of the client
	code to destroy it.  (This is not always possible or
	convenient, so sometimes, a transfer of ownership has to
	happen.)
      </para>
      <para>
	Using handles ensures that it is possible to change the way
	the library represents state in a way that is transparent to
	client code.  This is important to facilitate security updates
	and many other code changes.
      </para>
      <para>
	It is not always necessary to protect state behind a handle
	with a lock.  This depends on the level of thread safety
	the library provides.
      </para>
    </section>
  </section>

  <section>
    <title>Object orientation</title>
    <para>
      Classes should be either designed as base classes, or it should
      be impossible to use them as base classes (like
      <literal>final</literal> classes in Java).  Classes which are
      not designed for inheritance and are used as base classes
      nevertheless create potential maintenance hazards because it is
      difficult to predict how client code will react when calls to
      virtual methods are added, reordered or removed.
    </para>
    <para>
      Virtual member functions can be used as callbacks.  See
      <xref linkend="sect-Defensive_Coding-Tasks-Library_Design-Callbacks"/>
      for some of the challenges involved.
    </para>
  </section>

  <section id="sect-Defensive_Coding-Tasks-Library_Design-Callbacks">
    <title>Callbacks</title>
    <para>
      Higher-order code is difficult to analyze for humans and
      computers alike, so it should be avoided.  Often, an
      iterator-based interface (a library function which is called
      repeatedly by client code and returns a stream of events) leads
      to a better design which is easier to document and use.
    </para>
    <para>
      If callbacks are unavoidable, some guidelines for them follow.
    </para>
    <para>
      In modern C++ code, <literal>std::function</literal> objects
      should be used for callbacks.
    </para>
    <para>
      In older C++ code and in C code, all callbacks must have an
      additional closure parameter of type <literal>void *</literal>,
      the value of which can be specified by client code.  If
      possible, the value of the closure parameter should be provided
      by client code at the same time a specific callback is
      registered (or specified as a function argument).  If a single
      closure parameter is shared by multiple callbacks, flexibility
      is greatly reduced, and conflicts between different pieces of
      client code using the same library object could be unresolvable.
      In some cases, it makes sense to provide a de-registration
      callback which can be used to destroy the closure parameter when
      the callback is no longer used.
    </para>
    <para>
      Callbacks can throw exceptions or call
      <function>longjmp</function>.  If possible, all library objects
      should remain in a valid state.  (All further operations on them
      can fail, but it should be possible to deallocate them without
      causing resource leaks.)
    </para>
    <para>
      The presence of callbacks raises the question if functions
      provided by the library are <emphasis>reentrant</emphasis>.
      Unless a library was designed for such use, bad things will
      happen if a callback function uses functions in the same library
      (particularly if they are invoked on the same objects and
      manipulate the same state).  When the callback is invoked, the
      library can be in an inconsistent state.  Reentrant functions
      are more difficult to write than thread-safe functions (by
      definition, simple locking would immediately lead to deadlocks).
      It is also difficult to decide what to do when destruction of an
      object which is currently processing a callback is requested.
    </para>
  </section>

  <section>
    <title>Process attributes</title>
    <para>
      Several attributes are global and affect all code in the
      process, not just the library that manipulates them.
    </para>
    <itemizedlist>
    <listitem><para>
      environment variables
      (see <xref linkend="sect-Defensive_Coding-Tasks-secure_getenv"/>)
    </para></listitem>
    <listitem><para>
      umask
    </para></listitem>
    <listitem><para>
      user IDs, group IDs and capabilities
    </para></listitem>
    <listitem><para>
      current working directory
    </para></listitem>
    <listitem><para>
      signal handlers, signal masks and signal delivery
    </para></listitem>
    <listitem><para>
      file locks (especially <function>fcntl</function> locks
      behave in surprising ways, not just in a multi-threaded
      environment)
    </para></listitem>
    </itemizedlist>
    <para>
      Library code should avoid manipulating these global process
      attributes.  It should not rely on environment variables, umask,
      the current working directory and signal masks because these
      attributes can be inherted from an untrusted source.
    </para>
    <para>
      In addition, there are obvious process-wide aspects such as the
      virtual memory layout, the set of open files and dynamic shared
      objects, but with the exception of shared objects, these can be
      manipulated in a relatively isolated way.
    </para>

  </section>

</chapter>