453 lines
19 KiB
XML
453 lines
19 KiB
XML
<?xml version='1.0' encoding='utf-8' ?>
|
|
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
|
|
]>
|
|
<chapter id="chap-Defensive_Coding-Tasks-Serialization">
|
|
<title>Serialization and Deserialization</title>
|
|
<para>
|
|
Protocol decoders and file format parsers are often the
|
|
most-exposed part of an application because they are exposed with
|
|
little or no user interaction and before any authentication and
|
|
security checks are made. They are also difficult to write
|
|
robustly in languages which are not memory-safe.
|
|
</para>
|
|
|
|
<section id="sect-Defensive_Coding-Tasks-Serialization-Decoders">
|
|
<title>Recommendations for manually written decoders</title>
|
|
<para>
|
|
For C and C++, the advice in <xref
|
|
linkend="sect-Defensive_Coding-C-Pointers"/> applies. In
|
|
addition, avoid non-character pointers directly into input
|
|
buffers. Pointer misalignment causes crashes on some
|
|
architectures.
|
|
</para>
|
|
<para>
|
|
When reading variable-sized objects, do not allocate large
|
|
amounts of data solely based on the value of a size field. If
|
|
possible, grow the data structure as more data is read from the
|
|
source, and stop when no data is available. This helps to avoid
|
|
denial-of-service attacks where little amounts of input data
|
|
results in enormous memory allocations during decoding.
|
|
Alternatively, you can impose reasonable bounds on memory
|
|
allocations, but some protocols do not permit this.
|
|
</para>
|
|
</section>
|
|
|
|
<section>
|
|
<title>Protocol design</title>
|
|
<para>
|
|
Binary formats with explicit length fields are more difficult to
|
|
parse robustly than those where the length of dynamically-sized
|
|
elements is derived from sentinel values. A protocol which does
|
|
not use length fields and can be written in printable ASCII
|
|
characters simplifies testing and debugging. However, binary
|
|
protocols with length fields may be more efficient to parse.
|
|
</para>
|
|
</section>
|
|
|
|
<section>
|
|
<title id="sect-Defensive_Coding-Tasks-Serialization-Library">Library
|
|
support for deserialization</title>
|
|
<para>
|
|
For some languages, generic libraries are available which allow
|
|
to serialize and deserialize user-defined objects. The
|
|
deserialization part comes in one of two flavors, depending on
|
|
the library. The first kind uses type information in the data
|
|
stream to control which objects are instantiated. The second
|
|
kind uses type definitions supplied by the programmer. The
|
|
first one allows arbitrary object instantiation, the second one
|
|
generally does not.
|
|
</para>
|
|
<para>
|
|
The following serialization frameworks are in the first category,
|
|
are known to be unsafe, and must not be used for untrusted data:
|
|
</para>
|
|
<itemizedlist>
|
|
<listitem><para>
|
|
Python's <package>pickle</package> and <package>cPickle</package>
|
|
modules
|
|
</para></listitem>
|
|
<listitem><para>
|
|
Perl's <package>Storable</package> package
|
|
</para></listitem>
|
|
<listitem><para>
|
|
Java serialization (<type>java.io.ObjectInputStream</type>)
|
|
</para></listitem>
|
|
<listitem><para>
|
|
PHP serialization (<function>unserialize</function>)
|
|
</para></listitem>
|
|
<listitem><para>
|
|
Most implementations of YAML
|
|
</para></listitem>
|
|
</itemizedlist>
|
|
<para>
|
|
When using a type-directed deserialization format where the
|
|
types of the deserialized objects are specified by the
|
|
programmer, make sure that the objects which can be instantiated
|
|
cannot perform any destructive actions in their destructors,
|
|
even when the data members have been manipulated.
|
|
</para>
|
|
<para>
|
|
JSON decoders do not suffer from this problem. But you must not
|
|
use the <function>eval</function> function to parse JSON objects
|
|
in Javascript; even with the regular expression filter from RFC
|
|
4627, there are still information leaks remaining.
|
|
</para>
|
|
</section>
|
|
|
|
<section id="sect-Defensive_Coding-Tasks-Serialization-XML">
|
|
<title>XML serialization</title>
|
|
<para>
|
|
</para>
|
|
<section id="sect-Defensive_Coding-Tasks-Serialization-XML-External">
|
|
<title>External references</title>
|
|
<para>
|
|
XML documents can contain external references. They can occur
|
|
in various places.
|
|
</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
In the DTD declaration in the header of an XML document:
|
|
</para>
|
|
<informalexample>
|
|
<programlisting language="XML">
|
|
<![CDATA[<!DOCTYPE html PUBLIC
|
|
"-//W3C//DTD XHTML 1.0 Transitional//EN"
|
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">]]>
|
|
</programlisting>
|
|
</informalexample>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
In a namespace declaration:
|
|
</para>
|
|
<informalexample>
|
|
<programlisting language="XML">
|
|
<![CDATA[<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">]]>
|
|
</programlisting>
|
|
</informalexample>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
In an entity defintion:
|
|
</para>
|
|
<informalexample>
|
|
<programlisting language="XML">
|
|
<![CDATA[<!ENTITY sys SYSTEM "http://www.example.com/ent.xml">
|
|
<!ENTITY pub PUBLIC "-//Example//Public Entity//EN"
|
|
"http://www.example.com/pub-ent.xml">]]>
|
|
</programlisting>
|
|
</informalexample>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
In a notation:
|
|
</para>
|
|
<informalexample>
|
|
<programlisting language="XML">
|
|
<![CDATA[<!NOTATION not SYSTEM "../not.xml">]]>
|
|
</programlisting>
|
|
</informalexample>
|
|
</listitem>
|
|
</itemizedlist>
|
|
<para>
|
|
Originally, these external references were intended as unique
|
|
identifiers, but by many XML implementations, they are used
|
|
for locating the data for the referenced element. This causes
|
|
unwanted network traffic, and may disclose file system
|
|
contents or otherwise unreachable network resources, so this
|
|
functionality should be disabled.
|
|
</para>
|
|
<para>
|
|
Depending on the XML library, external referenced might be
|
|
processed not just when parsing XML, but also when generating
|
|
it.
|
|
</para>
|
|
</section>
|
|
<section id="sect-Defensive_Coding-Tasks-Serialization-XML-Entities">
|
|
<title>Entity expansion</title>
|
|
<para>
|
|
When external DTD processing is disabled, an internal DTD
|
|
subset can still contain entity definitions. Entity
|
|
declarations can reference other entities. Some XML libraries
|
|
expand entities automatically, and this processing cannot be
|
|
switched off in some places (such as attribute values or
|
|
content models). Without limits on the entity nesting level,
|
|
this expansion results in data which can grow exponentially in
|
|
length with size of the input. (If there is a limit on the
|
|
nesting level, the growth is still polynomial, unless further
|
|
limits are imposed.)
|
|
</para>
|
|
<para>
|
|
Consequently, the processing internal DTD subsets should be
|
|
disabled if possible, and only trusted DTDs should be
|
|
processed. If a particular XML application does not permit
|
|
such restrictions, then application-specific limits are called
|
|
for.
|
|
</para>
|
|
</section>
|
|
|
|
<section id="sect-Defensive_Coding-Tasks-Serialization-XML-XInclude">
|
|
<title>XInclude processing</title>
|
|
<para>
|
|
XInclude processing can reference file and network resources
|
|
and include them into the document, much like external entity
|
|
references. When parsing untrusted XML documents, XInclude
|
|
processing should be truned off.
|
|
</para>
|
|
<para>
|
|
XInclude processing is also fairly complex and may pull in
|
|
support for the XPointer and XPath specifications,
|
|
considerably increasing the amount of code required for XML
|
|
processing.
|
|
</para>
|
|
</section>
|
|
|
|
<section id="sect-Defensive_Coding-Tasks-Serialization-XML-Validation">
|
|
<title>Algorithmic complexity of XML validation</title>
|
|
<para>
|
|
DTD-based XML validation uses regular expressions for content
|
|
models. The XML specification requires that content models
|
|
are deterministic, which means that efficient validation is
|
|
possible. However, some implementations do not enforce
|
|
determinism, and require exponential (or just polynomial)
|
|
amount of space or time for validating some DTD/document
|
|
combinations.
|
|
</para>
|
|
<para>
|
|
XML schemas and RELAX NG (via the <literal>xsd:</literal>
|
|
prefix) directly support textual regular expressions which are
|
|
not required to be deterministic.
|
|
</para>
|
|
</section>
|
|
<section id="sect-Defensive_Coding-Tasks-Serialization-XML-Expat">
|
|
<title>Using Expat for XML parsing</title>
|
|
<para>
|
|
By default, Expat does not try to resolve external IDs, so no
|
|
steps are required to block them. However, internal entity
|
|
declarations are processed. Installing a callback which stops
|
|
parsing as soon as such entities are encountered disables
|
|
them, see <xref
|
|
linkend="ex-Defensive_Coding-Tasks-Serialization-XML-Expat-EntityDeclHandler"/>.
|
|
Expat does not perform any validation, so there are no
|
|
problems related to that.
|
|
</para>
|
|
<example id="ex-Defensive_Coding-Tasks-Serialization-XML-Expat-EntityDeclHandler">
|
|
<title>Disabling XML entity processing with Expat</title>
|
|
<xi:include href="snippets/Serialization-XML-Expat-EntityDeclHandler.xml"
|
|
xmlns:xi="http://www.w3.org/2001/XInclude" />
|
|
</example>
|
|
<para>
|
|
This handler must be installed when the
|
|
<literal>XML_Parser</literal> object is created (<xref
|
|
linkend="ex-Defensive_Coding-Tasks-Serialization-XML-Expat-Create"/>).
|
|
</para>
|
|
<example id="ex-Defensive_Coding-Tasks-Serialization-XML-Expat-Create">
|
|
<title>Creating an Expat XML parser</title>
|
|
<xi:include href="snippets/Serialization-XML-Expat-Create.xml"
|
|
xmlns:xi="http://www.w3.org/2001/XInclude" />
|
|
</example>
|
|
<para>
|
|
It is also possible to reject internal DTD subsets altogeher,
|
|
using a suitable
|
|
<literal>XML_StartDoctypeDeclHandler</literal> handler
|
|
installed with <function>XML_SetDoctypeDeclHandler</function>.
|
|
</para>
|
|
</section>
|
|
|
|
<section id="sect-Defensive_Coding-Tasks-Serialization-Qt">
|
|
<title>Using Qt for XML parsing</title>
|
|
<para>
|
|
The XML component of Qt, QtXml, does not resolve external IDs
|
|
by default, so it is not requred to prevent such resolution.
|
|
Internal entities are processed, though. To change that, a
|
|
custom <literal>QXmlDeclHandler</literal> and
|
|
<literal>QXmlSimpleReader</literal> subclasses are needed. It
|
|
is not possible to use the
|
|
<function>QDomDocument::setContent(const QByteArray
|
|
&)</function> convenience methods.
|
|
</para>
|
|
<para>
|
|
<xref linkend="ex-Defensive_Coding-Tasks-Serialization-XML-Qt-NoEntityHandler"/>
|
|
shows an entity handler which always returns errors, causing
|
|
parsing to stop when encountering entity declarations.
|
|
</para>
|
|
<example id="ex-Defensive_Coding-Tasks-Serialization-XML-Qt-NoEntityHandler">
|
|
<title>A QtXml entity handler which blocks entity processing</title>
|
|
<xi:include href="snippets/Serialization-XML-Qt-NoEntityHandler.xml"
|
|
xmlns:xi="http://www.w3.org/2001/XInclude" />
|
|
</example>
|
|
<para>
|
|
This handler is used in the custom
|
|
<literal>QXmlReader</literal> subclass in <xref
|
|
linkend="ex-Defensive_Coding-Tasks-Serialization-XML-Qt-NoEntityReader"/>.
|
|
Some parts of QtXml will call the
|
|
<function>setDeclHandler(QXmlDeclHandler *)</function> method.
|
|
Consequently, we prevent overriding our custom handler by
|
|
providing a definition of this method which does nothing. In
|
|
the constructor, we activate namespace processing; this part
|
|
may need adjusting.
|
|
</para>
|
|
<example id="ex-Defensive_Coding-Tasks-Serialization-XML-Qt-NoEntityReader">
|
|
<title>A QtXml XML reader which blocks entity processing</title>
|
|
<xi:include href="snippets/Serialization-XML-Qt-NoEntityReader.xml"
|
|
xmlns:xi="http://www.w3.org/2001/XInclude" />
|
|
</example>
|
|
<para>
|
|
Our <literal>NoEntityReader</literal> class can be used with
|
|
one of the overloaded
|
|
<function>QDomDocument::setContent</function> methods.
|
|
<xref linkend="ex-Defensive_Coding-Tasks-Serialization-XML-Qt-QDomDocument"/>
|
|
shows how the <literal>buffer</literal> object (of type
|
|
<literal>QByteArray</literal>) is wrapped as a
|
|
<literal>QXmlInputSource</literal>. After calling the
|
|
<function>setContent</function> method, you should check the
|
|
return value and report any error.
|
|
</para>
|
|
<example id="ex-Defensive_Coding-Tasks-Serialization-XML-Qt-QDomDocument">
|
|
<title>Parsing an XML document with QDomDocument, without entity expansion</title>
|
|
<xi:include href="snippets/Serialization-XML-Qt-QDomDocument.xml"
|
|
xmlns:xi="http://www.w3.org/2001/XInclude" />
|
|
</example>
|
|
</section>
|
|
|
|
<section id="sect-Defensive_Coding-Tasks-Serialization-XML-OpenJDK_Parse">
|
|
<title>Using OpenJDK for XML parsing and validation</title>
|
|
<para>
|
|
OpenJDK contains facilities for DOM-based, SAX-based, and
|
|
StAX-based document parsing. Documents can be validated
|
|
against DTDs or XML schemas.
|
|
</para>
|
|
<para>
|
|
The approach taken to deal with entity expansion differs from
|
|
the general recommendation in <xref
|
|
linkend="sect-Defensive_Coding-Tasks-Serialization-XML-Entities"/>.
|
|
We enable the the feature flag
|
|
<literal>javax.xml.XMLConstants.FEATURE_SECURE_PROCESSING</literal>,
|
|
which enforces heuristic restrictions on the number of entity
|
|
expansions. Note that this flag alone does not prevent
|
|
resolution of external references (system IDs or public IDs),
|
|
so it is slightly misnamed.
|
|
</para>
|
|
<para>
|
|
In the following sections, we use helper classes to prevent
|
|
external ID resolution.
|
|
</para>
|
|
<example id="ex-Defensive_Coding-Tasks-Serialization-XML-OpenJDK-NoEntityResolver">
|
|
<title>Helper class to prevent DTD external entity resolution in OpenJDK</title>
|
|
<xi:include href="snippets/Serialization-XML-OpenJDK-NoEntityResolver.xml"
|
|
xmlns:xi="http://www.w3.org/2001/XInclude" />
|
|
</example>
|
|
<example id="ex-Defensive_Coding-Tasks-Serialization-XML-OpenJDK-NoResourceResolver">
|
|
<title>Helper class to prevent schema resolution in
|
|
OpenJDK</title>
|
|
<xi:include href="snippets/Serialization-XML-OpenJDK-NoResourceResolver.xml"
|
|
xmlns:xi="http://www.w3.org/2001/XInclude" />
|
|
</example>
|
|
<para>
|
|
<xref linkend="ex-Defensive_Coding-Tasks-Serialization-XML-OpenJDK-Imports"/>
|
|
shows the imports used by the examples.
|
|
</para>
|
|
<example id="ex-Defensive_Coding-Tasks-Serialization-XML-OpenJDK-Imports">
|
|
<title>Java imports for OpenJDK XML parsing</title>
|
|
<xi:include href="snippets/Serialization-XML-OpenJDK-Imports.xml"
|
|
xmlns:xi="http://www.w3.org/2001/XInclude" />
|
|
</example>
|
|
<section id="sect-Defensive_Coding-Tasks-Serialization-XML-OpenJDK_Parse-DOM">
|
|
<title>DOM-based XML parsing and DTD validation in OpenJDK</title>
|
|
<para>
|
|
This approach produces a
|
|
<literal>org.w3c.dom.Document</literal> object from an input
|
|
stream. <xref linkend="ex-Defensive_Coding-Tasks-Serialization-XML-OpenJDK_Parse-DOM"/>
|
|
use the data from the <literal>java.io.InputStream</literal>
|
|
instance in the <literal>inputStream</literal> variable.
|
|
</para>
|
|
<example id="ex-Defensive_Coding-Tasks-Serialization-XML-OpenJDK_Parse-DOM">
|
|
<title>DOM-based XML parsing in OpenJDK</title>
|
|
<xi:include href="snippets/Serialization-XML-OpenJDK_Parse-DOM.xml"
|
|
xmlns:xi="http://www.w3.org/2001/XInclude" />
|
|
</example>
|
|
<para>
|
|
External entity references are prohibited using the
|
|
<literal>NoEntityResolver</literal> class in
|
|
<xref linkend="ex-Defensive_Coding-Tasks-Serialization-XML-OpenJDK-NoEntityResolver"/>.
|
|
Because external DTD references are prohibited, DTD validation
|
|
(if enabled) will only happen against the internal DTD subset
|
|
embedded in the XML document.
|
|
</para>
|
|
<para>
|
|
To validate the document against an external DTD, use a
|
|
<literal>javax.xml.transform.Transformer</literal> class to
|
|
add the DTD reference to the document, and an entity
|
|
resolver which whitelists this external reference.
|
|
</para>
|
|
</section>
|
|
|
|
<section id="sect-Defensive_Coding-Tasks-Serialization-XML-OpenJDK_Parse-SAX">
|
|
<title>XML Schema validation in OpenJDK</title>
|
|
<para>
|
|
<xref linkend="ex-Defensive_Coding-Tasks-Serialization-XML-OpenJDK_Parse-XMLSchema_SAX"/>
|
|
shows how to validate a document against an XML Schema,
|
|
using a SAX-based approach. The XML data is read from an
|
|
<literal>java.io.InputStream</literal> in the
|
|
<literal>inputStream</literal> variable.
|
|
</para>
|
|
<example id="ex-Defensive_Coding-Tasks-Serialization-XML-OpenJDK_Parse-XMLSchema_SAX">
|
|
<title>SAX-based validation against an XML schema in
|
|
OpenJDK</title>
|
|
<xi:include href="snippets/Serialization-XML-OpenJDK_Parse-XMLSchema_SAX.xml"
|
|
xmlns:xi="http://www.w3.org/2001/XInclude" />
|
|
</example>
|
|
<para>
|
|
The <literal>NoResourceResolver</literal> class is defined
|
|
in <xref linkend="ex-Defensive_Coding-Tasks-Serialization-XML-OpenJDK-NoResourceResolver"/>.
|
|
</para>
|
|
<para>
|
|
If you need to validate a document against an XML schema,
|
|
use the code in <xref
|
|
linkend="ex-Defensive_Coding-Tasks-Serialization-XML-OpenJDK_Parse-DOM"/>
|
|
to create the document, but do not enable validation at this
|
|
point. Then use
|
|
<xref linkend="ex-Defensive_Coding-Tasks-Serialization-XML-OpenJDK_Parse-XMLSchema_DOM"/>
|
|
to perform the schema-based validation on the
|
|
<literal>org.w3c.dom.Document</literal> instance
|
|
<literal>document</literal>.
|
|
</para>
|
|
<example id="ex-Defensive_Coding-Tasks-Serialization-XML-OpenJDK_Parse-XMLSchema_DOM">
|
|
<title>Validation of a DOM document against an XML schema in
|
|
OpenJDK</title>
|
|
<xi:include href="snippets/Serialization-XML-OpenJDK_Parse-XMLSchema_DOM.xml"
|
|
xmlns:xi="http://www.w3.org/2001/XInclude" />
|
|
</example>
|
|
</section>
|
|
</section>
|
|
</section>
|
|
|
|
<section>
|
|
<title>Protocol Encoders</title>
|
|
<para>
|
|
For protocol encoders, you should write bytes to a buffer which
|
|
grows as needed, using an exponential sizing policy. Explicit
|
|
lengths can be patched in later, once they are known.
|
|
Allocating the required number of bytes upfront typically
|
|
requires separate code to compute the final size, which must be
|
|
kept in sync with the actual encoding step, or vulnerabilities
|
|
may result. In multi-threaded code, parts of the object being
|
|
deserialized might change, so that the computed size is out of
|
|
date.
|
|
</para>
|
|
<para>
|
|
You should avoid copying data directly from a received packet
|
|
during encoding, disregarding the format. Propagating malformed
|
|
data could enable attacks on other recipients of that data.
|
|
</para>
|
|
<para>
|
|
When using C or C++ and copying whole data structures directly
|
|
into the output, make sure that you do not leak information in
|
|
padding bytes between fields or at the end of the
|
|
<literal>struct</literal>.
|
|
</para>
|
|
</section>
|
|
|
|
</chapter>
|
|
|