530 lines
18 KiB
Text
530 lines
18 KiB
Text
|
|
:experimental:
|
|
|
|
[[chap-Defensive_Coding-Tasks-Serialization]]
|
|
= Serialization and Deserialization
|
|
|
|
Protocol decoders and file format parsers are often the
|
|
most-exposed part of an application because they are exposed with
|
|
little or no user interaction and before any authentication and
|
|
security checks are made. They are also difficult to write
|
|
robustly in languages which are not memory-safe.
|
|
|
|
[[sect-Defensive_Coding-Tasks-Serialization-Decoders]]
|
|
== Recommendations for Manually-written Decoders
|
|
|
|
For C and C++, the advice in xref:../programming-languages/C-Language.adoc#sect-Defensive_Coding-C-Pointers[Recommendations for Pointers and Array Handling] applies. In
|
|
addition, avoid non-character pointers directly into input
|
|
buffers. Pointer misalignment causes crashes on some
|
|
architectures.
|
|
|
|
When reading variable-sized objects, do not allocate large
|
|
amounts of data solely based on the value of a size field. If
|
|
possible, grow the data structure as more data is read from the
|
|
source, and stop when no data is available. This helps to avoid
|
|
denial-of-service attacks where little amounts of input data
|
|
results in enormous memory allocations during decoding.
|
|
Alternatively, you can impose reasonable bounds on memory
|
|
allocations, but some protocols do not permit this.
|
|
|
|
== Protocol Design
|
|
|
|
Binary formats with explicit length fields are more difficult to
|
|
parse robustly than those where the length of dynamically-sized
|
|
elements is derived from sentinel values. A protocol which does
|
|
not use length fields and can be written in printable ASCII
|
|
characters simplifies testing and debugging. However, binary
|
|
protocols with length fields may be more efficient to parse.
|
|
|
|
In new datagram-oriented protocols, unique numbers such as
|
|
sequence numbers or identifiers for fragment reassembly (see
|
|
<<sect-Defensive_Coding-Tasks-Serialization-Fragmentation>>)
|
|
should be at least 64 bits large, and really should not be
|
|
smaller than 32 bits in size. Protocols should not permit
|
|
fragments with overlapping contents.
|
|
|
|
[[sect-Defensive_Coding-Tasks-Serialization-Fragmentation]]
|
|
== Fragmentation
|
|
|
|
Some serialization formats use frames or protocol data units
|
|
(PDUs) on lower levels which are smaller than the PDUs on higher
|
|
levels. With such an architecture, higher-level PDUs may have
|
|
to be *fragmented* into smaller frames during
|
|
serialization, and frames may need
|
|
*reassembly* into large PDUs during
|
|
deserialization.
|
|
|
|
Serialization formats may use conceptually similar structures
|
|
for completely different purposes, for example storing multiple
|
|
layers and color channels in a single image file.
|
|
|
|
When fragmenting PDUs, establish a reasonable lower bound for
|
|
the size of individual fragments (as large as possible—limits as
|
|
low as one or even zero can add substantial overhead). Avoid
|
|
fragmentation if at all possible, and try to obtain the maximum
|
|
acceptable fragment length from a trusted data source.
|
|
|
|
When implementing reassembly, consider the following aspects.
|
|
|
|
* Avoid allocating significant amount of resources without
|
|
proper authentication. Allocate memory for the unfragmented
|
|
PDU as more and more and fragments are encountered, and not
|
|
based on the initially advertised unfragmented PDU size,
|
|
unless there is a sufficiently low limit on the unfragmented
|
|
PDU size, so that over-allocation cannot lead to performance
|
|
problems.
|
|
|
|
* Reassembly queues on top of datagram-oriented transports
|
|
should be bounded, both in the combined size of the arrived
|
|
partial PDUs waiting for reassembly, and the total number of
|
|
partially reassembled fragments. The latter limit helps to
|
|
reduce the risk of accidental reassembly of unrelated
|
|
fragments, as it can happen with small fragment IDs (see
|
|
<<sect-Defensive_Coding-Tasks-Serialization-Fragmentation-ID>>).
|
|
It also guards to some extent against deliberate injection of fragments,
|
|
by guessing fragment IDs.
|
|
|
|
* Carefully keep track of which bytes in the unfragmented PDU
|
|
have been covered by fragments so far. If message
|
|
reordering is a concern, the most straightforward data
|
|
structure for this is an array of bits, with one bit for
|
|
every byte (or other atomic unit) in the unfragmented PDU.
|
|
Complete reassembly can be determined by increasing a
|
|
counter of set bits in the bit array as the bit array is
|
|
updated, taking overlapping fragments into consideration.
|
|
|
|
* Reject overlapping fragments (that is, multiple fragments
|
|
which provide data at the same offset of the PDU being
|
|
fragmented), unless the protocol explicitly requires
|
|
accepting overlapping fragments. The bit array used for
|
|
tracking already arrived bytes can be used for this purpose.
|
|
|
|
* Check for conflicting values of unfragmented PDU lengths (if
|
|
this length information is part of every fragment) and
|
|
reject fragments which are inconsistent.
|
|
|
|
* Validate fragment lengths and offsets of individual
|
|
fragments against the unfragmented PDU length (if they are
|
|
present). Check that the last byte in the fragment does not
|
|
lie after the end of the unfragmented PDU. Avoid integer
|
|
overflows in these computations (see xref:../programming-languages/C-Language.adoc#sect-Defensive_Coding-C-Arithmetic[Recommendations for Integer Arithmetic]).
|
|
|
|
[[sect-Defensive_Coding-Tasks-Serialization-Fragmentation-ID]]
|
|
=== Fragment IDs
|
|
|
|
If the underlying transport is datagram-oriented (so that PDUs
|
|
can be reordered, duplicated or be lost, like with UDP),
|
|
fragment reassembly needs to take into account endpoint
|
|
addresses of the communication channel, and there has to be
|
|
some sort of fragment ID which identifies the individual
|
|
fragments as part of a larger PDU. In addition, the
|
|
fragmentation protocol will typically involve fragment offsets
|
|
and fragment lengths, as mentioned above.
|
|
|
|
If the transport may be subject to blind PDU injection (again,
|
|
like UDP), the fragment ID must be generated randomly. If the
|
|
fragment ID is 64 bit or larger (strongly recommended), it can
|
|
be generated in a completely random fashion for most traffic
|
|
volumes. If it is less than 64 bits large (so that accidental
|
|
collisions can happen if a lot of PDUs are transmitted), the
|
|
fragment ID should be incremented sequentially from a starting
|
|
value. The starting value should be derived using a HMAC-like
|
|
construction from the endpoint addresses, using a long-lived
|
|
random key. This construction ensures that despite the
|
|
limited range of the ID, accidental collisions are as unlikely
|
|
as possible. (This will not work reliable with really short
|
|
fragment IDs, such as the 16 bit IDs used by the Internet
|
|
Protocol.)
|
|
|
|
[[sect-Defensive_Coding-Tasks-Serialization-Library]]
|
|
== Library Support for Deserialization
|
|
|
|
There are too many subtleties when dealing with Deserialization to be discussed here.
|
|
A more detailed and updated guide is available as https://cheatsheetseries.owasp.org/cheatsheets/Deserialization_Cheat_Sheet.html[OWASP Deserialization Cheat Sheet]
|
|
|
|
[[sect-Defensive_Coding-Tasks-Serialization-XML]]
|
|
== XML Serialization
|
|
|
|
[[sect-Defensive_Coding-Tasks-Serialization-XML-External]]
|
|
=== External References
|
|
|
|
XML documents can contain external references. They can occur
|
|
in various places.
|
|
|
|
* In the DTD declaration in the header of an XML document:
|
|
+
|
|
[source,xml]
|
|
----
|
|
|
|
<!DOCTYPE html PUBLIC
|
|
"-//W3C//DTD XHTML 1.0 Transitional//EN"
|
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
|
|
|
----
|
|
|
|
* In a namespace declaration:
|
|
+
|
|
[source,xml]
|
|
----
|
|
|
|
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
|
|
|
|
----
|
|
|
|
* In an entity definition:
|
|
+
|
|
[source,xml]
|
|
----
|
|
|
|
<!ENTITY sys SYSTEM "http://www.example.com/ent.adoc[]>
|
|
<!ENTITY pub PUBLIC "-//Example//Public Entity//EN"
|
|
"http://www.example.com/pub-ent.adoc[]>
|
|
|
|
----
|
|
|
|
* In a notation:
|
|
+
|
|
[source,xml]
|
|
----
|
|
|
|
<!NOTATION not SYSTEM "../not.adoc[]>
|
|
|
|
----
|
|
|
|
Originally, these external references were intended as unique
|
|
identifiers, but by many XML implementations, they are used
|
|
for locating the data for the referenced element. This causes
|
|
unwanted network traffic, and may disclose file system
|
|
contents or otherwise unreachable network resources, so this
|
|
functionality should be disabled.
|
|
|
|
Depending on the XML library, external referenced might be
|
|
processed not just when parsing XML, but also when generating
|
|
it.
|
|
|
|
[[sect-Defensive_Coding-Tasks-Serialization-XML-Entities]]
|
|
=== Entity Expansion
|
|
|
|
When external DTD processing is disabled, an internal DTD
|
|
subset can still contain entity definitions. Entity
|
|
declarations can reference other entities. Some XML libraries
|
|
expand entities automatically, and this processing cannot be
|
|
switched off in some places (such as attribute values or
|
|
content models). Without limits on the entity nesting level,
|
|
this expansion results in data which can grow exponentially in
|
|
length with size of the input. (If there is a limit on the
|
|
nesting level, the growth is still polynomial, unless further
|
|
limits are imposed.)
|
|
|
|
Consequently, the processing internal DTD subsets should be
|
|
disabled if possible, and only trusted DTDs should be
|
|
processed. If a particular XML application does not permit
|
|
such restrictions, then application-specific limits are called
|
|
for.
|
|
|
|
[[sect-Defensive_Coding-Tasks-Serialization-XML-XInclude]]
|
|
=== XInclude Processing
|
|
|
|
XInclude processing can reference file and network resources
|
|
and include them into the document, much like external entity
|
|
references. When parsing untrusted XML documents, XInclude
|
|
processing should be turned off.
|
|
|
|
XInclude processing is also fairly complex and may pull in
|
|
support for the XPointer and XPath specifications,
|
|
considerably increasing the amount of code required for XML
|
|
processing.
|
|
|
|
[[sect-Defensive_Coding-Tasks-Serialization-XML-Validation]]
|
|
=== Algorithmic Complexity of XML Validation
|
|
|
|
DTD-based XML validation uses regular expressions for content
|
|
models. The XML specification requires that content models
|
|
are deterministic, which means that efficient validation is
|
|
possible. However, some implementations do not enforce
|
|
determinism, and require exponential (or just polynomial)
|
|
amount of space or time for validating some DTD/document
|
|
combinations.
|
|
|
|
XML schemas and RELAX NG (via the `xsd:`
|
|
prefix) directly support textual regular expressions which are
|
|
not required to be deterministic.
|
|
|
|
[[sect-Defensive_Coding-Tasks-Serialization-XML-Expat]]
|
|
=== Using Expat for XML parsing
|
|
|
|
By default, Expat does not try to resolve external IDs, so no
|
|
steps are required to block them. However, internal entity
|
|
declarations are processed. Installing a callback which stops
|
|
parsing as soon as such entities are encountered disables
|
|
them, see <<ex-Defensive_Coding-Tasks-Serialization-XML-Expat-EntityDeclHandler>>.
|
|
Expat does not perform any validation, so there are no
|
|
problems related to that.
|
|
|
|
[[ex-Defensive_Coding-Tasks-Serialization-XML-Expat-EntityDeclHandler]]
|
|
.Disabling XML entity processing with Expat
|
|
====
|
|
|
|
[source,java]
|
|
----
|
|
include::example$Tasks-Serialization-XML-Expat-EntityDeclHandler.adoc[]
|
|
|
|
----
|
|
|
|
====
|
|
|
|
This handler must be installed when the
|
|
`XML_Parser` object is created (<<ex-Defensive_Coding-Tasks-Serialization-XML-Expat-Create>>).
|
|
|
|
[[ex-Defensive_Coding-Tasks-Serialization-XML-Expat-Create]]
|
|
.Creating an Expat XML parser
|
|
====
|
|
|
|
[source,java]
|
|
----
|
|
include::example$Tasks-Serialization-XML-Expat-Create.adoc[]
|
|
|
|
----
|
|
|
|
====
|
|
|
|
It is also possible to reject internal DTD subsets altogether,
|
|
using a suitable
|
|
`XML_StartDoctypeDeclHandler` handler
|
|
installed with `XML_SetDoctypeDeclHandler`.
|
|
|
|
[[sect-Defensive_Coding-Tasks-Serialization-Qt]]
|
|
=== Using Qt for XML Parsing
|
|
|
|
The XML component of Qt, QtXml, does not resolve external IDs
|
|
by default, so it is not required to prevent such resolution.
|
|
Internal entities are processed, though. To change that, a
|
|
custom `QXmlDeclHandler` and
|
|
`QXmlSimpleReader` subclasses are needed. It
|
|
is not possible to use the
|
|
`QDomDocument::setContent(const QByteArray
|
|
&)` convenience methods.
|
|
|
|
<<ex-Defensive_Coding-Tasks-Serialization-XML-Qt-NoEntityHandler>>
|
|
shows an entity handler which always returns errors, causing
|
|
parsing to stop when encountering entity declarations.
|
|
|
|
[[ex-Defensive_Coding-Tasks-Serialization-XML-Qt-NoEntityHandler]]
|
|
.A QtXml entity handler which blocks entity processing
|
|
====
|
|
|
|
[source,java]
|
|
----
|
|
include::example$Tasks-Serialization-XML-Qt-NoEntityHandler.adoc[]
|
|
|
|
----
|
|
|
|
====
|
|
|
|
This handler is used in the custom
|
|
`QXmlReader` subclass in <<ex-Defensive_Coding-Tasks-Serialization-XML-Qt-NoEntityReader>>.
|
|
Some parts of QtXml will call the
|
|
`setDeclHandler(QXmlDeclHandler *)` method.
|
|
Consequently, we prevent overriding our custom handler by
|
|
providing a definition of this method which does nothing. In
|
|
the constructor, we activate namespace processing; this part
|
|
may need adjusting.
|
|
|
|
[[ex-Defensive_Coding-Tasks-Serialization-XML-Qt-NoEntityReader]]
|
|
.A QtXml XML reader which blocks entity processing
|
|
====
|
|
|
|
[source,java]
|
|
----
|
|
include::example$Tasks-Serialization-XML-Qt-NoEntityReader.adoc[]
|
|
|
|
----
|
|
|
|
====
|
|
|
|
Our `NoEntityReader` class can be used with
|
|
one of the overloaded
|
|
`QDomDocument::setContent` methods.
|
|
<<ex-Defensive_Coding-Tasks-Serialization-XML-Qt-QDomDocument>>
|
|
shows how the `buffer` object (of type
|
|
`QByteArray`) is wrapped as a
|
|
`QXmlInputSource`. After calling the
|
|
`setContent` method, you should check the
|
|
return value and report any error.
|
|
|
|
[[ex-Defensive_Coding-Tasks-Serialization-XML-Qt-QDomDocument]]
|
|
.Parsing an XML document with QDomDocument, without entity expansion
|
|
====
|
|
|
|
[source,java]
|
|
----
|
|
include::example$Tasks-Serialization-XML-Qt-QDomDocument.adoc[]
|
|
|
|
----
|
|
|
|
====
|
|
|
|
[[sect-Defensive_Coding-Tasks-Serialization-XML-OpenJDK_Parse]]
|
|
=== Using OpenJDK for XML Parsing and Validation
|
|
|
|
OpenJDK contains facilities for DOM-based, SAX-based, and
|
|
StAX-based document parsing. Documents can be validated
|
|
against DTDs or XML schemas.
|
|
|
|
The approach taken to deal with entity expansion differs from
|
|
the general recommendation in <<sect-Defensive_Coding-Tasks-Serialization-XML-Entities>>.
|
|
We enable the the feature flag
|
|
`javax.xml.XMLConstants.FEATURE_SECURE_PROCESSING`,
|
|
which enforces heuristic restrictions on the number of entity
|
|
expansions. Note that this flag alone does not prevent
|
|
resolution of external references (system IDs or public IDs),
|
|
so it is slightly misnamed.
|
|
|
|
In the following sections, we use helper classes to prevent
|
|
external ID resolution.
|
|
|
|
[[ex-Defensive_Coding-Tasks-Serialization-XML-OpenJDK-NoEntityResolver]]
|
|
.Helper class to prevent DTD external entity resolution in OpenJDK
|
|
====
|
|
|
|
[source,java]
|
|
----
|
|
include::example$Tasks-Serialization-XML-OpenJDK-NoEntityResolver.adoc[]
|
|
|
|
----
|
|
|
|
====
|
|
|
|
[[ex-Defensive_Coding-Tasks-Serialization-XML-OpenJDK-NoResourceResolver]]
|
|
.Helper class to prevent schema resolution in OpenJDK
|
|
====
|
|
|
|
[source,java]
|
|
----
|
|
include::example$Tasks-Serialization-XML-OpenJDK-NoResourceResolver.adoc[]
|
|
----
|
|
|
|
====
|
|
|
|
<<ex-Defensive_Coding-Tasks-Serialization-XML-OpenJDK-Imports>>
|
|
shows the imports used by the examples.
|
|
|
|
[[ex-Defensive_Coding-Tasks-Serialization-XML-OpenJDK-Imports]]
|
|
.Java imports for OpenJDK XML parsing
|
|
====
|
|
|
|
[source,java]
|
|
----
|
|
include::example$Tasks-Serialization-XML-OpenJDK-Imports.adoc[]
|
|
----
|
|
|
|
====
|
|
|
|
[[sect-Defensive_Coding-Tasks-Serialization-XML-OpenJDK_Parse-DOM]]
|
|
==== DOM-based XML parsing and DTD validation in OpenJDK
|
|
|
|
This approach produces a
|
|
`org.w3c.dom.Document` object from an input
|
|
stream. <<ex-Defensive_Coding-Tasks-Serialization-XML-OpenJDK_Parse-DOM>>
|
|
use the data from the `java.io.InputStream`
|
|
instance in the `inputStream` variable.
|
|
|
|
[[ex-Defensive_Coding-Tasks-Serialization-XML-OpenJDK_Parse-DOM]]
|
|
.DOM-based XML parsing in OpenJDK
|
|
====
|
|
|
|
[source,java]
|
|
----
|
|
include::example$Tasks-Serialization-XML-OpenJDK_Parse-DOM.adoc[]
|
|
----
|
|
|
|
====
|
|
|
|
External entity references are prohibited using the
|
|
`NoEntityResolver` class in
|
|
<<ex-Defensive_Coding-Tasks-Serialization-XML-OpenJDK-NoEntityResolver>>.
|
|
Because external DTD references are prohibited, DTD validation
|
|
(if enabled) will only happen against the internal DTD subset
|
|
embedded in the XML document.
|
|
|
|
To validate the document against an external DTD, use a
|
|
`javax.xml.transform.Transformer` class to
|
|
add the DTD reference to the document, and an entity
|
|
resolver which whitelists this external reference.
|
|
|
|
[[sect-Defensive_Coding-Tasks-Serialization-XML-OpenJDK_Parse-SAX]]
|
|
==== XML Schema Validation in OpenJDK
|
|
|
|
<<ex-Defensive_Coding-Tasks-Serialization-XML-OpenJDK_Parse-XMLSchema_SAX>>
|
|
shows how to validate a document against an XML Schema,
|
|
using a SAX-based approach. The XML data is read from an
|
|
`java.io.InputStream` in the
|
|
`inputStream` variable.
|
|
|
|
[[ex-Defensive_Coding-Tasks-Serialization-XML-OpenJDK_Parse-XMLSchema_SAX]]
|
|
.SAX-based validation against an XML schema in OpenJDK
|
|
====
|
|
|
|
[source,java]
|
|
----
|
|
include::example$Tasks-Serialization-XML-OpenJDK_Parse-XMLSchema_SAX.adoc[]
|
|
----
|
|
|
|
====
|
|
|
|
The `NoResourceResolver` class is defined
|
|
in <<ex-Defensive_Coding-Tasks-Serialization-XML-OpenJDK-NoResourceResolver>>.
|
|
|
|
If you need to validate a document against an XML schema,
|
|
use the code in <<ex-Defensive_Coding-Tasks-Serialization-XML-OpenJDK_Parse-DOM>>
|
|
to create the document, but do not enable validation at this
|
|
point. Then use
|
|
<<ex-Defensive_Coding-Tasks-Serialization-XML-OpenJDK_Parse-XMLSchema_DOM>>
|
|
to perform the schema-based validation on the
|
|
`org.w3c.dom.Document` instance
|
|
`document`.
|
|
|
|
[[ex-Defensive_Coding-Tasks-Serialization-XML-OpenJDK_Parse-XMLSchema_DOM]]
|
|
.Validation of a DOM document against an XML schema in OpenJDK
|
|
====
|
|
|
|
[source,java]
|
|
----
|
|
include::example$Tasks-Serialization-XML-OpenJDK_Parse-XMLSchema_DOM.adoc[]
|
|
----
|
|
|
|
====
|
|
|
|
[[sect-Defensive_Coding-Tasks-Serialization-XML-OpenJDK_Parse-Other]]
|
|
==== Other XML Parsers in OpenJDK
|
|
|
|
OpenJDK contains additional XML parsing and processing
|
|
facilities. Some of them are insecure.
|
|
|
|
The class `java.beans.XMLDecoder` acts as a
|
|
bridge between the Java object serialization format and XML.
|
|
It is close to impossible to securely deserialize Java
|
|
objects in this format from untrusted inputs, so its use is
|
|
not recommended, as with the Java object serialization
|
|
format itself. See <<sect-Defensive_Coding-Tasks-Serialization-Library>>.
|
|
|
|
== Protocol Encoders
|
|
|
|
For protocol encoders, you should write bytes to a buffer which
|
|
grows as needed, using an exponential sizing policy. Explicit
|
|
lengths can be patched in later, once they are known.
|
|
Allocating the required number of bytes upfront typically
|
|
requires separate code to compute the final size, which must be
|
|
kept in sync with the actual encoding step, or vulnerabilities
|
|
may result. In multi-threaded code, parts of the object being
|
|
deserialized might change, so that the computed size is out of
|
|
date.
|
|
|
|
You should avoid copying data directly from a received packet
|
|
during encoding, disregarding the format. Propagating malformed
|
|
data could enable attacks on other recipients of that data.
|
|
|
|
When using C or C++ and copying whole data structures directly
|
|
into the output, make sure that you do not leak information in
|
|
padding bytes between fields or at the end of the
|
|
`struct`.
|