defensive-coding-guide/modules/ROOT/pages/programming-languages/Java.adoc

:experimental:
:toc:

include::partial$entities.adoc[]

= The Java Programming Language

[[sect-Defensive_Coding-Java-Language]]
== The Core Language

Implementations of the Java programming language provide strong
memory safety, even in the presence of data races in concurrent
code. This prevents a large range of security vulnerabilities
from occurring, unless certain low-level features are used; see
<<sect-Defensive_Coding-Java-LowLevel>>.

[[sect-Defensive_Coding-Java-Language-ReadArray]]
=== Increasing Robustness when Reading Arrays

External data formats often include arrays, and the data is
stored as an integer indicating the number of array elements,
followed by this number of elements in the file or protocol data
unit. This length specified can be much larger than what is
actually available in the data source.

To avoid allocating extremely large amounts of data, you can
allocate a small array initially and grow it as you read more
data, implementing an exponential growth policy. See the
`readBytes(InputStream, int)` function in
<<ex-Defensive_Coding-Java-Language-ReadArray>>.

[[ex-Defensive_Coding-Java-Language-ReadArray]]
.Incrementally reading a byte array
====

[source,java]
----
include::example$Java-Language-ReadArray.adoc[]

----

====

When reading data into arrays, hash maps or hash sets, use the
default constructor and do not specify a size hint. You can
simply add the elements to the collection as you read them.

[[sect-Defensive_Coding-Java-Language-Resources]]
=== Resource Management

Unlike C++, Java does not offer destructors which can deallocate
resources in a predictable fashion. All resource management has
to be manual, at the usage site. (Finalizers are generally not
usable for resource management, especially in high-performance
code; see <<sect-Defensive_Coding-Java-Language-Finalizers>>.)

The first option is the
`try`-`finally` construct, as
shown in <<ex-Defensive_Coding-Java-Language-Finally>>.
The code in the `finally` block should be as short as
possible and should not throw any exceptions.

[[ex-Defensive_Coding-Java-Language-Finally]]
.Resource management with a `try`-`finally` block
====

[source,java]
----
include::example$Java-Finally.adoc[]

----

====

Note that the resource allocation happens
*outside* the `try` block,
and that there is no `null` check in the
`finally` block. (Both are common artifacts
stemming from IDE code templates.)

If the resource object is created freshly and implements the
`java.lang.AutoCloseable` interface, the code
in <<ex-Defensive_Coding-Java-Language-TryWithResource>> can be
used instead. The Java compiler will automatically insert the
`close()` method call in a synthetic
`finally` block.

[[ex-Defensive_Coding-Java-Language-TryWithResource]]
.Resource management using the `try`-with-resource construct
====

[source,java]
----
include::example$Java-TryWithResource.adoc[]

----

====

To be compatible with the `try`-with-resource
construct, new classes should name the resource deallocation
method `close()`, and implement the
`AutoCloseable` interface (the latter breaking
backwards compatibility with Java 6). However, using the
`try`-with-resource construct with objects that
are not freshly allocated is at best awkward, and an explicit
`finally` block is usually the better approach.

In general, it is best to design the programming interface in
such a way that resource deallocation methods like
`close()` cannot throw any (checked or
unchecked) exceptions, but this should not be a reason to ignore
any actual error conditions.

[[sect-Defensive_Coding-Java-Language-Finalizers]]
=== Finalizers

Finalizers can be used a last-resort approach to free resources
which would otherwise leak. Finalization is unpredictable,
costly, and there can be a considerable delay between the last
reference to an object going away and the execution of the
finalizer. Generally, manual resource management is required;
see <<sect-Defensive_Coding-Java-Language-Resources>>.

Finalizers should be very short and should only deallocate
native or other external resources held directly by the object
being finalized. In general, they must use synchronization:
Finalization necessarily happens on a separate thread because it is
inherently concurrent. There can be multiple finalization
threads, and despite each object being finalized at most once,
the finalizer must not assume that it has exclusive access to
the object being finalized (in the `this`
pointer).

Finalizers should not deallocate resources held by other
objects, especially if those objects have finalizers on their
own. In particular, it is a very bad idea to define a finalizer
just to invoke the resource deallocation method of another object,
or overwrite some pointer fields.

Finalizers are not guaranteed to run at all. For instance, the
virtual machine (or the machine underneath) might crash,
preventing their execution.

Objects with finalizers are garbage-collected much later than
objects without them, so using finalizers to zero out key
material (to reduce its undecrypted lifetime in memory) may have
the opposite effect, keeping objects around for much longer and
prevent them from being overwritten in the normal course of
program execution.

For the same reason, code which allocates objects with
finalizers at a high rate will eventually fail (likely with a
`java.lang.OutOfMemoryError` exception) because
the virtual machine has finite resources for keeping track of
objects pending finalization. To deal with that, it may be
necessary to recycle objects with finalizers.

The remarks in this section apply to finalizers which are
implemented by overriding the `finalize()`
method, and to custom finalization using reference queues.

[[sect-Defensive_Coding-Java-Language-Exceptions]]
=== Recovering from Exceptions and Errors

Java exceptions come in three kinds, all ultimately deriving
from `java.lang.Throwable`:

* *Run-time exceptions* do not have to be
declared explicitly and can be explicitly thrown from any
code, by calling code which throws them, or by triggering an
error condition at run time, like division by zero, or an
attempt at an out-of-bounds array access. These exceptions
derive from from the
`java.lang.RuntimeException` class (perhaps
indirectly).

* *Checked exceptions* have to be declared
explicitly by functions that throw or propagate them. They
are similar to run-time exceptions in other regards, except
that there is no language construct to throw them (except
the `throw` statement itself). Checked
exceptions are only present at the Java language level and
are only enforced at compile time. At run time, the virtual
machine does not know about them and permits throwing
exceptions from any code. Checked exceptions must derive
(perhaps indirectly) from the
`java.lang.Exception` class, but not from
`java.lang.RuntimeException`.

* *Errors* are exceptions which typically
reflect serious error conditions. They can be thrown at any
point in the program, and do not have to be declared (unlike
checked exceptions). In general, it is not possible to
recover from such errors; more on that below, in <<sect-Defensive_Coding-Java-Language-Exceptions-Errors>>.
Error classes derive (perhaps indirectly) from
`java.lang.Error`, or from
`java.lang.Throwable`, but not from
`java.lang.Exception`.

The general expectation is that run-time errors are avoided by
careful programming (e.g., not dividing by zero). Checked
exception are expected to be caught as they happen (e.g., when
an input file is unexpectedly missing). Errors are impossible
to predict and can happen at any point and reflect that
something went wrong beyond all expectations.

[[sect-Defensive_Coding-Java-Language-Exceptions-Errors]]
==== The Difficulty of Catching Errors

Errors (that is, exceptions which do not (indirectly) derive
from `java.lang.Exception`), have the
peculiar property that catching them is problematic. There
are several reasons for this:

* The error reflects a failed consistenty check, for example,
`java.lang.AssertionError`.

* The error can happen at any point, resulting in
inconsistencies due to half-updated objects. Examples are
`java.lang.ThreadDeath`,
`java.lang.OutOfMemoryError` and
`java.lang.StackOverflowError`.

* The error indicates that virtual machine failed to provide
some semantic guarantees by the Java programming language.
`java.lang.ExceptionInInitializerError`
is an example—it can leave behind a half-initialized
class.

In general, if an error is thrown, the virtual machine should
be restarted as soon as possible because it is in an
inconsistent state. Continuing running as before can have
unexpected consequences. However, there are legitimate
reasons for catching errors because not doing so leads to even
greater problems.

Code should be written in a way that avoids triggering errors.
See <<sect-Defensive_Coding-Java-Language-ReadArray>>
for an example.

It is usually necessary to log errors. Otherwise, no trace of
the problem might be left anywhere, making it very difficult
to diagnose related failures. Consequently, if you catch
`java.lang.Exception` to log and suppress all
unexpected exceptions (for example, in a request dispatching
loop), you should consider switching to
`java.lang.Throwable` instead, to also cover
errors.

The other reason mainly applies to such request dispatching
loops: If you do not catch errors, the loop stops looping,
resulting in a denial of service.

However, if possible, catching errors should be coupled with a
way to signal the requirement of a virtual machine restart.

[[sect-Defensive_Coding-Java-LowLevel]]
== Low-level Features of the Virtual Machine

[[sect-Defensive_Coding-Java-Reflection]]
=== Reflection and Private Parts

The `setAccessible(boolean)` method of the
`java.lang.reflect.AccessibleObject` class
allows a program to disable language-defined access rules for
specific constructors, methods, or fields. Once the access
checks are disabled, any code can use the
`java.lang.reflect.Constructor`,
`java.lang.reflect.Method`, or
`java.lang.reflect.Field` object to access the
underlying Java entity, without further permission checks. This
breaks encapsulation and can undermine the stability of the
virtual machine. (In contrast, without using the
`setAccessible(boolean)` method, this should
not happen because all the language-defined checks still apply.)

This feature should be avoided if possible.

[[sect-Defensive_Coding-Java-JNI]]
=== Java Native Interface (JNI)

The Java Native Interface allows calling from Java code
functions specifically written for this purpose, usually in C or
C++.

The transition between the Java world and the C world is not
fully type-checked, and the C code can easily break the Java
virtual machine semantics. Therefore, extra care is needed when
using this functionality.

To provide a moderate amount of type safety, it is recommended
to recreate the class-specific header file using
[application]*javah* during the build process,
include it in the implementation, and use the
[option]`-Wmissing-declarations` option.

Ideally, the required data is directly passed to static JNI
methods and returned from them, and the code and the C side does
not have to deal with accessing Java fields (or even methods).

When using `GetPrimitiveArrayCritical` or
`GetStringCritical`, make sure that you only
perform very little processing between the get and release
operations. Do not access the file system or the network, and
not perform locking, because that might introduce blocking.
When processing large strings or arrays, consider splitting the
computation into multiple sub-chunks, so that you do not prevent
the JVM from reaching a safepoint for extended periods of time.

If necessary, you can use the Java `long` type
to store a C pointer in a field of a Java class. On the C side,
when casting between the `jlong` value and the
pointer on the C side,

You should not try to perform pointer arithmetic on the Java
side (that is, you should treat pointer-carrying
`long` values as opaque). When passing a slice
of an array to the native code, follow the Java convention and
pass it as the base array, the integer offset of the start of
the slice, and the integer length of the slice. On the native
side, check the offset/length combination against the actual
array length, and use the offset to compute the pointer to the
beginning of the array.

[[ex-Defensive_Coding-Java-JNI-Pointers]]
.Array length checking in JNI code
====

[source,java]
----
include::example$Java-JNI-Pointers.adoc[]

----

====

In any case, classes referring to native resources must be
declared `final`, and must not be serializeable
or clonable. Initialization and mutation of the state used by
the native side must be controlled carefully. Otherwise, it
might be possible to create an object with inconsistent native
state which results in a crash (or worse) when used (or perhaps
only finalized) later. If you need both Java inheritance and
native resources, you should consider moving the native state to
a separate class, and only keep a reference to objects of that
class. This way, cloning and serialization issues can be
avoided in most cases.

If there are native resources associated with an object, the
class should have an explicit resource deallocation method
(<<sect-Defensive_Coding-Java-Language-Resources>>) and a
finalizer (<<sect-Defensive_Coding-Java-Language-Finalizers>>) as a
last resort. The need for finalization means that a minimum
amount of synchronization is needed. Code on the native side
should check that the object is not in a closed/freed state.

Many JNI functions create local references. By default, these
persist until the JNI-implemented method returns. If you create
many such references (e.g., in a loop), you may have to free
them using `DeleteLocalRef`, or start using
`PushLocalFrame` and
`PopLocalFrame`. Global references must be
deallocated with `DeleteGlobalRef`, otherwise
there will be a memory leak, just as with
`malloc` and `free`.

When throwing exceptions using `Throw` or
`ThrowNew`, be aware that these functions
return regularly. You have to return control manually to the
JVM.

Technically, the `JNIEnv` pointer is not
necessarily constant during the lifetime of your JNI module.
Storing it in a global variable is therefore incorrect.
Particularly if you are dealing with callbacks, you may have to
store the pointer in a thread-local variable (defined with
`+__thread+`). It is, however, best to avoid the
complexity of calling back into Java code.

Keep in mind that C/C++ and Java are different languages,
despite very similar syntax for expressions. The Java memory
model is much more strict than the C or C++ memory models, and
native code needs more synchronization, usually using JVM
facilities or POSIX threads mutexes. Integer overflow in Java
is defined, but in C/C++ it is not (for the
`jint` and `jlong` types).

[[sect-Defensive_Coding-Java-MiscUnsafe]]
=== `sun.misc.Unsafe`

The `sun.misc.Unsafe` class is unportable and
contains many functions explicitly designed to break Java memory
safety (for performance and debugging). If possible, avoid
using this class.


[[sect-Defensive_Coding-Java-SecurityManager]]
== Interacting with the Security Manager

The Java platform is largely implemented in the Java language
itself. Therefore, within the same JVM, code runs which is part
of the Java installation and which is trusted, but there might
also be code which comes from untrusted sources and is restricted
by the Java sandbox (to varying degrees). The *security
manager* draws a line between fully trusted, partially
trusted and untrusted code.

The type safety and accessibility checks provided by the Java
language and JVM would be sufficient to implement a sandbox.
However, only some Java APIs employ such a capabilities-based
approach. (The Java SE library contains many public classes with
public constructors which can break any security policy, such as
`java.io.FileOutputStream`.) Instead, critical
functionality is protected by *stack
inspection*: At a security check, the stack is walked
from top (most-nested) to bottom. The security check fails if a
stack frame for a method is encountered whose class lacks the
permission which the security check requires.

This simple approach would not allow untrusted code (which lacks
certain permissions) to call into trusted code while the latter
retains trust. Such trust transitions are desirable because they
enable Java as an implementation language for most parts of the
Java platform, including security-relevant code. Therefore, there
is a mechanism to mark certain stack frames as trusted (<<sect-Defensive_Coding-Java-SecurityManager-Privileged>>).

In theory, it is possible to run a Java virtual machine with a
security manager that acts very differently from this approach,
but a lot of code expects behavior very close to the platform
default (including many classes which are part of the OpenJDK
implementation).

[[sect-Defensive_Coding-Java-SecurityManager-Compatible]]
=== Security Manager Compatibility

A lot of code can run without any additional permissions at all,
with little changes. The following guidelines should help to
increase compatibility with a restrictive security manager.

* When retrieving system properties using
`System.getProperty(String)` or similar
methods, catch `SecurityException`
exceptions and treat the property as unset.

* Avoid unnecessary file system or network access.

* Avoid explicit class loading. Access to a suitable class
loader might not be available when executing as untrusted
code.

If the functionality you are implementing absolutely requires
privileged access and this functionality has to be used from
untrusted code (hopefully in a restricted and secure manner),
see <<sect-Defensive_Coding-Java-SecurityManager-Privileged>>.

[[sect-Defensive_Coding-Java-SecurityManager-Activate]]
=== Activating the Security Manager

The usual command to launch a Java application,
[command]`java`, does not activate the security manager.
Therefore, the virtual machine does not enforce any sandboxing
restrictions, even if explicitly requested by the code (for
example, as described in <<sect-Defensive_Coding-Java-SecurityManager-Unprivileged>>).

The [option]`-Djava.security.manager` option activates
the security manager, with the fairly restrictive default
policy. With a very permissive policy, most Java code will run
unchanged. Assuming the policy in <<ex-Defensive_Coding-Java-SecurityManager-GrantAll>>
has been saved in a file `grant-all.policy`,
this policy can be activated using the option
[option]`-Djava.security.policy=grant-all.policy` (in
addition to the [option]`-Djava.security.manager`
option).

[[ex-Defensive_Coding-Java-SecurityManager-GrantAll]]
.Most permissve OpenJDK policy file
====

[source,java]
----

grant {
    permission java.security.AllPermission;
};

----

====

With this most permissive policy, the security manager is still
active, and explicit requests to drop privileges will be
honored.

[[sect-Defensive_Coding-Java-SecurityManager-Unprivileged]]
=== Reducing Trust in Code

The <<ex-Defensive_Coding-Java-SecurityManager-Unprivileged>> example
shows how to run a piece code of with reduced privileges.

[[ex-Defensive_Coding-Java-SecurityManager-Unprivileged]]
.Using the security manager to run code with reduced privileges
====

[source,java]
----
include::example$Java-SecurityManager-Unprivileged.adoc[]

----

====

The example above does not add any additional permissions to the
`permissions` object. If such permissions are
necessary, code like the following (which grants read permission
on all files in the current directory) can be used:

[source,java]
----
include::example$Java-SecurityManager-CurrentDirectory.adoc[]

----

[IMPORTANT]
====

Calls to the
`java.security.AccessController.doPrivileged()`
methods do not enforce any additional restriction if no
security manager has been set. Except for a few special
exceptions, the restrictions no longer apply if the
`doPrivileged()` has returned, even to
objects created by the code which ran with reduced privileges.
(This applies to object finalization in particular.)

The example code above does not prevent the called code from
calling the
`java.security.AccessController.doPrivileged()`
methods. This mechanism should be considered an additional
safety net, but it still can be used to prevent unexpected
behavior of trusted code. As long as the executed code is not
dynamic and came with the original application or library, the
sandbox is fairly effective.

The `context` argument in <<ex-Defensive_Coding-Java-SecurityManager-Unprivileged>>
is extremely important—otherwise, this code would increase
privileges instead of reducing them.

====

For activating the security manager, see <<sect-Defensive_Coding-Java-SecurityManager-Activate>>.
Unfortunately, this affects the virtual machine as a whole, so
it is not possible to do this from a library.

[[sect-Defensive_Coding-Java-SecurityManager-Privileged]]
=== Re-gaining Privileges

Ordinarily, when trusted code is called from untrusted code, it
loses its privileges (because of the untrusted stack frames
visible to stack inspection). The
`java.security.AccessController.doPrivileged()`
family of methods provides a controlled backdoor from untrusted
to trusted code.

[IMPORTANT]
====

By design, this feature can undermine the Java security model
and the sandbox. It has to be used very carefully. Most
sandbox vulnerabilities can be traced back to its misuse.

====

In essence, the `doPrivileged()` methods
cause the stack inspection to end at their call site. Untrusted
code further down the call stack becomes invisible to security
checks.

The following operations are common and safe to perform with
elevated privileges.

* Reading custom system properties with fixed names,
especially if the value is not propagated to untrusted code.
(File system paths including installation paths, host names
and user names are sometimes considered private information
and need to be protected.)

* Reading from the file system at fixed paths, either
determined at compile time or by a system property. Again,
leaking the file contents to the caller can be problematic.

* Accessing network resources under a fixed address, name or
URL, derived from a system property or configuration file,
information leaks not withstanding.

The <<ex-Defensive_Coding-Java-SecurityManager-Privileged>> example
shows how to request additional privileges.

[[ex-Defensive_Coding-Java-SecurityManager-Privileged]]
.Using the security manager to run code with increased privileges
====

[source,java]
----
include::example$Java-SecurityManager-Privileged.adoc[]

----

====

Obviously, this only works if the class containing the call to
`doPrivileged()` is marked trusted (usually
because it is loaded from a trusted class loader).

When writing code that runs with elevated privileges, make sure
that you follow the rules below.

* Make the privileged code as small as possible. Perform as
many computations as possible before and after the
privileged code section, even if it means that you have to
define a new class to pass the data around.

* Make sure that you either control the inputs to the
privileged code, or that the inputs are harmless and cannot
affect security properties of the privileged code.

* Data that is returned from or written by the privileged code
must either be restricted (that is, it cannot be accessed by
untrusted code), or must be harmless. Otherwise, privacy
leaks or information disclosures which affect security
properties can be the result.

If the code calls back into untrusted code at a later stage (or
performs other actions under control from the untrusted caller),
you must obtain the original security context and restore it
before performing the callback, as in <<ex-Defensive_Coding-Java-SecurityManager-Callback>>.
(In this example, it would be much better to move the callback
invocation out of the privileged code section, of course.)

[[ex-Defensive_Coding-Java-SecurityManager-Callback]]
.Restoring privileges when invoking callbacks
====

[source,java]
----
include::example$Java-SecurityManager-Callback.adoc[]

----

====