defensive-coding-guide/modules/ROOT/pages/tasks/Tasks-Processes.adoc


:experimental:

[[sect-Defensive_Coding-Tasks-Processes]]
= Processes
include::partial$entities.adoc[]

[[sect-Defensive_Coding-Tasks-Processes-Creation]]
== Creating Safe Processes

This section describes how to create new child processes in a
safe manner. In addition to the concerns addressed below, there
is the possibility of file descriptor leaks, see xref:../tasks/Tasks-Descriptors.adoc#sect-Defensive_Coding-Tasks-Descriptors-Child_Processes[Preventing File Descriptor Leaks to Child Processes].

=== Obtaining the Program Path and the Command-line Template

The name and path to the program being invoked should be
hard-coded or controlled by a static configuration file stored
at a fixed location (at an file system absolute path). The
same applies to the template for generating the command line.

The configured program name should be an absolute path. If it
is a relative path, the contents of the `PATH`
must be obtained in a secure manner (see <<sect-Defensive_Coding-Tasks-secure_getenv>>).
If the `PATH` variable is not set or untrusted,
the safe default `/bin:/usr/bin` must be
used.

If too much flexibility is provided here, it may allow
invocation of arbitrary programs without proper authorization.

[[sect-Defensive_Coding-Tasks-Processes-execve]]
=== Bypassing the Shell

Child processes should be created without involving the system
shell.

For C/C++, `system` should not be used.
The `posix_spawn` function can be used
instead, or a combination `fork` and
`execve`. (In some cases, it may be
preferable to use `vfork` or the
Linux-specific `clone` system call instead
of `fork`.)

In Python, the `subprocess` module bypasses
the shell by default (when the `shell`
keyword argument is not set to true).
`os.system` should not be used.

The Java class `java.lang.ProcessBuilder` can be
used to create subprocesses without interference from the
system shell.

.Portability notice
[IMPORTANT]
====

On Windows, there is no argument vector, only a single
argument string. Each application is responsible for parsing
this string into an argument vector. There is considerable
variance among the quoting style recognized by applications.
Some of them expand shell wildcards, others do not. Extensive
application-specific testing is required to make this secure.

====

Note that some common applications (notably
[application]*ssh*) unconditionally introduce the
use of a shell, even if invoked directly without a shell. It is
difficult to use these applications in a secure manner. In this
case, untrusted data should be supplied by other means. For
example, standard input could be used, instead of the command
line.

[[sect-Defensive_Coding-Tasks-Processes-environ]]
=== Specifying the Process Environment

Child processes should be created with a minimal set of
environment variables. This is absolutely essential if there
is a trust transition involved, either when the parent process
was created, or during the creation of the child process.

In C/C++, the environment should be constructed as an array of
strings and passed as the `envp` argument to
`posix_spawn` or `execve`.
The functions `setenv`,
`unsetenv` and `putenv`
should not be used. They are not thread-safe and suffer from
memory leaks.

Python programs need to specify a `dict` for
the the `env` argument of the
`subprocess.Popen` constructor.
The Java class `java.lang.ProcessBuilder`
provides a `environment()` method,
which returns a map that can be manipulated.

The following list provides guidelines for selecting the set
of environment variables passed to the child process.

* `PATH` should be initialized to
`/bin:/usr/bin`.

* `USER` and `HOME` can be inhereted
from the parent process environment, or they can be
initialized from the `pwent` structure
for the user.

* The `DISPLAY` and `XAUTHORITY`
variables should be passed to the subprocess if it is an X
program. Note that this will typically not work across trust
boundaries because `XAUTHORITY` refers to a file
with `0600` permissions.

* The location-related environment variables
`LANG`, `LANGUAGE`,
`LC_ADDRESS`, `LC_ALL`,
`LC_COLLATE`, `LC_CTYPE`,
`LC_IDENTIFICATION`,
`LC_MEASUREMENT`, `LC_MESSAGES`,
`LC_MONETARY`, `LC_NAME`,
`LC_NUMERIC`, `LC_PAPER`,
`LC_TELEPHONE` and `LC_TIME`
can be passed to the subprocess if present.

* The called process may need application-specific
environment variables, for example for passing passwords.
(See <<sect-Defensive_Coding-Tasks-Processes-Command_Line_Visibility>>.)

* All other environment variables should be dropped. Names
for new environment variables should not be accepted from
untrusted sources.

=== Robust Argument List Processing

When invoking a program, it is sometimes necessary to include
data from untrusted sources. Such data should be checked
against embedded `NUL` characters because the
system APIs will silently truncate argument strings at the first
`NUL` character.

The following recommendations assume that the program being
invoked uses GNU-style option processing using
`getopt_long`. This convention is widely
used, but it is just that, and individual programs might
interpret a command line in a different way.

If the untrusted data has to go into an option, use the
`--option-name=VALUE` syntax, placing the
option and its value into the same command line argument.
This avoids any potential confusion if the data starts with
`-`.

For positional arguments, terminate the option list with a
single [option]`--` marker after the last option, and
include the data at the right position. The
[option]`--` marker terminates option processing, and
the data will not be treated as an option even if it starts
with a dash.

[[sect-Defensive_Coding-Tasks-Processes-Command_Line_Visibility]]
=== Passing Secrets to Subprocesses

The command line (the name of the program and its argument) of
a running process is traditionally available to all local
users. The called program can overwrite this information, but
only after it has run for a bit of time, during which the
information may have been read by other processes. However,
on Linux, the process environment is restricted to the user
who runs the process. Therefore, if you need a convenient way
to pass a password to a child process, use an environment
variable, and not a command line argument. (See <<sect-Defensive_Coding-Tasks-Processes-environ>>.)

.Portability notice
[IMPORTANT]
====

On some UNIX-like systems (notably Solaris), environment
variables can be read by any system user, just like command
lines.

====

If the environment-based approach cannot be used due to
portability concerns, the data can be passed on standard
input. Some programs (notably [application]*gpg*)
use special file descriptors whose numbers are specified on
the command line. Temporary files are an option as well, but
they might give digital forensics access to sensitive data
(such as passphrases) because it is difficult to safely delete
them in all cases.

== Handling Child Process Termination

When child processes terminate, the parent process is signalled.
A stub of the terminated processes (a
*zombie*, shown as
`<defunct>` by
[application]*ps*) is kept around until the status
information is collected (*reaped*) by the
parent process. Over the years, several interfaces for this
have been invented:

* The parent process calls `wait`,
`waitpid`, `waitid`,
`wait3` or `wait4`,
without specifying a process ID. This will deliver any
matching process ID. This approach is typically used from
within event loops.

* The parent process calls `waitpid`,
`waitid`, or `wait4`,
with a specific process ID. Only data for the specific
process ID is returned. This is typically used in code
which spawns a single subprocess in a synchronous manner.

* The parent process installs a handler for the
`SIGCHLD` signal, using
`sigaction`, and specifies to the
`SA_NOCLDWAIT` flag.
This approach could be used by event loops as well.

None of these approaches can be used to wait for child process
terminated in a completely thread-safe manner. The parent
process might execute an event loop in another thread, which
could pick up the termination signal. This means that libraries
typically cannot make free use of child processes (for example,
to run problematic code with reduced privileges in a separate
address space).

At the moment, the parent process should explicitly wait for
termination of the child process using
`waitpid` or `waitid`,
and hope that the status is not collected by an event loop
first.

== `SUID`pass:attributes[{blank}]/pass:attributes[{blank}]`SGID` processes

Programs can be marked in the file system to indicate to the
kernel that a trust transition should happen if the program is
run. The `SUID` file permission bit indicates
that an executable should run with the effective user ID equal
to the owner of the executable file. Similarly, with the
`SGID` bit, the effective group ID is set to
the group of the executable file.

Linux supports *fscaps*, which can grant
additional capabilities to a process in a finer-grained manner.
Additional mechanisms can be provided by loadable security
modules.

When such a trust transition has happened, the process runs in a
potentially hostile environment. Additional care is necessary
not to rely on any untrusted information. These concerns also
apply to libraries which can be linked into such processes.

[[sect-Defensive_Coding-Tasks-secure_getenv]]
=== Accessing Environment Variables

The following steps are required so that a program does not
accidentally pick up untrusted data from environment
variables.

* Compile your C/C++ sources with `-D_GNU_SOURCE`.
The Autoconf macro `AC_GNU_SOURCE` ensures this.

* Check for the presence of the `secure_getenv`
and `__secure_getenv` function. The Autoconf
directive `AC_CHECK_FUNCS([__secure_getenv secure_getenv])`
performs these checks.

* Arrange for a proper definition of the
`secure_getenv` function. See <<ex-Defensive_Coding-Tasks-secure_getenv>>.

* Use `secure_getenv` instead of
`getenv` to obtain the value of critical
environment variables. `secure_getenv`
will pretend the variable has not bee set if the process
environment is not trusted.

Critical environment variables are debugging flags,
configuration file locations, plug-in and log file locations,
and anything else that might be used to bypass security
restrictions or cause a privileged process to behave in an
unexpected way.

Either the `secure_getenv` function or the
`__secure_getenv` is available from GNU libc.

[[ex-Defensive_Coding-Tasks-secure_getenv]]
.Obtaining a definition for `secure_getenv`
====

[source,c]
----
#include <stdlib.h>

#ifndef HAVE_SECURE_GETENV
#  ifdef HAVE__SECURE_GETENV
#    define secure_getenv __secure_getenv
#  else
#    error neither secure_getenv nor __secure_getenv are available
#  endif
#endif
----

====

[[sect-Defensive_Coding-Tasks-Processes-Daemons]]
== Daemons

Background processes providing system services
(*daemons*) need to decouple themselves from
the controlling terminal and the parent process environment:

* Fork.

* In the child process, call `setsid`. The
parent process can simply exit (using
`_exit`, to avoid running clean-up
actions twice).

* In the child process, fork again. Processing continues in
the child process. Again, the parent process should just
exit.

* Replace the descriptors 0, 1, 2 with a descriptor for
`/dev/null`. Logging should be
redirected to [application]*syslog*.

Older instructions for creating daemon processes recommended a
call to `umask(0)`. This is risky because it
often leads to world-writable files and directories, resulting
in security vulnerabilities such as arbitrary process
termination by untrusted local users, or log file truncation.
If the *umask* needs setting, a restrictive
value such as `027` or `077`
is recommended.

Other aspects of the process environment may have to changed as
well (environment variables, signal handler disposition).

It is increasingly common that server processes do not run as
background processes, but as regular foreground process under a
supervising master process (such as
[application]*systemd*). Server processes should
offer a command line option which disables forking and
replacement of the standard output and standard error streams.
Such an option is also useful for debugging.

== Semantics of Command-line Arguments

After process creation and option processing, it is up to the
child process to interpret the arguments. Arguments can be
file names, host names, or URLs, and many other things. URLs
can refer to the local network, some server on the Internet,
or to the local file system. Some applications even accept
arbitrary code in arguments (for example,
[application]*python* with the
[option]`-c` option).

Similar concerns apply to environment variables, the contents
of the current directory and its subdirectories.

Consequently, careful analysis is required if it is safe to
pass untrusted data to another program.

[[sect-Defensive_Coding-Tasks-Processes-Fork-Parallel]]
== `fork` as a Primitive for Parallelism

A call to `fork` which is not immediately
followed by a call to `execve` (perhaps after
rearranging and closing file descriptors) is typically unsafe,
especially from a library which does not control the state of
the entire process. Such use of `fork`
should be replaced with proper child processes or threads.