:experimental:

[[chap-Defensive_Coding-Shell]]
= Shell Programming and [application]*bash*
include::partial$entities.adoc[]

This chapter contains advice about shell programming, specifically
in [application]*bash*. Most of the advice will apply
to scripts written for other shells because extensions such as
integer or array variables have been implemented there as well, with
comparable syntax.

[[sect-Defensive_Coding-Shell-Alternatives]]
== Consider Alternatives

Once a shell script is so complex that advice in this chapter
applies, it is time to step back and consider the question: Is
there a more suitable implementation language available?

For example, Python with its `subprocess` module
can be used to write scripts which are almost as concise as shell
scripts when it comes to invoking external programs, and Python
offers richer data structures, with less arcane syntax and more
consistent behavior.

[[sect-Defensive_Coding-Shell-Language]]
== Shell Language Features

The following sections cover subtleties concerning the shell
programming languages. They have been written with the
[application]*bash* shell in mind, but some of these
features apply to other shells as well.

Some of the features described may seem like implementation defects,
but these features have been replicated across multiple independent
implementations, so they now have to be considered part of the shell
programming language.

[[sect-Defensive_Coding-Shell-Parameter_Expansion]]
=== Parameter Expansion

The mechanism by which named shell variables and parameters are
expanded is called *parameter expansion*. The
most basic syntax is
“pass:attributes[{blank}]`$`pass:attributes[{blank}]pass:attributes[{blank}]*variable*pass:attributes[{blank}]” or
“pass:attributes[{blank}]`${`pass:attributes[{blank}]pass:attributes[{blank}]*variable*pass:attributes[{blank}]pass:attributes[{blank}]`}`pass:attributes[{blank}]”.

In almost all cases, a parameter expansion should be enclosed in
double quotation marks `"`pass:attributes[{blank}]…pass:attributes[{blank}]`"`.

[source,bash]
----

external-program "$arg1" "$arg2"
  
----

If the double quotation marks are omitted, the value of the
variable will be split according to the current value of the
`IFS` variable. This may allow the injection of
additional options which are then processed by
`external-program`.

Parameter expansion can use special syntax for specific features,
such as substituting defaults or performing string or array
operations. These constructs should not be used because they can
trigger arithmetic evaluation, which can result in code execution.
See <<sect-Defensive_Coding-Shell-Arithmetic>>.

[[sect-Defensive_Coding-Shell-Double_Expansion]]
=== Double Expansion

*Double expansion* occurs when, during the
expansion of a shell variable, not just the variable is expanded,
replacing it by its value, but the *value* of
the variable is itself is expanded as well. This can trigger
arbitrary code execution, unless the value of the variable is
verified against a restrictive pattern.

The evaluation process is in fact recursive, so a self-referential
expression can cause an out-of-memory condition and a shell crash.

Double expansion may seem like as a defect, but it is implemented
by many shells, and has to be considered an integral part of the
shell programming language. However, it does make writing robust
shell scripts difficult.

Double expansion can be requested explicitly with the
`eval` built-in command, or by invoking a
subshell with “pass:attributes[{blank}]`bash -c`pass:attributes[{blank}]”. These constructs
should not be used.

The following sections give examples of places where implicit
double expansion occurs.

[[sect-Defensive_Coding-Shell-Arithmetic]]
==== Arithmetic Evaluation

*Arithmetic evaluation* is a process by which
the shell computes the integer value of an expression specified
as a string. It is highly problematic for two reasons: It
triggers double expansion (see <<sect-Defensive_Coding-Shell-Double_Expansion>>), and the
language of arithmetic expressions is not self-contained. Some
constructs in arithmetic expressions (notably array subscripts)
provide a trapdoor from the restricted language of arithmetic
expressions to the full shell language, thus paving the way
towards arbitrary code execution. Due to double expansion,
input which is (indirectly) referenced from an arithmetic
expression can trigger execution of arbitrary code, which is
potentially harmful.

Arithmetic evaluation is triggered by the follow constructs:

* The *expression* in
“pass:attributes[{blank}]`$((`pass:attributes[{blank}]pass:attributes[{blank}]*expression*pass:attributes[{blank}]pass:attributes[{blank}]`))`pass:attributes[{blank}]”
is evaluated. This construct is called *arithmetic
expansion*.

* {blank}
+
“pass:attributes[{blank}]`$[`pass:attributes[{blank}]pass:attributes[{blank}]*expression*pass:attributes[{blank}]pass:attributes[{blank}]`]`pass:attributes[{blank}]”
is a deprecated syntax with the same effect.

* The arguments to the `let` shell built-in
are evaluated.

* {blank}
+
“pass:attributes[{blank}]`((`pass:attributes[{blank}]pass:attributes[{blank}]*expression*pass:attributes[{blank}]pass:attributes[{blank}]`))`pass:attributes[{blank}]”
is an alternative syntax for “pass:attributes[{blank}]`let` *expression*pass:attributes[{blank}]”.

* Conditional expressions surrounded by
“pass:attributes[{blank}]`[[`pass:attributes[{blank}]…pass:attributes[{blank}]`]]`pass:attributes[{blank}]” can trigger
arithmetic evaluation if certain operators such as
`-eq` are used. (The
`test` built-in does not perform arithmetic
evaluation, even with integer operators such as
`-eq`.)
+
The conditional expression
“pass:attributes[{blank}]`[[ $`pass:attributes[{blank}]pass:attributes[{blank}]*variable* `=~` *regexp* `]]`pass:attributes[{blank}]”
can be used for input validation, assuming that
*regexp* is a constant regular
expression.
See <<sect-Defensive_Coding-Shell-Input_Validation>>.

* Certain parameter expansions, for example
“pass:attributes[{blank}]`${`pass:attributes[{blank}]pass:attributes[{blank}]*variable*pass:attributes[{blank}]pass:attributes[{blank}]`[`pass:attributes[{blank}]pass:attributes[{blank}]*expression*pass:attributes[{blank}]pass:attributes[{blank}]`]}`pass:attributes[{blank}]”
(array indexing) or
“pass:attributes[{blank}]`${`pass:attributes[{blank}]pass:attributes[{blank}]*variable*pass:attributes[{blank}]pass:attributes[{blank}]`:`pass:attributes[{blank}]pass:attributes[{blank}]*expression*pass:attributes[{blank}]pass:attributes[{blank}]`}`pass:attributes[{blank}]”
(string slicing), trigger arithmetic evaluation of
*expression*.

* Assignment to array elements using
“pass:attributes[{blank}]*array_variable*pass:attributes[{blank}]pass:attributes[{blank}]`[`pass:attributes[{blank}]pass:attributes[{blank}]*subscript*pass:attributes[{blank}]pass:attributes[{blank}]`]=`pass:attributes[{blank}]pass:attributes[{blank}]*expression*pass:attributes[{blank}]”
triggers evaluation of *subscript*, but
not *expression*.

* The expressions in the arithmetic `for`
command,
“pass:attributes[{blank}]`for ((`pass:attributes[{blank}]pass:attributes[{blank}]*expression1*pass:attributes[{blank}]pass:attributes[{blank}]`;` *expression2*pass:attributes[{blank}]pass:attributes[{blank}]`;` *expression3*pass:attributes[{blank}]pass:attributes[{blank}]`)); do` *commands*pass:attributes[{blank}]pass:attributes[{blank}]`; done`pass:attributes[{blank}]”
are evaluated. This does not apply to the regular
for command,
“pass:attributes[{blank}]`for` *variable* `in` *list*pass:attributes[{blank}]pass:attributes[{blank}]`; do` *commands*pass:attributes[{blank}]pass:attributes[{blank}]`; done`pass:attributes[{blank}]”.

[IMPORTANT]
====

Depending on the [application]*bash* version, the
above list may be incomplete.

If faced with a situation where using such shell features
appears necessary, see <<sect-Defensive_Coding-Shell-Alternatives>>.

====

If it is impossible to avoid shell arithmetic on untrusted
inputs, refer to <<sect-Defensive_Coding-Shell-Input_Validation>>.

[[sect-Defensive_Coding-Shell-Types]]
==== Type declarations

[application]*bash* supports explicit type
declarations for shell variables:

[source,bash]
----

	declare -i integer_variable
	declare -a array_variable
	declare -A assoc_array_variable

	typeset -i integer_variable
	typeset -a array_variable
	typeset -A assoc_array_variable

	local -i integer_variable
	local -a array_variable
	local -A assoc_array_variable

	readonly -i integer_variable
	readonly -a array_variable
	readonly -A assoc_array_variable
    
----

Variables can also be declared as arrays by assigning them an
array expression, as in:

[source,bash]
----

array_variable=(1 2 3 4)
    
----

Some built-ins (such as `mapfile`) can
implicitly create array variables.

Such type declarations should not be used because assignment to
such variables (independent of the concrete syntax used for the
assignment) triggers arithmetic expansion (and thus double
expansion) of the right-hand side of the assignment operation.
See <<sect-Defensive_Coding-Shell-Arithmetic>>.

Shell scripts which use integer or array variables should be
rewritten in another, more suitable language. Se <<sect-Defensive_Coding-Shell-Alternatives>>.

[[sect-Defensive_Coding-Shell-Obscure]]
=== Other Obscurities

Obscure shell language features should not be used. Examples are:

* Exported functions (`export -f` or
`declare -f`).

* Function names which are not valid variable names, such as
“pass:attributes[{blank}]`module::function`pass:attributes[{blank}]”.

* The possibility to override built-ins or external commands
with shell functions.

* Changing the value of the `IFS` variable to
tokenize strings.

[[sect-Defensive_Coding-Shell-Invoke]]
== Invoking External Commands

When passing shell variables as single command line arguments,
they should always be surrounded by double quotes. See
<<sect-Defensive_Coding-Shell-Parameter_Expansion>>.

Care is required when passing untrusted values as positional
parameters to external commands. If the value starts with a hyphen
“pass:attributes[{blank}]`-`pass:attributes[{blank}]”, it may be interpreted by the external
command as an option. Depending on the external program, a
“pass:attributes[{blank}]`--`pass:attributes[{blank}]” argument stops option processing and treats
all following arguments as positional parameters. (Double quotes
are completely invisible to the command being invoked, so they do
not prevent variable values from being interpreted as options.)

Cleaning the environment before invoking child processes is
difficult to implement in script. [application]*bash*
keeps a hidden list of environment variables which do not correspond
to shell variables, and unsetting them from within a
[application]*bash* script is not possible. To reset
the environment, a script can re-run itself under the “pass:attributes[{blank}]`env
-i`pass:attributes[{blank}]” command with an additional parameter which indicates
the environment has been cleared and suppresses a further
self-execution. Alternatively, individual commands can be executed
with “pass:attributes[{blank}]`env -i`pass:attributes[{blank}]”.

[IMPORTANT]
====

Complete isolation from its original execution environment
(which is required when the script is executed after a trust
transition, e.g., triggered by the SUID mechanism) is impossible
to achieve from within the shell script itself. Instead, the
invoking process has to clear the process environment (except for
few trusted variables) before running the shell script.

====

Checking for failures in executed external commands is recommended.
If no elaborate error recovery is needed, invoking “pass:attributes[{blank}]`set
-e`pass:attributes[{blank}]” may be sufficient. This causes the script to stop on
the first failed command. However, failures in pipes
(“pass:attributes[{blank}]`command1 | command2`pass:attributes[{blank}]”) are only detected for the
last command in the pipe, errors in previous commands are ignored.
This can be changed by invoking “pass:attributes[{blank}]`set -o pipefail`pass:attributes[{blank}]”.
Due to architectural limitations, only the process that spawned
the entire pipe can check for failures in individual commands;
it is not possible for a process to tell if the process feeding
data (or the process consuming data) exited normally or with 
an error.

See <<sect-Defensive_Coding-Tasks-Processes-Creation>>
for additional details on creating child processes.

[[sect-Defensive_Coding-Shell-Temporary_Files]]
== Temporary Files

Temporary files should be created with the
`mktemp` command, and temporary directories with
“pass:attributes[{blank}]`mktemp -d`pass:attributes[{blank}]”.

To clean up temporary files and directories, write a clean-up
shell function and register it as a trap handler, as shown in
<<ex-Defensive_Coding-Tasks-Temporary_Files>>.
Using a separate function avoids issues with proper quoting of
variables.

[[ex-Defensive_Coding-Tasks-Temporary_Files]]
.Creating and Cleaning up Temporary Files
====

[source,bash]
----

tmpfile="$(mktemp)"

cleanup () {
  rm -f -- "$tmpfile"
}

trap cleanup 0
 
----

====

[[sect-Defensive_Coding-Shell-Input_Validation]]
== Performing Input Validation

In some cases, input validation cannot be avoided. For example,
if arithmetic evaluation is absolutely required, it is imperative
to check that input values are, in fact, integers. See <<sect-Defensive_Coding-Shell-Arithmetic>>.

<<ex-Defensive_Coding-Shell-Input_Validation>>
shows a construct which can be used to check if a string
“pass:attributes[{blank}]`$value`pass:attributes[{blank}]” is an integer. This construct is
specific to [application]*bash* and not portable to
POSIX shells.

[[ex-Defensive_Coding-Shell-Input_Validation]]
.Input validation in [application]*bash*
====

[source,bash]
----
include::example$Shell-Input_Validation.adoc[]
		
----

====

Using `case` statements for input validation is
also possible and supported by other (POSIX) shells, but the
pattern language is more restrictive, and it can be difficult to
write suitable patterns.

The `expr` external command can give misleading
results (e.g., if the value being checked contains operators
itself) and should not be used.

[[sect-Defensive_Coding-Shell-Edit_Guard]]
== Guarding Shell Scripts Against Changes

[application]*bash* only reads a shell script up to
the point it is needed for executed the next command. This means
that if script is overwritten while it is running, execution can
jump to a random part of the script, depending on what is modified
in the script and how the file offsets change as a result. (This
behavior is needed to support self-extracting shell archives whose
script part is followed by a stream of bytes which does not follow
the shell language syntax.)

Therefore, long-running scripts should be guarded against
concurrent modification by putting as much of the program logic
into a `main` function, and invoking the
`main` function at the end of the script, using
this syntax:

[source,bash]
----

main "$@" ; exit $?
  
----

This construct ensures that [application]*bash* will
stop execution after the `main` function, instead
of opening the script file and trying to read more commands.