defensive-coding-guide/en-US/Shell.xml

455 lines
18 KiB
XML
Raw Normal View History

2014-10-10 15:36:28 +02:00
<?xml version='1.0' encoding='utf-8' ?>
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
]>
<chapter id="chap-Defensive_Coding-Shell">
<title>Shell Programming and <application>bash</application></title>
<para>
This chapter contains advice about shell programming, specifically
in <application>bash</application>. Most of the advice will apply
to scripts written for other shells because extensions such as
integer or array variables have been implemented there as well, with
comparable syntax.
</para>
<section id="sect-Defensive_Coding-Shell-Alternatives">
<title>Consider alternatives</title>
<para>
Once a shell script is so complex that advice in this chapter
applies, it is time to step back and consider the question: Is
there a more suitable implementation language available?
</para>
<para>
For example, Python with its <literal>subprocess</literal> module
can be used to write scripts which are almost as concise as shell
scripts when it comes to invoking external programs, and Python
offers richer data structures, with less arcane syntax and more
consistent behavior.
</para>
</section>
<section id="sect-Defensive_Coding-Shell-Language">
<title>Shell language features</title>
<para>
The following sections cover subtleties concerning the shell
programming languages. They have been written with the
<application>bash</application> shell in mind, but some of these
features apply to other shells as well.
</para>
<para>
Some of the features described may seem like implementation defects,
but these features have been replicated across multiple independent
implementations, so they now have to be considered part of the shell
programming language.
</para>
<section id="sect-Defensive_Coding-Shell-Parameter_Expansion">
<title>Parameter expansion</title>
<para>
The mechanism by which named shell variables and parameters are
expanded is called <emphasis>parameter expansion</emphasis>. The
most basic syntax is
<literal>$</literal><emphasis>variable</emphasis>” or
<literal>${</literal><emphasis>variable</emphasis><literal>}</literal>”.
</para>
<para>
In almost all cases, a parameter expansion should be enclosed in
double quotation marks <literal>"</literal><literal>"</literal>.
</para>
<informalexample>
<programlisting language="Bash">
external-program "$arg1" "$arg2"
</programlisting>
</informalexample>
<para>
If the double quotation marks are omitted, the value of the
variable will be split according to the current value of the
<envar>IFS</envar> variable. This may allow the injection of
additional options which are then processed by
<literal>external-program</literal>.
</para>
<para>
Parameter expansion can use special syntax for specific features,
such as substituting defaults or performing string or array
operations. These constructs should not be used because they can
trigger arithmetic evaluation, which can result in code execution.
See <xref linkend="sect-Defensive_Coding-Shell-Arithmetic"/>.
</para>
</section>
<section id="sect-Defensive_Coding-Shell-Double_Expansion">
<title>Double expansion</title>
<para>
<emphasis>Double expansion</emphasis> occurs when, during the
expansion of a shell variable, not just the variable is expanded,
replacing it by its value, but the <emphasis>value</emphasis> of
the variable is itself is expanded as well. This can trigger
arbitrary code execution, unless the value of the variable is
verified against a restrictive pattern.
</para>
<para>
The evaluation process is in fact recursive, so a self-referential
expression can cause an out-of-memory condition and a shell crash.
</para>
<para>
Double expansion may seem like as a defect, but it is implemented
by many shells, and has to be considered an integral part of the
shell programming language. However, it does make writing robust
shell scripts difficult.
</para>
<para>
Double expansion can be requested explicitly with the
2014-10-10 15:36:28 +02:00
<literal>eval</literal> built-in command, or by invoking a
subshell with “<literal>bash -c</literal>”. These constructs
should not be used.
</para>
<para>
The following sections give examples of places where implicit
double expansion occurs.
</para>
<section id="sect-Defensive_Coding-Shell-Arithmetic">
<title>Arithmetic evaluation</title>
<para>
<emphasis>Arithmetic evaluation</emphasis> is a process by which
the shell computes the integer value of an expression specified
as a string. It is highly problematic for two reasons: It
triggers double expansion (see <xref
linkend="sect-Defensive_Coding-Shell-Double_Expansion"/>), and the
2014-10-10 15:36:28 +02:00
language of arithmetic expressions is not self-contained. Some
constructs in arithmetic expressions (notably array subscripts)
provide a trapdoor from the restricted language of arithmetic
expressions to the full shell language, thus paving the way
towards arbitrary code execution. Due to double expansion,
input which is (indirectly) referenced from an arithmetic
expression can trigger execution of arbitrary code, which is
potentially harmful.
</para>
<para>
Arithmetic evaluation is triggered by the follow constructs:
</para>
<!-- The list was constructed by looking at the bash sources and
search for the string "expand_". -->
<itemizedlist>
<listitem>
<para>
The <emphasis>expression</emphasis> in
<literal>$((</literal><emphasis>expression</emphasis><literal>))</literal>
is evaluated. This construct is called <emphasis>arithmetic
expansion</emphasis>.
</para>
</listitem>
<listitem>
<para>
<literal>$[</literal><emphasis>expression</emphasis><literal>]</literal>
is a deprecated syntax with the same effect.
</para>
</listitem>
<listitem>
<para>
The arguments to the <literal>let</literal> shell built-in
are evaluated.
</para>
</listitem>
<listitem>
<para>
<literal>((</literal><emphasis>expression</emphasis><literal>))</literal>
is an alternative syntax for “<literal>let
</literal><emphasis>expression</emphasis>”.
</para>
</listitem>
<listitem>
<para>
Conditional expressions surrounded by
<literal>[[</literal><literal>]]</literal>” can trigger
arithmetic evaluation if certain operators such as
<literal>-eq</literal> are used. (The
<literal>test</literal> built-in does not perform arithmetic
evaluation, even with integer operators such as
<literal>-eq</literal>.)
</para>
<para>
The conditional expression
<literal>[[ $</literal><emphasis>variable</emphasis><literal> =~ </literal><emphasis>regexp</emphasis><literal> ]]</literal>
can be used for input validation, assuming that
<emphasis>regexp</emphasis> is a constant regular
expression.
See <xref linkend="sect-Defensive_Coding-Shell-Input_Validation"/>.
</para>
2014-10-10 15:36:28 +02:00
</listitem>
<listitem>
<para>
Certain parameter expansions, for example
<literal>${</literal><emphasis>variable</emphasis><literal>[</literal><emphasis>expression</emphasis><literal>]}</literal>
(array indexing) or
<literal>${</literal><emphasis>variable</emphasis><literal>:</literal><emphasis>expression</emphasis><literal>}</literal>
(string slicing), trigger arithmetic evaluation of
<emphasis>expression</emphasis>.
</para>
</listitem>
<listitem>
<para>
Assignment to array elements using
<emphasis>array_variable</emphasis><literal>[</literal><emphasis>subscript</emphasis><literal>]=</literal><emphasis>expression</emphasis>
triggers evaluation of <emphasis>subscript</emphasis>, but
not <emphasis>expression</emphasis>.
</para>
</listitem>
<listitem>
<para>
The expressions in the arithmetic <literal>for</literal>
command,
<literal>for ((</literal><emphasis>expression1</emphasis><literal>; </literal><emphasis>expression2</emphasis><literal>; </literal><emphasis>expression3</emphasis><literal>)); do </literal><emphasis>commands</emphasis><literal>; done</literal>
are evaluated. This does not apply to the regular
for command,
<literal>for </literal><emphasis>variable</emphasis><literal> in </literal><emphasis>list</emphasis><literal>; do </literal><emphasis>commands</emphasis><literal>; done</literal>”.
</para>
</listitem>
</itemizedlist>
<important>
<para>
Depending on the <application>bash</application> version, the
above list may be incomplete.
</para>
<para>
If faced with a situation where using such shell features
appears necessary, see <xref
linkend="sect-Defensive_Coding-Shell-Alternatives"/>.
</para>
</important>
<para>
If it is impossible to avoid shell arithmetic on untrusted
inputs, refer to <xref
linkend="sect-Defensive_Coding-Shell-Input_Validation"/>.
</para>
2014-10-10 15:36:28 +02:00
</section>
<section id="sect-Defensive_Coding-Shell-Types">
<title>Type declarations</title>
<para>
<application>bash</application> supports explicit type
declarations for shell variables:
</para>
<informalexample>
<programlisting language="Bash">
declare -i integer_variable
declare -a array_variable
declare -A assoc_array_variable
typeset -i integer_variable
typeset -a array_variable
typeset -A assoc_array_variable
local -i integer_variable
local -a array_variable
local -A assoc_array_variable
readonly -i integer_variable
readonly -a array_variable
readonly -A assoc_array_variable
</programlisting>
</informalexample>
<para>
Variables can also be declared as arrays by assigning them an
array expression, as in:
</para>
<informalexample>
<programlisting language="Bash">
array_variable=(1 2 3 4)
</programlisting>
</informalexample>
<para>
Some built-ins (such as <literal>mapfile</literal>) can
implicitly create array variables.
</para>
<para>
Such type declarations should not be used because assignment to
such variables (independent of the concrete syntax used for the
assignment) triggers arithmetic expansion (and thus double
expansion) of the right-hand side of the assignment operation.
See <xref linkend="sect-Defensive_Coding-Shell-Arithmetic"/>.
</para>
<para>
Shell scripts which use integer or array variables should be
rewritten in another, more suitable language. Se <xref
linkend="sect-Defensive_Coding-Shell-Alternatives"/>.
</para>
</section>
</section>
<section id="sect-Defensive_Coding-Shell-Obscure">
<title>Other obscurities</title>
<para>
Obscure shell language features should not be used. Examples are:
</para>
<itemizedlist>
<listitem>
<para>
Exported functions (<literal>export -f</literal> or
<literal>declare -f</literal>).
</para>
</listitem>
<listitem>
<para>
Function names which are not valid variable names, such as
<literal>module::function</literal>”.
</para>
</listitem>
<listitem>
<para>
The possibility to override built-ins or external commands
with shell functions.
</para>
</listitem>
<listitem>
<para>
Changing the value of the <envar>IFS</envar> variable to
tokenize strings.
</para>
</listitem>
</itemizedlist>
</section>
</section>
<section id="sect-Defensive_Coding-Shell-Invoke">
<title>Invoking external commands</title>
<para>
When passing shell variables as single command line arguments,
they should always be surrounded by double quotes. See
<xref linkend="sect-Defensive_Coding-Shell-Parameter_Expansion"/>.
</para>
<para>
Care is required when passing untrusted values as positional
parameters to external commands. If the value starts with a hyphen
<literal>-</literal>”, it may be interpreted by the external
command as an option. Depending on the external program, a
<literal>--</literal>” argument stops option processing and treats
all following arguments as positional parameters. (Double quotes
are completely invisible to the command being invoked, so they do
not prevent variable values from being interpreted as options.)
</para>
<para>
Cleaning the environment before invoking child processes is
difficult to implement in script. <application>bash</application>
keeps a hidden list of environment variables which do not correspond
to shell variables, and unsetting them from within a
<application>bash</application> script is not possible. To reset
the environment, a script can re-run itself under the “<literal>env
-i</literal>” command with an additional parameter which indicates
the environment has been cleared and suppresses a further
self-execution. Alternatively, individual commands can be executed
with “<literal>env -i</literal>”.
</para>
<important>
<para>
Complete isolation from its original execution environment
2014-10-10 15:36:28 +02:00
(which is required when the script is executed after a trust
transition, e.g., triggered by the SUID mechanism) is impossible
to achieve from within the shell script itself. Instead, the
invoking process has to clear the process environment (except for
few trusted variables) before running the shell script.
</para>
</important>
<para>
Checking for failures in executed external commands is recommended.
If no elaborate error recovery is needed, invoking “<literal>set
-e</literal>” may be sufficient. This causes the script to stop on
the first failed command. However, failures in pipes
(“<literal>command1 | command2</literal>”) are only detected for the
last command in the pipe, errors in previous commands are ignored.
This can be changed by invoking “<literal>set -o pipefail</literal>”.
Due to architectural limitations, only the process that spawned
the entire pipe can check for failures in individual commands;
it is not possible for a process to tell if the process feeding
data (or the process consuming data) exited normally or with
an error.
</para>
<para>
See <xref linkend="sect-Defensive_Coding-Tasks-Processes-Creation"/>
for additional details on creating child processes.
</para>
</section>
<section id="sect-Defensive_Coding-Shell-Temporary_Files">
<title>Temporary files</title>
<para>
Temporary files should be created with the
<literal>mktemp</literal> command, and temporary directories with
<literal>mktemp -d</literal>”.
</para>
<para>
To clean up temporary files and directories, write a clean-up
shell function and register it as a trap handler, as shown in
<xref linkend="ex-Defensive_Coding-Tasks-Temporary_Files"/>.
Using a separate function avoids issues with proper quoting of
variables.
</para>
<example id="ex-Defensive_Coding-Tasks-Temporary_Files">
<title>Creating and cleaning up temporary files</title>
<informalexample>
<programlisting language="Bash">
tmpfile="$(mktemp)"
cleanup () {
rm -f -- "$tmpfile"
}
trap cleanup 0
</programlisting>
</informalexample>
</example>
</section>
<section id="sect-Defensive_Coding-Shell-Input_Validation">
<title>Performing input validation</title>
<para>
In some cases, input validation cannot be avoided. For example,
if arithmetic evaluation is absolutely required, it is imperative
to check that input values are, in fact, integers. See <xref
linkend="sect-Defensive_Coding-Shell-Arithmetic"/>.
</para>
<para>
<xref linkend="ex-Defensive_Coding-Shell-Input_Validation"/>
shows a construct which can be used to check if a string
<literal>$value</literal>” is an integer. This construct is
specific to <application>bash</application> and not portable to
POSIX shells.
</para>
<example id="ex-Defensive_Coding-Shell-Input_Validation">
<title>Input validation in <application>bash</application></title>
<xi:include href="snippets/Shell-Input_Validation.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</example>
<para>
Using <literal>case</literal> statements for input validation is
also possible and supported by other (POSIX) shells, but the
pattern language is more restrictive, and it can be difficult to
write suitable patterns.
</para>
<para>
The <literal>expr</literal> external command can give misleading
results (e.g., if the value being checked contains operators
itself) and should not be used.
</para>
</section>
2014-10-10 15:36:28 +02:00
<section id="sect-Defensive_Coding-Shell-Edit_Guard">
<title>Guarding shell scripts against changes</title>
<para>
<application>bash</application> only reads a shell script up to
the point it is needed for executed the next command. This means
that if script is overwritten while it is running, execution can
jump to a random part of the script, depending on what is modified
in the script and how the file offsets change as a result. (This
behavior is needed to support self-extracting shell archives whose
script part is followed by a stream of bytes which does not follow
the shell language syntax.)
</para>
<para>
Therefore, long-running scripts should be guarded against
concurrent modification by putting as much of the program logic
into a <literal>main</literal> function, and invoking the
<literal>main</literal> function at the end of the script, using
this syntax:
</para>
<informalexample>
<programlisting language="Bash">
main "$@" ; exit $?
</programlisting>
</informalexample>
<para>
This construct ensures that <application>bash</application> will
stop execution after the <literal>main</literal> function, instead
of opening the script file and trying to read more commands.
</para>
</section>
</chapter>