:experimental: [[chap-Defensive_Coding-Shell]] = Shell Programming and [application]*bash* include::partial$entities.adoc[] This chapter contains advice about shell programming, specifically in [application]*bash*. Most of the advice will apply to scripts written for other shells because extensions such as integer or array variables have been implemented there as well, with comparable syntax. [[sect-Defensive_Coding-Shell-Alternatives]] == Consider Alternatives Once a shell script is so complex that advice in this chapter applies, it is time to step back and consider the question: Is there a more suitable implementation language available? For example, Python with its `subprocess` module can be used to write scripts which are almost as concise as shell scripts when it comes to invoking external programs, and Python offers richer data structures, with less arcane syntax and more consistent behavior. [[sect-Defensive_Coding-Shell-Language]] == Shell Language Features The following sections cover subtleties concerning the shell programming languages. They have been written with the [application]*bash* shell in mind, but some of these features apply to other shells as well. Some of the features described may seem like implementation defects, but these features have been replicated across multiple independent implementations, so they now have to be considered part of the shell programming language. [[sect-Defensive_Coding-Shell-Parameter_Expansion]] === Parameter Expansion The mechanism by which named shell variables and parameters are expanded is called *parameter expansion*. The most basic syntax is “pass:attributes[{blank}]`$`pass:attributes[{blank}]pass:attributes[{blank}]*variable*pass:attributes[{blank}]” or “pass:attributes[{blank}]`${`pass:attributes[{blank}]pass:attributes[{blank}]*variable*pass:attributes[{blank}]pass:attributes[{blank}]`}`pass:attributes[{blank}]”. In almost all cases, a parameter expansion should be enclosed in double quotation marks `"`pass:attributes[{blank}]…pass:attributes[{blank}]`"`. [source,bash] ---- external-program "$arg1" "$arg2" ---- If the double quotation marks are omitted, the value of the variable will be split according to the current value of the `IFS` variable. This may allow the injection of additional options which are then processed by `external-program`. Parameter expansion can use special syntax for specific features, such as substituting defaults or performing string or array operations. These constructs should not be used because they can trigger arithmetic evaluation, which can result in code execution. See <>. [[sect-Defensive_Coding-Shell-Double_Expansion]] === Double Expansion *Double expansion* occurs when, during the expansion of a shell variable, not just the variable is expanded, replacing it by its value, but the *value* of the variable is itself is expanded as well. This can trigger arbitrary code execution, unless the value of the variable is verified against a restrictive pattern. The evaluation process is in fact recursive, so a self-referential expression can cause an out-of-memory condition and a shell crash. Double expansion may seem like as a defect, but it is implemented by many shells, and has to be considered an integral part of the shell programming language. However, it does make writing robust shell scripts difficult. Double expansion can be requested explicitly with the `eval` built-in command, or by invoking a subshell with “pass:attributes[{blank}]`bash -c`pass:attributes[{blank}]”. These constructs should not be used. The following sections give examples of places where implicit double expansion occurs. [[sect-Defensive_Coding-Shell-Arithmetic]] ==== Arithmetic Evaluation *Arithmetic evaluation* is a process by which the shell computes the integer value of an expression specified as a string. It is highly problematic for two reasons: It triggers double expansion (see <>), and the language of arithmetic expressions is not self-contained. Some constructs in arithmetic expressions (notably array subscripts) provide a trapdoor from the restricted language of arithmetic expressions to the full shell language, thus paving the way towards arbitrary code execution. Due to double expansion, input which is (indirectly) referenced from an arithmetic expression can trigger execution of arbitrary code, which is potentially harmful. Arithmetic evaluation is triggered by the follow constructs: * The *expression* in “pass:attributes[{blank}]`$((`pass:attributes[{blank}]pass:attributes[{blank}]*expression*pass:attributes[{blank}]pass:attributes[{blank}]`))`pass:attributes[{blank}]” is evaluated. This construct is called *arithmetic expansion*. * {blank} + “pass:attributes[{blank}]`$[`pass:attributes[{blank}]pass:attributes[{blank}]*expression*pass:attributes[{blank}]pass:attributes[{blank}]`]`pass:attributes[{blank}]” is a deprecated syntax with the same effect. * The arguments to the `let` shell built-in are evaluated. * {blank} + “pass:attributes[{blank}]`((`pass:attributes[{blank}]pass:attributes[{blank}]*expression*pass:attributes[{blank}]pass:attributes[{blank}]`))`pass:attributes[{blank}]” is an alternative syntax for “pass:attributes[{blank}]`let` *expression*pass:attributes[{blank}]”. * Conditional expressions surrounded by “pass:attributes[{blank}]`[[`pass:attributes[{blank}]…pass:attributes[{blank}]`]]`pass:attributes[{blank}]” can trigger arithmetic evaluation if certain operators such as `-eq` are used. (The `test` built-in does not perform arithmetic evaluation, even with integer operators such as `-eq`.) + The conditional expression “pass:attributes[{blank}]`[[ $`pass:attributes[{blank}]pass:attributes[{blank}]*variable* `=~` *regexp* `]]`pass:attributes[{blank}]” can be used for input validation, assuming that *regexp* is a constant regular expression. See <>. * Certain parameter expansions, for example “pass:attributes[{blank}]`${`pass:attributes[{blank}]pass:attributes[{blank}]*variable*pass:attributes[{blank}]pass:attributes[{blank}]`[`pass:attributes[{blank}]pass:attributes[{blank}]*expression*pass:attributes[{blank}]pass:attributes[{blank}]`]}`pass:attributes[{blank}]” (array indexing) or “pass:attributes[{blank}]`${`pass:attributes[{blank}]pass:attributes[{blank}]*variable*pass:attributes[{blank}]pass:attributes[{blank}]`:`pass:attributes[{blank}]pass:attributes[{blank}]*expression*pass:attributes[{blank}]pass:attributes[{blank}]`}`pass:attributes[{blank}]” (string slicing), trigger arithmetic evaluation of *expression*. * Assignment to array elements using “pass:attributes[{blank}]*array_variable*pass:attributes[{blank}]pass:attributes[{blank}]`[`pass:attributes[{blank}]pass:attributes[{blank}]*subscript*pass:attributes[{blank}]pass:attributes[{blank}]`]=`pass:attributes[{blank}]pass:attributes[{blank}]*expression*pass:attributes[{blank}]” triggers evaluation of *subscript*, but not *expression*. * The expressions in the arithmetic `for` command, “pass:attributes[{blank}]`for ((`pass:attributes[{blank}]pass:attributes[{blank}]*expression1*pass:attributes[{blank}]pass:attributes[{blank}]`;` *expression2*pass:attributes[{blank}]pass:attributes[{blank}]`;` *expression3*pass:attributes[{blank}]pass:attributes[{blank}]`)); do` *commands*pass:attributes[{blank}]pass:attributes[{blank}]`; done`pass:attributes[{blank}]” are evaluated. This does not apply to the regular for command, “pass:attributes[{blank}]`for` *variable* `in` *list*pass:attributes[{blank}]pass:attributes[{blank}]`; do` *commands*pass:attributes[{blank}]pass:attributes[{blank}]`; done`pass:attributes[{blank}]”. [IMPORTANT] ==== Depending on the [application]*bash* version, the above list may be incomplete. If faced with a situation where using such shell features appears necessary, see <>. ==== If it is impossible to avoid shell arithmetic on untrusted inputs, refer to <>. [[sect-Defensive_Coding-Shell-Types]] ==== Type declarations [application]*bash* supports explicit type declarations for shell variables: [source,bash] ---- declare -i integer_variable declare -a array_variable declare -A assoc_array_variable typeset -i integer_variable typeset -a array_variable typeset -A assoc_array_variable local -i integer_variable local -a array_variable local -A assoc_array_variable readonly -i integer_variable readonly -a array_variable readonly -A assoc_array_variable ---- Variables can also be declared as arrays by assigning them an array expression, as in: [source,bash] ---- array_variable=(1 2 3 4) ---- Some built-ins (such as `mapfile`) can implicitly create array variables. Such type declarations should not be used because assignment to such variables (independent of the concrete syntax used for the assignment) triggers arithmetic expansion (and thus double expansion) of the right-hand side of the assignment operation. See <>. Shell scripts which use integer or array variables should be rewritten in another, more suitable language. Se <>. [[sect-Defensive_Coding-Shell-Obscure]] === Other Obscurities Obscure shell language features should not be used. Examples are: * Exported functions (`export -f` or `declare -f`). * Function names which are not valid variable names, such as “pass:attributes[{blank}]`module::function`pass:attributes[{blank}]”. * The possibility to override built-ins or external commands with shell functions. * Changing the value of the `IFS` variable to tokenize strings. [[sect-Defensive_Coding-Shell-Invoke]] == Invoking External Commands When passing shell variables as single command line arguments, they should always be surrounded by double quotes. See <>. Care is required when passing untrusted values as positional parameters to external commands. If the value starts with a hyphen “pass:attributes[{blank}]`-`pass:attributes[{blank}]”, it may be interpreted by the external command as an option. Depending on the external program, a “pass:attributes[{blank}]`--`pass:attributes[{blank}]” argument stops option processing and treats all following arguments as positional parameters. (Double quotes are completely invisible to the command being invoked, so they do not prevent variable values from being interpreted as options.) Cleaning the environment before invoking child processes is difficult to implement in script. [application]*bash* keeps a hidden list of environment variables which do not correspond to shell variables, and unsetting them from within a [application]*bash* script is not possible. To reset the environment, a script can re-run itself under the “pass:attributes[{blank}]`env -i`pass:attributes[{blank}]” command with an additional parameter which indicates the environment has been cleared and suppresses a further self-execution. Alternatively, individual commands can be executed with “pass:attributes[{blank}]`env -i`pass:attributes[{blank}]”. [IMPORTANT] ==== Complete isolation from its original execution environment (which is required when the script is executed after a trust transition, e.g., triggered by the SUID mechanism) is impossible to achieve from within the shell script itself. Instead, the invoking process has to clear the process environment (except for few trusted variables) before running the shell script. ==== Checking for failures in executed external commands is recommended. If no elaborate error recovery is needed, invoking “pass:attributes[{blank}]`set -e`pass:attributes[{blank}]” may be sufficient. This causes the script to stop on the first failed command. However, failures in pipes (“pass:attributes[{blank}]`command1 | command2`pass:attributes[{blank}]”) are only detected for the last command in the pipe, errors in previous commands are ignored. This can be changed by invoking “pass:attributes[{blank}]`set -o pipefail`pass:attributes[{blank}]”. Due to architectural limitations, only the process that spawned the entire pipe can check for failures in individual commands; it is not possible for a process to tell if the process feeding data (or the process consuming data) exited normally or with an error. See <> for additional details on creating child processes. [[sect-Defensive_Coding-Shell-Temporary_Files]] == Temporary Files Temporary files should be created with the `mktemp` command, and temporary directories with “pass:attributes[{blank}]`mktemp -d`pass:attributes[{blank}]”. To clean up temporary files and directories, write a clean-up shell function and register it as a trap handler, as shown in <>. Using a separate function avoids issues with proper quoting of variables. [[ex-Defensive_Coding-Tasks-Temporary_Files]] .Creating and Cleaning up Temporary Files ==== [source,bash] ---- tmpfile="$(mktemp)" cleanup () { rm -f -- "$tmpfile" } trap cleanup 0 ---- ==== [[sect-Defensive_Coding-Shell-Input_Validation]] == Performing Input Validation In some cases, input validation cannot be avoided. For example, if arithmetic evaluation is absolutely required, it is imperative to check that input values are, in fact, integers. See <>. <> shows a construct which can be used to check if a string “pass:attributes[{blank}]`$value`pass:attributes[{blank}]” is an integer. This construct is specific to [application]*bash* and not portable to POSIX shells. [[ex-Defensive_Coding-Shell-Input_Validation]] .Input validation in [application]*bash* ==== [source,bash] ---- include::example$Shell-Input_Validation.adoc[] ---- ==== Using `case` statements for input validation is also possible and supported by other (POSIX) shells, but the pattern language is more restrictive, and it can be difficult to write suitable patterns. The `expr` external command can give misleading results (e.g., if the value being checked contains operators itself) and should not be used. [[sect-Defensive_Coding-Shell-Edit_Guard]] == Guarding Shell Scripts Against Changes [application]*bash* only reads a shell script up to the point it is needed for executed the next command. This means that if script is overwritten while it is running, execution can jump to a random part of the script, depending on what is modified in the script and how the file offsets change as a result. (This behavior is needed to support self-extracting shell archives whose script part is followed by a stream of bytes which does not follow the shell language syntax.) Therefore, long-running scripts should be guarded against concurrent modification by putting as much of the program logic into a `main` function, and invoking the `main` function at the end of the script, using this syntax: [source,bash] ---- main "$@" ; exit $? ---- This construct ensures that [application]*bash* will stop execution after the `main` function, instead of opening the script file and trying to read more commands.