defensive-coding-guide/en-US/Tasks-File_System.xml

<?xml version='1.0' encoding='utf-8' ?>
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
]>
<chapter id="chap-Defensive_Coding-Tasks-File_System">
  <title>File System Manipulation</title>
  <para>
    In this chapter, we discuss general file system manipulation, with
    a focus on access files and directories to which an other,
    potentially untrusted user has write access.
  </para>
  <para>
    Temporary files are covered in their own chapter, <xref
    linkend="chap-Defensive_Coding-Tasks-Temporary_Files"/>.
  </para>
  <section id="sect-Defensive_Coding-Tasks-File_System-Unowned">
    <title>Working with Files and Directories Owned by Other Users</title>
    <para>
      Sometimes, it is necessary to operate on files and directories
      owned by other (potentially untrusted) users.  For example, a
      system administrator could remove the home directory of a user,
      or a package manager could update a file in a directory which is
      owned by an application-specific user.  This differs from
      accessing the file system as a specific user; see
      <xref linkend="sect-Defensive_Coding-Tasks-File_System-Foreign"/>.
    </para>
    <para>
      Accessing files across trust boundaries faces several
      challenges, particularly if an entire directory tree is being
      traversed:
    </para>
    <orderedlist>
      <listitem>
	<para>
	  Another user might add file names to a writable directory at
	  any time.  This can interfere with file creation and the
	  order of names returned by <function>readdir</function>.
	</para>
      </listitem>
      <listitem>
	<para>
	  Merely opening and closing a file can have side effects.
	  For instance, an automounter can be triggered, or a tape
	  device rewound.  Opening a file on a local file system can
	  block indefinitely, due to mandatory file locking, unless
	  the <literal>O_NONBLOCK</literal> flag is specified.
	</para>
      </listitem>
      <listitem>
	<para>
	  Hard links and symbolic links can redirect the effect of
	  file system operations in unexpected ways.  The
	  <literal>O_NOFOLLOW</literal> and
	  <literal>AT_SYMLINK_NOFOLLOW</literal> variants of system
	  calls only affected final path name component.
	</para>
      </listitem>
      <listitem>
	<para>
	  The structure of a directory tree can change.  For example,
	  the parent directory of what used to be a subdirectory
	  within the directory tree being processed could suddenly
	  point outside that directory tree.
	</para>
      </listitem>
    </orderedlist>
    <para>
      Files should always be created with the
      <literal>O_CREAT</literal> and <literal>O_EXCL</literal> flags,
      so that creating the file will fail if it already exists.  This
      guards against the unexpected appearance of file names, either
      due to creation of a new file, or hard-linking of an existing
      file.  In multi-threaded programs, rather than manipulating the
      umask, create the files with mode <literal>000</literal> if
      possible, and adjust it afterwards with
      <function>fchmod</function>.
    </para>
    <para>
      To avoid issues related to symbolic links and directory tree
      restructuring, the “<literal>at</literal>” variants of system
      calls have to be used (that is, functions like
      <function>openat</function>, <function>fchownat</function>,
      <function>fchmodat</function>, and
      <function>unlinkat</function>, together with
      <literal>O_NOFOLLOW</literal> or
      <literal>AT_SYMLINK_NOFOLLOW</literal>).  Path names passed to
      these functions must have just a single component (that is,
      without a slash).  When descending, the descriptors of parent
      directories must be kept open.  The missing
      <literal>opendirat</literal> function can be emulated with
      <literal>openat</literal> (with an
      <literal>O_DIRECTORY</literal> flag, to avoid opening special
      files with side effects), followed by
      <literal>fdopendir</literal>.
    </para>
    <para>
      If the “<literal>at</literal>” functions are not available, it
      is possible to emulate them by changing the current directory.
      (Obviously, this only works if the process is not multi-threaded.)
      <function>fchdir</function> has to be used to change the current
      directory, and the descriptors of the parent directories have to
      be kept open, just as with the “<literal>at</literal>”-based
      approach.  <literal>chdir("...")</literal> is unsafe because it
      might ascend outside the intended directory tree.
    </para>
    <para>
      This “<literal>at</literal>” function emulation is currently
      required when manipulating extended attributes.  In this case,
      the <function>lsetxattr</function> function can be used, with a
      relative path name consisting of a single component.  This also
      applies to SELinux contexts and the
      <function>lsetfilecon</function> function.
    </para>
    <para>
      Currently, it is not possible to avoid opening special files
      <emphasis>and</emphasis> changes to files with hard links if the
      directory containing them is owned by an untrusted user.
      (Device nodes can be hard-linked, just as regular files.)
      <function>fchmodat</function> and <function>fchownat</function>
      affect files whose link count is greater than one.  But opening
      the files, checking that the link count is one with
      <function>fstat</function>, and using
      <function>fchmod</function> and <function>fchown</function> on
      the file descriptor may have unwanted side effects, due to item
      2 above.  When creating directories, it is therefore important
      to change the ownership and permissions only after it has been
      fully created.  Until that point, file names are stable, and no
      files with unexpected hard links can be introduced.
    </para>
    <para>
      Similarly, when just reading a directory owned by an untrusted
      user, it is currently impossible to reliably avoid opening
      special files.
    </para>
    <para>
      There is no workaround against the instability of the file list
      returned by <function>readdir</function>.  Concurrent
      modification of the directory can result in a list of files
      being returned which never actually existed on disk.
    </para>
    <para>
      Hard links and symbolic links can be safely deleted using
      <function>unlinkat</function> without further checks because
      deletion only affects the name within the directory tree being
      processed.
    </para>
  </section>
  <section id="sect-Defensive_Coding-Tasks-File_System-Foreign">
    <title>Accessing the File System as a Different User</title>
    <para>
      This section deals with access to the file system as a specific
      user.  This is different from accessing files and directories owned by a
      different, potentially untrusted user; see <xref
      linkend="sect-Defensive_Coding-Tasks-File_System-Foreign"/>.
    </para>
    <para>
      One approach is to spawn a child process which runs under the
      target user and group IDs (both effective and real IDs).  Note
      that this child process can block indefinitely, even when
      processing regular files only.  For example, a special FUSE file
      system could cause the process to hang in uninterruptible sleep
      inside a <function>stat</function> system call.
    </para>
    <para>
      An existing process could change its user and group ID using
      <function>setfsuid</function> and <function>setfsgid</function>.
      (These functions are preferred over <function>seteuid</function>
      and <function>setegid</function> because they do not allow the
      impersonated user to send signals to the process.)  These
      functions are not thread safe.  In multi-threaded processes,
      these operations need to be performed in a single-threaded child
      process.  Unexpected blocking may occur as well.
    </para>
    <para>
      It is not recommended to try to reimplement the kernel
      permission checks in user space because the required checks are
      complex.  It is also very difficult to avoid race conditions
      during path name resolution.
    </para>
  </section>
  <section id="sect-Defensive_Coding-Tasks-File_System-Limits">
    <title>File System Limits</title>
    <para>
      For historical reasons, there are preprocessor constants such as
      <literal>PATH_MAX</literal>, <literal>NAME_MAX</literal>.
      However, on most systems, the length of canonical path names
      (absolute path names with all symbolic links resolved, as
      returned by <function>realpath</function> or
      <function>canonicalize_file_name</function>) can exceed
      <literal>PATH_MAX</literal> bytes, and individual file name
      components can be longer than <literal>NAME_MAX</literal>.  This
      is also true of the <literal>_PC_PATH_MAX</literal> and
      <literal>_PC_NAME_MAX</literal> values returned by
      <function>pathconf</function>, and the
      <literal>f_namemax</literal> member of <literal>struct
      statvfs</literal>.  Therefore, these constants should not be
      used.  This is also reason why the
      <function>readdir_r</function> should never be used (instead,
      use <function>readdir</function>).
    </para>
    <para>
      You should not write code in a way that assumes that there is an
      upper limit on the number of subdirectories of a directory, the
      number of regular files in a directory, or the link count of an
      inode.
    </para>
  </section>
  <section id="sect-Defensive_Coding-Tasks-File_System-Features">
    <title>File system features</title>
    <para>
      Not all file systems support all features.  This makes it very
      difficult to write general-purpose tools for copying files.  For
      example, a copy operation intending to preserve file permissions
      will generally fail when copying to a FAT file system.
    </para>
    <itemizedlist>
      <listitem>
	<para>
	  Some file systems are case-insensitive.  Most should be
	  case-preserving, though.
	</para>
      </listitem>
      <listitem>
	<para>
	  Name length limits vary greatly, from eight to thousands of
	  bytes.  Path length limits differ as well.  Most systems
	  impose an upper bound on path names passed to the kernel,
	  but using relative path names, it is possible to create and
	  access files whose absolute path name is essentially of
	  unbounded length.
	</para>
      </listitem>
      <listitem>
	<para>
	  Some file systems do not store names as fairly unrestricted
	  byte sequences, as it has been traditionally the case on GNU
	  systems.  This means that some byte sequences (outside the
	  POSIX safe character set) are not valid names.  Conversely,
	  names of existing files may not be representable as byte
	  sequences, and the files are thus inaccessible on GNU
	  systems.  Some file systems perform Unicode canonicalization
	  on file names.  These file systems preserve case, but
	  reading the name of a just-created file using
	  <function>readdir</function> might still result in a
	  different byte sequence.
	</para>
      </listitem>
      <listitem>
	<para>
	  Permissions and owners are not universally supported (and
	  SUID/SGID bits may not be available).  For example, FAT file
	  systems assign ownership based on a mount option, and
	  generally mark all files as executable.  Any attempt to
	  change permissions would result in an error.
	</para>
      </listitem>
      <listitem>
	<para>
	  Non-regular files (device nodes, FIFOs) are not generally
	  available.
	</para>
      </listitem>
      <listitem>
	<para>
	  Only on some file systems, files can have holes, that is,
	  not all of their contents is backed by disk storage.
	</para>
      </listitem>
      <listitem>
	<para>
	  <function>ioctl</function> support (even fairly generic
	  functionality such as <literal>FIEMAP</literal> for
	  discovering physical file layout and holes) is
	  file-system-specific.
	</para>
      </listitem>
      <listitem>
	<para>
	  Not all file systems support extended attributes, ACLs and
	  SELinux metadata.  Size and naming restriction on extended
	  attributes vary.
	</para>
      </listitem>
      <listitem>
	<para>
	  Hard links may not be supported at all (FAT) or only within
	  the same directory (AFS).  Symbolic links may not be
	  available, either.  Reflinks (hard links with copy-on-write
	  semantics) are still very rare.  Recent systems restrict
	  creation of hard links to users which own the target file or
	  have read/write access to it, but older systems do not.
	</para>
      </listitem>
      <listitem>
	<para>
	  Renaming (or moving) files using <function>rename</function>
	  can fail (even when <function>stat</function> indicates that
	  the source and target directories are located on the same
	  file system).  This system call should work if the old and
	  new paths are located in the same directory, though.
	</para>
      </listitem>
      <listitem>
	<para>
	  Locking semantics vary among file systems.  This affects
	  advisory and mandatory locks.  For example, some network
	  file systems do not allow deleting files which are opened by
	  any process.
	</para>
      </listitem>
      <listitem>
	<para>
	  Resolution of time stamps varies from two seconds to
	  nanoseconds.  Not all time stamps are available on all file
	  systems.  File creation time (<emphasis>birth
	  time</emphasis>) is not exposed over the
	  <function>stat</function>/<function>fstat</function>
	  interface, even if stored by the file system.
	</para>
      </listitem>
    </itemizedlist>
  </section>
  <section id="sect-Defensive_Coding-Tasks-File_System-Free_Space">
    <title>Checking Free Space</title>
    <para>
      The <function>statvfs</function> and
      <function>fstatvfs</function> functions allow programs to
      examine the number of available blocks and inodes, through the
      members <literal>f_bfree</literal>, <literal>f_bavail</literal>,
      <literal>f_ffree</literal>, and <literal>f_favail</literal> of
      <literal>struct statvfs</literal>.  Some file systems return
      fictional values in the <literal>f_ffree</literal> and
      <literal>f_favail</literal> fields, so the only reliable way to
      discover if the file system still has space for a file is to try
      to create it.  The <literal>f_bfree</literal> field should be
      reasonably accurate, though.
    </para>
  </section>
</chapter>