ANR Project CoLiS
Correctness of Linux Scripts

ANR-15-CE25-0001,1/10/2015 - 30/9/2019

Logo Project Funded by the ANR

Presentation

This project aims at applying techniques from deductive program verification and analysis of tree transformations to the problem of analyzing shell scripts, in particular those that are used in software installation. These scripts play a central role in the initial installation of UNIX and Linux machines, as well as in recurrent configuration, installation, and software maintenance tasks. Scripts interact with the state of the operating system, in particular via the file system.

Errors in scripts may have catastrophic consequences due to the fact that they are often executed at crucial moments, like the update of software packages that are installed on a UNIX-like machine. Scripts must work correctly in a multitude of different situations that probably have not been planned by the author of the script. For instance, installation scripts have to work correctly regardless of the choice of packages that currently are installed on the system, and they also have to work correctly in abnormal situations, for instance when the system administrator tries to resume an operation that was previously interrupted.

We see three main challenges:

  1. Script languages are difficult. They are Turing-complete programming languages. Typically they provide known features from imperative programming languages like various loop constructions (while, for, ...), assignment to variables, and recursive functions and procedures. Since script languages are often used for interaction with the operating system they provide easy means to invoke UNIX commands. However, their highly dynamic execution model is based on a combination of expansion (e.g. variable expansion), evaluation, and execution of system commands. They stand between macro expansion languages and classical programming languages.
  2. The data structure that is manipulated by these scripts is a complex one, namely the file system tree that is installed on a UNIX-like machine. Our approach is to consider the file system, as seen by the programmer of a shell script (i.e. abstracting from the implementation of a file system) as a tree-like object. However, we have to cope with features like unbounded depth and width, multiple hard links to leaves, symbolic links, ownership and access permissions, and access dates.
  3. The properties we are interested in range from the simple to the complex. Simple properties may be that a script is not allowed to touch certain parts of the file system tree, complex properties involve sequences of executions of possibly different scripts, like executing an installation script then a de-installation script, or re-running a script after the first attempt to execute it has failed.

At the end of the project we want to produce tools that formally analyse and attempt to verify scripts, possibly giving assurance of correctness, or pin-pointing possible errors. This is an ambitious goal - we certainly will have do compromises, make hypotheses on the structure of the scripts, and do some approximations of the file system and the behavior of system commands. These assumptions and approximations will be guided by use cases of installation scripts from the Debian GNU/Linux distribution.

We will start by laying out the formal models of the shell script language and the file system data structure, and by developing a language for expressing properties of scripts to be verified. We will pursue two avenues of formal techniques: one based on tree transducers which are a formal system of transformations on trees (like XML trees, or file system trees), and the other one using deductive program verification techniques as proposed by the Why3 environment. We will also aim at combining these two technologies.

Partner Sites

Contact : Ralf Treinen (project coordinator)

Results

Links


Valid HTML 4.01 Transitional