ENS — Semantics and applications to verification

TP1 — Denotational semantics

Comes mainly from here.Thanks Antoine!

The goal of this session is to program an interpreter to compute the denotational semantics of a simple language.

We will use OCaml, of course. Every version between 4.05.0 and 4.07.1 are ok. Probably older one too, I didn't checked.

Language

Start by downloading the package. You will work with it during the 5 sessions.

The package contains:

a parser for the language, programmed in OCamlLex and Menhir: lexer.mll and parser.mly;
the type of abstract syntax trees output by the parser, defined in abstract_syntax_tree.ml;
a pretty-printer, to print back an abstract syntax tree into the original language, in abstract_syntax_printer.ml;
in analyzer.ml, a simple driver that takes a file name passed as argument, parses it, and prints it back.

Typing

make

should compile an executable that runs the simple driver.

Syntax

The language is a very simple "curly brackets" C-like language. A program is composed of a sequence of statements of the form:

variable declaration: var var1 [= expr1], var2 [=expr2],...;
assignment: var = expr;
tests: if (expr) stat_1; or if (expr) stat_1; else stat_2;
while loops: while (expr) stat;
blocks in curly brackets { stat_1; ... stat_n; }

Non-standard statements include:

assertions of boolean expressions: assert (expr);
variable printing: print (var_1,...,var_n);
failure: halt; which stops the program immediately

Expressions include:

integer arithmetic operators: + (unary and binary), - (unary and binary), *, /, % (modulo);
boolean operators: && (and), || (or), ! (negation);
integer comparisons <, <=, >, >=;
equality == and disequality !=, that can be used to compare either two integers or two boolean values;
constants, including integers, and the boolean constants true and false;
the special expression rand(l,h) that denotes the non-deterministic interval of integers between the constant l and the constant h.

The operators have their usual precedence, and you can group expressions using parentheses.

You can use /* ... */ and // comments.

A single declaration can declare an arbitrary number of variables var1, var2, etc. Note that there is no static type information in declarations: the language is dynamically typed. Each variable name is optionally followed by an initialization of the form = expr, which behaves similarly to an assignment. If there is no initialization expression, the variable is assumed to be non-initialized. Note that using a non-initialized value in an expression results in a value error

A variable declared in a block starts existing at the point of declaration and stops existing at the end of the block; this is the variable scope. It is important to distinguish variables of the same name, in case several of them exist at the same time (in nested scopes). Scoping is already resolved in the base package.

Semantics

Variables have no type and can hold either an integer or a boolean value, à la Python: upon assignation, a variable takes the type of the value it is assigned. In our simple language, there are only boolean and integer values. There is a type error in case an integer value is used with a boolean operator, or the converse. Subsequently, we do not distinguish statically between boolean and integer expressions: only values have a type. It is an error to use operators with values of the wrong type, such as adding two boolean values. This is detected at run-time, when evaluating the expression in the current environment.

Other run-time errors include: divisions and modulos by zero; using a non-boolean value in tests, loops and assert conditions; using a variable that has never been assigned to; asserting a condition that is false; executing the halt statement.

To avoid having to handle overflows, our integer values will not be represented using machine integers, but as unbounded mathematical integers. You will use the ZArith library, which provides a simple API (similar to Int32) to manipulate unbounded integers.

Code architecture

The code shall be modular and as reusable as possible. This is story time! At the very top, there is the Interpreter module: its job is to iterate over the AST to run the program/analysis. There are two iterators: a concrete one (that may not terminate) and an abstract one, that always terminates. You will write the first one now and use it for the two first sessions. The other interpreter is to write in TP3 and use it for all the following sessions.

Under it, there is an Environment module. It represent all the information about the memory state. There are several of them.

concrete: it stores a set of mappings variable -> value, to run a non deterministic program. It needs an underlying module to handle value computations. You will work on it today.
typing: it stores a mapping from variables to types. This is TP2.
values: it stores a mapping from variables to abstract values. It needs an underlying module to handle abstract values computations. This is TP3.
equality: it only stores equality relation between variables. This is TP4.
equality: it has two underlying environments and stores both values and make them cooperate. This is TP5.

Some of them need an underlying module that handles computation on values: this is the Value module. There is only one. It takes two underlying domains: one for numeric values and one for boolean values. The value of a variable is thus an integer and a boolean (abstract or concrete) value. Both of them are optional. This module is in two parts: since the code is very redundant, the module ValueBasics that handle the orgy of pattern matching, and Value just use it with appropriate functions.

Now, there are integer and boolean value modules. Concrete domains stores only one integer (resp. boolean) value with a Z.t (resp. native bool).

Under all of that, there is the error module. It is the common types of errors across all the hierarchy of modules.

Now, look at analyzer.ml. There is a pattern matching to instantiate all the functors. Implement the corresponding modules (everywhere there are assert false). Implement in the order of instantiation, from the bottom to the top.

Extensions

Here are a few possible extensions you can implement in the language:

Uninitialized. This extension adds a notion of "uninitialized" value. If a variable is used before it is assigned a value, the executions continues by returning the "uninitialized" value. The "uninitialized" value is propagated by all operations ("uninitialized"+1 equals "uninitialized").
Machine integers. This extension changes the semantics of the integer data-type so that 32-bit machine integers are used instead of unbounded integers. You can design a version where overflows result in a wrap-around, following two's complement arithmetic, or a version where overflows cause run-time errors.

Top

Pour ce TP

Pensez à avoir une installation fonctionnelle de OCaml (≥ 4.05.0) avec Zarith, Menhir et Dune .

Pensez à Opam !

                        
                            opam install zarith menhir dune