TP1 — Denotational semantics
Comes mainly from here.Thanks Antoine!
The goal of this session is to program an interpreter to compute the denotational semantics of a simple language.
We will use OCaml, of course. Every version between 4.05.0 and 4.10.0 are ok. Probably older one too, I didn't checked.
Language
Start by downloading the package. You will work with it during the 5 sessions.
The package contains:
- a parser for the language, programmed in OCamlLex and Menhir:
lexer.mll
andparser.mly
; - the type of abstract syntax trees output by the parser, defined in
abstract_syntax_tree.ml
; - a pretty-printer, to print back an abstract syntax tree into the original language, in
abstract_syntax_printer.ml
; - in
analyzer.ml
, a simple driver that takes a file name passed as argument, parses it, and prints it back.
Typing
make
should compile an executable that runs the simple driver.
Syntax
The language is a very simple "curly brackets" C-like language. A program is composed of a sequence of statements of the form:
- variable declaration:
var var1 [= expr1], var2 [=expr2],...;
- assignment:
var = expr;
- tests:
if (expr) stat_1;
orif (expr) stat_1; else stat_2;
- while loops:
while (expr) stat;
- blocks in curly brackets
{ stat_1; ... stat_n; }
Non-standard statements include:
- assertions of boolean expressions:
assert (expr);
- variable printing:
print (var_1,...,var_n);
- failure:
halt;
which stops the program immediately
Expressions include:
- integer arithmetic operators:
+
(unary and binary),-
(unary and binary),*
,/
,%
(modulo); - boolean operators:
&&
(and),||
(or),!
(negation); - integer comparisons
<
,<=
,>
,>=
; - equality
==
and disequality!=
, that can be used to compare either two integers or two boolean values; - constants, including integers, and the boolean constants
true
andfalse
; - the special expression
rand(l,h)
that denotes the non-deterministic interval of integers between the constantl
and the constanth
.
The operators have their usual precedence, and you can group expressions using parentheses.
You can use /* ... */
and //
comments.
A single declaration can declare an arbitrary number of variables var1
, var2
, etc.
Note that there is no static type information in declarations: the language is dynamically typed.
Each variable name is optionally followed by an initialization of the form = expr
,
which behaves similarly to an assignment. If there is no initialization expression, the variable is
assumed to be non-initialized. Note that using a non-initialized value in an expression results in a
value error
A variable declared in a block starts existing at the point of declaration and stops existing at the end of the block; this is the variable scope. It is important to distinguish variables of the same name, in case several of them exist at the same time (in nested scopes). Scoping is already resolved in the base package.
Semantics
Variables have no type and can hold either an integer or a boolean value, à la Python: upon assignation, a variable takes the type of the value it is assigned. In our simple language, there are only boolean and integer values. There is a type error in case an integer value is used with a boolean operator, or the converse. Subsequently, we do not distinguish statically between boolean and integer expressions: only values have a type. It is an error to use operators with values of the wrong type, such as adding two boolean values. This is detected at run-time, when evaluating the expression in the current environment.
Other run-time errors include: divisions and modulos by zero; using a non-boolean value in tests,
loops and assert conditions; using a variable that has never been assigned to; asserting a condition
that is false; executing the halt
statement.
To avoid having to handle overflows, our integer values will not be represented using machine integers, but as unbounded mathematical integers.
You will use the ZArith library, which provides a simple API (similar to Int32
) to manipulate unbounded integers.
Code architecture
The code shall be modular and as reusable as possible. This is story time! At the very top, there is
the Interpreter
module: its job is to iterate over the AST to run the program/analysis.
There are two iterators: a concrete one (that may not terminate) and an abstract one, that always
terminates. You will write the first one now and use it for the two first sessions. The other
interpreter is to write in TP3 and use it for all the following sessions.
Under it, there is an Environment
module. It represent all the information about the
memory state. There are several of them.
- concrete: it stores a set of mappings variable -> value, to run a non deterministic program. It needs an underlying module to handle value computations. You will work on it today.
- typing: it stores a mapping from variables to types. This is TP2.
- values: it stores a mapping from variables to abstract values. It needs an underlying module to handle abstract values computations. This is TP3.
- equality: it only stores equality relation between variables. This is TP4.
- equality: it has two underlying environments and stores both values and make them cooperate. This is TP5.
Some of them need an underlying module that handles computation on values: this is the Value
module. There is only one. It takes two underlying domains: one for numeric values
and one for boolean values. The value of a variable is thus an integer and a boolean (abstract or
concrete) value. Both of them are optional. This module is in two parts: since the code is very
redundant, the module ValueBasics
that handle the orgy of pattern matching, and
Value
just use it with appropriate functions.
Now, there are integer and boolean value modules. Concrete domains stores only one integer (resp.
boolean) value with a Z.t
(resp. native bool
).
Under all of that, there is the error module. It is the common types of errors across all the hierarchy of modules.
Now, look at analyzer.ml
. There is a pattern matching to instantiate all the functors.
Implement the corresponding modules (everywhere there are assert false
). Implement in
the order of instantiation, from the bottom to the top.
Extensions
Here are a few possible extensions you can implement in the language:
- Uninitialized. This extension adds a notion of "uninitialized" value. If a variable is used before it is assigned a value, the executions continues by returning the "uninitialized" value. The "uninitialized" value is propagated by all operations ("uninitialized"+1 equals "uninitialized").
- Machine integers. This extension changes the semantics of the integer data-type so that 32-bit machine integers are used instead of unbounded integers. You can design a version where overflows result in a wrap-around, following two's complement arithmetic, or a version where overflows cause run-time errors.