(** * A Study of Clight and its Semantics *)
(* begin hide *)
From Coq Require Import List.
Import ListNotations.
(* end hide *)
(** CompCert is a certified C compiler which comes with a proof of semantics
preservation. What this means is the following: the semantics of the C code
you write is preserved by CompCert compilation passes up to the generated
machine code.
I had been interested in CompCert for quite some times, and ultimately
challenged myself to study Clight and its semantics. This write-up is the
result of this challenge, written as I was progressing.
#
#
#site/posts/ClightIntroduction.v
# *)
(** ** Installing CompCert *)
(** CompCert has been added to <>, and as a consequence can be very easily
used as a library for other Coq developments. A typical use case is for a
project to produce Clight (the high-level AST of CompCert), and to benefit
from CompCert proofs after that.
Installing CompCert is as easy as
<<
opam install compcert
>>
Once <> terminates, the <> namespace becomes available. In
addition, several binaries are now available if you have correctly set your
<> environment variable. For instance, <> takes a C file as
an argument, and generates a Coq file which contains the Clight generated by
CompCert. *)
(** ** Problem Statement *)
(** Our goal for this first write-up is to prove that the C function
<<
int add (int x, int y) {
return x + y;
}
>>
returns the expected result, that is <>. The <> tool
generates (among other things) the following AST (note: I have modified it
in order to improve its readability). *)
From compcert Require Import Clight Ctypes Clightdefs AST
Coqlib Cop.
Definition _x : ident := 1%positive.
Definition _y : ident := 2%positive.
Definition f_add : function :=
{| fn_return := tint
; fn_callconv := cc_default
; fn_params := [(_x, tint); (_y, tint)]
; fn_vars := []
; fn_temps := []
; fn_body := Sreturn
(Some (Ebinop Oadd
(Etempvar _x tint)
(Etempvar _y tint)
tint))
|}.
(** The fields of the [function] type are pretty self-explanatory (as it is
often the case in CompCert’s ASTs as far as I can tell for now).
Identifiers in Clight are ([positive]) indices. The [fn_body] field is of
type [statement], with the particular constructor [Sreturn] whose argument
is of type [option expr], and [statement] and [expr] look like the two main
types to study. The predicates [step1] and [step2] allow for reasoning
about the execution of a [function], step by step (hence the name). It
appears that <> generates Clight terms using the function call
convention encoded by [step2]. To reason about a complete execution, it
appears that we can use [star] (from the [Smallstep] module) which is
basically a trace of [step]. These semantics are defined as predicates (that
is, they live in [Prop]). They allow for reasoning about
state-transformation, where a state is either
- A function call, with a given list of arguments and a continuation
- A function return, with a result and a continuation
- A [statement] execution within a [function]
We import several CompCert modules to manipulate _values_ (in our case,
bounded integers). *)
From compcert Require Import Values Integers.
Import Int.
(** Putting everything together, the lemma we want to prove about [f_add] is
the following. *)
Lemma f_add_spec (env : genv)
(t : Events.trace)
(m m' : Memory.Mem.mem)
(v : val) (x y z : int)
(trace : Smallstep.star step2 env
(Callstate (Ctypes.Internal f_add)
[Vint x; Vint y]
Kstop
m)
t
(Returnstate (Vint z) Kstop m'))
: z = add x y.
(** ** Proof Walkthrough *)
(** We introduce a custom [inversion] tactic which does some clean-up in
addition to just perform the inversion. *)
Ltac smart_inv H :=
inversion H; subst; cbn in *; clear H.
(** We can now try to prove our lemma. *)
Proof.
(** We first destruct [trace], and we rename the generated hypothesis in order
to improve the readability of these notes. *)
smart_inv trace.
rename H into Hstep.
rename H0 into Hstar.
(** This generates two hypotheses.
<<
Hstep : step1
env
(Callstate (Ctypes.Internal f_add)
[Vint x; Vint y]
Kstop
m)
t1
s2
Hstar : Smallstep.star
step2
env
s2
t2
(Returnstate (Vint z) Kstop m')
>>
In other words, to “go” from a [Callstate] of [f_add] to a [Returnstate],
there is a first step from a [Callstate] to a state [s2], then a succession
of steps to go from [s2] to a [Returnstate].
We consider the single [step], in order to determine the actual value of
[s2] (among other things). To do that, we use [smart_inv] on [Hstep], and
again perform some renaming. *)
smart_inv Hstep.
rename le into tmp_env.
rename e into c_env.
rename H5 into f_entry.
(** This produces two effects. First, a new hypothesis is added to the context.
<<
f_entry : function_entry1
env
f_add
[Vint x; Vint y]
m
c_env
tmp_env
m1
>>
Then, the [Hstar] hypothesis has been updated, because we now have a more
precise value of [s2]. More precisely, [s2] has become
<<
State
f_add
(Sreturn
(Some (Ebinop Oadd
(Etempvar _x tint)
(Etempvar _y tint)
tint)))
Kstop
c_env
tmp_env
m1
>>
Using the same approach as before, we can uncover the next step. *)
smart_inv Hstar.
rename H into Hstep.
rename H0 into Hstar.
(** The resulting hypotheses are
<<
Hstep : step2 env
(State
f_add
(Sreturn
(Some
(Ebinop Oadd
(Etempvar _x tint)
(Etempvar _y tint)
tint)))
Kstop c_env tmp_env m1) t1 s2
Hstar : Smallstep.star
step2
env
s2
t0
(Returnstate (Vint z) Kstop m')
>>
An inversion of [Hstep] can be used to learn more about its resulting
state… So let’s do just that. *)
smart_inv Hstep.
rename H7 into ev.
rename v0 into res.
rename H8 into res_equ.
rename H9 into mem_equ.
(** The generated hypotheses have become
<<
res : val
ev : eval_expr env c_env tmp_env m1
(Ebinop Oadd
(Etempvar _x tint)
(Etempvar _y tint)
tint)
res
res_equ : sem_cast res tint tint m1 = Some v'
mem_equ : Memory.Mem.free_list m1
(blocks_of_env env c_env)
= Some m'0
>>
Our understanding of these hypotheses is the following
- The expression [_x + _y] is evaluated using the [c_env] environment (and
we know thanks to [binding] the value of [_x] and [_y]), and its result
is stored in [res]
- [res] is cast into a [tint] value, and acts as the result of [f_add]
The [Hstar] hypothesis is now interesting
<<
Hstar : Smallstep.star
step2 env
(Returnstate v' Kstop m'0) t0
(Returnstate (Vint z) Kstop m')
>>
It is clear that we are at the end of the execution of [f_add] (even if Coq
generates two subgoals, the second one is not relevant and easy to
discard). *)
smart_inv Hstar; [| smart_inv H ].
(** We are making good progress here, and we can focus our attention on the [ev]
hypothesis, which concerns the evaluation of the [_x + _y] expression. We
can simplify it a bit further. *)
smart_inv ev; [| smart_inv H].
rename H4 into fetch_x.
rename H5 into fetch_y.
rename H6 into add_op.
(** In a short-term, the hypotheses [fetch_x] and [fetch_y] are the most
important.
<<
fetch_x : eval_expr env c_env tmp_env m1 (Etempvar _x tint) v1
fetch_y : eval_expr env c_env tmp_env m1 (Etempvar _y tint) v2
>>
The current challenge we face is to prove that we know their value. At this
point, we can have a look at [f_entry]. This is starting to look familiar:
[smart_inv], then renaming, etc. *)
smart_inv f_entry.
clear H.
clear H0.
clear H1.
smart_inv H3; subst.
rename H2 into allocs.
(** We are almost done. Let’s simplify as much as possible [fetch_x] and
[fetch_y]. Each time, the [smart_inv] tactic generates two suboals, but only
the first one is relevant. The second one is not, and can be discarded. *)
smart_inv fetch_x; [| inversion H].
smart_inv H2.
smart_inv fetch_y; [| inversion H].
smart_inv H2.
(** We now know the values of the operands of [add]. The two relevant hypotheses
that we need to consider next are [add_op] and [res_equ]. They are easy to
read.
<<
add_op : sem_binarith
(fun (_ : signedness) (n1 n2 : Integers.int)
=> Some (Vint (add n1 n2)))
(fun (_ : signedness) (n1 n2 : int64)
=> Some (Vlong (Int64.add n1 n2)))
(fun n1 n2 : Floats.float
=> Some (Vfloat (Floats.Float.add n1 n2)))
(fun n1 n2 : Floats.float32
=> Some (Vsingle (Floats.Float32.add n1 n2)))
v1 tint v2 tint m1 = Some res
>>
- [add_op] is the addition of [Vint x] and [Vint y], and its result is
[res].
<<
res_equ : sem_cast res tint tint m1 = Some (Vint z)
>>
- [res_equ] is the equation which says that the result of [f_add] is
[res], after it has been cast as a [tint] value
Surprisingly, neither [add_op] nor [res_equ] can be reduced easily… After
some digging, I have found that this is because both rely on [Archi.ptr64],
which is basically opaque as far as reduction strategies are concerned.
Now, there are probably more elegant (idiomatic) way to deal with this
issue, but a working approach is to unfold until [Archi.ptr64] appears, then
use the [Archi.splitlong_ptr32] lemma to replace it with its value
([false]). *)
unfold sem_binarith in *.
cbn in *.
unfold sem_cast in *.
cbn in *.
rewrite Archi.splitlong_ptr32 in *; auto.
(** We can now simplify [add_op] and [res_equ], and this allows us to
conclude. *)
smart_inv add_op.
smart_inv res_equ.
reflexivity.
Qed.
(** ** Conclusion *)
(** The main difficulty of this exercise was the opaqueness of [Archi.ptr64];
this is surprising, since it means my struggle was technical, not
conceptual. On the contrary, the definitions of Clight are easy to
understand, and the ##CompCert documentation##
is very pleasant to read.
Understanding Clight and its semantics can be very interesting if you are
working on a language that you want to translate into machine code. However,
proving functional properties of a given C snippet using only CompCert can
quickly become cumbersome. From this perspective, the ##VST## project is very
interesting, as its main purpose is to provide tools to reason about Clight
programs more easily. *)