Add a new post about Clight and its semantics

author: Thomas Letan <lthms@soap.coffee> 2020-03-19 21:51:21 +0100
committer: Thomas Letan <lthms@soap.coffee> 2020-03-20 21:18:54 +0100
commit: 70731164b51f44b57044f01d111b1a7bd30298f9 (patch)
tree: 751fb125e754cc633178a3122e6fa79a82b563fe /site/posts/ClightIntroduction.v
parent: Use inline mathematics to display KaTeX (diff)
1 files changed, 368 insertions, 0 deletions
diff --git a/site/posts/ClightIntroduction.v b/site/posts/ClightIntroduction.v
new file mode 100644
index 0000000..eaae907
--- /dev/null
+++ b/site/posts/ClightIntroduction.v
@@ -0,0 +1,368 @@
+(** * A Study of Clight and its Semantics *)
+(* begin hide *)
+From Coq Require Import List.
+Import ListNotations.
+(* end hide *)
+(** CompCert is a certified C compiler which comes with a proof of semantics
+    preservation. What this means is the following: the semantics of the C code
+    you write is preserved by CompCert compilation passes up to the generated
+    machine code.
+
+    I have been interested in CompCert for quite some times now. Today (#<span
+    id="time">#March 18, 2020#</span>#), I have challenged myself to study
+    Clight and its semantics. This write-up is the result of this challenge,
+    written as I was progressing.
+
+    #<div id="generate-toc"></div># *)
+
+(** ** Installing CompCert *)
+
+(** CompCert has been added to <<opam>>, and as a consequence can be very easily
+    used as a library for other Coq developments. A typical use case is for a
+    project to produce Clight (the high-level AST of CompCert), and to benefit
+    from CompCert proofs after that.
+
+    Installing CompCert is as easy as
+
+<<
+opam install compcert
+>>
+
+    Once <<opam>> terminates, the <<compcert>> namespace becomes available. In
+    addition, several binaries are now available if you have correctly set your
+    <<PATH>> environment variable. For instance, <<clightgen>> takes a C file as
+    an argument, and generates a Coq file which contains the Clight generated by
+    CompCert. *)
+
+(** ** Problem Statement *)
+
+(** Our goal for this first write-up is to prove that the C function
+
+<<
+int add (int x, int y) {
+    return x + y;
+}
+>>
+
+    returns the expected result, that is <<x + y>>. The <<clightgen>> tool
+    generates (among other things) the following AST (note: I have modified it
+    in order to improve its readability). *)
+
+From compcert Require Import Clight Ctypes Clightdefs AST
+                             Coqlib Cop.
+
+Definition _x : ident := 1%positive.
+Definition _y : ident := 2%positive.
+
+Definition f_add : function :=
+  {| fn_return := tint
+   ; fn_callconv := cc_default
+   ; fn_params := [(_x, tint); (_y, tint)]
+   ; fn_vars := []
+   ; fn_temps := []
+   ; fn_body := Sreturn
+                  (Some (Ebinop Oadd
+                                (Etempvar _x tint)
+                                (Etempvar _y tint)
+                                tint))
+  |}.
+
+(** The fields of the [function] type are pretty self-explanatory (as it is
+    often the case in CompCert’s ASTs as far as I can tell for now).
+
+    Identifiers in Clight are ([positive]) indices.  The [fn_body] field is of
+    type [statement], with the particular constructor [Sreturn] whose argument
+    is of type [option expr], and [statement] and [expr] look like the two main
+    types to study.  The predicates [step1] and [step2] allow for reasoning
+    about the execution of a [function], step by step (hence the name). It
+    appears that <<clightgen>> generates Clight terms using the function call
+    convention encoded by [step2].  To reason about a complete execution, it
+    appears that we can use [star] (from the [Smallstep] module) which is
+    basically a trace of [step]. These semantics are defined as predicates (that
+    is, they live in [Prop]). They allow for reasoning about
+    state-transformation, where a state is either
+
+      - A function call, with a given list of arguments and a continuation
+      - A function return, with a result and a continuation
+      - A [statement] execution within a [function]
+
+    We import several CompCert modules to manipulate _values_ (in our case,
+    bounded integers). *)
+
+From compcert Require Import Values Integers.
+Import Int.
+
+(** Putting everything together, the lemma we want to prove about [f_add] is
+    the following. *)
+
+Lemma f_add_spec (env : genv)
+    (t : Events.trace)
+    (m m' : Memory.Mem.mem)
+    (v : val) (x y z : int)
+    (trace : Smallstep.star step2 env
+               (Callstate (Ctypes.Internal f_add)
+                          [Vint x; Vint y]
+                          Kstop
+                          m)
+               t
+               (Returnstate (Vint z) Kstop m'))
+  : z = add x y.
+
+(** ** Proof Walkthrough *)
+
+(** We introduce a custom [inversion] tactic which does some clean-up in
+    addition to just perform the inversion. *)
+
+Ltac smart_inv H :=
+  inversion H; subst; cbn in *; clear H.
+
+(** We can now try to prove our lemma. *)
+
+Proof.
+
+(** We first destruct [trace], and we rename the generated hypothesis in order
+    to improve the readability of these notes. *)
+
+  smart_inv trace.
+  rename H into Hstep.
+  rename H0 into Hstar.
+
+(** This generates two hypotheses.
+
+<<
+Hstep : step1
+          env
+          (Callstate (Ctypes.Internal f_add)
+                     [Vint x; Vint y]
+                     Kstop
+                     m)
+          t1
+          s2
+Hstar : Smallstep.star
+          step2
+          env
+          s2
+          t2
+          (Returnstate (Vint z) Kstop m')
+>>
+
+    In other words, to “go” from a [Callstate] of [f_add] to a [Returnstate],
+    there is a first step from a [Callstate] to a state [s2], then a succession
+    of steps to go from [s2] to a [Returnstate].
+
+    We consider the single [step], in order to determine the actual value of
+    [s2] (among other things). To do that, we use [smart_inv] on [Hstep], and
+    again perform some renaming. *)
+
+  smart_inv Hstep.
+  rename le into tmp_env.
+  rename e into c_env.
+  rename H5 into f_entry.
+
+(** This produces two effects. First, a new hypothesis is added to the context.
+
+<<
+f_entry : function_entry1
+            env
+            f_add
+            [Vint x; Vint y]
+            m
+            c_env
+            tmp_env
+            m1
+>>
+
+    Then, the [Hstar] hypothesis has been updated, because we now have a more
+    precise value of [s2]. More precisely, [s2] has become
+
+<<
+State
+  f_add
+  (Sreturn
+    (Some (Ebinop Oadd
+                  (Etempvar _x tint)
+                  (Etempvar _y tint)
+                  tint)))
+  Kstop
+  c_env
+  tmp_env
+  m1
+>>
+
+    Using the same approach as before, we can uncover the next step. *)
+
+  smart_inv Hstar.
+  rename H into Hstep.
+  rename H0 into Hstar.
+
+(** The resulting hypotheses are
+
+<<
+Hstep : step2 env
+          (State
+             f_add
+             (Sreturn
+               (Some
+                 (Ebinop Oadd
+                 (Etempvar _x tint)
+                 (Etempvar _y tint)
+                 tint)))
+             Kstop c_env tmp_env m1) t1 s2
+Hstar : Smallstep.star
+          step2
+          env
+          s2
+          t0
+          (Returnstate (Vint z) Kstop m')
+>>
+
+    An inversion of [Hstep] can be used to learn more about its resulting
+    state… So let’s do just that. *)
+
+  smart_inv Hstep.
+  rename H7 into ev.
+  rename v0 into res.
+  rename H8 into res_equ.
+  rename H9 into mem_equ.
+
+(** The generated hypotheses have become
+
+<<
+res : val
+ev : eval_expr env c_env tmp_env m1
+       (Ebinop Oadd
+               (Etempvar _x tint)
+               (Etempvar _y tint)
+               tint)
+       res
+res_equ : sem_cast res tint tint m1 = Some v'
+mem_equ : Memory.Mem.free_list m1
+                               (blocks_of_env env c_env)
+            = Some m'0
+>>
+
+    Our understanding of these hypotheses is the following
+
+      - The expression [_x + _y] is evaluated using the [c_env] environment (and
+        we know thanks to [binding] the value of [_x] and [_y]), and its result
+        is stored in [res]
+      - [res] is cast into a [tint] value, and acts as the result of [f_add]
+
+    The [Hstar] hypothesis is now interesting
+
+<<
+Hstar : Smallstep.star
+          step2 env
+          (Returnstate v' Kstop m'0) t0
+          (Returnstate (Vint z) Kstop m')
+>>
+
+    It is clear that we are at the end of the execution of [f_add] (even if Coq
+    generates two subgoals, the second one is not relevant and easy to
+    discard). *)
+
+  smart_inv Hstar; [| smart_inv H ].
+
+(** We are making good progress here, and we can focus our attention on the [ev]
+    hypothesis, which concerns the evaluation of the [_x + _y] expression. We
+    can simplify it a bit further. *)
+
+  smart_inv ev; [| smart_inv H].
+  rename H4 into fetch_x.
+  rename H5 into fetch_y.
+  rename H6 into add_op.
+
+(** In a short-term, the hypotheses [fetch_x] and [fetch_y] are the most
+    important.
+
+<<
+fetch_x : eval_expr env c_env tmp_env m1 (Etempvar _x tint) v1
+fetch_y : eval_expr env c_env tmp_env m1 (Etempvar _y tint) v2
+>>
+
+    The current challenge we face is to prove that we know their value.  At this
+    point, we can have a look at [f_entry]. This is starting to look familiar:
+    [smart_inv], then renaming, etc. *)
+
+  smart_inv f_entry.
+  clear H.
+  clear H0.
+  clear H1.
+  smart_inv H3; subst.
+  rename H2 into allocs.
+
+(** We are almost done. Let’s simplify as much as possible [fetch_x] and
+    [fetch_y]. Each time, the [smart_inv] tactic generates two suboals, but only
+    the first one is relevant. The second one is not, and can be discarded. *)
+
+  smart_inv fetch_x; [| inversion H].
+  smart_inv H2.
+  smart_inv fetch_y; [| inversion H].
+  smart_inv H2.
+
+(** We now know the values of the operands of [add]. The two relevant hypotheses
+    that we need to consider next are [add_op] and [res_equ]. They are easy to
+    read.
+
+<<
+add_op : sem_binarith
+           (fun (_ : signedness) (n1 n2 : Integers.int)
+              => Some (Vint (add n1 n2)))
+           (fun (_ : signedness) (n1 n2 : int64)
+              => Some (Vlong (Int64.add n1 n2)))
+           (fun n1 n2 : Floats.float
+              => Some (Vfloat (Floats.Float.add n1 n2)))
+           (fun n1 n2 : Floats.float32
+              => Some (Vsingle (Floats.Float32.add n1 n2)))
+           v1 tint v2 tint m1 = Some res
+>>
+
+      - [add_op] is the addition of [Vint x] and [Vint y], and its result is
+        [res].
+
+<<
+res_equ : sem_cast res tint tint m1 = Some (Vint z)
+>>
+
+      - [res_equ] is the equation which says that the result of [f_add] is
+        [res], after it has been cast as a [tint] value
+
+    Surprisingly, neither [add_op] nor [res_equ] can be reduced easily… After
+    some digging, I have found that this is because both rely on [Archi.ptr64],
+    which is basically opaque as far as reduction strategies are concerned.
+
+    Now, there are probably more elegant (idiomatic) way to deal with this
+    issue, but a working approach is to unfold until [Archi.ptr64] appears, then
+    use the [Archi.splitlong_ptr32] lemma to replace it with its value
+    ([false]). *)
+
+  unfold sem_binarith in *.
+  cbn in *.
+  unfold sem_cast in *.
+  cbn in *.
+  rewrite Archi.splitlong_ptr32 in *; auto.
+
+(** We can now simplify [add_op] and [res_equ], and this allows us to
+    conclude. *)
+
+  smart_inv add_op.
+  smart_inv res_equ.
+  reflexivity.
+Qed.
+
+(** ** Conclusion *)
+
+(** The main difficulty of this exercise was the opaqueness of [Archi.ptr64];
+    this is surprising, since it means my struggle was technical, not
+    conceptual. On the contrary, the definitions of Clight are easy to
+    understand, and the #<a
+    href="http://compcert.inria.fr/doc/index.html">#CompCert documentation#</a>#
+    is very pleasant to read.
+
+    Understanding Clight and its semantics can be very interesting if you are
+    working on a language that you want to translate into machine code. However,
+    proving functional properties of a given C snippet using only CompCert can
+    quickly become cumbersome. From this perspective, the #<a
+    href="https://github.com/PrincetonUniversity/VST">#VST#</a># project is very
+    interesting, as its main purpose is to provide tools to reason about Clight
+    programs more easily. *)
author	Thomas Letan <lthms@soap.coffee>	2020-03-19 21:51:21 +0100
committer	Thomas Letan <lthms@soap.coffee>	2020-03-20 21:18:54 +0100
commit	70731164b51f44b57044f01d111b1a7bd30298f9 (patch)
tree	751fb125e754cc633178a3122e6fa79a82b563fe /site/posts/ClightIntroduction.v
parent	Use inline mathematics to display KaTeX (diff)