Differences between Paper and Implementation

Note

The information contained in the section is valid for the version Gillian that is tagged pldi-20. The implementation may change in the future and implementation might get further away or closer to what the paper says in the future.

In the paper, the formalisation of GIL and the associated meta-theory is streamlined for clarity. The implementation follows the same principles, but is, expectedly, more complex. Here, we outline the main differences.

The GIL Syntax

Here is how the paper defines the GIL syntax:

\[\begin{split}\begin{array}{lcl} v \in \mathcal{V} & \triangleq & n \in \mathcal{N} \mid s \in \mathcal{S} \mid b \in \mathcal{B} \mid l, \varsigma \in \mathcal{U} \mid \tau \in \mathcal{T} \mid f \in \mathcal{F} \mid \bar v \\ e \in \mathcal{E} & \triangleq & v \mid x \in \mathcal{X} \mid \ominus e \mid e_1 \oplus e_2\\ c \in \mathcal{C}_A & \triangleq & x := e \mid \mathsf{ifgoto}\ e\ i \mid x := e(e') \mid \mathsf{return}\ e \mid \mathsf{fail}\ e \\ & & \mid \mathsf{vanish} \mid x := \alpha(e) \mid x := \mathsf{uSym}_j \mid x := \mathsf{iSym}_j \\ proc \in \mathcal{P}roc_A & \triangleq & \mathsf{proc}\ f(x)\{\bar c\}\\ \mathsf p \in \mathcal{P}rog_A & : & \mathcal{F} \rightharpoonup \mathcal{P}roc_A \end{array}\end{split}\]

GIL values, \(v \in \mathcal{V}\), include numbers, strings, booleans, uninterpreted symbols, types, procedure identifiers, and lists of values. Types are standard: they include, for example, the types of numbers, strings, booleans, and lists.

GIL expressions, \(e \in \mathcal{E}\), include values, program variables \(x\), and various unary and binary operators.

GIL commands include, first of all, the standard variable assignment, conditional goto, and dynamic procedure call. We assume single-parameter procedures in the meta-theory; in the implementation, we allow for multiple parameters. Next, the \(\mathsf{return}\) command terminates the execution of the current procedure; \(\mathsf{fail}\) terminates the execution of the entire program with an error; and \(\mathsf{vanish}\) silently terminates program execution without generating a result. Finally, we have three GIL-specific commands: action execution, \(x := \alpha(e)\), which executes the action \(\alpha \in A\) with the argument obtained by evaluating \(e\); and two analysis-related commands, \(x := \mathsf{uSym}_j\) and \(x := \mathsf{iSym}_j\) , which use Gillian’s built-in symbol generator to generate fresh symbols, similarly to the \(\mathsf{gensym}\) command of Lisp and Racket. We call the symbols created using \(\mathsf{uSim}_j\) uninterpreted, and the symbols created using \(\mathsf{iSym}_j\) interpreted symbols. The difference between them is in the way in which they are used in the symbolic analysis: intuitively, we use uninterpreted symbols to represent instantiation-specific constants or unique memory constituents; and interpreted symbols to represent logical variables, as in standard symbolic execution literature.

A GIL procedure, \(proc \in \mathcal{P}roc_A\) is of the form \(\mathsf{proc}\ f(x)\{c\}\), where \(f\) is its identifier, \(x\) is its formal parameter, and its body \(c\) is a sequence of GIL commands. A GIL program, \(p \in \mathcal{P}rog_A\), is a finite partial function, mapping procedure identifiers to their corresponding procedures.

The actual implementation of GIL slightly differs from this.

Commands

Let us start by focusing on commands, here is how commands are defined in the implementation:

(* Cmd.ml *)
type 'label t =
  | Skip                                                                      (** Skip                              *)
  | Assignment    of string * Expr.t                                          (** Assignment                        *)
  | LAction       of string * string * Expr.t list                            (** Local Actions                     *)
  | Logic         of LCmd.t                                                   (** GIL Logic commands                *)
  | Goto          of 'label                                                   (** Unconditional goto                *)
  | GuardedGoto   of Expr.t * 'label * 'label                                 (** Conditional goto                  *)
  | Call          of
      string * Expr.t * Expr.t list * 'label option * logic_bindings_t option (** Procedure call                    *)
  | ECall         of string * Expr.t * Expr.t list * 'label option            (** External Procedure call           *)
  | Apply         of string * Expr.t * 'label option                          (** Application-style procedure call  *)
  | Arguments     of string                                                   (** Arguments of the current function *)
  | PhiAssignment of (string * Expr.t list) list                              (** PHI assignment                    *)
  | ReturnNormal                                                              (** Normal return                     *)
  | ReturnError                                                               (** Error return                      *)
  | Fail          of string * Expr.t list                                     (** Failure                           *)

The assignment is as described in the paper. There is an additional Skip command, that does nothing.

In action call, LAction(x, a, el), x is the variable name the result is going to be assigned to, a is the name of the action and el is a list of parameters. Like function calls, the paper presents a simplified version where actions can only take one parameter. Moreover, there is not action type, but actions are denoted by their name which is a string.

Instead of an ifgoto e i command there are two commands :

  • goto lab that jumps to label lab

  • goto [e] lab1 lab2 that jumps either to lab1 or to lab2 depending on the boolean e evaluates to Note that using ifgoto or these two kinds of gotos is equivalent. Moreover, in the implementation, the type of labels is polymorphic. string Cmd.t corresponds to "labeled commands", meaning one can annotate commands with string labels. int Cmd.t corresponds to "indexed commands", meaning goto j jumps the the j-th command of the current procedure. We write GIL programs with labeled commands for readability, and translate to indexed commands for efficiency.

There is an additional Argument command that returns the list of arguments given to the current procedure, and a PhiAssignment that can be used for Static Single Assignment style programming.

The implementation of GIL comes with a better treatment of errors. There are two return commands:

  • ReturnNormal

  • ReturnError In both cases, the value returned is the one contained in the special "ret" variable. Both command return as explained in the paper, in an intuitive way, but they set the return mode either in Normal mode or in Error mode. If the function returns in Normal mode, the program execution continues normally, otherwise, it depends on how the function was called.

Let us take a closer look at the Call command, and describe the 5 arguments of Call(x, f, el, lab_opt, bindings).

  • x is the name of the variable in which the result will be stored

  • f is the expression that should resolve to the procedure identifier (a string)

  • el is the list of expressions passed as arguments to the procedure

  • lab_opt is an optional label to which the execution will jump if the called procedure returns in Error mode.

  • Some logic bindings that are useful for verification, but out of scope for the PLDI-2020 paper

Fail is very similar to the fail command described in the paper: it terminates the execution of the entire program in failure mode. However, it takes an additional parameter to the "failing value", which is a string that contains an error name or error message.

The implementation also has an external call mechanism (ECall) that is used to model eval in JavaScript, and could be used to model some system calls in Gillian-C.

Apply is an application-style procedure calls. It takes only one expression as parameter, which should evaluate to the list of argument that will be passed to the procedure.

There are Logic commands, such as Assume or Assert. But there are also more kinds of logic commands that are used for the verification mode of Gillian.

Finally the paper describes three more kinds of commands that are not in the implementation. The first one is vanish that is trivially replaced by Assume False in the implementation. Then, ther are the uSym and iSym commands which are mainly theoretical devices that ensure soundness in the presence of fresh-value generation. In the implementation, we provide an allocation mechanism that allows the creators of Gillian instantiations to generate fresh interpreted and uninterpreted symbols.

Procedures and programs

As explained earlier, there is no defined set A of actions, actions are denotted by their name, a string. Also, the procedures and programs contain much more information than what is in the paper.

type ('annot, 'label) proc = {
  proc_name : string;
  proc_body : ('annot * 'label option * 'label Cmd.t) array;
  proc_params : string list;
  proc_spec : Spec.t option;
}

type ('annot, 'label) prog = {
  imports : string list;
  lemmas : (string, Lemma.t) Hashtbl.t;
  preds : (string, Pred.t) Hashtbl.t;
  only_specs : (string, Spec.t) Hashtbl.t;
  procs : (string, ('annot, 'label) Proc.t) Hashtbl.t;
  macros : (string, Macro.t) Hashtbl.t;
  bi_specs : (string, BiSpec.t) Hashtbl.t;
  proc_names : string list;
  predecessors : (string * int * int, int) Hashtbl.t;
}

Procedures have a name, a body and parameters as described in the paper. However, each command in the body is also annotated with an opaque value that can be decided by the user (it has the 'annot polymorphic type). These annotations can be used to keep information during execution that helps understanding the result of an analysis. Every command is also attached to a label, that has polymorphic type 'label. Most often, we use string labels for labeled programs and int labels for labeled programs as explained above. Finally, procedures can also have specifications that are used for verification but are out of scope for the PLDI2020 paper.

Programs are not just a map from procedure identifiers to procedures. There are also:

  • lemmas, predicates and specifications that are used for verification (out of scope her)

  • bi_specs which are precomputed hints for automatic compositional testing

  • macros which are used to define syntactic sugar over lists of logic commands, useful for readability, and unfolded at execution time

  • A predecessors table used for the Phi Assignment

The Memory Interfaces

Here is how Memory models are defined in the paper:

Definition (Concrete Memory Model): A concrete memory model, \(M \in \mathbb{M}\), is a triple \(\langle |M|, A, \underline{\mathsf{ea}}\rangle\), consisting of a set of concrete memories, \(|M| \ni \mu\), a set of actions \(A \ni \alpha\), and the action execution function \(\underline{\mathsf{ea}} : A \rightarrow |M| \rightarrow \mathcal{V} \rightarrow \wp(|M| \times \mathcal{V})\), pretty-printed \(\mu.\alpha(v) \rightsquigarrow (\mu', v)\).

Definition (Symbolic Memory Model): A symbolic memory model, \(\hat M \in \mathbb{M}\), is a triple \(\langle |\hat M|, A, \hat{\underline{\mathsf{ea}}}\rangle\), consisting of a set of symbolic memories, \(|\hat M| \ni \hat \mu\), a set of actions \(A \ni \alpha\), and the action execution function \(\underline{\mathsf{ea}} : A \rightarrow |M| \rightarrow \mathcal{V} \rightarrow \wp(|M| \times \mathcal{V})\), pretty-printed \(\mu.\alpha(v) \rightsquigarrow (\mu', v)\), pretty-printed \(\hat \mu.\alpha(\hat e) \rightarrow (\mu', \hat e', \pi ')\).

In the implementation, Concrete Memory Models and Symbolic Memory Models have an interface a bit more complex. The complete interface can be found in the files GillianCore/engine/SymbolicSemantics/SMemory.ml and GillianCore/engine/ConcreteSemantics/CMemory.ml.

These interfaces do export:

  • type t, the type of memories, which correspond respectively to \(|M|\) and \(|\hat M|\)

  • val execute_action: string -> t -> vt list -> action_ret for the concrete memory models, which corresponds to the theoretical definition apart from the fact that actions are represented by their string name and that concrete actions can return an error, which is used for automatic compositional testing (out of scope here)

  • val execute_action: string -> t -> PFS.t -> TypeEnv.t -> vt list -> action_ret for the symbolic memory models, which correspond to the theoretical definition apart from actions that are represented by their string names, the fact that the actions can return errors which are used for automatic compositional testing (out of scope here), and the path conditions (\(\pi\)) are split into two parts : PFS.t which are set of pure formulae and TypeEnv.t which are special kind of pure formulae corresponding to the type of values.

These interfaces export more definitions. Since, for efficiency reasons, the type of memories can be mutable, the user must define an init function and a copy function. The user also has to define pretty printers for its state, which are used for the log files.

Finally, there are a lot of definitions (ga_to_..., is_overlaping_asrt, assertions, mem_constraints, type err_t, etc.) that are used either for verification or automatic compositional testing and are not presented in the PLDI20 paper because they are out of scope.

The State Model interface

In the paper, the state model interface is defined as below:

Definition (State Model): A state model \(S \in \mathbb{S}\) is a quadruple \(\langle|S|, \mathsf{V}, A, \mathsf{ea}\rangle\), consisting of: (1) a set of states on which GIL programs operate, \(|S| \ni \sigma\); (2) a set of values stored in those states, \(\mathsf{V} \ni v\); (3) a set of actions that can be performed on those states, \(A \ni \alpha\); and (4) a function \(\mathsf{ea}: a \rightarrow |S| \rightarrow \mathsf{V} \rightarrow \wp(|S| \times \mathsf{V})\) for execution actions on states. All GIL states must contain an internal representation of a variable store, denoted by \(\rho\), assigning values to program variables.

We write \(\sigma.\alpha(v) \rightsquigarrow (\sigma', v')\) to mean \((\sigma', v') \in \mathsf{ea}(\alpha, \sigma, v)\), and refer to \(\sigma'\) as the state output and to \(v'\) as the value output of \(\alpha\).

It is also added that:

A state model \(S = \langle |S|, \mathsf V, A, \mathsf{ea}\rangle\) is proper if and only if its set of actions, A, includes the following distinguished actions/families of actions:

  • \(\{ \mathsf{setVar}_x \}_{x \in \mathcal{X}}\) for updating the value of \(x\) in the store of a given state, denoted by \(\sigma.\mathsf{setVar}_x(v)\);

  • \(\mathsf{setStore}\), for replacing the entire store of a given state with a new store, denoted by \(\sigma.\mathsf{setStore}(\rho)\);

  • \(\mathsf{getStore}\), for obtaining the store of the given state, denoted by \(\sigma.\mathsf{getStore}()\);

  • \(\{ \mathsf{eval}_e \}_{e \in \mathcal{E}}\) for evaluationg the expression \(e\) in a given state, denoted by \(\sigma.\mathsf{eval}_e(-)\);

  • \(\mathsf{assume}\), for extending the given state with the information denoted by its argument value, denoted by \(\sigma.\mathsf{assume}(v)\);

  • \(\mathsf{uSym}\) and \(\mathsf{iSym}\), for generating new uninterpreted and interpreted symbols, respectively. From now on, we work with proper state models.

In the implementation, the interface of state models, available in GillianCore/engine/GeneralSemantics/State.ml is a bit difference and more complex.

First of all, the state interface defines "proper state models" in the first place. However, these state models do not define "families of actions". For example, eval_expr is one particular function exposed by the state interface, and has the following signature:

val eval_expr : t -> Expr.t -> vt

setVar is defined in terms of setStore and getStore directly by the interpreter:

let update_store (state : State.t) (x : string) (v : Val.t) : State.t =
    let store = State.get_store state in
    let _ = Store.put store x v in
    let state' = State.set_store state store in
    state'

Note that variables are designated by their string names. Also note the usage of Store.put: stores have their own interface in the implementation which greatly simplify their usage. Setting a variable in the store is simply getting the store of the state, setting the variable to the correct value in the store and putting that new obtained store back in the state.

States can be mutable to improve the performances, and therefore there is an init and a copy function.

The execute_action function defined in the state interface corresponds only to the lifting of user-defined memory-model actions, given that all necessary actions to have a proper state are defined as functions of their own.

val execute_action : string -> t -> vt list -> action_ret

Once again, actions are designated by their string names, and actions can return either a list of successful state or some errors that can be used for automatic compositional testing.

Finally, there are a lot of different functions that do not correspond to any aspect of the state models presented in the paper such as unify_assertion, produce_posts, apply_fixes, etc. which are useful either for the verification mode or the automatic compositional testing mode of Gillian, and are out of scope for the Gillian PLDI2020 paper.

Allocators

In the paper allocators have the following definition:

An allocator \(AL \in \mathbb{A}\mathbb{L}\) is a triple \(\langle|AL|, \mathsf Y, \mathsf{alloc}\rangle\), consisting of: (1) a set \(|AL|\ni \xi\) of allocation records; (2) a set \(Y\) of all values that are allowed to be allocated; and (3) an allocation function:

\[\mathsf{alloc}: |AL| \rightarrow \mathbb{N} \rightarrow \wp(\mathsf Y) \rightharpoonup |AL|\times V\]

pretty-printed as \(\xi.\mathsf{alloc}(j)\rightharpoonup_{\mathsf Y}(\xi', y)\), which takes an allocation record \(\xi\), a, allocation site \(j\), and an allocation range \(Y \subseteq \mathsf Y\), and returns a fresh value \(y \in Y\), together with the appropriately updated allocation record \(\xi'\).

Intuitively, an allocation record maintains information about already allocated values. This apporach is complementary to the free set approach, where information is maintained about values that can still be allocated. An allocation site \(j\) is the program point associated with either the \(\mathsf{uSym}_j\) or the \(\mathsf{iSym}_j\) command.

This could be interpreted in terms of OCaml module signature as:

module type Allocator = sig
  type t    (** Type of allocation records     *)
  type us_t (** Type of uninterpreted symbols **)
  type is_t (**  Type of interpreted symbols   *)

  val alloc_us : t -> int -> t * us_t
  val alloc_is : t -> int -> t * is_t
end

However, for efficiency, we chose this implementation:

(* Allocator.ml *)
module type S = sig
  type t                   (** Type of value to allocate *)

  val alloc : unit -> t    (** Allocation function *)
  val dealloc : t -> unit  (** Deallocation function *)
  val eq : t -> t -> bool  (** Equality of values to allocate *)
  val reset : unit -> unit (** Reset this allocator *)
end

The reset function is useful for bulk-testing. When running a new test, every allocator is reset.

The Abstract location allocator (in ALoc.ml), which corresponds to uninterpreted symbols, are then initiated like this:

include Allocators.Make_with_prefix
          (Basic ())
          (struct
            let prefix = Names.aloc_
          end)

Where Make_with_prefix is a functor that takes:

  • An abstract Allocator AL that produces values which can be stringified.

  • A string prefix

and it returns an Allocator that allocates strings of the form PREFIX_A where PREFIX is the given prefix and A is a stringification of the allocated by AL.

In this case, as the AL parameter, we use Basic () which instantiates an abstract allocator module that internally just allocates integers.