A Deductive Verification Platform for Cryptographic Software Electronic Communications of the EASST Volume 33 (2010) Proceedings of the Fourth International Workshop on Foundations and Techniques for Open Source Software Certification (OpenCert 2010) A Deductive Verification Platform for Cryptographic Software M. Barbosa, J. Pinto, J.-C. Filliâtre and B. Vieira 17 pages Guest Editors: Luis S. Barbosa, Antonio Cerone, Siraj A. Shaikh Managing Editors: Tiziana Margaria, Julia Padberg, Gabriele Taentzer ECEASST Home Page: http://www.easst.org/eceasst/ ISSN 1863-2122 http://www.easst.org/eceasst/ ECEASST A Deductive Verification Platform for Cryptographic Software M. Barbosa1, J. Pinto2, J.-C. Filliâtre3 and B. Vieira4 1mbb@di.uminho.pt 2jsp@di.uminho.pt 4barbarasv@di.uminho.pt CCTC/Departamento de Informática, Universidade do Minho, Campus de Gualtar, Braga, Portugal 3 Jean-Christophe.Filliatre@lri.fr INRIA Saclay - Île-de-France, ProVal, Orsay, France LRI, Université Paris-Sud, CNRS, Orsay, France Abstract: In this paper we describe a deductive verification platform for the CAO language. CAO is a domain-specific language for cryptography. We show that this language presents interesting challenges for formal verification, not only in the rich mathematical type system that it introduces, but also in the cryptography-oriented language constructions that it offers. We describe how we tackle these problems, and also demonstrate that, by relying on the Jessie plug-in included in the Frama- C framework, the development time of such a complex verification tool could be greatly reduced. We base our presentation on real-world examples of CAO code, extracted from the open-source code of the NaCl cryptographic library, and illustrate how various cryptography-relevant security properties can be verified. Keywords: formal program verification, cryptography 1 Introduction Background. The development of cryptographic software is clearly distinct from other areas of software engineering. Cryptography is an inherently interdisciplinary subject. The design and implementation of cryptographic software draws on skills from mathematics, computer sci- ence and electrical engineering. However, there is a clear lack of domain specific languages and tools for the development of cryptographic software that can assist developers in facing these challenges. The CACE project (http://www.cace-project.eu) addresses this lack of support by pursuing the development of an open-source toolbox comprising languages, tools and libraries tailored to the implementation of cryptographic algorithms and protocols. The formal verifica- tion tool described in this work has been developed to allow the static analysis of code written in CAO, a new domain-specific language developed as a part of the CACE effort. Currently, this tool is being employed in the formal verification of an open-source library written in CAO. The results in this paper already reflect a part of this verification effort. The CAO language[Bar09] allows practical description of cryptography-relevant programs. Unlike languages used in mathematical packages such as Magma or Maple, which allow the 1 / 17 Volume 33 (2010) mailto:mbb@di.uminho.pt mailto:jsp@di.uminho.pt mailto:barbarasv@di.uminho.pt mailto:Jean-Christophe.Filliatre@lri.fr http://www.cace-project.eu A Deductive Verification Platform for Cryptographic Software description of high-level mathematical constructions in their full generality, CAO is restricted to enabling the implementation of cryptography kernels (e.g. block ciphers and hash functions) and sequences of finite field arithmetics (e.g. for elliptic curve cryptography). CAO has been designed to allow the programmer to work over a syntax that is similar to that of C, whilst focusing on the implementation aspects that are most critical for security and efficiency. The memory model of CAO is extremely simple (there is no dynamic memory allocation, there are no side-effects in expressions, and it has call-by-value semantics). Furthermore, the language does not support any input/output constructions, as it is targeted at implementing the core components in cryptographic libraries. On the other hand, the native types and operators in the language are highly expressive and tuned to the specific domain of cryptography. These languages feature can then be used by the CAO compiler to provide domain-specific analysis and optimization. Deductive program verification and Frama-C. Program Verification is the area of Formal Methods that aims to statically check software properties based on the axiomatic semantics of programming languages. In this paper we focus on techniques based on Hoare logic, brought to practice through the use of contracts – specifications consisting of preconditions and postcondi- tions, annotated into the programs. Verification tools based on contracts are becoming increas- ingly popular, as their scope evolved from toy languages to realistic fragments of languages like C, C#, or Java. We use the expression deductive verification to distinguish this approach from other ways of checking properties of programs, such as model checking. In this work we build on Frama-C [BFM+08], an extensible framework where static analysis of C programs is provided by a series of plug-ins. Jessie [MM10] is a plug-in that can be used for deductive verification of C programs. Broadly speaking, Jessie performs the translation between an annotated C program and the input language for the Why tool. Why is a Verification Condition Generator (VCGen), which then produces a set of proof obligations that can be discharged using a multitude of proof tools that include the Coq proof assistant [The08], and the Simplify [DNS05], Alt-ergo [CCK06], and Z3 [MB08] automatic theorem provers. The gwhy graphical front-end, allows monitoring individual verification conditions. This is particularly useful when used in combination with the possibility of exporting the conditions to various proof tools, allows users to first try discharging conditions with one or more automatic provers, leaving the harder condi- tions to be studied with the help of an interactive proof assistant. Motivation. Experience shows [ABPV09, ABPV10] that a tool such as Frama-C has a great potential for verifying a wide variety of security-relevant properties in cryptographic software implementations. However, it is well-known that the intrinsic characteristics of the C language make it a hard target for formal verification, particularly when the goal is to maximize automa- tion. This problem is amplified when the verification target is in the domain of cryptography, because implementations typically explore language constructions that are little used in other application areas, including bit-wise operations, unorthodox control-flow (loop unrolling, single- iteration loops, break statements, etc.), intensive use of macros, etc. The idea behind the con- struction of a deductive verification tool for CAO is to take advantage of the characteristics of this programming language to construct a domain-specific verification tool, allowing for the same generic verification techniques that can be applied over C implementations, simplifying the ver- ification of security-relevant properties, and hopefully providing a higher degree of automation. Proc. OpenCert 2010 2 / 17 ECEASST Contributions. In this paper we describe an implementation of such a deductive verification platform for the CAO language. We show that CAO presents interesting challenges for formal verification, concerning not only the rich mathematical type system that it introduces, but also the cryptography-oriented language constructions that it offers. We describe how we tackle these problems, namely by presenting what we believe is the first formalisation in first-order logic of the rich mathematical data types that are used in cryptography in the context of deductive veri- fication. We also demonstrate that, by relying on the Jessie plug-in of the Frama-C framework, the development time of such a complex verification tool could be greatly reduced. We base our presentation on real-world examples of CAO code, extracted from the open-source code of the NaCl cryptography library (http://nacl.cr.yp.to). The development of our tool has so far focused on automating safety verification, which we have achieved for a representative set of examples. Organisation of the Paper. The next section expands on the application scenario and functional requirements for our tool. Section 3 describes the high-level implementation choices we have made. In Section 4 we introduce the most relevant parts of the translations performed by the tool. The generation of safety proof obligations is discussed separately in Section 5. Section 6 discusses related work, and Section 7 has some concluding remarks. 2 Deductive verification of CAO programs A detailed specification of CAO can be found in [Bar09]. Appendix A includes an example we will use throughout the paper: a partial CAO implementation of the AES block cipher. As a C-like language, CAO supports analogous definitions of conditionals and loops, as well as global variable declarations, function declarations and procedures. The syntax of expressions is also similar to that of C, although the set of types and operations are significantly different. The CAO type system includes a set of primitive types: arbitrary precision integers int, bit strings of finite length bits[n] , rings of residue classes modulo an integer mod[n] (intuitively, arithmetic modulo an integer, or a finite field of order n if the modulus is prime) and boolean values bool. Derived types allow the programmer to define more complex abstractions. These include the product construction struct, the generic one-dimensional container vector[n] of T, the algebraic notion of matrix, denoted matrix[i,j] of T, and the construction of an extension to a finite field T using a polynomial p(X), denoted mod[T<X>/p(X)]. Algebraic operators are overloaded so that expressions can include integer, ring/finite-field and matrix operations; the natural com- parison operators, extended bit-wise operators, boolean operators and a well-defined set of type conversion (cast) operators are also supported. Bit string, vector and matrix access operations are extended with range selection (also known as slicing operations). An implementation of a type-checker for CAO programs has been derived from the CAO type system formalisation [Bar09]. Hence, for the purpose of this paper we will assume that the CAO verification tool has access to an Abstract Syntax Tree (AST) with complete type information for the input CAO program. Note that this includes the concrete sizes of all container types, the moduli and polynomials in rings and finite fields, etc. Furthermore, the CAO type checker is able to reject all programs where incompatible type parameters are passed to an operator. For 3 / 17 Volume 33 (2010) http://nacl.cr.yp.to A Deductive Verification Platform for Cryptographic Software example, the size restrictions associated with matrix addition and multiplication are enforced by the type system. The same happens for operations involving bit strings, rings and finite fields, where the type system checks that operator inputs have matching lengths, moduli, etc. Safety in CAO. The verification that a program will not reach a point of execution that may result in a catastrophic failure, namely a run-time error, is commonly known as a safety verification. This type of verification goal is admittedly a modest one. Nevertheless, not only is safety veri- fication a critical aspect for most practical applications, but also it is frequent that even this is a challenge for existing tools. In many cases, safety verification cannot be dealt with automatically, and it may become a labour-intensive activity. One of the requirements for the CAO verification tool is that safety verification should be feasible with minimum intervention from the end-user. Program safety in CAO has two dimensions: memory safety and safety of arithmetic opera- tions. A program is said to be memory safe if, at run-time, it never fails by accessing an invalid memory address. Memory safety verification is not in general a trivial problem in languages with pointers and heap-based data structures, and indeed there exist dedicated verification tools for this task. However, for CAO programs, this problem is reduced to making sure that all indices used in vector, bit string and matrix index accesses are within the proper range, which is fully determined by the type of the container and must be fully determined at compile time. The safety of arithmetic operations is more interesting. In CAO we have four algebraic types: arbitrary precision integers, rings of residue classes modulo a non-prime, finite fields, and matri- ces thereof. The semantics of operators over these types is precisely given by the mathematical abstractions that they capture. This means that the concept of arithmetic overflow does not make sense in this context, and it leaves as candidate safety verification goals the possibility that such operators are not defined for some inputs. For integers, this reduces to the classic division-by- zero condition, whereas matrix addition and multiplication introduce are intrinsically safe. Rings and finite fields pose an interesting problem, as they are not distinct CAO types. Take the following declarations: def a : mod[13] := 4; def b : mod[10] := 5; def c : mod[13] := 1/a; def d : mod[10] := 1/b; All of these operations are safe, except for the initialization of d. The reason for this is that the multiplicative inverse modulo 10 is only defined for those integers in the range 1 to 9 that are co-prime with 10. This means that, whenever a division occurs in the mod[n] type, one must also ensure that the divisor is co-prime to the modulus. When the modulus is a prime number, then the mod[n] type represents the finite field of size n. In this case, the previous problem reduces again to the division-by-zero case, as all non-zero elements have a multiplicative inverse. However, this observation does not help, unless there is a way to verify that the modulus is indeed a prime number. One way to do this, of course, is to allow the programmer to vouch for the primality of the modulus. We will return to this issue in Section 4. Finally, a related problem arises when one considers the construction of extension fields. In this case, not only must one ensure that the underlying type represents a finite field (which might not be the case for the mod[n] type) but also that the polynomial that is provided is irreducible in the corresponding ring of polynomials. Proc. OpenCert 2010 4 / 17 ECEASST Extending CAO with annotations. CAO-SL is a specification language to be used in annotations added to CAO programs. These annotations are embedded in comments (so that they are ignored by the CAO compiler) using a special format that is recognised by the verification tool. CAO-SL is strongly inspired by ACSL [BFM+08] and enables the specification of behavioral properties of CAO programs. CAO-SL stands to CAO in the same way that ACSL stands to C. The expressions used in annotations are called logical expressions and they correspond to CAO expressions with additional constructs. The semantics of the logical expressions is based on first- order logic. CAO-SL includes the definition of function contracts with pre- and postconditions, statement annotations such as assertions and loop variants and invariants, and other annotations commonly used in specification languages. CAO-SL also allows for the declaration of new logic types and functions, as well as predicates and lemmas. A complete description of CAO-SL can be found in [Bar09]. In this paper, various features of this language will be introduced gradually, as we describe the tool architecture and implementation. 3 Tool implementation Our tool follows the same approach used in other scenarios for general-purpose languages such as Java [MPU04] and C [FM04]. Furthermore, the tool architecture itself fundamentally relies on the Jessie plug-in included in the Frama-C framework. This allowed us to significantly re- duce the tool development time and effort. Jessie enables reasoning about typical imperative programs, and it is equipped with a first-order logic mechanism, which facilitates the design of new models and extensions. In particular, it is possible to use this feature to define in Jessie a model of the domain-specific types and memory model of CAO. This means that an annotated CAO program can be translated into an annotated Jessie program and, from this point on, our verification tool can rely totally on the functionality of Jessie and Why. The translation is such that correctness of the Jessie program entails the correctness of the source CAO program. The Jessie Input Language. The Jessie input language is a simply typed imperative language with a precisely defined semantics. It is used as an intermediate language for verification of C programs in the Frama-C framework. As an intermediate language, programmers are not expected to produce Jessie source programs from scratch. Jessie was developed in parallel with ACSL, and they share many constructions. The language combines operational and logic features. The operational part refers to statements which describe the control flow and instruc- tions that perform operations on data, including memory updates. The logical part is described through formulas of first-order logic, attached to statements and functions in the form of anno- tations. Jessie provides primitive types such as integers, booleans, reals and unit (the void type), abstract datatypes and also allows the definition of new datatypes. Programs can be annotated using pre- and postconditions, loop invariants, and other interme- diate assertions. The logical language is typed and includes built-in equality, booleans, arbitrary precision integers, and real numbers. Implementation strategy. In order to better illustrate our approach to designing a VCGen for CAO taking advantage of an existing generic VCGen, we introduce a very simple example. 5 / 17 Volume 33 (2010) A Deductive Verification Platform for Cryptographic Software Consider the definition of a VCGen for the subset of CAO that is essentially a while language with applicative arrays1, and how one would deal with both aliasing and safety. The weakest precondition of the array assignment operation would resemble the following wp(u[e] := x, Q) = safe(u[e]) ∧ unalias(Q, e, x) where safe(u[e]) = safe(e) ∧ 0 ≤ e < length(u), safe(e) imposes that the evaluation of e will not produce arithmetic errors, and the function unalias would process Q, updating the contents of index e to x considering aliasing, e.g. unalias(u[ j] > 100, i, 10) = (i = j =⇒ 10 > 100) ∧ (i 6= j =⇒ u[ j] > 100) If one momentarily forgets safety considerations, an alternative possibility is to construct the VCGen on top of an existing VCGen for the base while language, not by adding new dedicated rules, but instead by translating the new type being introduced as a logical type. Applicative arrays can be modeled by the following well-known axioms for the get and set logical functions. ∀u, e, x. get(set(u, e, x), e) = x (1) ∀u, e, e′, x. e 6= e′ =⇒ get(set(u, e, x), e′) = get(u, e′) (2) In the presence of this theory, one can use the VCGen for the core while language to derive verification conditions (VCs) for array lookup expressions and assignment commands, by simply translating these to use the definitions above: T (u[e]) ≡ get(u, T (e)) T (u[e] := x) ≡ u := set(u, T (e), T (x)) A powerful generic VCGen such as Jessie allows us to follow this approach for the entire CAO language. There is an overlap between the CAO language and the Jessie input language that enables a direct translation of many language constructions. Furthermore, for each CAO type that is not supported by Jessie, we are able to declare a set of logical functions and write a theory for them that creates a suitable first-order model of the type. This then enables us to translate arbitrary annotated CAO programs into suitable programs of the Jessie input language. Let us now turn back to the example above, to see how we deal with safety conditions using Jessie’s assert clause to force the generation of arbitrary proof obligations. Safety conditions for applicative arrays can be generated by using the following translation: T (u[e]) ≡ assert 0 ≤ T (e) < length(u); get(u, T (e)) T (u[e] := x) ≡ assert 0 ≤ T (e) < length(u); u := set(u, T (e), T (x)) Of course, in CAO we have to deal with data types that are considerably more sophisticated than arrays. Yet, the general pattern followed in the implementation of our tool is the same. The introduction of each new type implies the introduction of a new theory. The definition of a new theory includes the definition of logic functions together with axioms to model their behavior. Some lemmas and predicates are also introduced to speed up the process of proving some goals. At this point, it makes sense to ask which properties of those types should be included in the corresponding logical model. Soundness is of course the most important property that should be guaranteed by our translation process: the Jessie model should not allow proving assertions about CAO data structures that are not valid according to the language semantics. A second very 1 We use this denomination for typical imperative arrays with destructive update, and with opaque storage. Proc. OpenCert 2010 6 / 17 ECEASST desirable property is that the model should allow for as many assertions as possible to be proved automatically. More precisely, the verification conditions produced by Jessie, and exported to some external theorem prover, should as much as possible be discharged automatically. Emphasis on Automation. The fact that Jessie relies on the Why VCGen, which is a multi- prover tool, means that it is possible to export verification conditions to a large number of differ- ent proof tools, from SMT-solvers to the Coq interactive proof assistant. The typical workflow is to first discharge “easy” VCs using an automatic prover, and then interactively trying to dis- charge the remaining conditions. Once the model of CAO is fixed, different properties of CAO code will naturally have different degrees of automation with respect to discharging VCs. As is true of VCGens for other realistic languages, safety conditions should be proved with a high degree of automation, whereas a lower degree should be expected for other functional proper- ties. Our approach is multi-tiered in the sense that we start with high-level models tuned for automatic verification (in particular of safety properties); these models can then be refined into lower-level models that take advantage of theories supported by specific automatic provers (such as bit strings, integers, and so on). Finally, all models can be further refined to Coq models: inter- active proof is the last resort for discharging VCs, and it may be mandatory for any verification tool based on first-order logic. Our efforts have so far concentrated on maximizing the degree of automation that we can achieve in verifying the safety of CAO programs. We are able, for example, to carry out the safety verification of the entire CAO implementation of the AES block-cipher (see Appendix) without user intervention. This includes heavy use of finite field, vector and matrix operations across several dependent functions. 4 CAO to Jessie translation We will resort to snippets of CAO code to describe the most interesting parts of the CAO to Jessie translation carried out by our verification tool, which essentially correspond to the rich cryptography-specific data types that are available in CAO. In other words, we will focus on the way in which we handle the parts of the CAO language (including the extension to CAO-SL) that do not directly map to constructions in the Jessie input language, leaving out the standard imperative constructions supported by both languages, the CAO types that directly map to Jessie native types, and the translation of annotations. In the following dee denotes the translation of a CAO expression e into Jessie. 4.1 Container Types The container types in CAO include the vector[] of, matrix[] of and bits types. The get and set operations on these types are modeled in Jessie using exactly the second approach that we de- scribed in the example in the previous section. The only caveat is that they are generalized to the two dimensional case in the case of matrices, and that we fix Jessie type bool as the content type in the case of bits. However, CAO includes elaborate operators to deal with these container types that are fine- 7 / 17 Volume 33 (2010) A Deductive Verification Platform for Cryptographic Software blit vector dτe : vector dτe→ vector dτe→ integer → integer → vector dτe shi f t vector dτe : vector dτe→ integer → vector dτe ∀v, ofs, i. get vector dτe(shi f t vector dτe(v, ofs), i) = get vector dτe(v, (ofs + i)) ∀src, dest, ofs, len, i. ofs ≤ i < (ofs + len) =⇒ get vector dτe(blit vector dτe(src, dst, ofs, len), i) = get vector dτe(src, i−ofs) ∀src, dest, ofs, len, i. i < ofs ∨ i ≥ (ofs + len) =⇒ get vector dτe(blit vector dτe(src, dst, ofs, len), i) = get vector dτe(dst, i) Figure 1: Declarations and axioms for vector types. tuned to the implementation of cryptographic algorithms, namely symmetric primitives such as block ciphers and hash functions. As an example, consider the next snippet from the AES implementation in CAO. def ShiftRows( s : S ) : S { def r : S; seq i := 0 to 3 { r[i,0..3] := (Row)(((RowV)s[i,0..3]) |> i); } return r; } What we have here is a sequence of rotation (|>) operations applied to the ith row of matrix s. The way in which this is expressed in CAO takes advantage of the range selection operator (..) that returns a value of the corresponding container type, with the same contents as the original one, but with appropriate dimensions. Here, this operator is used to select an entire row in the matrix, which is cast to the vector type in order to be rotated. The result is then cast back to the correct matrix type that can be assigned to the original row slice in matrix r. Our first-order formalisation of container types deals with shift, rotate, range selection, range assignment and concatenation (@) operators in container types using a pattern that relies on two logic functions (shift and blit). We present the case of the vector type. The model assumes that a vector has infinite length, i.e., it has a start position, but it is represented as an unbounded (infinite length) memory block. The only exception to this rule is the extensional equality operator (==), where translation explicitly refers to the range of valid positions over which equality should hold. We emphasize that this part of the model deals only with the functionality of these operators: safety is handled separately by introducing appropriate assertions, as will be seen in Section 5. Intuitively, the shift logic function takes as input a vector of arbitrary length, starting in posi- tion 0, and produces the sub-vector that starts at position i. The blit logic function involves two vectors, source s and destination d, an index i and a length parameter l. It produces the vector with the contents of d for indices 0 to i-1, and from i+l onwards; the l positions in between con- tain the region 0..l-1 of s. The behaviour of these logic functions is modeled by the declarations and axioms given in Figure 1. Range Selection. Given a CAO variable µ of type vector[n] o f τ , the CAO range selection operation is modeled in Jessie as follows: d µ[i.. j] e ≡ let x1 = die in ( let x2 = d je in assert (0 ≤ x1 < n) && (0 ≤ x2 < n) && (x1 ≤ x2); shi f t(dµe, x1)) Proc. OpenCert 2010 8 / 17 ECEASST where i and j are integer expressions. We remark that although the translation disregards the upper bound j, the typechecker will ensure that the range selection operation µ[i.. j] with µ of type vector[n] o f τ , returns type vector[ j−i + 1] o f τ , thus taking that upper bound into account. Range assignment. Assigning to a region in a vector is modeled directly using the blit function. dµ1[i.. j] := µ2e ≡ let x1 = die in ( let x2 = d je in assert (0 ≤ x1 < n) && (0 ≤ x2 < n) && (x1 ≤ x2); dµ1e = blit(dµ2e,dµ1e, x1, x2 −x1 + 1)) Here, = denotes assignment in Jessie. Concatenation. Consider the CAO variables µ1 and µ2 of types vector[n1] o f τ and vector[n2] o f τ respectively. The concatenation of vectors µ1 and µ2 can also be captured using the blit function. dµ1 @ µ2 e≡ blit(dµ2e,dµ1e, n1, n2) The intuition behind this definition is that concatenation can be seen as a range assignment operation, where µ2 is assigned to the region of µ1 that starts at position n1 (recall that in the model vectors are assumed to have infinite length). Shift and Rotate. To present the shift and rotate operations in a more intuitive way, we will turn to the bits type. Both operations are modeled using the blit function. The rotate operations are commonly known as circular shifts. A downwards circular shift by 1 is defined as a permutation of the entries in a tuple where the last element becomes the first element and all the other elements are down-shifted one position. Conversely, in an upwards circular shift, the first element becomes the last element and all the other are shifted up. As an example, consider the bits literal: 0b1101001. The internal representation of bits in our model stores the least significant bit (the right-most bit in the literal) in the 0-th position. This means that upwards and downwards rotate correspond to the intuitive interpretation of left and right rotations, respectively. An example of a down rotate is therefore 0b1101001 |> 3 = 0b0011101 and an example of an up rotate is 0b1101001 < | 3 = 0b1001110. In our model, for a CAO expression e of type vector[n] o f τ or bits[n], we have: de <| ie≡ de[n−i .. n−1] @ e[0 .. n−i−1]e≡ blit(shi f t(dee, 0), shi f t(dee, n−i), i, n−i) de |> ie≡ de[i .. n−1] @ e[0 .. i−1]e≡ blit(shi f t(dee, 0), shi f t(dee, i), n−i, i) where i is a constant of type int. The intuition is that rotations can be seen as concatenations of the appropriate sub-regions, which in turn are modeled using the blit function. Logical shifts are handled in a similar way, but resorting to bits null vector (the all-zeroes bits value) to fill in the positions left vacant by the operation. Matrices. Our model of matrices for the equivalent vector operators described above is essen- tially a direct generalization of the above strategy to the 2-dimensional case. However, our model of matrices must also account for the fact that the matrix type in CAO is an algebraic type that 9 / 17 Volume 33 (2010) A Deductive Verification Platform for Cryptographic Software supports addition and multiplication operations (indeed this is why in CAO you can only define matrices whose contents are themselves algebraic types). The formalisation of matrices in first-order logic includes the matrix addition and multiplica- tion arithmetic operations as logic functions matrix dτe add, matrix dτe mult : matrix dτe→ matrix dτe→ matrix dτe The functionality of the addition operator is modeled using the following axiom: Axiom 1 Let A and B be matrices of dimensions m×n, and ai j and bi j the elements in the ith row and jth column of A and B, respectively. Then, ∀ j, i. (A + B)i j = ai j + bi j. An equivalent axiom for matrix multiplication was not introduced because, for each possible base type, we would need the (higher-order) logic formalization of summation Σ. The translation of expressions with arithmetic operations of type matrix[n1, n2] o f τ is there- fore the following: d µ1 + µ2 e = matrix dτe add(dµ1e,dµ2e) d µ1 ∗ µ2 e = matrix dτe mult(dµ1e,dµ2e). Bitwise operations. We complete this section with a brief description of how bitwise operations are handled in our model, as these are of critical importance in cryptographic applications. Here we greatly benefit from the design of the CAO language, where the classic ambivalence between integers and their bit-level representations that exists in the C int type is eliminated by introducing the bits type. Indeed, CAO programmers can freely use bit strings of any size, and convert these to and from the type int that represents the mathematical type Z. A very simple model of bit strings based on vectors of bits (boolean values) can be used, although things get more complicated when we need to deal with type conversions. The Jessie model of bitwise operations on bits is based on the following logic functions: bits bitwise xor : bits → bits → bits bits bitwise and : bits → bits → bits bits bitwise or : bits → bits → bits bits bitwise neg : bits → bits which are axiomatized in the obvious way. CAO bitwise operations are translated as: de1 ⊕ e2e≡ bits bitwise d⊕e(de1e,de2e) d! ee≡ bits bitwise neg(dee) where ⊕∈{|, &,ˆ} and µ1 and µ2 are expressions of type bits[n]. 4.2 Rings, fields and extension fields Residue classes modulo n. The mod[n] type is an algebraic type. For n ∈ N, it corresponds to the algebraic ring Zn. Moreover if n is prime, then mod[n] permits programmers to take full advantage of the fact that Zn is a field. The Jessie model for the mod[n] type is based on the theory of integers, taking advantage of optimized models supported by many automatic provers, and fully integrated into Jessie. The model of mod[n] starts with the definition of the logic type modn equipped with logic functions corresponding to the two natural homomorphisms that convert to and from the Jessie integer type, as well as the mapping that results from their composition. Proc. OpenCert 2010 10 / 17 ECEASST int o f modn : modn → integer modn o f int : integer → modn modn : integer → integer Hence, modn represents the mapping from Z to Z that associates to each a ∈ Z the least residue r ∈ Z of [a]. The model is then extended with a set of axioms for the following mathematical properties of these functions ∀x. 0 ≤ int o f modn(x) ≤ n−1 ∀x. 0 ≤ x ≤ n−1 =⇒ modn(x) = x ∀x. modn(int o f modn(modn o f int(x)) = modn(x) ∀x. x ≥ n =⇒ modn(x) = modn(x−n) ∀x. x < 0 =⇒ modn(x) == modn(x + n) The Jessie translation of arithmetic operations involving expressions of type mod[n] is based on the homomorphisms declared above. First, int o f modn is used to get the least residues of the equivalence classes involved in the arithmetic operation. These least residues are integers, which allows us to model the arithmetic operations using the integers theory. Finally, we apply modn o f int to the result to recover the equivalence class that represents that value. Hence, the translation of arithmetic operations on type mod[n] is given as follows for op ∈{+,−,∗,∗∗}. de1 op e2e≡ modn o f int(int o f modn(de1e) opint int o f modn(de2e)) de1 / e2e≡ let x = int o f modn(de2e) in assert gcd(x, n) = 1; modn o f int(int o f modn(de1e)∗int inv mod(x, n)) Note the special case of division. This is justified because the semantics of division modulo n is not the same as integer division. Firstly, one must express the correct semantics, which we do by introducing the logical function inv mod(x, n). Simple properties involving operations with this function, which are used to easily discharge some proof obligations, are axiomatized as follows: ∀x. gcd(int o f modn(x)), n) = 1 =⇒ modn(int o f modn(x)∗int inv mod(int o f modn(x), n)) = modn(1) ∀x, y. modn(int o f modn(x)∗int y) = modn(1) =⇒ inv mod(int o f modn(x), n) = modn(y) Secondly, in the division case, one must generate a proof obligation for the safety condition that CAO programs should not perform undefined divisions. This property is trivially true if the divisor is in the range 1 . . . n-1 and the number n is prime. Hence we add the following axiom to our model, to automatically handle these trivial cases. ∀x, n. is prime(n) ∧ (0 < x < n) =⇒ gcd(x, n) = 1 where is prime : integer → boolean is a predicate to check if an integer number is prime, and gcd : integer → integer → integer is a logic function that calculates the greatest common divi- sor between two integer numbers. Note that is prime and gcd are neither directly defined nor axiomatized, but the programmer can explicitly assert that some n is prime through a CAO-SL annotation. This enables automatically discharging safety assertions using gcd. Extension Fields. Consider the following type declarations taken from the AES implementation: typedef GF2 := mod[2]; typedef GF2N := mod[GF2<X> / X**8+X**4+X**3+X+1]; typedef GF2C := mod[GF2N<Y> / Y**4+1]; 11 / 17 Volume 33 (2010) A Deductive Verification Platform for Cryptographic Software Take the first field extension type GF2N. Types of the form mod[mod[n] <X>/p(X)] are also alge- braic types that model the Galois field of order nd where n is a prime number and d is the degree of the irreducible polynomial p(X ). We emphasize that in CAO each such type represents a specific construction of an extension field, whose representation is fixed as elements of the poly- nomial ring Zn[X ], and the semantics of operations is defined based on polynomial arithmetics modulo p(X ). Furthermore this type is only valid when n is prime and p(X ) is irreducible. The theory of extension fields of this form begins with the definition of a logic type ring modn that represents the ring of polynomials over the base type mod[n] and logic functions to model the elements of the ring and the addition operation that permits combining them. ring modn monomial : modn → integer → ring modn ring modn add : ring modn → ring modn → ring modn Our model explicitly captures the fact that elements of this ring are polynomials, which in turn can be defined as an addition of monomials. The reason for this is that the CAO literal that corresponds to the irreducible polynomial f ield modn poly f (x) generator used to construct these types can then be represented in our logical model. A monomial can be represented by its coefficient (which is an element of modn) and its degree (an integer). Arithmetic operations over the polynomial ring are not included in the model, as they do not exist in CAO. Indeed our model is purposefully incomplete because we do not intend to use automatic theorem provers on verification conditions involving arbitrary extension field algebra. The goal is to use specific interactive proof assistants, namely Coq, to prove these kinds of properties, relying on existing libraries (e.g. SSReflect2) that provide theories for abstract algebra (fields, polynomials, etc). The model is then completed by adding definitions for the type f ield modn poly f (x) and the corresponding arithmetic operations. The Jessie translation of the arithmetic operations defined over mod[mod[n] <X>/p(X)] is then a direct one: de1 op e2e ≡de1e op f ield modn poly f (x) de2e de1 / e2e ≡ let x = de2e in assert x 6= 0 f ield modn poly f (X ) ; de1e div f ield modn poly f (X ) x where op ∈{+,−,∗,∗∗}. Note that there is also a special case for division. This ensures that a safety proof obligation is generated that checks if the divisor is different from zero. A set of axioms that describe basic properties of these operators has been added to the model in order to increase the degree of automation provided by our tool. The goal here is that, given that there is no integrated support for this sort of mathematical construction in the automatic provers interfaced with Jessie, some simple properties can be captured in first order logic that permit dealing with trivial deduction steps, e.g. cancellation rules. The following axioms are included in our model ∀a, b. a 6= 0F ∧ b 6= 0F =⇒ a×F b 6= 0F ∀a, b. a 6= 0F =⇒ a divF b 6= 0F ∀a, b. a 6= b =⇒ a −F b 6= 0F ∀a, b. a 6= −b =⇒ a +F b 6= 0F ∀a, b. a 6= 0F =⇒ a (∗∗)F b 6= 0F ∀a. a 6= 0F =⇒ −F a 6= 0F where F = f ield modn poly f (X ). Literals of the extension field types are modeled in Jessie as 2 http://www.msr-inria.inria.fr/Projects/math-components Proc. OpenCert 2010 12 / 17 http://www.msr-inria.inria.fr/Projects/math-components ECEASST bits[n] ⇒ int mod[n] → int int → bits[n] τ ⇒ mod[τ < X > / f (X )] vector[n] o f τ → mod[τ < X > / f (X )] mod[τ < X > / f (X )] → vector[n] o f τ matrix[1, n] o f τ → vector[n] o f τ matrix[n, 1] o f τ → vector[n] o f τ vector[n] o f τ → matrix[1, n] o f τ vector[n] o f τ → matrix[n, 1] o f τ Figure 2: Casts and coercions vectors of polynomial coefficients. Therefore, logic functions to access and update the coefficient of a given power of some polynomial of type mod[mod[n] <X>/p(X)] are also included in the model, together with the usual two axioms for the theory of arrays. f ield modn poly f (x) get coe f : f ield modn poly f (x) → integer → modn f ield modn poly f (x) set coe f : f ield modn poly f (x) → integer → modn → f ield modn poly f (x) Returning to the example introduced above, it can be seen by examining the type declaration of GF2C that the base type of an extension field can actually be an extension field itself. However, our modeling approach is exactly the same for this case, taking into consideration that the base type must be adjusted when defining the ring of polynomials over the base field. 4.3 Casts and Coercions Type conversion operations in CAO can be explicit, in which case they are called cast operations, or implicit, called coercion operations. Figure 2 presents the allowed cast (→) and coercion (⇒) operations between CAO types. The translation of CAO programs into Jessie handles these conversions in the natural way by using appropriate logical functions. We present a few examples of the simpler conversions: e :: mod[n] =⇒ d(int) ee = int o f modn(dee) e :: int =⇒ d(mod[n]) ee = modn o f int(dee) e :: int =⇒ d(bits[n]) ee = bits o f int(dee) e :: τ =⇒ d(mod[τ<X >/ f (X )]) ee = f ield dτe poly f (x) set coe f ( f ield dτe poly f (x) zero, 0,deeτ ) Conversions between matrices and column/row vectors are handled in the natural way by using get and set operations. Finally, we present the conversion between extension field types and vector types in a bit more detail, since these are very useful CAO operators that permit com- muting between the abstract algebraic view of a finite field, and its concrete representation in a cryptographic implementation. Indeed, one can construct an extension field value from a vector representation that contains the coefficients of the corresponding polynomial over the base field. We model this as d(mod[τ < X > / f (X )]) ee = let x1 = f ield dτe poly f (x) zero in ( let x2 = dee in let x3 = f ield dτe poly f (x) set coe f (x2, n−1, vector dτe get(x2, n−1)) in ... let xn+2 = f ield dτe poly f (x) set coe f (xn+1, 0, vector dτe get(x2, 0)) in xn+2) The inverse conversion is also possible, and is modeled using a similar approach. This translation further justifies our modeling of extension field literals presented in the previous section. 13 / 17 Volume 33 (2010) A Deductive Verification Platform for Cryptographic Software Table 1: Safety proof obligations Type Operation Proof Obligation Auto integer p1/p2 p2 6= 0 × mod[n] p1/p2 gcd(int o f modn(p2), n) = 1; int o f modn(p2) 6= 0 × mod[n] < X > / f (X ) p1 / p2 p2 6= 0 vector[n] o f τ v[e] 0 ≤dee < n v |> i, v <| i 0 ≤die < n v[i.. j] 0 ≤die < n∧0 ≤d je < n∧die < d je matrix[n1, n2] o f τ m[e1, e2] 0 ≤de1e < n1 ∧ 0 ≤de2e < n2 m[i.. j, k..l] 0 ≤die < n1 ∧ 0 ≤d je < n1 ∧ 0 ≤dke < n2 ∧ 0 ≤dle < n2 ∧ die < d je ∧ dke < dle bits[n] b[e] 0 ≤dee < n b |> i, b <| i, b � i, b � i 0 ≤die < n 5 Automatic safety proof obligations Following the same approach adopted in tools such as Frama-C, the CAO to Jessie translation in our tool ensures that all statements in the input program that could potentially result in a safety violation originate the automatic generation of a verification condition that, if proven, guarantees the safe execution of the verified code. We have two classes of safety proof obligations: those related with memory safety, and those related with algebraic type declarations and operations. Some of the proof obligations are au- tomatically generated by the Jessie tool, while others are explicitly introduced in the generated Jessie code as assertions, during the translation process. We have encountered examples of these assertions in the models for division operations presented above. Table 1 presents the proof obli- gations that are generated to ensure the safety of memory access and algebraic operations. Proof obligations automatically generated by the Jessie plug-in are signaled in the table, corresponding to those that originate from the use of the Jessie integer type. The safety proof obligations that are generated when types are declared are limited to the declaration of extension fields of the form mod[mod[n] <X>/p(X)]. In this case, the proof depends on the two following generated lemmas. is prime(n) ring modn is irreducible( f ield modn poly f (x) generator) These are required to allow the automatic discharge of some proof obligations, but they also ensure that the user is aware of the type declarations and their implications. Lemmas can be immediately used in proofs, so for instance the first lemma can be used as an hypothesis in all proof obligations related to division operations in mod[n], requiring that the divisor is relative prime to the modulus. We emphasise however that the presence of lemmas also originates new proof obligations corresponding to the validation of the lemmas themselves. 6 Related work The verification infrastructure introduced in the Jessie plug-in was already used in the devel- opment of other verification tools for C and Java. Caduceus [FM04], a tool for C, and Kraka- toa [MPU04], a tool for Java, are also built on the top of Why tool. The translation into Why performed by Krakatoa is similar to that performed by Frama-C and also adopted in this paper. Proc. OpenCert 2010 14 / 17 ECEASST Boogie [BED+06] is a verification condition generator very similar to Why. The input lan- guages to Boogie and Why are both languages with imperative features and first-order propo- sitions. In both cases, verification condition generation is based on the weakest precondition calculus. Boogie has front-ends for extensions of C# and C which enrich the languages with annotations in first-order logic, such as pre- and postconditions, assertions and loop invariants. The C# extension is known as Spec# [BMF+]. Boogie performs loop-invariant inference using abstract interpretation and then generates the verification conditions for Simplify or Z3. The core component of VCC [CDH+09], a tool for low-level concurrent C programs, is also Boogie. Esc/Java [FLL+02] is another deductive verification tool for Java programs whose annotation language is a subset of JML [LRL+00]. Its architecture is similar to the ones presented above and based on an earlier checker for the Modula-3 language. This tool relies on loop unrolling, and the fact that generation of verification conditions includes optimizations to avoid the exponential blow-up inherent in a naive weakest-precondition computation. It looks for run-time errors in annotated Java programs, but does not model arithmetic overflow. Jack (Java Applet Correctness Kit) [BBC+07] is a static verification tool for JML-annotated programs. It provides support for annotation generation and for interactive verification of func- tional specifications, as well as support for automatic verification of common security policies and for verification of byte-code programs. Its integration in the Eclipse IDE allows for proof obligation inspection, allowing users to visualize where in the code they are originated. 7 Conclusions We have presented a model in first-order logic of certain mathematical objects, taking advantage of the theories implemented in general Satisfiability Modulo Theories (SMT) solvers. The ob- jects that we model have specific interest for cryptographic security and for the verification of CAO programs in particular, but we believe that the model has independent interest and can be of use in other areas, whenever formal verification involving these objects is important. Admittedly, the fact that the tool has not been proved correct is a flaw. Note however that this work is part of a bigger effort on the formalization of the CAO programming language. In related work we are working on an operational semantics for CAO, which we will later use to establish a correctness result for our VCGen. We remark however that the reliability of the VCGen is already high, since we rely on Jessie to capture the semantics of the basic language aspects of CAO, such as control structures and mutually recursive, contract-annotated procedures. References [ABPV09] J. B. Almeida, M. Barbosa, J. S. Pinto, B. Vieira. Verifying Cryptographic Soft- ware Correctness with Respect to Reference Implementations. In Formal Methods for Industrial Critical Systems (FMICS). LNCS 5825, pp. 37–52. Springer, 2009. [ABPV10] J. B. Almeida, M. Barbosa, J. S. Pinto, B. Vieira. Deductive Verification of Crypto- graphic Software. To appear NASA Journal of Innovations in Systems and Software Engineering, 2010. 15 / 17 Volume 33 (2010) A Deductive Verification Platform for Cryptographic Software [Bar09] M. Barbosa. CACE Deliverable D5.2: Formal Specification Language Definitions and Security Policy Extensions. 2009. Available from http://www.cace-project.eu. [BBC+07] G. Barthe, L. Burdy, J. Charles, B. Grégoire, M. Huisman, J.-L. Lanet, M. Pavlova, A. Requet. JACK: a tool for validation of security and behaviour of Java applica- tions. In International Symposium on Formal Methods for Components and Objects (FMCO). LNCS. Springer-Verlag, 2007. [BED+06] M. Barnett, B. yuh Evan Chang, R. Deline, B. Jacobs, K. R. Leino. Boogie: A mod- ular reusable verifier for object-oriented programs. In Formal Methods for Compo- nents and Objects (FMCO 2005). LNCS 4111, pp. 364–387. Springer-Verlag, 2006. [BFM+08] P. Baudin, J.-C. Filliâtre, C. Marché, B. Monate, Y. Moy, V. Prevosto. ACSL: ANSI/ISO C Specfication Language. CEA LIST and INRIA, 2008. [BMF+] M. Barnett, P. Müller, M. Fähndrich, W. Schulte, K. Rustan, M. Leino, H. Venter. Specification and Verification: The Spec # Experience. [CCK06] S. Conchon, E. Contejean, J. Kanig. Ergo : a theorem prover for polymorphic first- order logic modulo theories. 2006. http://ergo.lri.fr/papers/ergo.ps. [CDH+09] E. Cohen, M. Dahlweid, M. Hillebrand, D. Leinenbach, M. Moskal, T. Santen, W. Schulte, S. Tobies. VCC: A practical system for verifying concurrent C. In Conf. Theorem Proving in Higher Order Logics. LNCS 5674. Springer, 2009. [DNS05] D. Detlefs, G. Nelson, J. B. Saxe. Simplify: a theorem prover for program checking. J. ACM 52(3):365–473, 2005. [FLL+02] C. Flanagan, K. R. M. Leino, M. Lillibridge, G. Nelson, J. B. Saxe, R. Stata. Ex- tended static checking for Java. In ACM SIGPLAN Conference on Programming language design and implementation (PLDI’02). Pp. 234–245. ACM, 2002. [FM04] J.-C. Filliâtre, C. Marché. Multi-prover Verification of C Programs. In Davies et al. (eds.), ICFEM. LNCS 3308, pp. 15–29. Springer, 2004. [LRL+00] G. T. Leavens, C. Ruby, K. R. M. Leino, E. Poll, B. Jacobs. JML: notations and tools supporting detailed design in Java. In Proceedings of OOPSLA ’00 (Poster session addendum). Pp. 105–106. ACM, New York, NY, USA, 2000. [MB08] L. de Moura, N. Bjørner. Z3: An Efficient SMT Solver. LNCS 4963/2008, pp. 337– 340. Springer Berlin, April 2008. [MM10] C. Marché, Y. Moy. Jessie Plugin Tutorial. INRIA, 2010. [MPU04] C. Marché, C. Paulin-mohring, X. Urbain. The Krakatoa Tool for Certication of Java/JavaCard Programs annotated in JML. 2004. [The08] The Coq Development Team. The Coq Proof Assistant Reference Manual – Version V8.2. 2008. http://coq.inria.fr. Proc. OpenCert 2010 16 / 17 http://www.cace-project.eu http://ergo.lri.fr/papers/ergo.ps http://coq.inria.fr ECEASST A CAO implementation of AES typedef GF2 := mod[ 2 ]; typedef GF2N := mod[ GF2<X>/X**8 + X**4 + X**3 + X + 1]; typedef GF2V := vector[8] of GF2; typedef S,K := matrix[4,4] of GF2N; typedef Row := matrix[1,4] of GF2N; typedef Col := matrix[4,1] of GF2N; typedef RowV,ColV := vector[4] of GF2N; def M : matrix[8,8] of GF2 := { ... }; def C : vector[8] of GF2 := { ... }; def mix : matrix[4,4] of GF2N := { ... }; def SBox( e : GF2N ) : GF2N { def x : GF2N; if (e == [0]) { x := [0]; } else { x := [1] / e; } def A : matrix[8,1] of GF2:=(matrix[8,1] of GF2)(GF2V)x; def B : GF2V := (GF2V)(M*A); return ((GF2N)B) + ((GF2N)C); } def SubBytes( s : S ) : S { def r : S; seq i := 0 to 3 seq j := 0 to 3 {r[i,j] := SBox( s[i,j] );} return r; } def SubWord( w : vector[4] of GF2N ) : vector[4] of GF2N { def r : vector[4] of GF2N; seq i := 0 to 3 { r[i] := SBox( w[i] ); } return r; } def ShiftRows( s : S ) : S { def r : S; seq i:= 0 to 3 {r[i,0..3]:=(Row)(((RowV)s[i,0..3])|>i);} return r; } def MixColumns( s : S ) : S { def r : S; seq i := 0 to 3 { r[0..3,i] := mix * s[0..3,i]; } return r; } def AddRoundKey( s : S, k : K ) : S { def r : S; seq i := 0 to 3 seq j := 0 to 3 { r[i,j] := s[i,j] + k[i,j]; } return r; } def FullRound( s : S, k : K ) : S { return MixColumns( ShiftRows( SubBytes(s) ) ) + k; } def Aes( s : S, keys : vector[11] of K) : S { seq i := 1 to 9 { s := FullRound( s,keys[i] ); } return ShiftRows( SubBytes(s) ) + keys[10]; } 17 / 17 Volume 33 (2010) Introduction Deductive verification of CAO programs Tool implementation CAO to Jessie translation Container Types Rings, fields and extension fields Casts and Coercions Automatic safety proof obligations Related work Conclusions CAO implementation of AES