High-level Proofs about Low-level Programs Electronic Communications of the EASST Volume 23 (2009) Proceedings of the Ninth International Workshop on Automated Verification of Critical Systems (AVOCS 2009) High-level Proofs about Low-level Programs Holger Gast and Julia Trieflinger 15 pages Guest Editor: Markus Roggenbach Managing Editors: Tiziana Margaria, Julia Padberg, Gabriele Taentzer ECEASST Home Page: http://www.easst.org/eceasst/ ISSN 1863-2122 http://www.easst.org/eceasst/ ECEASST High-level Proofs about Low-level Programs Holger Gast and Julia Trieflinger Wilhelm-Schickard-Institut für Informatik Eberhard Karls Universität Tübingen, Tübingen, Germany gast@informatik.uni-tuebingen.de trieflin@informatik.uni-tuebingen.de http://www-pu.informatik.uni-tuebingen.de/users/gast/ http://www-pu.informatik.uni-tuebingen.de/users/trieflin/ Abstract: Functional verification of low-level code requires abstractions over the memory model to be effective, since the number of side-conditions induced by byte- addressed memory is prohibitive even with modern automated reasoners. We pro- pose a flexible solution to this challenge: assertions contain explicit memory layouts which carry the necessary side-conditions as invariants. The memory-related proof obligations arising during verification can then be solved using specialized auto- matic proof procedures. The remaining verification conditions about the content of data structures directly reflect a developer’s understanding. The development is formalized in Isabelle/HOL. Keywords: verification of C code, pointer programs, precise memory models 1 Introduction The functional verification of low-level C code has recently attracted much attention (e.g. [Tuc08b, TKN07b, CMST09, RH09]). The central challenge in these applications consists in proving the disjointness of memory objects in the C memory model. Unlike in strongly typed languages like Java or C#, the inequality of pointers in C does not imply the disjointness of the memory regions occupied by the referenced objects: pointer arithmetic, pointer casts, and internal point- ers to struct fields allow almost arbitrary overlaps. The strategy proposed in the literature is to maintain a typed view on the untyped memory: Tuch et al. [Tuc08b, TKN07b] employ a variant of separation logic; Cohen et al. [CMST09] maintain a set of disjoint objects in a ghost vari- able; Rakamarić et al. [RH09] use a static analysis to identify parts of C programs that obey the split-heap model [Bur72]. The actual verification is then performed on the typed view. The disjointness of regions is, however, only one aspect of reasoning about low-level pro- grams. Another aspect concerns the invariants and side-conditions associated with data struc- tures. For instance, allocated blocks in C are always contiguous in memory, i.e. overflow in the pointer arithmetic inside them is excluded. This property, in turn, enables pointers into an array to be compared by the less than operator. While it is always possible to augment assertions with suitable side-conditions, these need to be handled explicitly although they are, in fact, invariants that continue to hold through all operations. Reasoning about less precise memory models is more efficient partly because the invariants are implicit. We show that it is possible to associate the invariants with descriptions of low-level memory layouts instead. 1 / 15 Volume 23 (2009) mailto:gast@informatik.uni-tuebingen.de mailto:trieflin@informatik.uni-tuebingen.de http://www-pu.informatik.uni-tuebingen.de/users/gast/ http://www-pu.informatik.uni-tuebingen.de/users/trieflin/ High-level Proofs about Low-level Programs The following function, which initializes raw memory, demonstrates the point. void init(char *p, char *q) { char *r = p; while (r != q) *r++ = 0; } For functional verification, the user would like to give the following natural loop invariant. It asserts that r runs between p and q and that the bytes from p to r have already been initialized (*a denotes reading from address a): p ≤ r ∧ r ≤ q ∧ (∀a. p ≤ a ∧ a < r −→ (*a) = 0) The crucial point to observe is that this invariant requires reasoning about pointer inequalities. To establish the invariant, p ≤ q has to be given as a precondition of the init function. How- ever, a programmer would never initialize a memory region without this property, so that the precondition appears as a merely technical necessity. Building on the lightweight separation method [Gas08], we propose, instead, to make memory layouts explicit in assertions and to associate side-conditions with them. In the example, the layout will contain a ptr-block p q. It describes the memory region between addresses p and q and also includes the side-condition p ≤ q. Furthermore, the proposed approach simplifies proofs about layouts [Gas09] since fewer side-conditions arise (see Section 3). The purpose of this paper is to show how side-conditions and invariants can be maintained implicitly with the memory layout, and that handling them implicitly leads to natural proof obli- gations. In low-level programs it then becomes straightforward to switch between typed and untyped views. Furthermore, we show that the reasoning also covers composite objects such as structs, arrays, and linked lists. The current paper thus extends the earlier work [Gas08, Gas09] to byte-addressed memory and machine-level representations of values. The work presented is carried out in Isabelle/HOL to ensure soundness. The proofs are there- fore partly interactive. However, we structure them to enable a direct comparison with automated approaches: for each algorithm, we first prove auxiliary theorems, which would become axioms in other methods. The verification itself is then essentially automatic (cf. Section 2). Organization Section 2 gives an overview over lightweight separation. Section 3 describes the maintenance of invariants associated with memory layouts. Section 4 applies the resulting framework to two examples. Section 5 discusses related work. Section 6 concludes. Isabelle Notation The notation of Isabelle/HOL mostly uses standard mathematical conven- tions. A few exceptions need to be mentioned. Functions, as usual in higher-order logic, are curried. The function type is denoted by ⇒. Application of function f to argument a is written by juxtaposition f a. Definitions of constants are written by ≡. An interval [a, b) over an or- dered domain is denoted by {a..x and src->y in the typed view. This is shown automatically by proving the respective memory regions disjoint as follows. Proc. AVOCS 2009 8 / 15 ECEASST pre: M I �src : struct point� ‖ �dst : struct point� ∧ point-known Γ ∧ src = SRC ∧ dst = DST ∧ src → x = X ∧ src → y = Y post: M I �SRC : struct point� ‖ �DST : struct point� ∧ SRC → x = X ∧ SRC → y = Y ∧ DST → x = X ∧ DST → y = Y void copy point(struct point *src, struct point *dst) { int i; i = 0; [inv M I �src : struct point� ‖ �dst : struct point� ‖ src ‖ dst ‖ i ∧ point-known Γ ∧ 0 ≤s i ∧ i ≤s sz-of-ty Γ (struct point) ∧ src = SRC ∧ dst = DST ∧ �src → x� = X ∧ �src → y� = Y ∧ ( ∀ j. 0 ≤s j ∧ j