From Hyperedge Replacement to Separation Logic and Back Electronic Communications of the EASST Volume 16 (2009) Proceedings of the Doctoral Symposium at the International Conference on Graph Transformation (ICGT 2008) From Hyperedge Replacement to Separation Logic and Back Mike Dodds and Detlef Plump 18 pages Guest Editors: Andrea Corradini, Emilio Tuosto Managing Editors: Tiziana Margaria, Julia Padberg, Gabriele Taentzer ECEASST Home Page: http://www.easst.org/eceasst/ ISSN 1863-2122 http://www.easst.org/eceasst/ ECEASST From Hyperedge Replacement to Separation Logic and Back Mike Dodds1 and Detlef Plump2 1md466@cl.cam.ac.uk University of Cambridge, UK 2det@cs.york.ac.uk University of York, UK Abstract: Hyperedge-replacement grammars and separation-logic formulas both define classes of graph-like structures. In this paper, we relate the different for- malisms by effectively translating restricted hyperedge-replacement grammars into formulas of a fragment of separation-logic with recursive predicates, and vice versa. The translations preserve the classes of specified graphs, and hence the two ap- proaches are of equivalent power. It follows that our fragment of separation-logic inherits properties of hyperedge-replacement grammars, such as inexpressibility re- sults. We also show that several operators of full separation logic cannot be ex- pressed using hyperedge replacement. Keywords: hyperedge replacement, separation logic, graph grammars, program verification, graph transformation 1 Introduction Hyperedge replacement grammars and separation-logic formulas both define classes of graph- like structures. Hyperedge-replacement grammars [DHK97] are a natural extension of context- free string grammars to the world of hypergraphs. Separation logic [Rey02] is a recently- developed logic which allows compositional reasoning about heap structures using so-called spatial assertions. In this paper, we identify a class of heap-graphs corresponding to separation logic heaps, and show that hyperedge-replacement grammars generating languages of heap-graphs correspond to formulas in a fragment of separation-logic with recursively defined predicates. To show that this correspondence exists, we define a mapping g from separation logic formulas to grammars generating heap-graphs, and a mapping s from grammars to formulas. Our major results are as follows: 1. Both g and s are semantics-preserving with respect to a mapping α from states to heap- graphs. That is, the following diagrams commute. Grammar Graphs Formula Heaps language satisfied by g α= Grammar Graphs Formula Heaps language satisfied by s α−1= 1 / 18 Volume 16 (2009) mailto:md466@cl.cam.ac.uk mailto:det@cs.york.ac.uk From Hyperedge Replacement to Separation Logic and Back 2. As a consequence of (1), our fragment of separation logic is of equivalent expressive power to grammars generating heap-graphs. 3. As a consequence of (2), results for hyperedge replacement languages, such as inexpress- ibility results, can be imported into our fragment of separation logic. 4. The logical operators ∧ and ¬ and the primitive formula true are omitted from our frag- ment. We have proved that a formulas in a fragment with these operators cannot in general be simulated by heap-graph grammars. This paper is based on Part III of the PhD thesis [Dod08]. It is structured as follows. Sections 2 and 3 define the semantics of separation logic and hyperedge replacement respectively. Section 4 defines the mapping from formulas to grammars and back. Section 5 describes separation logic constructs that cannot be modelled using hyperedge replacement. Section 6 explores the consequences of the correspondence and describes the relationship of our work to other research. 2 Separation logic Separation logic [Rey02] extends first-order logic with spatial assertions that permit local rea- soning. Formulas are interpreted over heaps. Assumption 1. Loc is a countably infinite set of locations, and nil is a distinct value not in Loc. The set of elements, Elem is defined as Loc∪{nil}. Definition 1 A heap h : Loc ⇀ Elem×Elem is a finite partial function mapping locations to pairs of elements. We write HE for the set of all possible heaps. We write img(h) for the set of locations held in pairs in the heap, that is img(h) = {v ∈ Loc |∃v′. h(v′) = (v, )∨h(v′) = ( , v)}. Note that in general img(h)\dom(h) may be nonempty, meaning elements in h refer to locations outside the heap. The fusion h1 ⊙h2 of two heaps is defined only if dom(h1) and dom(h2) are disjoint, when it is the union of the two functions. In our heap model, each location maps to a pair of elements. This model is simple enough to easily define our translation, but rich enough to define interesting structures such as trees and lists. Other heap models, such as the one used in [Sim06], allow integer values in the heap, but prohibit pointer arithmetic. Here we abstract away to deal only with the structure of the heap. Separation logic extends first-order logic with several constructs permitting reasoning about heaps. The most basic of these is the points-to assertion a 7→ e1, e2, which asserts that the heap has a singleton domain {a} and that it maps a to (e1, e2). Separation logic also introduces the separating conjunction P0 ∗P1, which asserts that P0 and P1 hold in disjoint sections of the heap. This prohibits sharing. For example, the formula ∃xyz. ((x 7→ z, y)∗(y 7→ z, x)) asserts that the heap associates location x with the pair (z, y), and location y with the pair (z, x), and that x and y are distinct. This is satisfied by heap h1 in Figure 1, but not by heap h2 due to sharing. Our mapping between separation-logic formulas and hyperedge replacement grammars oper- ates over a fragment of the full separation logic. The syntax for the fragment is given below. Proc. Doctoral Symposium ICGT 2008 2 / 18 ECEASST h1 : i i′′ i′′ i′ h2 : i i′′ i′ Figure 1: Heap h1 satisfies the formula ∃xyz. ((x 7→ z, y)∗(y 7→ z, x)) with x instantiated with i, y with i′ and z with i′′. Heap h2 does not satisfy the formula, due to sharing. Variables x, x1, . . . , xn are members of a countably infinite set of variable names Var, and σ is a member of a countably infinite set of predicate names Pred. We use SL to stand for the class of all formulas in this fragment. E ::= Var | nil F ::= emp | x 7→ E, E | F ∗F | ∃x. F | F ∨F | σ (E, . . . , E) | let Γ in F Γ ::= σ (x1, . . . , xn) = F | Γ, Γ The only construct in our fragment which does not appear in [Rey02] is let, which is used to recursively define predicates. A let-expression let Γ in F takes as its arguments a list of predicate definitions Γ and a separation logic formula F , over which the definitions scope. Predicates definitions have the form σ (x1, . . . , xn) = R. Here σ is a predicate name, x1, . . . , xn are variable arguments, and R is the predicate body. Intuitively, a formula let σ (x1, . . . , xn) = R in F states that any (recursive) invocation of the predicate σ in F is satisfied by every heap satisfying R. Predicates defined in Γ scope over the definitions of Γ, so let-statements can define mutually-recursive predicates. We call a formula let-free if it does not contain a let-expression. Example 1. The following formula is satisfied by any heap containing a complete binary tree with nil leaves. We abstract away from a particular root node by existentially quantifying the variable x. let bt(x1) = (x1 7→ nil, nil)∨(∃x2, x3. (x1 7→ x2, x3)∗bt(x2)∗bt(x3)) in ∃x. bt(x) In this predicate each branch of the tree holds a pair of values. The leaves of the tree consist of pairs of nil-values. 2.1 Fragment semantics Satisfaction of a formula in our semantics is defined for a heap h, a variable interpretation i and a predicate interpretation η . Definition 2 A variable interpretation i : Var ⇀ Elem is a partial function mapping variables to heap elements. A variable interpretation i defines an interpretation function [[ ]]i : Var∪{nil} ⇀ Elem, where [[nil]]i = nil and [[v]]i = i(v) for all v ∈ Var. Definition 3 A predicate interpretation η : Pred ⇀ Pow(Loc∗×HE) is a partial function map- ping predicate names to (possibly infinite) sets of pairs, consisting of a sequence of locations and a heap. 3 / 18 Volume 16 (2009) From Hyperedge Replacement to Separation Logic and Back h, i, η |= emp iff dom(h) = /0 h, i, η |= (x 7→ E1, E2) iff dom(h) = {[[x]]i} and h([[E]]i) = ([[E1]]i, [[E2]]i) h, i, η |= P∗Q iff there exist h0, h1 such that h0 ⊙h1 = h and h0, i, η |= P and h1, i, η |= Q h, i, η |= P∨Q iff h, i, η |= P or h, i, η |= Q h, i, η |= ∃x. P iff exists v ∈ Loc such that h, i[x → v], η |= P h, i, η |= σ (E1, . . . , En) iff (([[E1]]i, . . . , [[En]]i), h) ∈ η (σ ) h, i, η |= let Γ in P iff h, i, η [β 7→ k(β )]β∈dom(Γ) |= P where k = fix λ k0. λ σ ∈ dom(Γ). bind σ (~x) = Q to Γ(σ ) in {(~l, h) | h, [ ~x 7→~l ] , η [ β 7→ k0(β ) ] β∈dom(Γ) |= Q} Figure 2: Definition of satisfaction for separation logic. The definition of the satisfaction relation |= is given in Figure 2. We make use of several extra concepts in this definition. ~x stands for a sequence of variables x1, . . . , xn for some n. Similarly ~l stands for a sequence of values. Square brackets are used to denote the function update operator, so f ′ = f [x 7→ y] is defined so f ′(a) = f (a) when a 6= x and f ′(a) = y otherwise. We use [~a 7→~b] to stand for f⊥[~a 7→~b], where f⊥ is the empty function with dom( f⊥) = /0. Given a predicate definition Γ, dom(Γ) is defined as the set of names of predicates defined in Γ. We write ‘bind σ (~x) = R to Γ(σ ) in F ’ to associate ~x and R with the variables and right-hand side of the syntactic definition of σ given in Γ. Predicates are defined using a recursive let-statement. A formula let Γ in P is satisfied by a heap h in variable interpretation i and predicate interpretation η if h satisfies P in the new inter- pretation defined by Γ. The new interpretation is defined as the least fixed-point of the following function fη ,Γ. fη ,Γ = λ k0. λ σ ∈ dom(Γ). bind (σ (~x) = Q) to Γ(σ ) in {(~l, h) | h, [ ~x 7→~l ] , η [ β 7→ k0(β ) ] β∈dom(Γ) |= Q} Intuitively fη ,Γ applies a single iteration of the recursive definitions in Γ. Let σ (~x) = Q be a definition in Γ. The function fη ,Γ is defined so that fη ,Γ(k0, σ ) is the set of pairs that satisfy Q in the predicate interpretation k0. The fixed-point is defined with respect to ≤p, the partial order on predicate interpretations defined by set-inclusion for each predicate name. That is, η1 ≤p η2 if for every predicate symbol σ in the domain of η1, it holds that η1(σ ) ⊆ η2(σ ). Lemma 1 (Existence of least fixed-point) Let η and Γ be an arbitrary predicate interpretation and predicate definition respectively. There exists a least fixed-point for fη ,Γ with respect to ≤p. Proof. ≤p defines a complete lattice, and the fragment SL does not include ¬ or ⇒, so fη ,Γ is Proc. Doctoral Symposium ICGT 2008 4 / 18 ECEASST lift(∃x. let Γ in P) def = let Γ in ∃x. P lift((let Γ in P)∗Q) def = let Γ in (P∗Q) lift(P∗(let Γ in Q)) def = let Γ in (P∗Q) lift((let Γ1 in P)∗(let Γ2 in Q)) def = let Γ1, Γ2 in (P∗Q) lift((let Γ in P)∨Q) def = let Γ in (P∨Q) lift(P∨(let Γ in Q)) def = let Γ in (P∨Q) lift((let Γ1 in P)∨(let Γ2 in Q)) def = let Γ1, Γ2 in (P∨Q) lift(let Γ1 in (let Γ2 in P)) def = let lift(Γ1), Γ2 in P lift(let Γ in P) def = let lift(Γ) in P lift(Γ1, Γ2) def = lift(Γ1), lift(Γ2) lift(σ (~x) = let Γ in P) def = Γ, σ (~x) = P Figure 3: Rewriting function lift used by the flattening function flat. monotonically increasing with respect to ≤p. Consequently the existence of a least fixed-point is guaranteed by Tarski’s fixed-point theorem [Tar55]. 2.2 Flattening formulas Mapping directly from our fragment of separation logic to a graph grammar is difficult, because the potential nesting of let-constructs makes the definition of productions complex. For this reason, our mappings operate over the restricted domain of flat formulas. Definition 4 A formula F ∈SL is flat if it is a let-statement let Γ in P where Γ and P are let-free. We write SLF for the class of all flat formulas in SL. Every formula in our fragment can be rewritten as an equivalent flat formula. To show this we define a function flat which takes as its input an arbitrary formula in our fragment and produces a sematically-equivalent flattened formula. A formula is flattened by incrementally ‘promoting’ let-expressions outwards, merging together predicate definitions, until only the outermost let remains. We assume input formulas are conflict-free, meaning that predicate names are not reused between distinct predicate definitions. The flattening is performed by the function lift, which takes as its argument a formula F with flat sub-expressions and produces a flattened formula lift(F ). lift is defined in Figure 3. Here, the variables P and Q always stand for let-free formulas. All unintroduced cases are defined as the identity transformation. The flattening function flat is defined by a bottom-up transformation of the structure of a formula. Function flat recursively applies flat to the sub-expressions of the formula, and then 5 / 18 Volume 16 (2009) From Hyperedge Replacement to Separation Logic and Back applies the rewriting function lift to the resulting formula. Proposition 1 For every heap h, variable interpretation i and predicate interpretation η , h, i, η |= F if and only if h, i, η |= flat(F ). Proof. By appeal to the correctness of lift. See [Dod08] for full details. 3 Hyperedge replacement This section reviews hyperedge-replacement graph transformation, based on the definitions given in [DHK97]. We assume a fixed label alphabet C and a fixed arity function ari : C → N. A hypergraph H over C and ari is a tuple H = 〈VH , EH , attH , lH , extH〉. Here VH and EH are, respectively, finite sets of vertices (or nodes) and hyperedges (often just referred to as edges), lH : E → C assigns an edge label to each edges, and attH : E → V ∗ assigns to edges a sequence of attachment nodes, with |attH (e)| = ari(l(e)) for all e ∈ EH . The first element of attH (e) is the source of hyperedge e, if it exists. We denote by attH (e)[i] the ith attachment point of e. Finally, extH ∈V ∗ defines a sequence of pairwise-distinct external nodes. Given a hypergraph H and set X ⊆ C of labels we denote by E HX the set {e ∈ EH | l(e) ∈ X} of hyperedges of H with labels in X . The class of all hypergraphs over C is denoted by HC. A graph H ∈ HC is said to be a singleton if VH = extH and |EH| = 1. A singleton H with EH = {e} and att(e) = extH is a handle. The unique handle for a label A is denoted A •. Let H ∈ HC be a hypergraph, e ∈ EH a hyperedge to be replaced, and R ∈ HC a hypergraph with |extR| = ari(l(e)). Then the replacement of e in H by R yields the hypergraph H[R/e] by removing e from EH , adding R disjointly, and merging the i-th external node of R with att(e)[i] for i = 1, . . . ,|extR|. A production over a set of labels X is a pair (A, R) such that A ∈ X , R ∈ HC and ari(A) = |extR|. Let H and H′ be graphs in HC and P a finite set of productions over X . Then H directly derives H′, denoted by H ⇒P H ′, if there are an edge e in H and a production (l(e), R) in P such that H′ ∼= H[R/e]. (Here ∼= denotes isomorphism of hypergraphs). A derivation H ⇒∗P H ′ is a sequence of direct derivations H ∼= H0 ⇒P H1 ⇒P ... ⇒P Hn ∼= H ′, where n ≥ 0. Given a derivation R ⇒∗ H and edge e ∈ ER, the projection of e in H , written H(e) is the subgraph of graph H resulting from edge e in the derivation. A hyperedge-replacement grammar (or HR grammar) G over C is a tuple G = 〈T, N, P, Z〉 where T ⊆ C and N ⊆ C are sets of terminal and non-terminal symbols respectively, P is a finite set of productions over N, and Z is a finite set of initial graphs. The language of graphs L(G) generated by G is the set of all graphs H ∈ HT such that there exists a derivation I ⇒ ∗ P H for some I ∈ Z. The system of equations E QP : N → HN∪T associated with G is defined as E QP(A) = {R | (A, R) ∈ P}. Note that we define hyperedge replacement grammars with a finite initial set of graphs, com- pared to the standard definition which has a single initial graph. This change makes no difference to the properties of the grammars. Proc. Doctoral Symposium ICGT 2008 6 / 18 ECEASST nil nil b a c α −→ α−1 ←− K nil K K 3 2 1 32 1 2 3 1 1 b a c Figure 4: Left: A heap h with three locations. Right: The corresponding heap-graph α(h). 3.1 Heap-graphs and heap-graph grammars We define heap-graphs as hypergraphs corresponding to separation logic heaps. A location l in a heap h is modelled in a heap-graph as a node to which the first attachment point of a single hyperedge of arity 3 is attached. This edge is labelled with the distinguished terminal label ‘K’. The second and third attachment points of the edge correspond to the pair of heap elements (l1, l2) = h(l). The nil value in the heap is modelled by a nil-labelled edge of arity 1. Definition 5 Consider the label alphabet C = {K, nil} with ari(K) = 3 and ari(nil) = 1. A hypergraph H over C is a heap-graph if (1) H contains exactly one nil-labelled edge and (2) each node is the source of at most one edge. We write HG for the set of all heap-graphs. We define a representation function α mapping heaps to corresponding heap graphs. The function constructs a node for each location with a defined value and an additional unique node for the atomic value nil. Definition 6 The function α : HE → HG is defined as follows. Given a heap h, the heap- graph α(h) has nodes V = dom(h)∪ img(h)∪{vnil} and edges E = dom(h)∪{enil}. For each k ∈ E −{enil}, l(k) = K and att(k) = k, k1, k2 where h(k) = (k1, k2). Moreover, l(enil) = nil and att(enil) = vnil. In the following we consider heaps and heap-graphs distinct only up to isomorphism. Under this assumption, α is a bijection, and hence there exists an inverse function α−1 : HG → HE that is also a bijection. The domain of the heap on the left-hand side of Figure 4 contains three locations. The resulting heap-graph, shown on the right of Figure 4, contains three K-labelled hyperedges, corresponding to locations defined by the heap function, and four nodes. The heap locations are labelled a, b, c, and the corresponding nodes and edges in the graph are identified by dashed regions. The extra node corresponds to the nil-element of the heap. We call hyperedge-replacement grammars that generate languages of heap-graphs heap-graph grammars, and use HRH to refer to the class of all heap-graph grammars. 7 / 18 Volume 16 (2009) From Hyperedge Replacement to Separation Logic and Back 3.2 Source normalisation and compatible properties When constructing the grammar that corresponds to a separation-logic formula, we must be careful that we construct a heap-graph grammar. It is simple to ensure syntactically that the first clause of Definition 5, that exactly one nil-labelled edge exists, is respected by every graph generated by a grammar. The second clause, that each node is the source of at most one edge, is more tricky to ensure. We say that a graph is well-sourced if each node is the source for at most one hyperedge. Whether all graphs generated by a grammar are well-sourced is a property of the whole grammar, rather than a simple syntactic property of individual productions. A derivation sequence violating this requirement may involve many rules, and removal of a single rule may convert the grammar into a valid heap-graph grammar. Definition 7 Given hyperedge-replacement grammars G and G′, G′ is a source normalisation of G if L(G′) is the set of all well-sourced graphs in L(G). In [DHK97] it is proved that for any hyperedge-replacement grammar G and predicate P expressing a compatible property ([DHK97], Definition 2.6.1), a HR grammar GP can be con- structed so that L(GP) = {H ∈ L(G) | P(H)}. Intuitively, a property is compatible if it can be tested for a graph G derived from graph R just by testing the property for the projections in G of the nonterminal edges of R. Restriction of a grammar to a compatible property works because the property can be ensured statically by embedding property information into the grammar’s nonterminal symbols. We now state formally the definition of a compatible predicate. This definition is taken verba- tim from [Hab92], with minor updates to conform to our notation. Definition 8 Let C be a class of HR grammars, I a finite set, called the index set, PROP a decidable predicate defined on pairs (H, i), with H ∈ HC, and i ∈ I, and PROP ′ a decidable predicate on triples (R, assign, i) with R ∈ HC, a mapping assign : ER → I, and i ∈ I. Then PROP is called (C, PROP′)-compatible if for all H RG = 〈N, T, P, Z〉∈ C and all derivations A•⇒ R ⇒∗ H i with A ∈ N ∪T and H ∈ HT , and for all i ∈ I, PROP(H, i) holds if and only if there is a mapping assign : ER → I such that PROP ′(R, assign, i) holds and PROP(H(e), assign(e)) holds for all e ∈ ER. A predicate PROP0 on HC is called C-compatible if predicates PROP and PROP ′ and an index i0 exist such that PROP is (C, PROP ′)-compatible and PROP0 = PROP(−, i0). (PROP(−, i0) denotes the unary predicate given by PROP(−, i0)(H) = PROP(H, i0) for all H ∈ HC.) This definition requires the existence of three predicates PROP, PROP′ and PROP0. Property information is held using index values. PROP is the property defined without any auxilliary information recorded for non-terminal nodes. PROP′ records auxiliary information for nonter- minal edges using the assignment function assign. PROP0 eliminates the auxilliary information from PROP by just requiring that PROP holds for some index. We now define a predicate W S0 expressing well-sourcedness. Definition 9 Let W S′ be a predicate defined on triples (R, assign, I), with R ∈ HC a hypergraph Proc. Doctoral Symposium ICGT 2008 8 / 18 ECEASST over label-set C, assign : E NR → Pow(N) a mapping from non-terminal hyperedges to sets of nat- ural numbers, and I ⊆ N a finite set of natural numbers. The set assign records the configuration of terminal edges which can replace the non-terminal edges, while I records the edges attached to the external nodes of R. W S′(R, assign, I) holds if and only if: (1) For each node v ∈VR there exists at most one edge e ∈ ER such that either l(e) is terminal and att(e)[1] = v, or l(e) is non-terminal and there exists n ∈ assign(e) such that att(e)[n] = v. (2) For each i ∈ I there exists exactly one edge e such that either l(e) is terminal and s(e) = extR(i), or l(e) is non-terminal and there exists n ∈ assign(e) such that att(e)[n] = extR(i). The predicate W S is defined as W S(H, I) = W S′(H, emp, I). The predicate W S0(H) holds if W S(H, I) holds for some I ⊆ N. Lemma 2 W S0 holds for a graph H if and only if H is well-sourced. Proof. If. If graph H is well-sourced, then the first requirement for W S′ must be satisfied by definition, because every edge is terminal and every node has at most a single E -labelled edge with the node as its first attachment point. The second requirement is automatically satisfied for the same reason; as every node has either zero or one edge with the node as its source, any appropriate I can always be constructed. Only-if. The condition on terminal edges ensures that the graph satisfies the restriction on E - labelled edge sources. The definition of a heap-graph (Definition 5) places no restriction on the external nodes of the graph, meaning that any I for the external nodes will result in a well-sourced graph. Lemma 3 W S0 is a compatible predicate in the sense of Def. 8. Proof. We must show that for any derivation R ⇒∗ H , W S(H, I) holds if and only if there is a mapping assign : E NR → Pow(N) such that W S ′(R, assign, I) and W S(H(e), assign(e)) holds for all e ∈ E NR . If. Assume that the mapping assign : E NR → Pow(N) exists. If we consider a node n ∈ NH , then there must be at most one attached edge with n as a source. This is because the function assign records the attached edges for the external nodes of the subgraphs H(e), so if we replace the non-terminals with the terminal graphs, W S(H, I) must hold. Only-if. Assume that W S(H, I) holds. Then there must exist a function assign such that W S′(R, assign, I) and W S(H(e), assign(e)) holds for all e ∈ E NR . Assignment assign can be con- structed by removing the subgraphs H(e), constructing a set I such that W S(H(e), I) holds, and assigning to assign(e) the set I. The values of I here can be defined by simple examination of the projections H(e) with respect to the function W S. By construction the predicates must then satisfy our requirements. As an immediate consequence of Lemma 3 and the result of [DHK97], a source normalisa- tion can be constructed for any hyperedge-replacement grammar. We write ⌊G⌋ for the source normalisation of a grammar G. 9 / 18 Volume 16 (2009) From Hyperedge Replacement to Separation Logic and Back 4 Mapping from formulas to grammars and back We now define the function g : SLF → HRH that maps from flat separation logic formulas to hyperedge-replacement grammars, and the function s : HRH → SLF that maps from grammars to separation-logic formulas. Graphs in the mappings are constructed by ‘gluing together’ small graphs into larger graphs. Nodes are identified and merged using tags. Definition 10 Let X be a countably infinite set of tags. A tagged graph R = 〈G,t〉 consists of a hypergraph G and a partial tagging function t : VG → Pow(X ) that associates tags sets to nodes. The tag-set for a node v is t(v). tag(R) gives the union of all tag-sets in R. A tagged graph is well-tagged if the tag-sets of the nodes are pairwise disjoint. The tagged graph R\x is the graph R with the tag x removed from all tag-sets. We show tags visually as label-sets next to graph nodes. In the following two-node graph, the first and second attachment points of the X -edge have tag-sets {x, y, z}, and { j, k} respectively. {x, y, z} 1 X 2 { j, k} Definition 11 Let h be a heap and i a variable interpretation. Then αtag(h, i) is the tagged graph 〈G,t〉. Here G = α(h), and the tagging function t is defined so that for any node v ∈VG and any variable x ∈ Var, x ∈ t(v) if i(x) = l and l is the unique heap location in h mapped to v by α . Definition 12 Let H = 〈G,t〉 be a tagged graph. A unification step on R is constructed by fusing any pair of nodes v1 and v2 where t(v1)∩ t(v2) 6= /0. The resulting node is tagged with t(v1)∪ t(v2). The tag unification of R denoted by ⇓R, is the well-tagged graph constructed by applying unification steps as long as possible. A tag unification is guaranteed to exist for any graph because unification steps are size- reducing and can be applied to any non-well-tagged graph. The tag unification is unique because unification steps are strongly confluent. Definition 13 The join of two tagged graphs R0 and R1, denoted by R0 ⊲⊳ R1, is the tag unifi- cation of the disjoint union of R0 and R1: R0 ⊲⊳ R1 =⇓R0 ⊎R1. Tags are used by the expose operator to attach external nodes to a graph. Definition 14 Let R = 〈H,t〉 be a well-tagged graph, and ~x = x1, . . . , xn be a sequence of tags such that each xi belongs to tag(R). Then expose(H,~x) is identical to H , except (1) it has n external nodes, and (2) the ith external node is the node in H tagged with xi. 4.1 Mapping from formulas to grammars Function g : SLF → HRH, defined in Figure 5, maps a formula F in SLF to a set of initial graphs Z and a set of productions P. A minimal set N of non-terminal labels and T of terminal labels Proc. Doctoral Symposium ICGT 2008 10 / 18 ECEASST g[[let Γ in F ]] = 〈Z, P〉 where H = h[[F ]] ⊲⊳ nil1 {nil} Z = expose(H,{nil}) P = r[[Γ]] h[[x 7→ E1, E2]] = {⇓H} where H = {x} K {E1} {E2} 1 2 3 h[[σ (E1, . . . , En)]] = {⇓H} where H = {E1} . . . {En} 1 n σ n + 1 {nil} h[[∃x. P]] = {H′ | H ∈ h[[P]]∧H′ ∼= H \x} h[[emp]] = {〈 empty graph 〉} h[[P∨Q]] = h[[P]]∪h[[Q]] h[[P∗Q]] = h[[P]] ⊲⊳ h[[Q]] r[[Γ1, Γ2]] = r[[Γ1]]∪r[[Γ2]] r[[ σ (x1, . . . , xn) = P ]] = {(σ , H ′) | H ∈ h[[P]]∧ tag(H) = {x1, . . . , xn, nil}∧ H′ = expose(H, (x1, . . . , xn, nil))} Figure 5: Mapping g from separation logic formulas to hyperedge-replacement grammars. can be inferred from the labels used in the productions and the initial graphs. This suffices to define a heap-graph grammar H = 〈N, T, P, Z〉. The function is defined using two subsidiary functions: h, which takes as its argument a let- free formula and recursively constructs a sets of tagged graphs, and r, takes as its argument let-formulas and constructs sets of productions. The base cases for h construct edges from points-to assertions and predicate calls. Terminal K-labelled hyperedges are constructed from points-to assertions, with attached nodes tagged according to the assertion’s arguments. Non-terminal hyperedges are constructed from calls to predicates. Attachment points are tagged corresponding to the predicate’s variable arguments, with the addition of a nil-tagged node needed to track the common nil. In both cases, after generating an edge the tag unification operator ⇓ is applied. This ensures that only a single node exists with any tag x in its variable-set. The recursive cases for h merge graphs using the graph join operator. Disjunction merges by union the graph-sets constructed for each subexpression of the formula. Intuitively, a heap satis- 11 / 18 Volume 16 (2009) From Hyperedge Replacement to Separation Logic and Back fies the disjunction if it satisfies either the left-hand subexpression or right-hand subexpression, so set-union models the intuitive semantics. The separating conjunction applies the graph-join operator ⊲⊳ pairwise to the two sets constructed from its subexpressions. This fuses any nodes that are referred to by the same variable. The function r is called by g to construct productions from predicate definitions. The right- hand sides of productions are constructed by applying h to the right-hand sides of the predicate definitions and then apply the expose operator to attach the appropriate external nodes. Consider the following small formula, which asserts that the heap contains a linked list. let ll(x1) = (x1 7→ nil, nil)∨(∃x2. (x1 7→ nil, x2)∗ll(x2)) in ∃x. ll(x) Applying g to the two subexpressions (x1 7→ nil, x2) and ll(x2) gives the following pair of single-element graph sets. h[[x1 7→ nil, x2]] = { {x1} 1 K 2 3 {nil} {x2} } h[[ll(x2)]] = { {x2} 1 ll 2 {nil} } The fusion of these two graphs gives the following single graph. Note that the label x2 has been removed as per the mapping of existentially quantified variables. h[[∃x2. (x1 7→ nil, x2)∗ll(x2)]] =        {x1} 1 3 2 K 1 ll 2 {nil}        When combined with the other disjunct in the predicate definition, this results in the following production definitions. ll ⇒ 1 1 3 2 K 2 1 ll 2 3 1 K 1 2 2 3 When combined with an initial graph constructed from ∃x. ll(x), these productions give a grammar corresponding to the formula. 4.2 Mapping from grammars to formulas We now define the function s : HRH→SLF, which maps a pair 〈Z, P〉 consisting of a set of initial graphs Z and set of productions P to a recursive let-statement let ΓP in FZ . The definition of s is given in Figure 6. The definition of s uses the subsidiary functions sq, sG, sg and se, which operate over pro- ductions, sets of graphs, graphs and edges respectively. s calls sq to construct the predicate definitions ΓP and sG to construct the let-body FZ . Proc. Doctoral Symposium ICGT 2008 12 / 18 ECEASST s[[〈Z, P〉]] def = let sq[[E QP]] in sG[[Z]] sG[[{g1, . . . , gm}]] def = sg[[g1]]∨ . . .∨sg[[gm]] sg[[〈g,t〉]] def = ∃y1 . . .∃ym. se[[e1,t]]∗ . . .∗se[[en,t]]∗emp where {e1, . . . , en} = {e ∈ Eg | l(e) 6= nil} {y1, . . . , ym} = {t(v) | v ∈Vg ∧v /∈ extg ∧t(v) 6= nil} se[[e,t]] def =                    t(v) 7→ t(v0),t(v1) where att(e) = (v, v0, v1) if lE (e) = K σ (t(v1), . . . ,t(vn)) where σ = lE (e) att(e) = (v1, . . . , vn) if lE (e) ∈ N sq[[E QR]] def = σ (x1, . . . , xn, nil) = sG[[E QR(σ )]], sq[[E QR/σ ]] where n = ari(σ ) Figure 6: Mapping from heap-graph grammars to separation logic formulas. The mapping needs to associate each node with a particular variable name (or nil). To do this, we assume that every initial graph and production right-hand side G in an input grammar has been replaced with a corresponding well-tagged graph 〈G,tG〉. For each graph G, we assume that tG is an arbitrary but fixed injective tagging function satisfying the following formula. Here n is the number of external nodes in G. t(v) =      {xi} if v ∈ extG and posG(v) = i. {nil} if a nil-labelled edge is attached to v. {r} s.t. r ∈Var\{x1 . . . xn, nil} otherwise. The tagging function associates internal nodes with arbitrary variable names, while the ith external node of a graph is given the fixed tag ‘xi’. The nil node is tagged with nil. The function sG maps a set of tagged heap-graphs to a disjunction, each disjunct of which corresponds to a single graph. These single graphs are constructed by the function sg, which takes as its input a tagged graph and constructs a separating conjunction, with each conjunct corresponding to a single hyperedge. The conjunction for a graph is wrapped in existential quantifications, which bind all variables corresponding to internal nodes. Individual conjuncts for a graph are constructed by the function se. Terminally labelled K- edges result in points-to assertions, while non-terminal edges result in calls to recursively-defined predicates. The variable arguments for the point-to assertions and predicate calls are recovered from the tagging function. The function sq takes as its input a system of equations E QR corresponding to a set of produc- tions R, and constructs the predicate definitions for a let-statement. An equation system is used 13 / 18 Volume 16 (2009) From Hyperedge Replacement to Separation Logic and Back rather than the original set of productions because sq needs to recover all the right-hand sides for a single symbol. sq constructs a single recursive definition for each nonterminal symbol in R. The set of right- hand side graphs for productions over a single symbol are mapped to a right-hand side formula by the function sG. The arguments to the predicate are fixed as x1, . . . , xn, which by the definition of the tagging function form the free variables of the right-hand side formula. 4.3 Mappings preserve semantics We want the functions g and s to preserve the semantics of formulas and grammars. That is, for every formula F ∈ SLF and heap h, we would like that h, i0, η0 |= F if and only if α(h) ∈ L(g[[F]])) (where i0 and η0 are empty, i.e. dom(i0) = dom(η0) = /0). Analogously, for every heap- graph grammar G and graph H ∈HC, we would like that H ∈L(G) if and only if α −1(H), i0, η0 |= s[[G]]. However, the first correspondence holds only for dangling-free formulas. These are formulas such that for any satisfying heap h, img(h) ⊆ dom(h). A separating conjunction of points-to assertions is dangling-free if every variable appearing on the left-hand side of a points-to asser- tion is either used on the right-hand side of a points-to assertion, or passed as a dangling-free argument to a predicate. Dangling-free formulas can be identified syntactically by tracking sets of dangling variables. Dangling-free arguments to predicates can be identified by recursion over definitions. Theorem 1 Let formula F = let Γ in P be a dangling-free separation logic formula, and let both Γ and P be let-free. Let h be a heap. Then h, i0, η0 |= F if and only if α(h) ∈ L(g[[F]]). Proof. By induction over the structure of a flat formula. See [Dod08]. Theorem 2 Let G be a heap-graph grammar. Let H ∈ HC be a graph. Then H ∈ L(G) if and only if α−1(H), η0 |= s[[G]]. Proof. By incremental proof of the result first for graphs, then sets of graphs, and finally gram- mars. See [Dod08]. Theorem 3 (Equivalent expressive power) Any class of structures defined by a dangling-free formula in SL can be generated by some heap-graph grammar modulo α , and vice versa. Proof. Function s is total over heap-graph grammars. The composition g◦flat of the flattening function flat and g is total over formulas in SL. The result follows immediately from the cor- rectness of flattening (Proposition 1) and of the correctness of g and s (Theorem 1 and Theorem 2). 5 Inexpressible separation logic operators In this section we show that the logical operators ∧ and ¬, and the primitive assertion true must be omitted from our fragment. Proc. Doctoral Symposium ICGT 2008 14 / 18 ECEASST Lemma 4 There exists HR-expressible heap-graph languages L1, L2 such that L1 ∩L2 is HR- inexpressible. Proof. It is proved in [HK89] that the class of all hyperedge-replacement languages is not closed under intersection. The proof first defines the following pair of string languages. S1 : {a n1 bn1 an2 bn2 . . . bnk−1 ank bnk | k ≥ 1∧n1 . . . nk ≥ 1} S2 : {a n1 bn2 an2 bn3 . . . bnk ank bnk+1 | k ≥ 1∧n1 . . . nk+1 ≥ 1} The proof then defines a mapping from any string w to a corresponding string graph w•. S1 and S2 are used to define the HR-expressible languages S • 1 and S • 2. The intersection between them gives the language {((anbn)k)• | k ≥ 1∧n ≥ 1}. In [DHK97] it is shown by application of the pumping lemma that this language is HR-inexpressible. To show the corresponding result for the class of heap-graph languages, we define a mapping from any string w ∈{a, b}∗ to a corresponding string heap-graph w◦. A string heap-graph for string w of length n, w = c1c2 . . . cn consists of n + 1 unlabelled nodes v1 . . . vn+1 and n K-labelled edges e1 . . . en. If character cn is an ‘a’, then the edge K has attachment points [vn, vn+1, vn+1]. If cn is a ‘b’ then the edge has attachment points [vn, vn+1, vn]. For the string ‘abaab’, the corre- sponding string heap-graph (abaab)◦ is as follows. 1 K a 2 3 1 3 K b 2 1 K a 2 3 1 K a 2 3 1 3 K b 2 1 nil The rest of the proof is then identical to [HK89]. Proposition 2 There exist formulas F1, F2 ∈SL such that no grammar g[[F1∧F2]] can be defined. Proof. Any grammar g[[F1 ∧F2]] must generate the language {α(h) | h, i0, η0 |= F1 ∧F2}. By Lemma 4 there exist heap-graph grammars G and H such that L(G)∩L(H) is HR-inexpressible. Let us construct formulas F1 = s[[G]] and F2 = s[[H]]. By the correctness of g and s (Theorems 1 and 2) L(G) = {α(h) | h, i0, η0 |= F1} and L(H) = {α(h) | h, i0, η0 |= F2}. By the seman- tics of conjunction {α(h) | h, i0, η0 |= F1 ∧F2} equals L(G)∩L(H), which is HR-inexpressible. Consequently no grammar g[[F1 ∧F2]] can be defined. Given a language of graphs L over label alphabet C, we write L for the language complement {g ∈ HC | g /∈ L}. Lemma 5 There exists HR-expressible heap-graph language L such that L is HR-inexpressible. Proof. Consequence of De Morgan’s law (A∪B) = A∩B and Lemma 4. Proposition 3 There exists a formula F ∈ SL such that no grammar g[[¬F]] can be defined. Proof. Corollary of Lemma 5. Proposition 4 No grammar g[[true]] can be defined. 15 / 18 Volume 16 (2009) From Hyperedge Replacement to Separation Logic and Back Proof. Consequence of the result of [Hab92], Chapter IV, where it is shown that any HR lan- guage of simple graphs (meaning graphs without loops or parallel edges) must have a maximum bound on the size of any clique in the language. However, any grammar g[[true]] must con- struct the language {α(h) | h, i0, η0 |= true}, which is the language of all possible heap-graphs, contradicting this result. 6 Consequences and related work We have defined a bijective mapping between the domains of heap-graphs and heaps, and we have shown that the mappings g and s are correct with respect to this mapping. This means that formulas in our fragment and heap-graph grammars can be used interchangeably as methods for defining classes of structures. An immediate consequence of Theorem 3 is that some results for HR grammars hold for formulas in SL. Most obviously, if a language of graphs is proved to be inexpressible by any hyperedge replacement grammar, then the corresponding class of heaps must also be undefinable in our fragment. For example, both the languages of grid graphs and balanced binary trees are inexpressible by any hyperedge replacement grammar (consequences of the pumping lemma and linear-growth theorem of [Hab92] respectively) as is the language of all heap-graphs (consequence of the clique-size limit discussed in §5). The most flexible result for proving inexpressibility results is the pumping lemma for hyperedge-replacement languages – see [Hab92] for an extended dis- cussion. In §3.1 we used the fact that the class of HR languages is closed under intersection with compatible properties to show that we can define a source normalisation operator. This result also holds for any formula in our separation logic fragment. So for any formula in SL, a new formula can be constructed expressing the intersection between the formula and a compatible property. Some other work has been done on the expressiveness of separation logic. The class of closed separation-logic formula is known to be of equivalent expressive power to the class of formulas in first-order logic without separating operators [Loz04]. However, in [CGZ07] it is shown that the correspondence does not hold for separation logic formulas that include logical parameters standing for formulas, and that it also does not hold between separation logic with a list segment predicate and first-order logic with such a predicate. Our results regarding expressiveness are incomparable with these results, because our frag- ment omits several operators from full separation logic but includes a more general notion of recursion. However, our translation gives us a more general framework for deriving expressive- ness results, because the class of hyperedge replacement grammars has many well-understood properties. The work of Lee et al. [LYY05] was a major inspiration for our work. In this, a semantics is given to graph grammars by mapping them to separation logic formulas. These grammars are used as the abstract domain in an automatic shape analysis. However, the mapping is only one-way, meaning no correspondence result can be derived. In addition, the grammars used are severely restricted in their expressiveness compared to our heap-graph grammars. Proc. Doctoral Symposium ICGT 2008 16 / 18 ECEASST Our fragment SL is quite close to symbolic heaps that form the basis of the Space Invader tool [YLB+08]. This suggests that our fragment, while restricted, is still suitable for practical use. Symbolic heaps are a symbolic heap representations defined in a fragment of separation logic (the exact fragment used varies with the paper cited). A symbolic heap Π | Σ as defined in [DOY06] consists of a finite set Π of equalities and a finite set Σ of heap predicates. The elements of Σ are of the form E 7→F , ls(E, F), and junk. ls(x, y) stands for a recursively-defined list segment, while junk can stand for any heap. Semantically, a symbolic heap Π | Σ where Π = {P1, P2, . . . , Pn} and Σ = {Q1, Q2, . . . , Qm} expands to a formula (P1 ∧P2∧ . . .∧Pn)∧(Q1∗Q2 ∗. . .∗Qm). Formulas in our fragment closely resemble the heap portion of a symbolic heap. We include a general notion of recursive predicate, while symbolic heaps use specific predicates defined for particular domains. For example, early work on the Space Invader tool used a simple list fragment predicate ls(x, y), while more recent work has included a nested ‘list of lists’ predicate. In both cases these predicates can be defined using our let statement. The predicate junk is equivalent to true, and so by the result given in §5 cannot be expressed by any hyperedge replacement grammar. However, junk is largely used to handle unconnected sections of the heap. Structure in the heap is specified in the junk-free fragment of symbolic heaps, which suggests that the structure-specifying properties of symbolic heaps may be similar to our fragment. All of the symbolic heap fragments include assertions (such as equality) that describe the relationships between locations. We conjecture that hyperedge replacement as defined in §3 is expressive enough to model equality and inequality. To support this, we refer to [Eng97], where it is proved that for any grammar defined using rules with repetition, a corresponding repetition- free grammar exists that defines the same language. A graph with repetition is one where nodes can appear more than once in the sequence of external nodes. A rule with repetition on its right-hand side will merge some of the attachment points of any nonterminal node to which it is applied. This seems to correspond in some sense to assertions of equality. However, proving this result remains the subject of future work. Acknowledgements: We would like to thank Matthew Parkinson, Colin Runciman and the anonymous referees for their comments on this work. Work completed while the first author was studying for a PhD at the University of York. Publication supported by EPSRC grant EP/F019394/1. Bibliography [CGZ07] C. Calcagno, P. Gardner, U. Zarfaty. Context logic as modal logic: completeness and parametric inexpressivity. In Proceedings of the 34th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. Pp. 123–134. ACM, 2007. [DHK97] F. Drewes, A. Habel, H.-J. Kreowski. Hyperedge Replacement Graph Grammars. In Rozenberg (ed.), Handbook of Graph Grammars and Computing by Graph Trans- formation. Volume I: Foundations. Chapter 2, pp. 95–162. World Scientific, 1997. 17 / 18 Volume 16 (2009) From Hyperedge Replacement to Separation Logic and Back [Dod08] M. Dodds. Graph Transformation and Pointer Structures. PhD thesis, University of York, UK, 2008. http://www.cl.cam.ac.uk/∼md466/publications/phdthesis.mdodds.pdf [DOY06] D. Distefano, P. W. O’Hearn, H. Yang. A local shape analysis based on separation logic. In 12th International Conference on Tools and Algorithms for the Construction and Analysis of Systems. Lecture Notes in Computer Science 3920, pp. 287–302. Springer, 2006. [Eng97] J. Engelfriet. Context-free graph grammars. In Handbook of formal languages, vol. 3: beyond words. Pp. 125–213. Springer, 1997. [Hab92] A. Habel. Hyperedge Replacement: Grammars and Languages. Lecture Notes in Computer Science 643. Springer, 1992. [HK89] A. Habel, H.-J. Kreowski. Filtering Hyperedge-Replacement Through Compatible Properties. In Nagl (ed.), WG. Lecture Notes in Computer Science 411, pp. 107– 120. Springer, 1989. [Loz04] E. Lozes. Separation logic preserves the expressive power of classical logic. In Pro- ceedings of the 2st Workshop on Semantics, Program Analysis, and Computing En- vironments for Memory Management. 2004. Informal proceedings. [LYY05] O. Lee, H. Yang, K. Yi. Automatic Verification of Pointer Programs Using Grammar- Based Shape Analysis. In Proceedings of the 14th European Symposium on Pro- gramming. Lecture Notes in Computer Science 3444, pp. 124–140. Springer, April 2005. [Rey02] J. C. Reynolds. Separation Logic: A logic for shared mutable data structures. In Pro- ceedings of the Seventeenth Annual IEEE Symposium on Logic in Computer Science. 2002. [Sim06] É.-J. Sims. Extending separation logic with fixpoints and postponed substitution. Theoretical Computer Science 351(2):258–275, 2006. [Tar55] A. Tarski. A lattice-theoretical fixpoint theorem and its applications. Pacific Journal of Mathematics 5:285–309, 1955. [YLB+08] H. Yang, O. Lee, J. Berdine, C. Calcagno, B. Cook, D. Distefano, P. O’Hearn. Scal- able Shape Analysis for Systems Code. In International Conference on Computer- Aided Verification. Lecture Notes in Computer Science 5123, pp. 385–398. Springer, 2008. Proc. Doctoral Symposium ICGT 2008 18 / 18 http://www.cl.cam.ac.uk/~md466/publications/phdthesis.mdodds.pdf Introduction Separation logic Fragment semantics Flattening formulas Hyperedge replacement Heap-graphs and heap-graph grammars Source normalisation and compatible properties Mapping from formulas to grammars and back Mapping from formulas to grammars Mapping from grammars to formulas Mappings preserve semantics Inexpressible separation logic operators Consequences and related work