A Unification Algorithm for GP 2 Electronic Communications of the EASST Volume 71 (2015) Graph Computation Models Selected Revised Papers from GCM 2014 A Unification Algorithm for GP 2 Ivaylo Hristakiev and Detlef Plump 17 pages Guest Editors: Rachid Echahed, Annegret Habel, Mohamed Mosbah Managing Editors: Tiziana Margaria, Julia Padberg, Gabriele Taentzer ECEASST Home Page: http://www.easst.org/eceasst/ ISSN 1863-2122 http://www.easst.org/eceasst/ ECEASST A Unification Algorithm for GP 2 Ivaylo Hristakiev∗ and Detlef Plump The University of York, UK Abstract: The graph programming language GP 2 allows to apply sets of rule schemata (or “attributed” rules) non-deterministically. To analyse conflicts of pro- grams statically, graphs labelled with expressisons are overlayed to construct critical pairs of rule applications. Each overlay induces a system of equations whose solu- tions represent different conflicts. We present a rule-based unification algorithm for GP expressions that is terminating, sound and complete. For every input equation, the algorithm generates a finite set of substitutions. Soundness means that each of these substitutions solves the input equation. Since GP labels are lists constructed by concatenation, unification modulo associativity and unit law is required. This prob- lem, which is also known as word unification, is infinitary in general but becomes finitary due to GP’s rule schema syntax and the assumption that rule schemata are left-linear. Our unification algorithm is complete in that every solution of an input equation is an instance of some substitution in the generated set. Keywords: graph programs, word unification, critical pair analysis 1 Introduction A common programming pattern in the graph programming language GP 2 [Plu12, BFPR15] is to apply a set of graph transformation rules as long as possible. To execute such a loop {r1,...,rn}! on a host graph, in each iteration an applicable rule ri is selected and applied. As rule selection and rule matching are non-deterministic, different graphs may result from the loop. Thus, if the programmer wants the loop to implement a function, a static analysis that establishes or refutes functional behaviour would be desirable. The above loop is guaranteed to produce a unique result if the rule set {r1,...,rn} is ter- minating and confluent. However, conventional confluence analysis via critical pairs [Plu05] assumes rules with constant labels whereas GP employs rule schemata (or “attributed” rules) whose graphs are labelled with expressions. Confluence of attributed graph transformation rules has been considered in [HKT02, EEPT06, GLEO12], but we are not aware of algorithms that check confluence over non-trivial attribute algebras such as GP’s which includes list concatena- tion and Peano arithmetic. The problem is that one cannot use syntactic unification (as in logic programming) when constructing critical pairs and checking their joinability, but has to take into account all equations valid in the attribute algebra. For example, [HKT02] presents a method of constructing critical pairs in the case where the equational theory of the attribute algebra is represented by a confluent and terminating term ∗ This author’s work is supported by a PhD scholarship of the UK Engineering and Physical Sciences Research Council (EPSRC) 1 / 17 Volume 71 (2015) A Unification Algorithm for GP 2 rewriting system. The algorithm first computes normal forms of the attributes of overlayed nodes and subsequently constructs the most general (syntactic) unifier of the normal forms. This has been shown to be incomplete [EEPT06, p.198] in that the constructed set of critical pairs need not represent all possible conflicts. For, the most general unifier produces identical attributes—but it is necessary to find all substitutions that make attributes equivalent in the algebra’s theory. Graphs in GP rule schemata are labelled with lists of integer and string expressions, where lists are constructed by concatenation. In host graphs, list entries must be constant values. Integers and strings are subtypes of lists in that they represent lists of length one. As a simple example, consider the program in Figure 1 for calculating shortest distances. The program expects input graphs with non-negative integers as edge labels, and arbitrary lists as node labels. There must be a unique marked node (drawn shaded) whose shortest distance to each reachable node has to be calculated. The rule schemata init and add append distances main = init; {add, reduce}! init(x: list) add(x,y: list; m,n: int) x 1 ⇒ x:0 1 x:m y 1 2 n ⇒ x:m y:m+n 1 2 n reduce(x,y: list; m,n,p: int) x:m y:p 1 2 n ⇒ x:m y:m+n 1 2 n where m+n < p Figure 1: A program calculating shortest distances to the labels of nodes that have not been visited before, while reduce decreases the distance of nodes that can be reached by a path that is shorter than the current distance. To construct the conflicts of the rule schemata add and reduce, their left-hand sides are overlayed. For example, the structure of the left-hand graph of reduce can match the following structure in two different ways: Consider a copy of reduce in which the variables have been renamed to x′, m′, etc. To match reduce and its copy differently requires solving the system of equations 〈x:m =? y′:p′, y:p =? x′:m′〉. Solutions to these equations should be as general as possible to represent all potential conflicts resulting from the above overlay. In this simple example, it is clear that the substitution σ = {x′ 7→ y, m′ 7→ p, y′ 7→ x, p′ 7→ m} is a most general solution. It gives rise to the following critical pair:1 1 For simplicity, we ignore the condition of reduce. Selected Revised Papers from GCM 2014 2 / 17 ECEASST x:p+n′ y:p 1 2 n n′ ⇐ x:m y:p 1 2 n n′ ⇒ x:m y:m+n 1 2 n n′ In general though, equations can arise that have several independent solutions. For example, the equation 〈n:x =? y:2〉 (with n of type int and x,y of type list) has the minimal solutions σ1 = {x,y 7→ empty, n 7→ 2} and σ2 = {x 7→ z:2, y 7→ n:z} where empty represents the empty list and z is a list variable. Seen algebraically, we need to solve equations modulo the associativity and unit laws AU = {x : (y : z) = (x : y) : z, empty : x = x, x : empty = x}. This problem is similar to word unification [BS01], which attempts to solve equations modulo associativity. (Some authors consider AU -unification as word unification, e.g. [Jaf90]). Solv- ability of word unification is decidable, albeit in PSPACE [Pla99], but there is not always a finite complete set of solutions. The same holds for AU-unification (see Section 3). Fortunately, GP’s syntax for left-hand sides of rule schemata forbids labels with more than one list variable. It turns out that by additionally forbidding shared list variables between different left-hand labels of a rule, rule overlays induce equation systems possessing finite complete sets of solutions. This paper is the first step towards a static confluence analysis for GP programs. In Section 3, we present a rule-based unification algorithm for equations between left-hand expressions of rule schemata. We show that the algorithm always terminates and that it is sound in that each substitution generated by the algorithm is an AU-unifier of the input equation. Moreover, the algorithm is complete in that every unifier of the input equation is an instance of some unifier in the computed set of solutions. 2 GP Rule Schemata We refer to [Plu12, BFPR15] for the definition of GP and more example programs. In this sec- tion, we define (unconditional) rule schemata which are the “building blocks” of graph programs. A graph over a label set C is a system G = (V,E,s,t,l,m), where V and E are finite sets of nodes (or vertices) and edges, s,t : E →V are the source and target functions for edges, l : V → C is the node labelling function and m : E → C is the edge labelling function. We write G (C ) for the class of all graphs over C . Figure 2 shows an example for the declaration of a rule schema. The types int and string represent integers and character strings. Type atom is the union of int and string, and list represents lists of atoms. Given lists l1 and l2, we write l1 : l2 for the concatenation of l1 and l2. The empty list is denoted by empty. In pictures of graphs, nodes or edges without label (such as the dashed edge in Figure 2) are implicitly labelled with the empty list. We equate lists of length one with their entry to obtain the syntactic and semantic subtype relationships shown in Figure 3. For example, all labels in Figure 2 are list expressions. Figure 4 gives a grammar in Extended Backus-Naur Form defining the abstract syntax of labels. The function length return the length of a variable, while indeg and outdeg access the indegree resp. outdegree of a left-hand node in the host graph. 3 / 17 Volume 71 (2015) A Unification Algorithm for GP 2 bridge(x,y: list; a: atom; n: int; s,t: string) a:x 1 n 2 y 3 s t ⇒ a 1 x:n 2 3 n∗n 3 s t Figure 2: Declaration of a GP rule schema Figure 4 defines four syntactic categories of expressions: Integer, String, Atom and List, where Integer and String are subsets of Atom which in turn is a subset of List. Category Node is the set of node identifiers used in rule schemata. Moreover, IVar, SVar, AVar and LVar are the sets of variables of type int, string, atom and list. We assume that these sets are disjoint and define Var = IVar ∪ SVar ∪ AVar ∪ LVar. The mark components of labels are represented graphically rather than textually. Each expression l has a unique smallest type, denoted by type(l), which can be read off the list atom int string char ⊆ ⊆ ⊇ ⊆ (Z∪ Char∗)∗ Z∪ Char∗ Z Char∗ Char ⊆ ⊆ ⊇ ⊆ Figure 3: Subtype hierarchy for labels Integer ::= Digit {Digit} | IVar | ‘−’ Integer | Integer ArithOp Integer | length ‘(’ LVar | AVar | SVar ‘)’ | (indeg | outdeg) ‘(’ Node ‘)’ ArithOp ::= ‘+’ | ‘-’ | ‘∗’ | ‘/’ String ::= ‘ “ ’ {Char} ‘ ” ’ | SVar | String ‘.’ String Atom ::= Integer | String | AVar List ::= empty | Atom | LVar | List ‘:’ List Label ::= List [Mark] Mark ::= red | green | blue | grey | dashed | any Figure 4: Abstract syntax of rule schema labels Selected Revised Papers from GCM 2014 4 / 17 ECEASST hierarchy in Figure 3 after l has been normalised with the rewrite rules shown at the beginning of Subsection 3.2. We write type(l1) < type(l2) or type(l1) ≤ type(l2) to compare types according to the subtype hierarchy. If the types of l1 and l2 are incomparable, we write type(l1)‖ type(l2). The values of rule schema variables at execution time are determined by graph matching. To ensure that matches induce unique “actual parameters”, expressions in the left graph of a rule schema must have a simple shape. Definition 1 (Simple expression) A simple expression contains no arithmetic operators, no length or degree operators, no string concatenation, and at most one occurrence of a list variable. In other words, simple expressions contain no unary or binary operators except list concatena- tion, and at most one occurrence of a list variable. For example, given the variable declarations of Figure 2, a:x and y:n:n are simple expressions whereas n∗2 or x:y are not simple. Our definition of simple expressions is more restrictive than that in [Plu12] because we exclude string concatenation and the unary minus. These operations (especially string concatenation) would considerably inflate the unification algorithm and its completeness proof, without posing a substantial challenge. Definition 2 (Rule schema) A rule schema 〈L, R, I〉 consists of graphs L,R in G (Label) and a set I, the interface, such that I = VL ∩ VR. All labels in L must be simple and all variables occurring in R must also occur in L. When a rule schema is graphically declared, as in Figure 2, the interface I is represented by the node numbers in L and R. Nodes without numbers in L are to be deleted and nodes without numbers in R are to be created. All variables in R have to occur in L so that for a given match of L in a host graph, applying the rule schema produces a graph that is unique up to isomorphism. Assumption 1 (Left-linearity). We assume that rule schemata 〈L, R, I〉 are left-linear, that is, the labels of different nodes or edges in L do not contain the same list variable. This assumption is necessary to ensure that the solutions of the equations resulting from over- laying two rule schemata can be represented by a finite set of unifiers. For example, without this assumption it is easy to construct two rule schemata that induce the system of equations 〈x : 1 =? y, y =? 1 : x〉. This system has solutions {x 7→ empty,y 7→ 1}, {x 7→ 1,y 7→ 1 : 1}, {x 7→ 1 : 1,y 7→ 1 : 1 : 1},... which form a infnite, minimal and compete set of solutions (See Definition 5 below). 3 Unification We start with introducing some technical notions such as substitutions, unification problems and complete sets of unifiers. Then, in Subsection 3.2, we present our unification algorithm. In Subsection 3.3, we prove that the algorithm terminates and is sound. 5 / 17 Volume 71 (2015) A Unification Algorithm for GP 2 3.1 Preliminaries A substitution is a family of mappings σ = (σX )X∈{I,S,A,L} where σI : IVar → Integer, σS : SVar → String, σA : AVar → Atom, σL : LVar → List. Here Integer, String, Atom and List are the sets of expressions defined by the GP label grammar of Figure 4. For example, if z ∈ LVar, x ∈ IVar and y ∈ SVar, then we write σ = {x 7→ x+1, z 7→ y : −x : y} for the substitution that maps x to x+1, z to y : −x : y and every other variable to itself. Applying a substitution σ to an expression t, denoted by tσ , means to replace every variable x in t by σ(x) simultaneously. In the above example, (z : −x)σ = y : −x : y : −(x+1). By Dom(σ) we denote the set {x ∈ Var | σ(x) 6= x} and by VRan(σ) the set of variables occur- ring in the expressions {σ(x) | x ∈ Var}. A substitution σ is idempotent if Dom(σ)∩ VRan(σ) = /0. The composition of two substitutions σ and θ , is defined as x(σ ◦θ) = { (xσ)θ if x ∈ Dom(σ) xθ otherwise and is an associative operation. Definition 3 (Unification problem) A unification problem is a pair of an equation and a substi- tution P = 〈s =? t,σP〉 where s and t are simple list expressions without common variables. The symbol =? signifies that the equation must be solved rather than having to hold for all values of variables. The purpose of σP is for the unification algorithm (Section 3.2) to record a partial solution. An illustration of this concept will be seen in Figure 8. In Section 2, we already assumed that GP rule schemata need to be left-linear. Now, the prob- lem of solving a system of equations {s1 = t1,s2 = t2} can be broken down to solving individual equations and combining the answers - if σ1 and σ2 are solutions to each individual equation, then σ1 ∪ σ2 is a solution to the combined problem as σ1 and σ2 do not share variables. Consider the equational axioms for associativity and unity, AU = {x : (y : z) = (x : y) : z, empty : x = x, x : empty = x} where x,y,z are variables of type list, and let =AU be the equivalence relation on expressions generated by these axioms. Definition 4 (Unifier) Given a unification problem P = 〈s =? t,σP〉 a unifier of P a is a substi- tution θ such that sθ =AU tθ and xiθ =AU tiθ for each binding {xi 7→ ti} in σP . The set of all unifiers of P is denoted by U (P). We say that P is unifiable if U (P) 6= /0. The special unification problem fail represents failure and has no unifiers. A problem P = 〈s =? t,∅〉 is initial and P = 〈∅,σP〉 is solved. Selected Revised Papers from GCM 2014 6 / 17 ECEASST A substitution σ is more general on a set of variables X than a substitution θ if there exists a substitution λ such that xθ =AU xσ λ for all x ∈ X . In this case we write σ ≦X θ and say that θ is an instance of σ on X . Substitutions σ and θ are equivalent on X , denoted by σ =X θ , if σ ≦X θ and θ ≦X σ . Definition 5 (Complete set of unifiers [Plo72]) A set C of substitutions is a complete set of unifiers of a unification problem P if 1. (Soundness) C ⊆ U (P), that is, each substitution in C is a unifier of P, and 2. (Completeness) for each θ ∈ U (P) there exists σ ∈ C such that σ ≦X θ , where X = Var(P) is the set of variables occurring in P. Set C is also minimal if each pair of distinct unifiers in C are incomparable with respect to ≦X . If a unification problem P is not unifiable, then the empty set ∅ is a minimal complete set of unifiers of P. For simplicity, we replace =? with = in unification problems from now on. Example 1 The minimal complete set of unifiers of the problem 〈a : x = y : 2〉 (where a is an atom variable and x,y are list variables) is {σ1,σ2} with σ1 = {a 7→ 2, x 7→ empty, y 7→ empty} and σ2 = {x 7→ z : 2, y 7→ a : z}. We have σ1(a : x) = 2 : empty =AU 2 =AU empty : 2 = σ1(y : 2) and σ2(a : x) = a : z : 2 = σ2(y : 2). Other unifiers such as σ3 = {x 7→ 2, y 7→ a} are instances of σ2. 3.2 Unification Algorithm We start with some notational conventions for the rest of this section: • L,M stand for simple expressions, • x,y,z stand for variables of any type (unless otherwise specified), • a,b stand for (i) simple string or integer expressions, or (ii) string, integer or atom variables • s,t stand for (i) simple string or integer expressions, or (ii) variables of any type 7 / 17 Volume 71 (2015) A Unification Algorithm for GP 2 Preprocessing. Given a unification problem P = 〈s =? t,σ〉, we rewrite the terms in s and t using the reduction rules L : empty → L and empty : L → L where L ranges over list expressions. These reduction rules are applied exhaustively before any of the transformation rules. For example, x : empty : 1 : empty → x : 1 : empty → x : 1. We call this process normalization. In addition, the rules are applied to each instance of a trans- formation rule (that is, once the formal parameters have been replaced with actual parameters) before it is applied, and also after each transformation rule application. Transformation rules. Figure 5 shows the transformation rules, the essence of our approach, in an inference system style where each rule consists of a premise and a conclusion. Remove: deletes trivial equations Decomp1: syntactically equates a list variable with an atom expression or list variable Decomp1’: syntactically equates an atom variable with an expression of lesser type Decomp2/2’: assigns a list variable to start with another list variable Decomp3: removes identical symbols from the head Decomp4: solves an atom variable Subst1: solves a variable Subst2: assigns empty to a list variable Subst3: assigns an atom prefix to a list variable Orient1/2: moves variables to left-hand side Orient3: moves variables of larger type to left-hand side Orient4: moves a list variable to the left-hand side The rules induce a transformation relation ⇒ on unification problems. In order to apply any of the rules to a problem P, the problem part of its premise needs to be matched onto P. Subse- quently, the boolean condition of the premise is checked and the rule instance is normalized so that its premise is identical to P. For example, the rule Orient3 can be matched to P = 〈a : 2 = m, σ〉 (where a and m are variables of type atom and list, respectively) by setting y 7→ a, x 7→ m, M 7→ 2, and L 7→ empty. The rule instance and its normal form are then 〈a : 2 = m : empty,σ〉 〈m : empty = a : 2,σ〉 and 〈a : 2 = m,σ〉 〈m = a : 2,σ〉 where the conclusion of the normal form is the result of applying Orient3 to P. Selected Revised Papers from GCM 2014 8 / 17 ECEASST 〈L = L,σ〉 〈∅,σ〉 Remove 〈x : L = s : M,σ〉 L 6= empty type(x) = list 〈L = M, σ ◦{x 7→ s}〉 Decomp1 〈x : L = a : M,σ〉 L 6= empty type(a) ≤ type(x) ≤ atom 〈(L = M){x 7→ a}, σ ◦{x 7→ a}〉 Decomp1′ 〈x : L = y : M,σ〉 L 6= empty type(x) = list type(y) = list x′ is a fresh list variable 〈x′ : L = M,σ ◦{x 7→ y : x′}〉 Decomp2 〈x : L = y : M,σ〉 L 6= empty type(x) = list type(y) = list y′ is a fresh list variable 〈L = y′ : M,σ ◦{y 7→ x : y′}〉 Decomp2′ 〈s : L = s : M,σ〉 〈L = M,σ〉 Decomp3 〈x = a : y,σ〉 type(x) = atom type(y) = list 〈empty = y,σ ◦{x 7→ a}〉 Decomp4 〈x = L,σ〉 x /∈ Var(L) type(x) ≥ type(L) 〈∅,σ ◦{x 7→ L}〉 Subst1 〈x : L = M,σ〉 L 6= empty type(x) = list 〈L = M,σ ◦{x 7→ empty}〉 Subst2 〈x : L = a : M},σ〉 L 6= empty x′ is a fresh list variable type(x) = list 〈x′ : L = M,σ ◦{x 7→ a : x′}〉 Subst3 〈a : M = x : L,σ〉 a is not a variable 〈x : L = a : M,σ〉 Orient1 〈x : L = y,σ〉 L 6= empty type(x) = type(y) 〈y = x : L,σ〉 Orient2 〈y : M = x : L,σ〉 type(y) < type(x) 〈x : L = y : M,σ〉 Orient3 〈empty = x : L,σ〉 type(x) = list 〈x : L = empty,σ〉 Orient4 Figure 5: Transformation rules 9 / 17 Volume 71 (2015) A Unification Algorithm for GP 2 〈x = L,σ〉 x ∈ Var(L) x 6= L type(x) = list fail Occur 〈a : L = empty,σ〉 fail Clash2 〈a : L = b : M,σ〉 a 6= b Var(a) = /0 = Var(b) fail Clash1 〈empty = a : L,σ〉 fail Clash3 〈x = L,σ〉 type(x)‖ type(L) fail Clash4 Figure 6: Failure rules Showing that a unification problem has no solution can be a lengthy affair because we need to compute all normal forms with respect to ⇒. Instead, the rules Occur and Clash1 to Clash4, shown in Figure 6, introduce failure. Failure cuts off parts of the search tree for a given problem P. This is because if P ⇒ fail, then P has no unifiers and it is not necessary to compute another normal form. Effectively, the failure rules have precedence over the other rules. They are justified by the following lemmata. Lemma 1 A normalised equation x = L with x 6= L has no solution if L is a simple expression, x ∈ Var(L) and type(x) = list. Proof. Since x ∈ Var(L) and x 6= L, L is of the form s1 : s2 : ... : sn with n ≥ 2 and x ∈ Var(si) for some 1 ≤ i ≤ n. As L is normalised, none of the terms si contains the constant empty. Also, since L is simple, it contains no list variables other than x and x is not repeated. It follows σ(x) 6=AU σ(L) for every substitution σ . Lemma 2 Equations of the form a : L = empty or empty = a : L have no solution if a is an atom expression. Lemma 3 An equation a : L = b : M with a 6= b has no solution if a and b are atom expressions without variables. The algorithm. The unification algorithm in Figure 7 starts by normalizing the input equation, as explained above. It uses a queue of unification problems to search the derivation tree of P with respect to ⇒ in a breadth-first manner. The first step is to put the normalized problem P on the queue. The variable next holds the head of the queue. If next is in the form 〈∅,σ〉, then σ is a unifier of the original problem and is added to the set U of solutions. Otherwise, the next step is to construct all problems P′ such that next ⇒ P′. If P′ is fail, then the derivation tree below next is ignored, otherwise P′ gets normalized and enqueued. Selected Revised Papers from GCM 2014 10 / 17 ECEASST Unify(P) : U := /0 create empty queue Q of unification problems normalize P Q.enqueue(〈P,∅〉) while Q is not empty next := Q.dequeue() if next is in the form 〈∅,σ〉 U := U∪{σ} else if next ; fail foreach P′ such that next ⇒ P′ normalize P′ Q.enqueue(P′) end foreach end if end if end while return U Figure 7: Unification algorithm An example tree traversed by the algorithm is shown in Figure 8. Nodes are labelled with unification problems and edges represent applications of transformation rules. The root of the tree is the problem 〈y : 2 = a : x〉 to which the rules Decomp1, Subst2 and Subst3 can be applied. The three resulting problems form the second level of the search tree and are processed in turn. Eventually, the unifiers σ1 = {x 7→ 2, y 7→ a} σ2 = {x 7→ y ′ : 2, y 7→ a : y′} σ3 = {a 7→ 2, x 7→ empty, y 7→ empty} are found, which represent a complete set of unifiers of the initial problem. Note that the set is not minimal because σ1 is an instance of σ2. The algorithm is similar to the A-unification (word unification) algorithm presented in [Sch92] which looks at the head of the problem equation. That algorithm terminates for the special case that the input problem has no repeated variables, and is sound and complete. Our approach can be seen as an extension from A-unification to AU-unification, to handle the unit equations, and presented in the rule-based style of [BS01]. In addition, our algorithm deals with GP’s subtype system. 3.3 Termination and Soundness We show that the unification algorithm terminates if the input problem contains no repeated list variables, where termination of the algorithm follows from termination of the relation ⇒. 11 / 17 Volume 71 (2015) A Unification Algorithm for GP 2 { y : 2 = a : x ∅ } { 2 = x y 7→ a } { x = 2 y 7→ a }    ∅ y 7→ a x 7→ 2    solved Subst1 Orient1 Decomp1 { 2 = a : x y 7→ empty } { a : x = 2 y 7→ empty }    x = empty y 7→ empty a 7→ 2           ∅ y 7→ empty a 7→ 2 x 7→ empty        solved Subst1 Decomp1′ Orient1 Subst2 { y′ : 2 = x y 7→ a : y′ } { x = y′ : 2 y 7→ a : y′ }    ∅ y 7→ a : y′ x 7→ y′ : 2    solved Subst1 Orient2 { y′′ : 2 = empty y 7→ a : x : y′′ } { 2 = empty y 7→ a : x } fail Clash2 Subst2 Decomp2    2 = x′ y 7→ a : y′ x 7→ y′ : x′       x′ = 2 y 7→ a : y′ x 7→ y′ : x′       ∅ y 7→ a : y′ x 7→ y′ : 2    solved Subst1 Orient1 Decomp2′ Subst3 Figure 8: Unification example Selected Revised Papers from GCM 2014 12 / 17 ECEASST Theorem 1 (Termination) If P is a unification problem without repeated list variables, then there is no infinite sequence P ⇒ P1 ⇒ P2 ⇒ ... Proof. Define the size |L| of an expression L by • 0 if L = empty, • 1 if L is an expression of category Atom (see Figure 4) or a list variable, • |M|+ |N|+ 1 if L = M : N. We define a lexicographic termination order by assigning to a unification problem P = 〈L = M,σ〉 the tuple (n1,n2,n3,n4), where • n1 is the size of P, that is, n1 = |L|+ |M|; • n2 = { 0 if L starts with a variable 1 otherwise • n3 = { 1 if type(x) > type(y) 0 otherwise where x and y are the starting symbols of L and M • n4 = |L| The table in Figure 9 shows that for each transformation step P ⇒ P′, the tuple associated with P′ is strictly smaller than the tuple associated with P in the lexicographic order induced by the components n1 to n4. n1 n2 n3 n4 Subst1−3 > Decomp1−4 > Remove > Orient1 = > Orient4 = > Orient3 = = > Orient2 = = = > Figure 9: Lexicographic termination order For most rules, the table entries are easy to check. All the rules except for Orient decrease the size of the input equation s =? t. The Orient rules keep the size equal, but move variables and expressions of a bigger type to the left-hand side. 13 / 17 Volume 71 (2015) A Unification Algorithm for GP 2 Lemma 4 If P ⇒ P′, then U (P) ⊇ U (P′) . Proof. We show that for each transformation rule, a unifier θ of the conclusion unifies the premise. Note that for Remove, Decomp3 and Orient1-4, we have that σP′ = σP . • Remove θ ∈ U (〈∅,σP′〉) ⇐⇒ θ ∈ U (〈∅,σP〉) ⇐⇒ θ ∈ U (〈∅,σP〉)∧ Lθ = Lθ ⇐⇒ θ ∈ U (〈L = L,σP〉) • Decomp3 θ ∈ U (〈L = M,σP′〉) ⇐⇒ θ ∈ U (〈L = M,σP〉) ⇐⇒ θ ∈ U (〈L = M,σP〉)∧sθ = sθ ⇐⇒ θ ∈ U (〈s : L = s : M,σP〉) • Decomp1 - We have xθ = sθ and Lθ = Mθ . Then (x : L)θ = (s : L)θ = (s : M)θ as required. • Decomp1’ - We have xθ = aθ and Lθ = Mθ . Then (x : L)θ = (a : L)θ = (a : M)θ as required. • Decomp2 - We have xθ = (y : x′)θ and (x′ : L)θ = Mθ . Then (x : L)θ = (y : x′ : L)θ = (y : M)θ as required. • Decomp2’ - We have yθ = (x : y′)θ and Lθ = (y′ : M)θ . Then (y : M)θ = (x : y′ : M)θ = (x : L)θ as required. • Decomp4 - We have xθ = aθ and yθ = (empty)θ = empty. Then (a : y)θ = (x : y)θ = (x : empty)θ = xθ as required. • Subst1 - We have xθ = Lθ θ ∈ U (〈∅,σS′〉) ⇐⇒ θ ∈ U (〈∅,σS ◦(x 7→ L)〉) ⇐⇒ θ ∈ U (〈∅,σS ◦(x 7→ L)〉)∧xθ = Lθ ⇐⇒ θ ∈ U (〈x = L,σS〉) • Subst2 - We have xθ = (empty)θ and Lθ = Mθ . Then (x : L)θ = (empty : L)θ = Lθ = Mθ as required. • Subst3 - We have xθ = (a : x′)θ and (x′ : L)θ = Mθ . Then (x : L)θ = (a : x′ : L)θ = (a : M)θ as required. • Orient1-4 - Since =AU is an equivalence relation and hence symmetric, a unifier of the conclusion is also a unifier of the premise. Selected Revised Papers from GCM 2014 14 / 17 ECEASST Theorem 2 (Soundness) If P ⇒+ P′ with P′ = 〈∅,σP′〉 in solved form, then σP′ is a unifier of P. Proof. We have that σP′ is a unifier of P ′ by definition. A simple induction with Lemma 4 shows that σP′ must be a unifier of P. 3.4 Completeness In order to prove that our algorithm is complete, by Definition 5 we have to show that for any unifier δ , there is a unifier in our solution set that is more general. Our proof involves using a selector algorithm that takes a unification problem 〈s =? t〉 together with an arbitrary unifier δ , and outputs a path of the unification tree associated with a more general unifier than δ . This is very similar to [Sie78] where completeness of a A-unification algorithm is shown via such selector. Due to space restrictions the following lemma is given without proof. For the selector algo- rithm and proof (comprising of an extra 9 pages) see the long version of this paper [HP14]. Lemma 5 (Selector Lemma) There exists an algorithm Select(δ ,s =? t) that takes an equation s =? t and a unifier δ as input and produces a sequence of selections B = (b1,...,bk) such that: • Unify(s =? t) has a path specified by B . • For all selections b ∈ B: if σ is the substitution corresponding to b, then there exists an instantiation λ such that σ ◦ λ ≤ δ . For example, consider the unification problem 〈y : 2 = a : x〉 and δ = (x 7→ 1 : 2,y 7→ a : 1) as a unifier. The unification tree was shown in Figure 8. Select(δ ,y : 2 = a : x) would produce selections (Subst3, Decomp2’, Orient1, Subst1), which corresponds to the right-most path in the tree. The unifier at the end of this path is σ = (x 7→ y′ : 2,y 7→ a : y′) which is more general than δ by instantiation λ = (y′ 7→ 1). Now we are able to state our completeness result, which follows directly from Lemma 5. Theorem 3 (Completeness) For every unifier δ of a unification problem 〈s =? t〉, there exist a unifier σ generated by Unify such that σ ≤ δ . 4 Conclusion This paper presents groundwork for a static confluence analysis of GP programs. We have con- structed a rule-based unification algorithm for systems of equations with left-hand expressions of rule schemata, and have shown that the algorithm always terminates and is sound. Moreover, we have proved completeness in that every unifier of the input problem is an instance of some unifier in the computed set of solutions. 15 / 17 Volume 71 (2015) A Unification Algorithm for GP 2 Future work includes establishing a Critical Pair Lemma in the sense of [Plu05]; this entails developing a notion of independent rule schema applications, as well as restriction and embed- ding theorems for derivations with rule schemata. Another topic is to consider critical pairs of conditional rule schemata (see [EGH+12]). In addition, since critical pairs contain graphs labelled with expressions, checking joinability of critical pairs will require sufficient conditions under which equivalence of expressions can be decided. This is because the theory of GP’s label algebra includes the undecidable theory of Peano arithmetic. Acknowledgements: We thank the anonymous referees of GCM 2014 for their comments on previous versions of this paper. Bibliography [BFPR15] C. Bak, G. Faulkner, D. Plump, C. Runciman. A Reference Interpreter for the Graph Programming Language GP 2. In Proc. Graphs as Models (GaM 2015). Electronic Proceedings in Theoretical Computer Science 181, pp. 48–64. 2015. [BS01] F. Baader, W. Snyder. Unification Theory. In Robinson and Voronkov (eds.), Hand- book of Automated Reasoning. Pp. 445–532. Elsevier and MIT Press, 2001. [EEPT06] H. Ehrig, K. Ehrig, U. Prange, G. Taentzer. Fundamentals of Algebraic Graph Trans- formation. Monographs in Theoretical Computer Science. Springer-Verlag, 2006. [EGH+12] H. Ehrig, U. Golas, A. Habel, L. Lambers, F. Orejas. M-Adhesive Transformation Systems with Nested Application Conditions. Part 2: Embedding, Critical Pairs and Local Confluence. Fundamenta Informaticae 118(1):35–63, 2012. [GLEO12] U. Golas, L. Lambers, H. Ehrig, F. Orejas. Attributed Graph Transformation with Inheritance: Efficient Conflict Detection and Local Confluence Analysis Using Ab- stract Critical Pairs. Theoretical Computer Science 424:46–68, 2012. [HKT02] R. Heckel, J. M. Küster, G. Taentzer. Confluence of Typed Attributed Graph Trans- formation Systems. In Proc. International Conference on Graph Transformation (ICGT 2002). Lecture Notes in Computer Science 2505, pp. 161–176. Springer- Verlag, 2002. [HP14] I. Hristakiev, D. Plump. A Unification Algorithm for GP (Long version). http://www.cs.york.ac.uk/plasma/publications/pdf/HristakievPlump.Full.GCM14.pdf, 2014. [Jaf90] J. Jaffar. Minimal and Complete Word Unification. Journal of the ACM 37(1):47–85, 1990. Selected Revised Papers from GCM 2014 16 / 17 http://www.cs.york.ac.uk/plasma/publications/pdf/HristakievPlump.Full.GCM14.pdf ECEASST [Pla99] W. Plandowski. Satisfiability of Word Equations with Constants is in PSPACE. In Symposium on Foundations of Computer Science (FOCS 1999). Pp. 495–500. IEEE Computer Society, 1999. [Plo72] G. Plotkin. Building-in Equational Theories. Machine intelligence 7(4):73–90, 1972. [Plu05] D. Plump. Confluence of Graph Transformation Revisited. In Middeldorp et al. (eds.), Processes, Terms and Cycles: Steps on the Road to Infinity: Essays Dedi- cated to Jan Willem Klop on the Occasion of His 60th Birthday. Lecture Notes in Computer Science 3838, pp. 280–308. Springer-Verlag, 2005. [Plu12] D. Plump. The Design of GP 2. In Proc. International Workshop on Reduction Strategies in Rewriting and Programming (WRS 2011). Electronic Proceedings in Theoretical Computer Science 82, pp. 1–16. 2012. [Sch92] K. U. Schulz. Makanin’s Algorithm for Word Equations: Two Improvements and a Generalization. In Proc. Word Equations and Related Topics (IWWERT ’90). Lecture Notes in Computer Science 572, pp. 85–150. Springer-Verlag, 1992. [Sie78] J. Siekmann. Unification and Matching Problems. PhD thesis, University of Essex, 1978. 17 / 17 Volume 71 (2015) Introduction GP Rule Schemata Unification Preliminaries Unification Algorithm Termination and Soundness Completeness Conclusion