Preserving constraints in horizontal model transformations Electronic Communications of the EASST Volume 29 (2010) Proceedings of the Ninth International Workshop on Graph Transformation and Visual Modeling Techniques (GT-VMT 2010) Preserving constraints in horizontal model transformations Paolo Bottoni, Andrew Fish, Francesco Parisi Presicce 14 pages Guest Editors: Jochen Küster, Emilio Tuosto Managing Editors: Tiziana Margaria, Julia Padberg, Gabriele Taentzer ECEASST Home Page: http://www.easst.org/eceasst/ ISSN 1863-2122 http://www.easst.org/eceasst/ ECEASST Preserving constraints in horizontal model transformations Paolo Bottoni1, Andrew Fish2, Francesco Parisi Presicce1 1 Dipartimento di Informatica, ”Sapienza” Università di Roma, Italy, 2Computing, Mathematical and Information Sciences, University of Brighton, UK Abstract: Graph rewriting is gaining credibility in the model transformation field, and tools are increasingly used to specify transformation activities. However, their use is often limited by special features of graph transformation approaches, which might not be familiar to experts in the modeling domain. On the other hand, trans- formations for specific domains may require special constraints to be enforced on transformation results. Preserving such constraints by manual definition of graph transformations can be a cumbersome and error-prone activity. We explore the prob- lem of ensuring that possible violations of constraints following a transformation are repaired in a way coherent with the intended meaning of the transformation. In par- ticular, we consider the use of transformation units within the DPO approach for intra-model transformations, where the modeling language is expressed via a type graph and graph conditions. We derive additional rules in a unit from a declarative rule expressing the principal objective of the transformation, so that the constraints set by the type graph and the graph conditions hold after the application of the unit. The approach is illustrated with reference to a diagrammatic reasoning system. Keywords: DPO, automatic generation, model transformation. 1 Introduction Graph rewriting-based tools are increasingly used in the field of model transformation. However, their use is often limited by the special features of the different graph transformation approaches, which might not be familiar to experts in the modeling domain. On the other hand, transforma- tions for specific domains may require constraints to be enforced on the results of the transfor- mation. In this paper we explore the problem of ensuring that possible violations of constraints are managed in a way coherent with the intended meaning of the transformation. We consider horizontal (or in-place) model transformations which destructively update a model expressed in a given language, for the case where the modeling language is expressed via a type graph and a set of graph conditions. In particular, we study transformations in reasoning pro- cesses deriving inferences via logical steps creating or deleting model elements. While modelers are generally clear on what they want to achieve by defining a transformation, the evaluation of all of its consequences may be complex, and the definition of the implied preserving or enforcing actions cumbersome and error-prone. We propose an approach to the automatic construction of transformation units achieving the effect of an intended model transformation while ensuring that all conditions are satisfied at the end of the unit if they held at its start. We consider transformations consisting of the creation or deletion of elements of a specific type, expressed as principal declarative rules. As their appli- 1 / 14 Volume 29 (2010) Preserving constraints cation may violate some conditions, they have to be applied in a proper (condition preserving) context, or (condition enforcing) repair actions have to be taken to restore the satisfaction of such conditions. Hence, additional rules are defined, derived from the principal one and the conditions to be enforced. The approach is illustrated with reference to a diagrammatic reasoning system. Paper organisation. Section 2 discusses related work on constraint preservation in graph trans- formation, and Section 3 provides the relevant formal notions. Section 4 introduces Spider Graphs (SGs) as running example, before presenting the approach in Section 5 and applying it to SGs in Section 6. Finally, Section 7 draws conclusions and points to future developments. 2 Related work Rensink and Kuperus have exploited the notion of nested graphs to deal with the amalgamated application of rules to all matches of a rule. In [RK09], they define a language to specify nested graph formulae. A match can be found from a nested graph rule to a graph satisfying a formula, according to a given morphism, and the application of a composite rule ensues. Their approach is focused on avoiding control expressions when all the matches of a rule have to be applied, while we focus here on preserving constraints with reference to a single match. Bottoni et al. have defined methods to extend single declarative rules for model transforma- tion so that they comply with specific patterns defining consistency of interpretation in triple graphs [BGL08]. They define completions of single rules with respect to several patterns, while we are interested here in constructing several rules, navigating along different sets of constraints. Taentzer et al. have proposed the management of inconsistencies among different viewpoints of a model in distributed graph rewriting. For example, the resolve strategy requires the def- inition of the right-hand sides of rules to be applied when the left-hand side identifying the inconsistency is matched [GMT99]. The detection of inconsistencies between rules representing different model transformations has been attacked by static analysis methods in [HHT02]. Sim- ilarly, Münch et al. have added repair actions to rules in case some post-conditions are violated by rule application [MSW00]. In all these cases, actions were modeled through single rules. Habel and Pennemann [HP09] unify theories about application conditions from [EEHP06] and nested graph conditions from [Ren04], lifting them to high-level transformations. They transform rules to make them preserve or enforce both universal and existential conditions. Their approach leads to the generation of a single rule incorporating several application conditions derived from different conditions with reference to the possible matches of the rule on host graphs. In his dissertation [Pen09], Pennemann expands on the topic, also introducing programs with interfaces, analogous to transformation units, but allowing passing of matches. In [OEP08], Orejas et al. define a logic of graph constraints to allow the use of constraints for language specification, and to provide rules for proving satisfaction of clausal forms. The idea of introducing basic rules derived from entities and associations defined in a meta- model is exploited in [BQV06] to define constraints on the interactive composition of complex rules, by allowing their presence in the rule left or right-hand sides only in accordance with their roles in the meta-model, where only the abstract syntax is taken as a source of constraints. Ehrig et al. describe a procedure, exploiting layers, which derives a grammar to generate (rather than transform) instances of the language defined by a meta-model with multiplici- Proc. GT-VMT 2010 2 / 14 ECEASST ties [EKTW06]. Satisfaction of OCL constraints is checked a posteriori on a generated instance. 3 Background For a graph G = (V (G), E(G), s,t), V (G) is the set of nodes, E(G) ⊂ V (G)×V (G) the set of edges and s,t : E →V the source and target functions. In a type graph T G = (VT , ET , sT ,t T ), VT and ET are sets of node and edge types, while sT : ET →VT and t T : ET →VT define source and target node types for each edge type. G is typed on T G via a graph morphism type : G → T G, where typeV : V →VT and typeE : E → ET preserve sT and t T , i.e. typeV (s(e)) = sT (typeE (e)) and typeV (t(e)) = t T (typeE (e)). |V (G) |t is the number of nodes of type t ∈VT in G. A DPO rule [EEPT06] consists of three graphs: left- and right-hand side (L and R) and inter- face graph K. Two morphisms1 l : K → L and r : K → R model the embedding of K (containing the elements preserved by the rule) in L and R. Figure 1 shows a DPO direct derivation diagram. Square (1) is a pushout (i.e. G is the union of L and D through their common elements in K), modeling the deletion of the elements of L not in K, while pushout (2) adds the new elements, i.e. those present in R but not in K. Figure 1 also illustrates the notion of negative application condition (NAC), as the association of a set of morphisms ni : L → Ni, also noted NAC −→n← L, with a rule. A rule is applicable on G through a match m : L → G if there is no morphism qi : Ni → G, with Ni in NAC, commuting with m (i.e. qi ◦ni = m). We exploit the partial order ≤ induced, up to isomorphisms, by monomorphisms on the set of graphs, i.e. g1 ≤ g2 ⇔∃m : g1 ↪→ g2. Nk qk // 6= . . . N1 q1 ,, 6= Ln1 oo nktt m �� (1) K (2) loo r // k �� R m∗ �� G Dfoo g // H Figure 1: DPO Direct Derivation Diagram for rules with NAC. Graph conditions allow the specification of models by forbidding the appearance of certain subgraphs, or by enforcing others to appear in given contexts. We use here a class of conditions Q similar to those in [HP09], where a condition over a graph A is either of the form true or of the form ∃(a, q), with a : A → Q a morphism from A to some graph Q and q a condition over Q. Conditions are also obtained by using the Boolean connectives ¬ and ∨, and can be written in the form ∀(a, q), equivalent to ¬∃(a,¬q). We assume that all conditions in a set Θ ⊂ Q differ for the a morphism, so that (a1, q1), (a2, q2) ∈ Θ ⇒ (A1 6' A2)∨(Q1 6' Q2). We will also use the short forms ∃(Q) for ∃(a : /0 → Q,true) and @(Q) for ¬∃(a : /0 → Q,true). We restrict here to positive conditions of types ∃(Q) or ∀(a : /0 → Q, q), noted ∀(Q, q) with q = ∨ j∈J q j : Q →Wj a disjunction of existential conditions. In this case, all the conditions of the form ∃(Qi) ∈ Θ can be collapsed into a single condition ∃Q, with Q the colimit of all Qi on the diagram constructed with all pairwise maximal common subgraphs. Simple negative conditions have the form @(Q). Definition 1 Given a graph G, we say: 1 In this paper, when we speak of morphisms, we will always consider them injective. 3 / 14 Volume 29 (2010) Preserving constraints • A morphism m : X → G satisfies a condition C, (m |= C), iff one of the following holds: 1. C = true. 2. C = ∃(Y ) and Y ≤ X . 3. C = ∀(X , ∨ j∈J q j : X →Wj) and ∃m j : m(X ) →Wj s.t. q j = m j ◦m for some q j. 4. C = @(Y ) and Y 6≤ X . 5. C = C1 ∨C2 and m |= C1 or m |= C2. • A graph G satisfies C (G |= C), iff one of the following holds: 1. C = true. 2. C = ∃(Y ) and there exists m : Y → G s.t. m |= C. 3. C = ∀(X , q) and for each m : X → G, m |= C. 4. C = @(Y ) and there is no morphism m : Y → G. 5. C = C1 ∨C2 and G |= C1 or G |= C2. We say that a graph G typed on T G is a model for Θ, noted G |= Θ, if for each Ci ∈ Θ, G |= Ci. We assume Θ to be a consistent set of conditions, whose models are finite non-empty graphs; in particular, simple graphs, with no two instances of the same edge type between two nodes. Transformation units control rule application through control words over rule names [KKS97]. Given: 1) G the class of typed graphs; 2) R the class of DPO rules with NACs on G ; 3) =⇒ the DPO derivation relation; 4) E a class of graph expressions (here defined by type graphs and graph conditions), where the semantics of an expression e is a subclass sem(e)⊂G ; 5) W a class of control words over identifiers of rules in R exploiting single rules, the sequential construct ‘;’, the iteration construct w∗, with w ∈ W , the alternative choice ‘|’; a transformation unit is a construct T U = (e1, e2, P, imp, w), with e1, e2 ∈ E initial and terminal graph class expressions, P ⊂ R a set of DPO rules, imp a set of references to other, imported, units, whose rules can be used in the current one, and w ∈ W a control word enabling rules from P, and units from imp, to be applied. TUs have a transactional behaviour, i.e. a unit succeeds iff it can be executed according to the control condition; it fails otherwise. The semantics of a T U is the set sem(T U ) = {(g1, g2) | g1 ∈ sem(e1), g2 ∈ sem(e2), g1 T U↓ =⇒ g2}, where ↓ indicates successful termination. 4 A Running Example: Spider Diagrams and Spider Graphs Spider Diagrams are a reasoning system based on Euler diagrams. Several variants exist, differ- ing in syntax and semantics [HMT+01]. We adopt a simplified version, based on Venn, rather than Euler, diagrams and omitting shading and strands. We first provide an indication of the con- crete syntax of the diagrams and an informal semantics. Then we propose a graph-based abstract model for them, called Spider Graphs, which differs from the usual algebraic abstract models and is in fact slightly closer to the concrete model, even modelling spider’s feet. Let C = {C1, . . . ,Cn} be a collection of simple closed curves in the plane with finitely many points of intersection between curves. A zone is a region of the form X1 ∩···∩Xn, where Xi ∈ Proc. GT-VMT 2010 4 / 14 ECEASST {int(Ci), ext(Ci)}, the interior of Ci or the exterior of Ci, for i ∈{1, . . . , n}. If each of the 2n possible zones of C are non-empty and connected then C is a Venn diagram (see [Rus97] for more details). Each zone z defines a unique partition of the set C, according to whether z is inside or outside a curve. Two zones are called twins if their inside and outside relations are switched for exactly one curve. In this paper, a Spider Diagram is a Venn diagram whose curves are labelled, together with extra syntax called spiders, which are trees whose vertices (called feet) are placed in unique zones. The set of zones containing a spider’s feet is called its habitat. Special arcs, called ties, can be drawn between feet of different spiders in the same zone. Intuitively, each curve represents a given set (indicated by the label) and each zone represents some set intersection. A spider indicates the existence of an element within the set determined by its habitat, whilst a tie between a pair of feet of different spiders within a zone indicates equality of elements, if both spiders represent an element in the set represented by the zone. Figure 2 (left) shows an example of a Spider Diagram, with two curves {A, B} and four zones described by {({A},{B}), ( /0,{A, B}), ({B},{A}), ({A, B}, /0)}. Here, these zones are the four minimal region of the plane determined by the curves; for example, the zone described by ({A},{B}) is the region int(A)∩ext(B) which is inside A but outside B. The habitat of spider s is the set of zones {({A},{B}), ({A, B}, /0)}, while that of t is the singleton {({A, B}, /0)}. In- formally, the diagram semantics is: there are two sets A and B, there exists an element named s in A and an element named t in A∩B. Moreover, if s is in A∩B then s = t. Figure 2: A Spider Diagram on the left, with the corresponding Spider Graph on the right. We provide here an abstract graph-based model of a Spider Diagram, called a Spider Graph, not taking into account its concrete geometry. Since we are interested here only in syntactic aspects, we do not consider the labeling of the curves. We obtain the type graph of Figure 3 (left), where nodes represent the diagram elements Curve, Foot, Spider and Zone, and edges represent relations between them. A twin edge indicates that two zones are twins w.r.t. some curve and an inside/outside edge indicates whether a curve contains/excludes a zone, respectively. In Figure 2 (right) the Spider Graph associated with the Spider diagram on the left is shown. The names of the nodes show the correspondence with the objects in the diagram. We have two curve nodes in each possible relation with four zones2. For ease of reading, the zone nodes are given names consisting of a list of the lower case letters corresponding to the upper case letters used as names of the curves the zones are inside, and we use O for the name of the node corresponding to the zone outside all curves in the diagram. Zone node pairs ab and b, and O and a are twinned due to curve A, whilst ab and a, and O and b are twinned due to curve B. We now present the conditions completing the definition of the class of Spider Graphs. Fig- 2 To keep the graph simple, we have omitted the outside edges, which are complementary to the inside ones. 5 / 14 Volume 29 (2010) Preserving constraints Figure 3: The type graph (left) and negative conditions (right) for Spider Graphs. ure 3 (right) shows a set of conditions of the form @Q, presented as forbidden graphs. They prevent duplication or inconsistency of information and state the uniqueness of relations be- tween zones and curves. Moreover, we assume the existence of all negative conditions forcing the graphs to be simple. We omit the direction of edges and their labels, when understood from the type graph, and use the abbreviations i and o for the inside/outside case. The remaining conditions force the existence of a partition of the set of curves for all zones, and require suit- able contexts for zones and feet. We present them adopting a visual syntax where a condition ∃(a : A → Q, q) is represented by a box, separated into two parts by a horizontal line, with the top part containing a depiction of the morphism a and the bottom part containing a box depicting the condition q on Q. An empty bottom box corresponds to true. Each condition box has an exter- nal tab containing either quantifier information or the boolean connective ∨,∧ or ¬. As we use conditions with A = /0, we only present Q and we do not repeat Q in the depiction of q. Numbers indicate identification in the morphisms, while not numbered nodes indicate a hidden existential quantification, as usual. Edges between identified nodes are also assumed to be identified in the morphisms. The class of Spider Graphs is the intersection of the languages defined by the type graph and the negative conditions of Figure 3, and the positive conditions in Figures 4 to 6. Figure 4: Conditions on single elements. Reasoning rules are derived on top of the algebraic abstract models for Spider diagrams. These are syntactic transformations whose application corresponds to logical deduction, according to the semantics. They are usually specified by complex algorithmic procedures, during which the intermediate diagrams may not be logical consequences of the premise diagram, with pre and post conditions taking into account the stated semantics of the diagram. For instance a rule to add a new curve must split every zone into two zones, one inside and one outside each existing zone, as well as duplicating spider’s feet in zones. Whereas the first effect derives from the syntactical conditions, the second is a semantic aspect. Proc. GT-VMT 2010 6 / 14 ECEASST Figure 5: Conditions on pairs of elements. Figure 6: Conditions on existence and uniqueness of twins. 5 Condition preserving rules We discuss the derivation of a condition-preserving transformation unit T U tg for the generation of an element of type t. The initial and terminal expressions e1 and e2 for T U tg define the class of graphs typed on T G and satisfying Θ. T U tg is associated with the execution of 3 r : /0 ← /0 → t and is constructed so that given a graph G ∈ sem(e1), for G T U tg↓ =⇒ H, (G, H) ∈ sem(T U tg), and G ≤ G + t ≤ H, where + indicates the pushout along the empty subgraph. Note that in general G + t 6|= Θ, but G + t |= Θ′ for some Θ′ ⊂ Θ. Hence, we admit that some conditions may not be satisfied at intermediate steps of the unit application, and define an operational class in which to perform transformations. Graphs in this class satisfy a subset of the graph conditions and may be typed on some T G′ with additional types and edges w.r.t. T G. In particular, we use here the subset Θ′ containing ∃(Q) and all the conditions @(Qi) in Θ. Before presenting the algorithm, we give its rationale. We only have to consider universal and negative existential conditions, as positive existential conditions cannot be violated by adding an element. However, adding t produces a graph G + t which may not satisfy Θ in two ways: either it contains a forbidden subgraph, or it provides a new match for the premise of a universal condition, but it fails to satisfy the conclusion. 3 Here and in the rest of the paper, t denotes the graph consisting of a single node of type t. 7 / 14 Volume 29 (2010) Preserving constraints To solve the first problem, given4 a rule r : L → R in T U tg (including r : /0 → t ), for each condition @(X ) ∈ Θ, the function genNAC(r, X ) adds to r the set of NACs formed according to the construction in Figure 7 (left). Here M j is a maximal common subgraph of R and X and M′j is a maximal common subgraph to M j and L, s.t. all the squares are pushouts. Hence, L → X′j ← X j is the pushout for L ← M′j → X j, with the second morphism given by arrow composition. The set of NACs contains all the morphisms n′j : L → X ′ j preserving the image of L in X j. This prevents the application of r on a match which could create the forbidden subgraph X (see [HHT96]). M′j // �� M j �� // X �� L r // ��@ @@ R �� L r // n′j (( QQQ QQQ QQQ 11 R // X j �� Mh >>}}}} A AA Lh rh B BB X′j X ??~~~ // Rh Figure 7: Constructing NAC (left) and incorporating available context (right). To solve the second problem, given a (universal) condition C = ∀(Q, ∨ j∈J q j : Q → Wj), s.t. t ≤ Q, the function genU niRules(C) produces the set of rules R(C) where each rule has the form NAC(C) −→n← Q rC. j→ Wj. T U tg will contain an alternative choice among these rules, produced by the function alt(R(C)). In order to prevent these rules from being applied indefinitely in case of iteration on the choice, NAC(C) contains a copy of each Wj so the same match is not reused twice. Intuitively, these rules adjust the relations of the newly added element w.r.t. the contexts defined in their premises. However, several aspects have to be taken into account. For example, consider conditions C2 in Figure 4 and suppose we want to add a Spider. Then, the derived rule will have to create a Foot (condition C2), but this will require a Zone (condition C3), which will require a Curve (condition C4), hence other additional Zones (conditions C8 and C9), with several relations to other curves and zones (conditions C10−C12). On the other hand, a Zone for a Foot is already guaranteed to be present by C1, so that one can reuse existing context to satisfy this. To deal with such situations, given a rule r : L → R and a context X to be reused (more on this later), the function reuseContext(r, X ) produces a collection of rules of the form rh : Lh → Rh according to the construction in Figure 7 (right). Here, L → Lh ← X is the pushout along a maximal common subgraph Mh of L and X and X → Rh ← R is the pushout of X ← Mh → R. In general, one wants to obtain a T U tg which, after applying r : /0 → t to G, proceeds through the following abstract steps, so that context is progressively constructed for the next step. 1. define all edges between the added node and existing nodes of G as required by conditions; 2. generate new nodes as required by the conditions; 3. generate all edges for the new nodes, as required by the conditions. For example, when adding a Curve, one has to: 1) define relations between the new curve and existing zones; 2) create new zones, while defining relations with the new curve; 3a) establish relations between new zones and existing curves; 3b) establish relations between zones. 4 Where not needed, we will omit K. Proc. GT-VMT 2010 8 / 14 ECEASST Two things have to be considered. In general, satisfaction of ∀(Q, q) requires iterating through all possible matches for Q. However, when Q consists of just one node, no iteration is necessary, and if Q is the graph t , the derived rule has to be applied only to the newly added node, as it is already satisfied for the nodes of type t which were in G originally. Hence, we extend T G to admit a special type of loop edge: the first rule is changed to r : /0 → t †, where t † designates a node with a marker loop. For a rule5 r : L → R, the function mark(r) produces a set P†r = {r † h : L † h → R † h | h : t → L} where L † h and R † h are obtained by adding the loop to the images h(L) and r◦h(L), the immersions mh : L ↪→ L†h and m ′ h : R ↪→ R † h preserve such images, and r † h is the unique morphism s.t. L†h r†h→ R†h m′h← R is the pushout of L†h mh← L r→ R. T U tg will apply r or rules from P†r in different situations. The rule delLoop : t † → t will conclude T U tg deleting the loop. Moreover, as in the examples above, some rules create new nodes if they cannot be provided by the context, and so conditions relative to the new nodes have to be satisfied. This potentially creates a situation in which an infinite recursion might start. To avoid this, we study the relations between types for which conditions are mutually recursive. In our example one such pair consists of Curve and Zone. Indeed, generating a curve implies the generation of a collection of zones, whilst the generation of a zone can imply the generation of a single curve and of the collection of zones related to the new curve: we need to distinguish between situations in which context, enriched with the new node which has started the process, has to be reused, and those in which a new node is needed to provide the correct context. Definition 2 provides the needed notation. Definition 2 Let t ∈ VT be a type and Q(t) ⊂ Θ the set of conditions of the form Op(a : A → Q, q), for Op∈{∃,∀, @} s.t. t ≤Q (i.e. a node of type t appears in Q). {Q∃(t), Q∀(t), Q@(t)} is a partition of Q(t) into existential6, universal and negative existential conditions for t, respectively. V∃T = {t | t ≤ Q} is the set of existentially quantified types. A partial order ≤C is induced on Q∀(t) by (C1