Generating Instance Graphs from Class Diagrams with Adaptive Star Grammars Electronic Communications of the EASST Volume 39 (2011) Graph Computation Models Selected Revised Papers from the Third International Workshop on Graph Computation Models (GCM 2010) Generating Instance Graphs from Class Diagrams with Adaptive Star Grammars Berthold Hoffmann, Mark Minas 20 pages Guest Editors: Rachid Echahed, Annegret Habel, Mohamed Mosbah Managing Editors: Tiziana Margaria, Julia Padberg, Gabriele Taentzer ECEASST Home Page: http://www.easst.org/eceasst/ ISSN 1863-2122 http://www.easst.org/eceasst/ ECEASST Generating Instance Graphs from Class Diagrams with Adaptive Star Grammars Berthold Hoffmann1, Mark Minas2 1 hof@informatik.uni-bremen.de Universität Bremen Postfach 330 440 28344 Bremen, Germany and Fachgebiet Sichere Kognitive Systeme DFKI Bremen Enrique-Schmidt-Str.5 28569 Bremen, Germany 2 Mark.Minas@unibw.de Universität der Bundeswehr München 85577 Neubiberg, Germany Abstract: In model-driven software engineering, class diagrams are used to define the structure of object-oriented software and valid object configurations, i.e., what objects may occur in a program and how they are related. Object configurations are essentially graphs, so that class diagrams define graph languages. Class diagrams are declarative, i.e., it is quite easy to check whether a graph is an instance of a class diagram. Graph grammars, on the other hand, define a graph language by derivation and are thus well suited for constructing instance graphs. This paper describes how a class diagram can be translated into a graph grammar that defines the same graph language as the original class diagram. Such a graph grammar may then be used for, e.g., automatically generating valid object configurations as test cases. In contrast to earlier attempts, the presented approach allows to translate class diagrams with arbi- trary multiplicities, unique and non-unique associations, composition associations, class generalization, association subsetting and redefinition. This is made possible by using adaptive star grammars, a special kind of graph grammars. Keywords: Class diagram, Graph grammar, Adaptive star grammar 1 Introduction The model-driven design of software relies on the precise specification of models, often with diagrams in the uniform modeling language, UML. Class diagrams are the sub-language of UML for specifying the structure of object-oriented classes and possible configurations of objects that are instances of these classes. Configurations primarily consist of sets of class instances and links between them. Links are instances of the class diagram’s associations. Configurations can be considered as graphs, called instance graphs in the following: each object corresponds to a node, 1 / 20 Volume 39 (2011) mailto:hof@informatik.uni-bremen.de mailto:Mark.Minas@unibw.de Generating Instance Graphs from Class Diagrams with Adaptive Star Grammars and each link corresponds to a directed edge where source and target are used to distinguish the two roles of the (binary) link. The set of all object configurations that are compatible with a class diagram, therefore, corresponds to a language of graphs, i.e., a class diagram specifies a graph language. It is easy to check whether a given instance graph is compatible with a class diagram. In con- trast, constructing sample instance graphs for a class diagram is rather difficult. This, however, is important to generate test cases for a software model. In [EKT09], graph transformation rules have been used to do this. Here we use an adaptive star grammar [DHJM10, DHJ+06, DHM08] for this purpose. The approach described in [EKT09] supports only restricted meta models (e.g., class diagrams with constraints): it does not distinguish unique from non-unique associations, considers only very restricted association multiplicities, and does not cover composite associa- tions or associations with redefined or subsetted association ends. Adaptive star grammars, as shown in this paper, allow for a fairly straight-forward treatment of all of these concepts. More- over, adaptive star grammars do not need additional control mechanisms like negative application conditions or prioritizing some rules over others by rule layers as it is necessary in [EKT09]. The rest of the paper is structured as follows. We briefly introduce adaptive star grammars in the next section before we recall some properties of UML class diagrams in Section 3. In Section 4, the main part, the rules generating instance graphs of class diagrams are defined, explained, and illustrated with an example. The conclusions (Section 5) mention some related and future work. 2 Adaptive Star Grammars Graph grammars generalize the idea of Chomsky grammars to graphs: A set of rules defines how the graphs of the language can be derived by applying them to a given initial graph. Node replacement and hyperedge replacement [Eng99] have been studied most thoroughly as grammars for deriving graph languages. Their rules remove a nonterminal node, and attach a replacement graph to its neighbor nodes. The sort (i.e., the label) and the direction of the edge connecting a neighbor to the nonterminal determine completely how the neighbor is attached to the replacement graph; in hyperedge replacement, the number of neighbors is even fixed for every nonterminal. Both formalisms specify context-free compositions of graphs in the sense of [Cou87]; however, they fail to define even simple languages, such as the class of all graphs. Adaptive star grammars [DHJM10] overcome these limitations by means of a cloning mecha- nism that allows the neighbors of a nonterminal to be attached to the replacement graph in an arbitrary fashion. Let us briefly define the concepts needed; for more detailed definitions, see [DHM08] and for a more thorough discussion [DHJM10]. Graphs. Let the set C of sorts be the disjoint union of finite disjoint sets Ċ and C̄ for labeling nodes and edges, respectively. We distinguish a subset N ⊆ Ċ of nonterminal names. A graph G = 〈Ġ,Ḡ,sG,tG, ˙̀G, ¯̀G〉 consists of finite sets Ġ and Ḡ of nodes and edges, respec- tively, source and target functions sG,tG : Ḡ → Ġ, and functions ˙̀G : Ġ → Ċ and ¯̀G : Ḡ → C̄ assigning a sort to each node and edge, respectively. A node x ∈ Ġ is called nonterminal if `G(x)∈ N, and terminal otherwise. GCM 2010 2 / 20 ECEASST For a node x in G, the subgraph consisting of x and its adjacent nodes and incident edges is denoted by G(x). A border node of x is a node in Ġ(x)\{x}. Note that graphs may have several edges with the same source and target; such edges are called parallel. Parallel edges with the same label are called indistinguishable. In the following, we assume that the reader is familiar with common graph terminology, such as subgraph, disjoint union and isomorphism. Rules and Replacement. Adaptive star grammars are based upon a simple kind of graph trans- formation that replaces a subgraph G(x) by another graph. For this, define a rule r = 〈y,R〉 to be a pair consisting of a graph R with a distinguished node y ∈ Ṙ. We call R(y) and R\{y} the left- and right-hand side of r, respectively. Let G be a graph with a node x ∈ Ġ such that G(x)∼=g R(y) for some isomorphism g. Then the graph H = G[x /g r] is obtained from the disjoint union of G and R by identifying R(y) with G(x) according to the isomorphism g and removing x and its incident edges. Multiple Nodes. To make graphs (and rules) adaptive, we distinguish a subset G̈ ⊆ Ġ of ter- minal nodes in a graph G as multiple nodes, similar to the set nodes of Progress [SWZ99]. A multiple node x represents any number of ordinary nodes, which are called clones of x. (The nodes Ġ\G̈ are called singular.) In figures, a multiple node is distinguished by drawing it with double lines (see Example 1 below). A graph that does not contain any multiple node is said to be singular. Note that nonterminal nodes are always singular. Cloning. Let G be a graph. A function ρ : G̈ → N is a replicator for G. The graph Gρ is obtained from G by cloning each node x ∈ G̈ according to ρ , by replicating x and its incident edges ρ(x) times. If ρ(x) = 0, then x and its incident edges are simply deleted. Adaptive Star Grammars. Call a graph simple if it contains neither adjacent nonterminal nodes nor indistinguishable edges. A graph of the form G(x) is a star if it contains neither loops nor indistinguishable edges, x is nonterminal and its border nodes are terminal. An adaptive star rule over C is a rule r = 〈y,R〉, where R is a simple graph with sorts in C, and R(y) is a star. A clone of r is a rule 〈y,R′〉 such that R′ is simple, and there is a replicator ρ for R such that R′ can be obtained from Rρ by identifying some of the border nodes of y with each other (where, of course, only nodes of the same sort can be identified). An adaptive star grammar (ASG) is a system Γ = 〈C,P,Z〉, where P is a finite set of adap- tive star rules, and Z is the initial star, which has no multiple border nodes. (All sorts are taken from C.) Given a graph G, we write G =⇒P H if H = G[x /r] for some node x ∈ G and a clone r of an adaptive star rule in P. The adaptive star language generated by Γ is the set of all terminal graphs G that can be derived from Z: L (Γ) ={G | Z =⇒+P G and ˙̀G(x)∈ Ċ\N for all x ∈ Ġ}, where =⇒+P denotes the transitive closure of =⇒P . Late Cloning. In the terminology of [DHJM10], these definitions make use of early cloning, where neither the graphs in derivations, nor the clones of rules contain multiple nodes. However, it requires to “guess” in advance how many clones of a multiple node must be made, which is not always adequate. Especially for constructing derivations, it is better to do cloning as late as possible. In the following, we will use late cloning, a corresponding way of constructing derivations, which has been considered in [DHJM10] as well. 3 / 20 Volume 39 (2011) Generating Instance Graphs from Class Diagrams with Adaptive Star Grammars A late replicator ρ̈ : G̈ → N×N sends multiple nodes to pairs of numbers that represent the numbers of singular and multiple nodes that shall be made of a multiple node, respectively. The graph Gρ̈ contains, for every multiple node x ∈ G̈ with ρ̈(x) = (n,m), n + m clones, whereof m are designated as multiple. In order to apply an adaptive rule r = 〈y,R〉 to a graph G with late cloning, a combined late replicator ρ̈ : G(x)∪R(y) → N×N is used to clone both G and r so that G(x)ρ ∼= R′(y) where R′ is simple and obtained from Rρ by identifying some of the border nodes of y. A step using late cloning is denoted as G ⇒P H again; H is obtained by applying 〈y,R′〉 to Gρ . Like early cloning, late cloning does not restrict the language of singular graphs generated by an adaptive star grammar [DHJM10]. 1 Example 1 (The Language of Unlabeled Graphs) As an example, consider the adaptive star grammar Γ which is given by Γ = (C,P,Z), where Ċ ={ε,A}, C̄ ={λ}, Z = A , and P = { A ::= A , A ::= A , A ::= } . A rule 〈y,R〉 is drawn as lhs ::= rhs where lhs = R(y) and rhs = R\{y}. The grammar derives arbitrary graphs without loops over the sorts ε and λ , which are not drawn, that is, the class of finite unlabeled graphs. Starting with Z, the first rule makes it possible to add an arbitrary number of border nodes. The second rule adds edges between border nodes, and the third removes the nonterminal node. Let us briefly discuss a major difference between the definition of adaptive star grammars used here and the one in [DHJM10]. The definition of stars and star rules used here is more general since stars are allowed to have parallel (but no indistinguishable) edges which is not allowed in [DHJM10]. Moreover, cloning a star rule r = 〈y,R〉 prior to application may involve taking a quotient that identifies border nodes of y in Rρ with each other like in [DHM08]. This may sound alarming, as it was shown in [DHJ+06] that quite a similar type of adaptive star grammars can generate all recursively enumerable languages. However, note that the resulting graph R′ – after cloning and taking the quotient – must be simple. In particular, R′ must not contain indistinguishable edges. The effective rule 〈y,R′〉, therefore, has simple left-hand and right-hand sides. As a consequence, adaptive star grammars of the sort defined above can be simulated by ordinary ones (i.e., those in [DHJM10]) by using subsets of C̄ to label edges in stars (without parallel edges) and turning every rule into a finite number of rules (corresponding to the allowed quotients). The definition used here just simplifies writing star grammars without extending their power. 3 Class Diagrams Class diagrams are a well-known graphical language of the UML. The focus of this paper is on the specification of graphs without attributes. Hence, we ignore method and attribute specifica- 1 In that paper, we consider conditions under which a late replicator is minimal; this is not necessary here. GCM 2010 4 / 20 ECEASST tions within classes. They can be easily added if required. Moreover, we ignore associations with cardinality greater than 2. The class diagrams used in this paper consist of classes and (binary) associations between them. Concrete classes are distinguished from abstract classes. The former are drawn as rect- angles with white background, the latter with gray background and italic class names. Each association has a name and is drawn as a directed edge in order to distinguish the association end-points.2 Each end-point of an association is equipped with a multiplicity of the form u..v where u ∈ N0 is the lower bound and v ∈ N∪{∗} the upper bound such that u 6 v if v ∈ N. As usual in this context, ∗ stands for “infinity”. An association may be declared unique (the default) or non-unique, indicated by the annotation ’{non-unique}’. We distinguish regular associations (the default) from composite associations. The latter have a black diamond at one of their end- points. The multiplicity at this end-point is either 0..1 or 1..1. The class at the end-point with the diamond is called composite, the class at the other end-point is called its part. Edges represent- ing composite associations are always directed towards the part class, and they may be loops. However, no object can be part of two objects, and no object is (directly or indirectly) part of itself. Moreover, class diagrams may contain generalization arrows, which have triangular arrow heads and point from sub-classes to super-classes. Each class may have an arbitrary number of sub-classes and super-classes. However, generalization arrows are not allowed to form cycles. We usually extend the notion of sub-classes and super-classes to all classes that are reachable by chains of generalization arrows (including chains of length 0). We also adopt features of UML 2 and allow association ends to redefine or subset other association ends. These features are explained in Sections 4.4 and 4.5. A class diagram specifies graphs by the mechanism of instantiation: A node is an instance of a class iff it is labeled with the class name. Note that each node is an instance of exactly one class. A more general concept is defined as follows: A node is called member of a class C iff the node is an instance of C or any of its sub-classes. An edge e is an instance of an association from a class C1 to a class C2 if e is labeled with the association label, and if the source and target nodes are members of C1 and C2, respectively. Moreover, each association defines a multiplicity constraint: Let a be an association from class C1 to class C2 with multiplicities u..v at its source end-point and multiplicity r..s at its target end-point. The multiplicity constraint of a is satisfied iff no member of C1 has less than r or more than s outgoing a-instances as edges and if no member of C2 has less than u or more than v incoming a-instances as edges. The graph language specified by a class diagram consists of all graphs that satisfy the follow- ing conditions: C1: Each node is an instance of a concrete class, and each edge is an instance of an association. C2: The multiplicity constraints of all associations are satisfied. C3: No two instances of any unique association are indistinguishable edges. C4: The subgraph induced by composite edges (i.e., instances of composite associations) must be a collection of trees. 2 These directed edges should not be confused with navigation arrows, which solely represent implementation issues in object-oriented programming. 5 / 20 Volume 39 (2011) Generating Instance Graphs from Class Diagrams with Adaptive Star Grammars Such graphs are called instance graphs in the following. This definition follows the UML specification when nodes are considered as objects and edges as links [OMG]. 4 An Adaptive Star Grammar for Instance Graphs In the following, we assume an arbitrary, but fixed class diagram. We first describe how this class diagram can be translated into an ASG defining the same graph language. The set of terminal node sorts, hence, must consist of the names of all concrete classes. The set of edge sorts must contain all association names, but also additional edge sorts that will be used for labeling edges between nonterminal and terminal nodes. These edge sorts, but also nonterminal nodes sorts, will be introduced as needed in the following. The idea of translating a class diagram into an adaptive star grammar defining the same graph language is to start from a graph (called actual initial graph, AIG, in the following) that closely resembles the class diagram. We add an initial rule that derives the AIG from the initial star that consists of just a single node with a unique nonterminal label. For each concrete class, the AIG contains a multiple node labeled with the class name. Cloning these nodes in a derivation creates the instances of the corresponding classes. The AIG’s other nodes are nonterminal nodes repre- senting the associations of the class diagram: Each regular association is represented by a single node with a unique nonterminal label. It is connected to all multiple nodes of those concrete classes that participate in the association, possibly by being a sub-class of a class at an endpoint of the association. A set of rules is responsible for eventually creating all edges representing instances of the association, i.e., links between objects. The rules make sure that conditions C2 and C3 (see Sect. 3) are satisfied. The construction is more complicated for composite associa- tions since it must make sure that condition C4 is not violated. Actually, each maximal connected subgraph of the class diagram consisting of composite associations and generalizations only has to be represented by a nonterminal node with a unique label. Rules must be defined that create all possible trees consistent with the class diagram. These constructions are described in the following. But first, we introduce some notation that makes drawing of rules easier: In rules, corresponding nodes of the left- and right-hand sides are associated by their positions in the drawing. Terminal nodes in rules are always unlabeled; on the left-hand side of a rule, they match terminal nodes with arbitrary labels. Because none of the following rules introduces new terminal nodes on the right-hand side (new terminal nodes are rather created by cloning), there is no need for labeling them. Finally, we use a simplified notation for bundles in graphs as shown in the following illustration: For edge labels a1,...,an, bundle (a) is drawn as (b), and as (c) if a1 = ···= an = x. A ··· a1 ··· an a1 ··· an A ··· A x (a) (b) (c) The next two sub-sections describe the construction for regular associations, the first one for associations declared “non-unique”, the second for “unique associations”. The presented con- struction must be repeated for each regular association within the class diagram. Afterwards, we GCM 2010 6 / 20 ECEASST discuss the construction for composition associations, and finally present an example. 4.1 Non-Unique Regular Associations Consider an association U W r..s u..v{non-unique} c where r,s,u,v ∈N0,r 6 s,u 6 v within the class diagram. Classes U and W may be arbitrary classes, U and W may even reference the same class, or U may be a sub-class of W or vice versa. We assume the upper bounds s and v to be finite, i.e., different from “∗”. However, the following construction can be extended easily to the case s =∗ and/or v =∗ as we will discuss briefly at the end of this subsection. Let U and W be the sets of concrete sub-classes of U and W , respectively. Note that U and W contain the classes U and W , respectively., if they are concrete. Note also that U∩W contains all concrete classes being sub-classes of both U and W . Each instance of a class in U\W has u..v outgoing, but no incoming c-edges, each instance of a class in W\U has r..s incoming, but no outgoing c-edges, and each instance of a class in W∩U has u..v outgoing as well as r..s incoming c-edges. Since the association is non-unique, indistinguishable c-edges are permitted. The following construction assures these properties. Let {U1,...,Uk} = U\W, {W1,...,Wn} = W\U, and {V1,...,Vm} = U∩W. Note that each of these sets may be empty. Let C be a nonterminal label that is unique for this c-association, and let {x0,x1,...,xv}∪{y0,y1,...,ys} be a set of pairwise distinct edge labels. The AIG must contain the following graph as a subgraph: U1 ... Uk V1 ··· Vm C W1 ... Wn x0 x0 y0 y0 The meaning of the edges labeled xk and yi is the following: An xk-edge connects C with a terminal node that has k outgoing c-edges already, and an yi-edge connects C with a terminal node that has i incoming c-edges already. Note that terminal nodes with labels in W∩U may have incoming as well as outgoing c-edges; they may even have c-loops where each loop counts as an incoming and as an outgoing edge. The number of incoming and outgoing edges of terminal nodes with labels in W∩U is indicated by two edges labeled xk and yi, respectively. We may add another c-edge between two appropriate nodes where the source node must have k < v outgoing c-edges and the target node i < s incoming c-edges. This is the task of the following set of rules rCk,i, 0 6 k < v and 0 6 i < s, which add a new c-edge between two nodes and increment the “counters” realized by edge labels. x0 ... xv y0 ... ys C xk yi rCk,i ::= x0 ... xv y0 ... ys C xk+1 yi+1 c 7 / 20 Volume 39 (2011) Generating Instance Graphs from Class Diagrams with Adaptive Star Grammars Note that such a rule can also be applied in situations where nodes are connected with two parallel edges labeled xk and yi. The corresponding border nodes have to be identified in such cases. A c-loop is added if the two singular border nodes are identified. We can stop adding c-edges if there is no xk-edge and no yi-edge with k < u and i < r; we can then remove the nonterminal node C representing the association. This is realized by the following final rule r̂C: xu ... xv C ... yr ys r̂C ::= xu ... xv ... yr ys The presented construction uses the fact that v and s are finite numbers and different from *. The construction, however, is extended easily to the case v = ∗ and/or s = ∗. Let us assume just v = ∗. The new meaning of an xu-edge connecting C with a terminal node is that the terminal node has at least u outgoing c-edges already. And we change rule rCu,i, so that it no longer creates a new xu+1-edge, but an xu-edge again. It is clear that this construction now creates any number of outgoing c-edges, but at least u, at appropriate nodes. 4.2 Unique Regular Associations We now consider the slightly different situation with a unique instead of a non-unique associa- tion; the other aspects remain unchanged. Let C and C̄ be two nonterminal labels that are unique for this c-association, and let {x,y0,y1,...,ys} be a set of pairwise distinct edge labels. The AIG must contain the following graph as a subgraph: U1 ... Uk V1 ··· Vm C W1 ... Wn x x y0 y0 The meaning of yi-edges is the same as in the previous section, i.e., a terminal node connected to C with a yi edge means that the terminal node has i incoming c-edges already. However, we do not count outgoing c-edges. Instead, an x-edge between a terminal node and C means that the terminal node has not yet any outgoing c-edge. Since indistinguishable c-edges are prohibited, we cannot create c-edges independently from each other like in the previous subsection. Instead, we use rules that create all outgoing c-edges of any singular node in a single derivation step; an x-edge visits those terminal nodes that receive a bundle of outgoing c-edges to pairwise distinct nodes visited by yi-edges. We define a rule rC that selects one singular node visited by an x-edge and some of the terminal nodes visited by yi-edges. The latter nodes will receive new incoming c-edges. Rule rC, therefore, increments the GCM 2010 8 / 20 ECEASST “counters” realized by yi-edges: y0 ··· ys y0 ... ys−1 C x rC ::= y0 ··· ys y1 ... ys C C̄ x x y The rule adds a new nonterminal node labeled with C̄. The following set of rules rC̄k , u 6 k 6 v, adds a bundle of c edges from the node visited by the x-edge to the k nodes visited by the y-edges. C̄ ... ... x y y   k nodes rC̄k ::= ... ... c c Finally, we can remove the C-node as soon as all terminal nodes connected to C with an x-edge have been processed by rule rC. This is the task of rule r̂C: ... yr ys C r̂C ::= ... Like in the previous subsection, these rules use the fact that the upper multiplicity bounds v and s are finite and different from *. The case s =∗ can be realized similarly to the discussion at the end of the previous subsection. As an example, see Sect. 4.7. The case v =∗ is even simpler: The set of rules rC̄k is replaced by a single rule r C̄ that looks like rC̄u+1; however, one of the border nodes visited by an y-edge is turned into a multiple node. Hence, this rule can add bundles with an arbitrary number of c-edges, but at least u. 4.3 Composite Associations With the constructions of the last two subsections, each class diagram whose associations are all regular can be translated into an adaptive star grammar defining the same graph language. Com- posite associations are special since subgraphs induced by composite edges must be collections of trees (condition C4 in Sect. 3). The ASG, hence, must be able to create all such trees. In order to translate this part of a class diagram into an ASG, we have to find those classes whose instances can belong to the same tree. This is done by finding the largest connected subgraphs of the class diagram that contain only composite associations and generalizations. The instance nodes of the classes that belong to the same connected subgraph may, but need not, belong to the same tree. This can be represented in the ASG in the following way: We add a new nonterminal node to the AIG for each of these connected subgraphs, and connect this nonterminal node to each multiple node representing a concrete class within the subgraph. Then we define star rules that create all possible trees consistent with the class diagram. We demonstrate this procedure for just a single composite association as shown here. How- ever, the construction can be generalized easily to more composite associations in the same 9 / 20 Volume 39 (2011) Generating Instance Graphs from Class Diagrams with Adaptive Star Grammars connected subgraph. U W 0..1u..v c where u,v ∈N0,u 6 v Again, classes U and W may be arbitrary classes, U and W may even reference the same class, or U may be a sub-class of W or vice versa. Let the sets U, W, {U1,...,Uk} = U\W, {W1,...,Wn} = W\U, and {V1,...,Vm} = U∩W be defined as in the beginning of Sect. 4.1. Let C,C̄ be two nonterminal labels that are unique for this c-association, and let x,y,z be distinct edge labels. The AIG must contain the following graph as a subgraph: U1 ... Uk V1 ··· Vm C W1 ... Wn x y z The nonterminal node C stands for all trees with instances of classes in U∪W as nodes connected by c-edges; z-edges visit those terminal nodes (roots) that will “receive” u..v outgoing, but no incoming c-edges. Nodes visited by y-edges (inner nodes) will “receive” u..v outgoing and possibly one incoming c-edge. Finally, nodes visited by x-edges (leaves) will “receive” at most one incoming, but no outgoing c-edge. These properties are assured by the following rules. C x y z z rC ::= C̄ C C x y z x y z x yz C r̂C ::= 〈〉 The recursive rule rC selects a singular root (visited by a z-edge) and recursively proceeds with the rest of the roots by adding a new C-node connected to the multiple node representing the rest of the roots. Moreover, rC creates a C̄-node that will be derived by a rule rC̄k,i (see below) to a bundle of c-edges from the root to its children. Those children that are inner nodes become roots of sub-trees, indicated by the other created C-node. The recursion stops as soon as all nodes have been processed, represented by rule r̂C, which replaces a C-node by the empty graph 〈〉. The rules rC and r̂C make sure that each member of U receives an incoming c-edge. However, this is actually not required by the association. Rule r̄C, therefore, allows to turn any inner node into a root and any leaf into a singleton tree node. This rule must be dropped if the multiplicity of association c is 1..1 instead of 0..1 at the composite end-point. C x x y y z r̄C ::= C x z y z GCM 2010 10 / 20 ECEASST Finally, C̄-nodes must be derived to bundles of u..v many c-edges. This is the task of the set of rules rC̄k,i for all i,k ∈N0 so that u 6 i + k 6 v: i nodes   ... ...C̄ zx x y y  k nodes rC̄k,i ::= ... ... c c c c Note that this construction can be easily extended to the situation where v = ∗. The set of rules rC̄k,i must be replaced by the set of rules r ′C̄ k such that 0 6 k 6 u; r ′C̄ k looks like r C̄ k+1,u−k+1, but two of the border nodes, one being visited by an x-edge, the other with an y-edge, are turned into multiple nodes. Hence, these rules can add bundles with an arbitrary number of c-edges, but at least u. 4.4 Class Diagrams with Redefining Associations Class inheritance, i.e., classes that are specializations of others, is one of the original object- oriented concepts. UML 2 [OMG] has extended class diagrams by the concept of specialization for associations. In this paper, we consider redefinition and subsetting. The former is described in this section, the latter in the following. The following class diagram shows an example of a redefined association end. U W X Y r..s u..vc r̄..s̄ ū..v̄d {redefines c.tgt} An end of an association c can be redefined by an end of an association d if d is a specialization of c, i.e., the associated classes must be compatible. This is indicated by the dashed generaliza- tion arrows, meaning that X has to be a sub-class3 of U and Y a sub-class of W . The constraint {redefines c.tgt} marks the redefining association end and declares the redefined association end, here the target end of c. A role name would be used in regular class diagrams. In this paper, we ignore role names and, use src or tgt instead. The meaning of redefined associations is as follows: Associations with their ends correspond to class properties that describe links from objects to objects at the other association ends. Re- definition of association ends actually restricts what objects may be linked. As specified by association c, each instance of class U and its sub-classes may be linked to instances of class W or its sub-classes. In particular, instances of X (and its sub-classes) may be linked to W - objects by association c. The redefinition of c.tgt means that each W -object being linked from an X -object by association C must actually be a Y -object. These Y -objects are directly accessible through association d; each d-link is a c-link at the same time. Note, however, that the source end of d does not redefine the source end of c in our class diagram. This means that U -objects 3 As introduced in Section 3, we extend the notion of sub-classes to all classes that are reachable by chains of generalization arrows, including chains of length 0. 11 / 20 Volume 39 (2011) Generating Instance Graphs from Class Diagrams with Adaptive Star Grammars which are not X -objects may still link to Y -objects. If the source end of d were annotated with {redefines c.src}, only X -objects would be allowed to be linked to Y objects by association c. In the following we show how the situation presented in the class diagram shown above can be translated into an ASG such that the links in each instance graph comply with the association end redefinitions. We will consider non-unique associations only. Unique associations or multiple redefined association ends can be translated analogously. Translation of redefinition semantics to ASGs is actually simple. The scheme for setting up an ASG for the given class diagram is the same as described in Section 4.1. We handle associations c and d as if they were completely independent associations. For c, however, we have to set up the AIG in a more restricted way such that no c-edges can be created between X -nodes and Y -nodes. This can be simply enforced by removing each concrete sub-class of X from set U first and then by proceeding as described in Section 4.1. Note that we use edge label d for representing links of association d, which are c-links at the same time. This information is implicit and cannot be represented in instance graphs explicitly. 4.5 Class Diagrams with Association Subsetting Subsetting of association ends is another extension to associations that has been introduced by UML 2. If an association end may redefine another association end as shown in the previous section, it may also subset it instead. Subsetting association ends do not restrict the subsetted as- sociations, in contrast to redefining association ends. A subsetting association end simply means that each link of the subsetting end must be a link of the corresponding subsetted association end, too. It is easy to see that subsetting one end of an association automatically implies the sub- setting of the other association end. Therefore, we rather say that an association subsets another association instead of association ends subsetting other association ends (as done in the UML standard [OMG]). The following class diagram shows the general situation of an association c which is subsetted by the associations d1,d2,...dn. The classes at the association ends must be compatible to U and W , respectively, indicated by the dashed generalization arrows. U W X1 Y1 Xn Yn . . . . . . r..s u..vc r1..s1 u1..v1d1 {subsets c} rn..sn un..vndn {subsets c} Similar to the previous section, we are going to represent dt -links by dt -edges. Subsetting means that each of these dt -links is a c-link at the same time. Again, this is implicit information that cannot be represented in the instance graph explicitly. Also similar to the previous section, we consider non-unique associations only. Unique associations can be translated analogously. Since subsetting associations do not restrict subsetted associations, we can create c-edges and dt -edges almost independently in the ASG. When a dt -edge is created, however, we must count GCM 2010 12 / 20 ECEASST it as a c-edge, too, since each dt -link is also a c-link. Counting is necessary if multiplicity is re- stricted. For instance, if no U -object may be linked to more than two W -objects using association c, i.e., v = 2, we must not have an X1-object with one leaving c-edge and two leaving d1-edges. Similar to Section 4.1, we use a nonterminal node C for creating all edges labeled c,d1,d2,...,dn. C is connected with terminal nodes by edges labeled xk and yi iff the corresponding terminal node has already k outgoing and i incoming c-edges, respectively. We use edges labeled xtk and y t i iff the corresponding terminal node has already k outgoing and i incoming dt -edges, respectively. The AIG has to contain C and one multiple node for each concrete sub-class of U and W . Nodes representing sub-classes of Xt (t = 1,2,...,n) are connected with C by xt0-edges. Nodes repre- senting sub-classes of W and Yt (t = 1,2,...,n) are connected with C by edges labeled y0 and yt0, respectively. Note that some classes in the class diagram may actually be the same classes or they may inherit from several classes. The corresponding nodes are then connected with C by several edges. As an example, we choose n = 1, i.e., we have one subsetting association only. Similar to Section 4.1, let U,W,X,Y be the sets of concrete sub-classes of U,W,X1,Y1, respectively. Note that X ⊆ U and Y ⊆ W. Note also that U,W,X,Y contain U,W,X1,Y1, respectively, if they are concrete classes. Each concrete sub-class of U or W then falls into exactly one of the following sets: {U1,...,Uk} = (U\X)\W {W1,...,Wn} = (W\Y)\U {U 11 ,...,U 1 k1 } = X\W {W 11 ,...,W 1 n1} = Y\U {R1,...,Rp} = (W\Y)∩X {S1,...,Sq} = (U\X)∩Y {V1,...,Vm} = (U\X)∩(W\Y) {V 11 ,...,V 1 m1} = X∩Y and the AIG, therefore, must contain the following graph as a subgraph: C U 11 ··· U 1k1 R1 ··· Rp U1 ··· Uk V1 ··· Vm V 11 ··· V 1m1 W 11 ··· W 1n1 S1 ··· Sq W1 ··· Wn x0 x10 x10 x0 y0 x0 x0 y0 x10 y 1 0 x0 y0 y0 y10 y10 y0 x0 y0 Like in Section 4.1, we may add another c-edge between two appropriate nodes where the source node must have k < v outgoing c-edges and the target node i < s incoming c-edges. This is the task of the following set of rules rCk,i, 0 6 k < v and 0 6 i < s, that add a new c-edge between 13 / 20 Volume 39 (2011) Generating Instance Graphs from Class Diagrams with Adaptive Star Grammars two nodes and increment the “counters” realized by edge labels. x20 ... x2v2 x10 ... x1v1 x0 ... xv ··· ··· y20 ... y2s2 y10 ... y1s1 y0 ... ysC xk yi rCk,i ::= x20 ... x2v2 x10 ... x1v1 x0 ... xv ··· ··· y20 ... y2s2 y10 ... y1s1 y0 ... ysC xk+1 yi+1 c Analogously, we may add another dt -edge (t = 1,2,...,n) between two appropriate nodes where the source node must have l < vt outgoing dt -edges and the target node j < st incoming dt -edges. However, since each dt -link is also a c-link, we must count this new dt -edge as a c- edge, too. Therefore, the source node must have k < v outgoing c-edges and the target node i < s incoming c-edges. This is the task of the following set of rules r̄Ct,k,i,l, j (0 6 k < v,0 6 i < s,0 6 l < vt,0 6 j < st , and 0 6 t 6 n) that add a new dt -edge between two nodes and increment the “counters” realized by edge labels. x20 ... x2v2 x10 ... x1v1 x0 ... xv ··· ··· y20 ... y2s2 y10 ... y1s1 y0 ... ysC xk xtl yi ytj r̄Ct,k,i,l, j ::= x20 ... x2v2 x10 ... x1v1 x0 ... xv ··· ··· y20 ... y2s2 y10 ... y1s1 y0 ... ysC xk+1 xtl+1 yi+1 ytj+1 dt We can stop adding c-edges and dt -edges if each “counter” has reached its lower multiplicity bound; we can then remove the nonterminal node C representing the association. This is realized by the following final rule r̂C: x2u2 ... x2v2 x1u1 ... x1v1 xu ... xv ··· ··· y2r2 ... y2s2 y1r1 ... y1s1 yr ... ysC r̂C ::= x2u2 ... x2v2 x1u1 ... x1v1 xu ... xv ··· ··· y2r2 ... y2s2 y1r1 ... y1s1 yr ... ys If a class acts as a root of an inheritance hierarchy and must not be instantiated, it is declared as an abstract class. A similar concept exists in UML 2 for associations: Subsetted associations GCM 2010 14 / 20 ECEASST (association c in our class diagram) that must not be instantiated are declared as {union}. Only their subsetting associations (associations dt , t = 1,2,...,n in our class diagram) may be instan- tiated as long as these are not themselves declared as {union}. Union-associations can easily be handled in the construction of the grammar by prohibiting the addition of c-edges, i.e., by dropping the rules rCk,i. 4.6 Class Diagrams with Empty Graph Languages There are class diagrams that cannot be satisfied by any instance graph, i.e., its graph language is empty. This section demonstrates that the ASG constructed from such a class diagram has an empty language, too. As an example, consider the following class diagram. X 1..1 a 2..2 There is no (finite) instance graph compatible with this class diagram because it requires each instance graph node to have an in-degree of 1 and an out-degree of 2 at the same time. Of course, the ASG constructed from this class diagram has an empty graph language, too. Each derivation starting from the initial star Z is infinite. Each derived graph contains a non- terminal node; the terminating rules can never be applied. Such a derivation is shown below; every graph wherein n class nodes satisfy the multiplicity constraints will contain at least n other classes that do not satisfy it. Z ⇒ A X x0 y0 ⇒ A X X x1 y1 x0 y0 a ⇒ A X X X x2 y1 x0 y1 x0 y0 a a ⇒ A X X X X x2 y1 x1 y1 x0 y1 x0 y0 a a a ⇒ A X X X X X x2 y1 x2 y1 x0 y1 x0 y1 x0 y0 a a a a ⇒··· Interestingly, while there is no finite instance of the class diagram, there are, in fact, infi- nite instances. This fits nicely to the fact that derivation sequences of the ASG are infinite and “generate” these infinite graphs. 4.7 Example Finally, this section presents the rules constructed from an example class diagram and shows the derivation of an instance graph. Consider the following class diagram that specifies trees where each tree node may be con- 15 / 20 Volume 39 (2011) Generating Instance Graphs from Class Diagrams with Adaptive Star Grammars nected to an E-object. “Manager”-objects of class M are connected to root nodes: E Node Child Parent MIRoot Leaf Inner Root Single 1..∗ f 0..∗ c 1..∗ 1..1 1..1 g 0..1 We assume that the association f is unique, but that g is not (which is actually irrelevant for an m-to-1 association). The association c is a composite association. Then the initial rule for the initial star Z deriving the AIG looks as follows: Z init ::= ELeaf Inner Root Single M G C F y0x0 x y z y0 y0 x The rules for the composition c are rC and r̂C as in Sect. 4.3, but without r̄C since the lower bound of the association at the composite end-point is 1. Because the upper bound at the part end-point is *, we need the following rules for C̄: C̄ x y z x r′C̄0 ::= c c c C̄ x y z y r′C̄1 ::= c c c The rules for association g are constructed as shown in Sect. 4.1 with r = s = v = 1 and u = 0: x0 x1 y0 y1 G x0 y0 rG0,0 ::= x0 x1 y0 y1 G x1 y1 g x0 x1 y1G y1 r̂G ::= x0 x1 y1 The rules for association f are constructed as described in Sect. 4.2 with r = 1,u = 0,s = v =∗. Because of s = ∗, rule rF differs from the one shown in Sect. 4.2: an y1-edge means that the connected terminal node has at least one incoming f -edge. Moreover, as discussed at the end of Sect. 4.2, there is only a single rule rF̄ creating 0..∗ f -edges from an F̄ -node: y0 y1 y0 y1 Fx rF ::= y0 y1 Fx y1 F̄ x y GCM 2010 16 / 20 ECEASST F y1 r̂ F ::= F̄ yx r F̄ ::= f Fig. 1 shows a sample derivation of an instance graph using these rules. Note that node labels have been abbreviated by initial letters of class names. The derivation arrows d1,...,d9 do not represent single derivation steps, but have the following meaning: d1 clones the multiple R-node to a singular as well as another multiple one, clones the I- and L-nodes three times each, and applies rule rC. d2 clones the topmost I-node to a singular and another multiple one, deletes the R-, L-, and I-nodes at the bottom (by replicating them 0 times), and applies rC again after cloning the remaining I- and L-nodes three times each. d3 deletes the 6 lowermost multiple nodes and clones the remaining two multiple L-nodes to singular ones. d4 applies r̂C twice, deleting the isolated C-nodes, and then r′C̄0 twice after adapting the rule to the context of the corresponding C̄-node. d5 applies rG0,0, and d6 applies r̂ G, deleting the G-node. d7 applies rF ; both the R- and the S-node get visited by y1-edges. d8 deletes the multiple M-node by replicating it 0 times, and applies r̂F , deleting the F -node. Finally, d9 clones the S-node twice and applies rF̄ . 5 Conclusions We have described how a class diagram can be translated into an adaptive star grammar that defines the same language of instance graphs as the class diagram. The translation process works for all class diagrams with generalizations, and with associations that may be unique, non-unique, or even composite, and may have arbitrary multiplicities. We have explicitly shown the translation process for regular associations with arbitrary multiplicities, have outlined the translation for compositions, and demonstrated the situation for a single composition association. Furthermore, we have shown that newer concepts of class diagrams, like association subsetting and redefinition, can also be translated correctly. The presented approach closes the gap between class diagrams as a declarative approach for defining instances and graph grammars, which provide a constructive definition. This makes techniques known from graph grammars available to class diagrams as well. For instance, test cases can be constructed automatically by just creating sample derivations. Furthermore, gram- mars allow for inductive definitions based on their rules and their derivation trees that structure instances of the class diagram hierarchically. Moreover, one may wish to restrict the instances of a class diagram in ways that class diagrams cannot express. The translation into an equivalent grammar may make this possible, because grammars allow for a much finer tuning. The benefits of using adaptive star grammars come with a moderate descriptive complexity. If association subsetting is not used, the size of each rule (by counting nodes and edges on a rule’s left-hand side and right-hand side) depends linearly on the maximum finite multiplicity bound m used in the class diagram, and the number of obtained rules grows linearly with the number of associations and quadratic with m. For class diagrams with association subsetting, the descriptive complexity is worse. Let Ns be the number of associations participating in association subsetting. Then, the size of rules obtained for such associations depends linearly on m and Ns, whereas the number of rules grows linearly with Ns and with the fourth power of m. The only closely related work, to the best of our knowledge, is by K. Ehrig, J. M. Küster, and G. Taentzer [EKT09], which extends the work by R. Bardohl, H. Ehrig, J. de Lara, and 17 / 20 Volume 39 (2011) Generating Instance Graphs from Class Diagrams with Adaptive Star Grammars Z ⇓ init SRIL G EC F M y 0 x 0 zyx y 0 x y 0 ⇒d 1 SRLILIRLI G EC̄CC F M y 0 x 0 zxyzxyzxy y 0 y 0 y 0 x ⇒d 2 SRLILILIILI G EC̄C̄CC F M y 0 y 0 x y 0 x 0 z x yy z x yz x yz x y ⇒d 3 SRLIL G EC̄C̄CC F M y 0 y 0 x y 0 x 0 z x yz x ⇒d 4 S SR L IL E M ff f c c g c ⇐d 9 SR L IL E F̄ M x y y c c g c ⇐d 8 SR L IL E F̄F MM x yy1 x c c g c ⇐d 7 SR L IL G E F M y 0 y 0 x y 0 x 0 c cc⇓ d 5 S R L IL G E F M y 0 y 0 x y 1 x 1 x 0 c c g c⇓ d 6 S R L IL E F M y 0 y 0 x c c g c F igure 1: A sam ple derivation of an instance graph (to be read clockw ise) GCM 2010 18 / 20 ECEASST G. Taentzer [BELT04] considerably. In [EKT09], a meta model is translated into a graph gram- mar that defines the same language of instance graphs. In that paper, composite associations are not considered, multiplicities are restricted to the simplest cases, and unique and non-unique associations are not distinguished. Moreover, the generated graph grammar makes heavy use of negative application conditions for specifying forbidden context graphs, and uses a layering mechanism to give some rules priority over others. This does not only complicate the grammar, but—more important—also the reasoning about the grammar and the graph language it gener- ates.On the other hand, this kind of grammar allows to treat some simple meta model constraints by translating them into application conditions. So far, we have not considered how meta model constraints can be translated along with the class diagram into an adaptive star grammar, possibly with application and graph conditions. A related, but not general approach has been described in previous work [HM10] where rules may be applied only if their application conditions are satisfied. In future work, we intend to make use of generalized results on context conditions and their relation to logical graph properties and constraints by A. Habel and K.-H. Pennemann [HP09] and A. Habel and H. Radke [HR10]. Acknowledgement. We would like to thank the anonymous referees for reading the submitted version carefully, and making valuable suggestions that helped to improve the paper. Bibliography [BELT04] R. Bardohl, H. Ehrig, J. de Lara, G. Taentzer. Integrating Meta-modelling Aspects with Graph Transformation for Efficient Visual Language Definition and Model Ma- nipulation. In Proc. Fundamental Approaches to Software Engineering (FASE’04). Lecture Notes in Computer Science 2984, pp. 214–228. Springer-Verlag, 2004. doi:10.1007/978-3-540-24721-0 16 [Cou87] B. Courcelle. An Axiomatic Definition of Context-free Rewriting and its Application to NLC rewriting. Theoretical Computer Science 55(2-3):141–181, 1987. doi:10.1016/0304-3975(87)90102-2 [DHJ+06] F. Drewes, B. Hoffmann, D. Janssens, M. Minas, N. V. Eetvelde. Adaptive Star Grammars. In Corradini et al. (eds.), 3rd Int’l Conference on Graph Transformation (ICGT’06). Lecture Notes in Computer Science 4178, pp. 77–91. Springer, 2006. doi:10.1007/11841883 7 [DHJM10] F. Drewes, B. Hoffmann, D. Janssens, M. Minas. Adaptive Star Grammars and Their Languages. Theoretical Computer Science 411(34-36):3090–3109, 2010. doi:10.1016/j.tcs.2010.04.038 [DHM08] F. Drewes, B. Hoffmann, M. Minas. Adaptive Star Grammars for Graph Mod- els. In Ehrig et al. (eds.), 4th International Conference on Graph Transformation (ICGT’08). Lecture Notes in Computer Science 5214, pp. 201–216. Springer, 2008. doi:10.1007/978-3-540-87405-8 30 19 / 20 Volume 39 (2011) http://dx.doi.org/10.1007/978-3-540-24721-0_16 http://dx.doi.org/10.1016/0304-3975(87)90102-2 http://dx.doi.org/10.1007/11841883_7 http://dx.doi.org/10.1016/j.tcs.2010.04.038 http://dx.doi.org/10.1007/978-3-540-87405-8_30 Generating Instance Graphs from Class Diagrams with Adaptive Star Grammars [EKT09] K. Ehrig, J. M. Küster, G. Taentzer. Generating Instance Models from Meta Models. Software and System Modeling 8(4):479–500, 2009. doi:10.1007/s10270-008-0095-y [Eng99] J. Engelfriet. Context-Free Graph Grammars. In Rozenberg and Salomaa (eds.), Handbook of Formal Languages. Volume 3: Beyond Words, chapter 3, pp. 125–213. Springer, 1999. [HM10] B. Hoffmann, M. Minas. Defining Models – Meta Models versus Graph Grammars. Elect. Comm. of the EASST 29, 2010. Proc. 6th Workshop on Graph Transformation and Visual Modeling Techniques (GT-VMT’10), Paphos, Cyprus. http://www.easst.org/eceasst/ [HP09] A. Habel, K.-H. Pennemann. Correctness of high-level transformation systems rela- tive to nested conditions. Mathematical Structures in Computer Science 19(2):245– 296, 2009. doi:10.1017/S0960129508007202 [HR10] A. Habel, H. Radke. Expressiveness of graph conditions with variables. Elect. Comm. of the EASST 30, 2010. International Colloquium on Graph and Model Transformation (GraMoT’10). http://www.easst.org/eceasst/ [OMG] OMG. OMG Unified Modeling Language (OMG UML), Infrastructure, Version 2.2. OMG Document Number: formal/2009-02-04. http://www.omg.org/spec/UML/2.2/Infrastructure [SWZ99] A. Schürr, A. Winter, A. Zündorf. The PROGRES Approach: Language and Envi- ronment. In Engels et al. (eds.), Handbook of Graph Grammars and Computing by Graph Transformation. Vol. II: Applications, Languages, and Tools. Chapter 13, pp. 487–550. World Scientific, Singapore, 1999. GCM 2010 20 / 20 http://dx.doi.org/10.1007/s10270-008-0095-y http://www.easst.org/eceasst/ http://dx.doi.org/10.1017/S0960129508007202 http://www.easst.org/eceasst/ http://www.omg.org/spec/UML/2.2/Infrastructure Introduction Adaptive Star Grammars Class Diagrams An Adaptive Star Grammar for Instance Graphs Non-Unique Regular Associations Unique Regular Associations Composite Associations Class Diagrams with Redefining Associations Class Diagrams with Association Subsetting Class Diagrams with Empty Graph Languages Example Conclusions