Electronic Communications of the EASST Volume 3 (2006) Proceedings of the Third Workshop on Software Evolution through Transformations: Embracing the Chance (SeTra 2006) Refactoring Information Systems Michael Löwe, Harald König, Michael Peters, and Christoph Schulz 17 Pages Guest Editors: Jean-Marie Favre, Reiko Heckel, Tom Mens Managing Editors: Tiziana Margaria, Julia Padberg, Gabriele Taentzer ECEASST Home Page: http://www.easst.org/eceasst ISSN 1863-2122 ECEASST Refactoring Information Systems - Handling Partial Compositions - Michael Löwe, Harald König, Michael Peters, and Christoph Schulz FHDW Hannover, Freundallee 15 D-30173 Hannover, Germany Abstract: We present our formal framework for the refactoring of complete information systems, i.e., the data model and the data itself. It is described using general and abstract notions of category theory and can handle addition, renaming and removal of model objects as well as folding and unfolding within complete and partial object compositions. Keywords: Refactoring, Migration, Graph transformation, Pullback complement 1. Introduction The only constant thing is change. This is especially true for the information and communica- tion business. Currently, information systems in many companies are subject to change. This is mainly due to the technological progress connected to the Internet which enables completely new sorts of electronic business. Thus, we see big efforts to re-engineer the technical basis on the one hand and to improve the business processes and information models on the other hand [1]. This development has been reflected in the research and development community in the last years. Agile and Extreme Programming Techniques [2] [3] [4] aim at supporting the ongoing reengineering processes by providing refactoring methods, techniques, patterns [5] [6] and tools [7]. These tools enable consistent global changes of a whole software system, for example to introduce some design patterns which are necessary for the system to take the next evolution step. This puts the flexibility into the development process that is needed to keep a system up- to-date (without any over-specifications at the beginning of the development) and to realize changing requirements quickly. For the time being, agile techniques in database engineering were often restricted to the im- provement and change of model artifacts. The main obstacle for agile techniques here is existing data. Attempts to describe semantics-preserving schema transformations that also migrate data can be found in [8] [9]. A transformational approach that considers the instance level is discussed in [10]. If a model of a productive information system is changed, we are faced with one central question: “What shall we do with the data conforming to the old model?” Up to now, we hear two major answers: 2 / 17 Volume 3 (2006) Refactoring Information Systems 1. Leave the data as it is and map the new model to the old one using for example some object- relational-mapping tools [11].1 2. Migrate the data from the old model to the new one by crafting corresponding migration scripts and performing the (long-running) data migration at night or on the weekend. Both solutions possess big disadvantages. The first one leads to complex mappings if applied several times. This complexity is very likely to produce performance problems and reduce the development speed of the engineering team in the long run.2 The second solution requires long production breaks and consumes a lot of development and test time for software (migration scripts) that is thrown away after success. We propose another approach, namely the generation of the necessary data migration directly from model refactoring, compare also [12]. One central issue is the correctness of the induced migrations. We can only benefit from this approach if we can trust in the produced migrations without any further tests. Therefore, we present a theoretical framework in this paper, which 1. is able to represent models and instances in a uniform meta-model, 2. comes equipped with a suitable notion of model refactoring, 3. provides refactoring-induced correct transformations of the instances (migrations), and 4. proves its applicability by satisfying necessary and natural properties for refactorings and migrations, i.e., that refactoring can be composed in a natural way. The framework is built on category theory [13] and algebraic graph transformation [14]. In this theory, we not only have a very general notion of structured object. By the notion of morphism, we also get a natural way of representing (1) typings of instance objects in model objects as well as (2) model changes (refactorings) and instance migrations. Section 2 presents our current framework built on a double-pullback construction, which can handle addition, renaming, and removal of model objects as well as folding and unfolding within complete object compositions [15]. This framework is not able to handle inheritance structures directly. Section 3 provides a slight generalization: We do not longer require that the right-hand side of a migration is a pullback. Instead we re-use the explicit construction of the pullback complements in more general situations. It turns out that this construction enjoys some categorical properties that guarantee uniqueness up to isomorphism. Section 4 shows that the usual sequential composition of refactorings extends to migrations in the generalized framework as well. We sketch in section 5, how the results in this paper can be reformulated on a purely categorical level. We explicitly point out the similarities to the approach of Ehrig et al. using adhesive categories [14]. Section 6 provides a conclusion and contains hints for future research activities. 1 An older and worse version of this approach is: Leave the data-model and the corresponding data as it is and redefine the meaning of the data within the model, for example by using comma-separated multi-value fields in a single string column. 2 The longer this approach is applied, the bigger the problems to switch to the second one. Proc. SeTra 2006 3 / 17 ECEASST 2. Migration Framework For a motivation of the following theoretical aspects, consider the situation of a class Department. A possible refactoring would be to extract an abstract superclass Unit in order to be prepared for additional specializations [5]. If we interpret generalization as object composition on the instance level, an automated migration must add a Unit-Object to each Department-Object in a 1:1 fashion3. After the migration client objects no longer use a single Department-Object but a new object which contains the Unit-information as an aggregated object, see Fig. 1. Since Unit is an abstract class, we can model this refactoring by unfolding Department to two classes. This can be done by a morphism l that maps the new model N to the old model M assigning the two classes Department and Unit in N to Department in M. Having data D which is typed by the morphism t : D M we obviously can generate the migrated data by calculating the pullback object F of t and l4. Another possible refactoring is the addition of a new class [5]. This can be achieved with a (non-surjective) map r from the old to the new model, see Fig. 2. Here the question arises which categorical construct generates a reasonable migration. Moreover, different data structures are possible after the migration: one possibility would be to create no B-object, another to create a default-value or prototype object for B. Both solutions lead to pullback diagrams, if objects a1 and a2 are preserved. Category theory can be applied in the following way. We can express the typing of some data D in a model M by a morphism t : D M . And we need to express refactorings between models and migrations between typed data. We will have to use the two variations l and r discussed above. But we are not only interested in the model states before and after 3 This interpretation is often used when object models are mapped to relational database systems using the “one table per class”-strategy. This strategy provides one relational table for each class and maps each inheritance association to a foreign key relation from the special to the general class. 4 Later, we discuss in which category the construction is carried out. 4 / 17 Volume 3 (2006) Fig. 1: Extracting a superclass D M Unit Client Dept. cl2 dept1 cl1 Client Dept. l' l vt cl2 dept1 cl1 unit1 N F Fig. 2: Adding a new class r t' NM t A A B a1 a2 r' D “A reasonable object collection” Refactoring Information Systems refactoring but in the refactoring process itself. Hence, it is a good choice to represent one model refactoring from model M to N by a combination of the two variants, i.e., by a pair of morphisms: M  l K  r N . The pair (l, r) represents an arbitrary relation between M and N and can model: 1. Deletion of model objects, i.e., l is not surjective, 2. Addition of model objects, i.e., r is not surjective, 3. Renaming of model objects, i.e., l and r are bijective but not identities, 4. Splitting or unfolding of model objects, i.e., l is not injective, and 5. Gluing or folding of model objects, i.e., r is not injective. Given a typed database t : D M and a model refactoring M  l K  r N , we want to canonically construct the induced migration to some typed database u : E  N . As a first step, we can use the pullback construction of t and l, which shall result in a typed database v : F  K . For reasons of symmetry, we need to construct a pullback complement of v and r in the second step. Unfortunately, such a pullback complement is not guaranteed to exist nor need be unique if it exists (see Fig. 2). Even worse, there is no simple property for r that guarantees existence and uniqueness of the pullback complement. Some authors argue that r being epimorphism is sufficient, compare [16] or [17]. This is wrong as the following examples demonstrate. Example 1 (Ambiguous Pullback Complements). Consider the situation depicted in Fig. 3 in the usual category of graphs and graph morphisms. The epimorphism f and the morphism g do not possess a unique (up to isomorphism) pullback complement, since (g, f1*) is pullback of (f, g1*) and (g, f2*) is pullback of (f, g2*) but D1 and D2 are not isomorphic. □ In the category of sets and mappings, however, pullback complements seem to be uniquely determined. This is not (really) true, as is demonstrated by the following example. Proc. SeTra 2006 5 / 17 Fig. 3: Ambiguous Pullback Complement g 2 * 1D 1 3 2 g 1 2 1 ,3 1 2 2 ,3 2 f 2 * 2,3 11 32 B g 1 * 2 1 ,3 2 2 2 D 2 f 1 * 2 1 ,3 2 1 2 1 f 3 1 A C M NK D EF r u v r'l' l t (1) (2) ECEASST Example 2 (Ambiguous Pullback Complements in Set). Let g:{1,2,3,4}→{a,b} be given by g(1)=a, g(2)=a, g(3)=b, g(4)=b and f:{a,b}→{3} be the constant function as in Fig. 4. There are two pullback complements: 1. ({13,24}, f*1:{1,2,3,4}→{13,24}, g*1: {13,24}→{3}) with f*1(1)=13, f*1(2)=24, f*1(3)=13, and f*1(4)=24. 2. ({14,23}, f*2:{1,2,3,4}→{14,23}, g*2: {13,24}→{3}) with f*2(1)=14, f*2(2)=23, f*2(3)=23, and f*2(4)=14. Obviously, {13,24} and {14,23} are isomor- phic. But no isomorphism i:{13,24}→{14,23} translates f*1 to f*2, in the sense: i ○ f*1 = f*2. Hence, we have isomorphic pullback complement objects. But the induced morphisms are ambiguous since they cannot be compared by the existing isomorphisms. □ This type of ambiguity cannot be accepted in our context, since the morphisms represent the transition of the data from the old to the new model. There seems to be no chance to avoid this type of ambiguity, if we do not put additional requirements on the “vertical” morphisms g, g1* and g2*. These properties shall single out a unique choice for the pullback complement extension of g. These examples provide the motivation for the following definitions: Definition 3 (Graph). The category G of graphs is the algebraic category w. r. t. the signature: Sorts: O(bject) Opns: s(ource), t(arget): O → O. This is a simple form of graphs where we do not distinguish between nodes and edges. In such a graph, nodes can be characterized as objects n such that s(n) = n = t(n). Graphs and graph mor- phisms of this type provide more flexibility in the refactoring/migration context we are considering here, for example: if two nodes x and y are mapped to the same node z, it is possible that a morphism maps an edge e with s(e) = x and t(e) = y to z, too. E.g. in Fig. 1, l(Unit) = l(Dept.) = Dept. and the edge between them is mapped to the node Dept. as well. Definition 4 (Component Graph). A component graph g: G → G is a morphism in G. A component graph morphism α: (g: G → G) → (h: H → H) is a pair (α: G → H, α: G → H) such that the resulting square commutes, i.e., ° g=h ° . The comma category CG consists of all component graphs and all morphisms between them. If not otherwise stated, we just write g for a component graph g: G → G. Note that G is the underlying graph and g provides a decomposition of G into parts or components via the the 6 / 17 Volume 3 (2006) Fig. 4: Ambiguous pullback complement in SET f*1 a b 3 1 3 4 2 13 24 14 23 f g f*2 g*1 g*2 Refactoring Information Systems congruence kern(g)5. Thus for the carrier set G we have G= {[ x ]g : x ∈G} where []g denotes congruence classes of kern(g). We also note that congruence classes are not necessarily subgraphs of G as can easily be seen in component graphs id: G → G where G contains edges. The additional component structure on graphs provides means to distinguish typings from refactorings. In a typing, we require that all components are instantiated completely in a 1:1 manner. In a refactoring we allow identification of objects if and only if they belong to the same component. Hence, refactorings map components injectively and typings map objects within components bijectively. Note that CG has all limits and that pullbacks in CG can be constructed component-wise. Definition 5 (Typings, Refactorings, and Migrations). A typing t: g → h is a CG-morphism if for each x ∈G the mapping t :[ x]g [t x]h considered as a SET-morphism is bijective. A refactoring is a pair of morphisms m  l k  r n in CG such that l and r are injective. The morphisms l and r are called refactoring morphisms in this case. A refactoring m  l k  r n and a typing t : d  m induce a migration from typing t : d  m to typing u : e n , if there is a diagram as depicted to the right that satisfies: 1. (1) and (2) are pullbacks, and 2. r' is epimorphism. The proof of the following proposition is straightforward and relies on the fact, that pullbacks preserve monomorphisms and isomorphisms. Propositon 6 (Refactorings, Typings, and Pullbacks). If (n*: l → g, m*: l → k) is the pull- back of (n: k → h, m: g → h) in CG, then 1. m* is a refactoring if m is, 2. n* is a typing if n is, and 3. if n is injective on components, i.e., ∀ x , y∈K : n x=n y∧k x =k  y⇒ x= y , then the same property holds for n*, i.e., ∀ x , y∈ L :n *  x=n *  y∧l x=l  y⇒ x= y Proposition 7 (Existence and Uniqueness of Migrations). Let m  l k  r n be a refactoring and let t: d → m be a typing. If r: K → N is an epimorphism, then: 1. There is a migration as defined in definition 5 and 2. The result of the migration is uniquely determined (up to isomorphism). 5 The relation kern(f) denotes the congruence that the morphism f induces on its domain, i.e., (x, y) ∈ kern(f) iff f(x) = f(y). Proc. SeTra 2006 7 / 17 m k n d f e l r t v(1) (2) u l' r' ECEASST Proof. Subdiagram (1) can be constructed as a pullback. Thus (F, v, l') are unique up to isomor- phism. The morphism v is a typing due to proposition 6. Having a typing v and a refactoring morphism r, we construct diagram (2), i.e., (E, r', u), as follows and depicted in Fig. 5: If the component graph f is the morphism f: F → F, then 1. r' is the identity on F, 2. E=F /≡ where ≡=kern  f  ∩ kernr ° v 3. r ' =[]≡ , 4. u = r °v , 5. u is the unique morphism providing u °r ' =r° v which exists since kern r° v⊇≡ , and 6. component graph e: E → E is the morphism with e° r ' = f which exists since kern  f ⊇≡ . By construction u°r ' =r° v and r' is epimorphism. Since kern r ' =kernr ° v on each component and r is an epimorphism, u is bijective on components and thus a typing. And it is easy to show that (v, r') is pullback of (r, u) in SET and therefore in CG: if there is o such that r(x(o)) = u(y(o)), then choose o ' = r '−1  yo ∩ v−1 xo . This is unique, since v is bijective on components and, by construction, r' folds on components only. This completes the proof of the first statement. To prove the second statement, let (r*: F → E', u*: E' → N) be any other completion with the required properties. It is easy to see, that the two pullback situations project to pullback situations in SET on each component. Here we have u * ° r *=u° r ' where u* and u are bijective. Hence kern(r*) = kern(r') on each component of F. Because r is a refactoring, so are r' and r* (see proposition 6) such that this property holds throughout F. Thus E= ~ E ' . □ Although the framework presented so far allows copying and gluing of objects within the same component only, it provides some nice features for our purposes of information system refactor- ing, as the following example demonstrates. Example 8 (Association Redirection). In Fig. 1 we showed how to introduce a superclass Unit. Subsequently, one needs to check the references to Department- objects and redirect them to Unit- objects if necessary. To do this, consider the model refactoring in 8 / 17 Volume 3 (2006) Fig. 6: Redirection of associations 1 2 4 5 6 3 7 1 2,4,5 6 3 7 1,2,6 4 5 3 7 rl Fig. 5: Constructing the right side of a refactoring K N K N F F E Er' r' r r f e vv u u k n Refactoring Information Systems Fig. 66: All three graphs have 3 components; the non-trivial component in each graph (the com- ponent that has more than one element) is highlighted. Using this refactoring in a migration redirects all associations of type “7” from the source of “6” to the target of “6”. It uses an intermediate vertex “2”, that is introduced by the left-hand side l as an unfolding and removed again by the right-hand side r by a corresponding folding. This example shows, that we are able to redirect association sources and targets as long as we stay in the same component. □ With these features, we should be able to handle all refactorings that are concerned with inheritance structures. Recall, that inheritance can be considered as some sort of static com- position between objects: an object of class c can be considered to be composed of a set of (sub-)objects, namely one object for each direct or indirect ancestor class c' of c. All these objects are created at the same time the most special object is created. And they are also destroyed at the same time. Hence, we can model them as explicit parts in a component graph on the instance level in our framework . But these components are not components in the sense of typings (Def. 5). It is not the case, that the complete inheritance tree of classes needs to be instantiated, if one class is. If there are concrete classes that possess subclasses, an object might instantiate a proper subpart of the com- plete inheritance tree of its class, only. Our approach is not able to handle those incomplete parts, since pullback complements do not always exist in these situations. Example 9 (Missing Pullback Complement). Consider the reverse process as in example 8. An association to class “1,2,6” (a concrete superclass of “4”) shall be redirected to its subclass, see Fig. 7. We apply this rule to an instance of “1,2,6”, called “1,2,6' ”. The pullback on the left produces an intermediate object “2' ” in F. But we can easily deduce, that the right part is not able to complete the diagram to a double-pullback situation. This is mainly due to the fact that the non-trivial component in K is only partially instantiated in F (there is no “4”-object). For suppose, that such a pullback complement r': F → E, u: E → N exists. Then r'(2') is a preimage of “2,4,5” under u. This is only possible if there is a preimage of “4” in F. □ 6 We indicate the model objects numerically to clarify the mappings. Proc. SeTra 2006 9 / 17 Fig. 7: Missing pullback complement F 1 2 4 5 6 3 7 1 2,4,5 6 3 7 r 1' 2' 6' 3' 7' r v 1,2,6 4 5 3 7 l 1,2,6' 3' 7' t l' M K N D ? ECEASST We might use a trick to handle inheritance. We always instantiate complete inheritance graphs, when an object is created and keep the information about the most special real object in the resulting part (of real and extra objects). Then we distinguish two views on the system: (1) the refactoring perspective and (2) the operational perspective. In the first perspective, all objects are visible and our framework is applicable. The second perspective blends out all extra objects in order to keep the system's state consistent from the operational point of view.7 With these additional arrangements, inheritance structures and the typical refactorings could be modeled. But there are also disadvantages of this approach: the additional instantiations might cause a significant memory overhead. In this paper, we use a different approach, which omits this problem: In the next section, we slightly generalize our framework such that partial instantiations of components in the model are allowed. To achieve that, we do no longer require that the right-hand side of a migration is a pullback. 3. Partial Instantiation of Components In this section, we relax the requirements for typings. Weak typings allow partial instantiations of model components, since they are injective on each component but need not be surjective. Definition 10 (Weak Typing). A component graph morphism α: (g: G → G) → (h: H → H) is a weak typing if it is injective on each component, i.e.,  x =  y ∧ g x = g  y ⇒ x = y . Now we use the construction in the proof of proposition 7 to construct the right-hand side of a migration. This works for weak typings as well. Construction 11 (Folding). Consider Fig. 8, where weak typing n and refactoring morphism m are given. We construct the folding completion of this situation as follows: 1. m* = ([]≡, idG), where ≡ = kerng  ∩ kernm° n , 2. n* = (i, m°n ), where i is the unique morphism with i °[]≡ = m° n , since kern m° n⊇≡ , and 3. the component graph j is the unique morphism with j °[]≡ = id ° g , since kern g ⊇≡ . m* is a CG-morphism by construction. Moreover, we obtain k ° i = m° n° j , since []≡ is epi and k °i °[]≡ = k °m° n = m° n ° g = m° n° j °[]≡ . Thus, n* is a CG-morphism, too. □ Lemma 12 (Folding). If (m*, n*) is folding of (m, n), m* is refactoring morphism and n* is typing. 7 Note that the model is stable under the operational perspective! 10 / 17 Volume 3 (2006) Fig. 8: Construction of a Folding H H K G G G G/≡ K m id []≡ m k j h g n n i m ○ n Refactoring Information Systems Proof. The first part is obvious, since m * has been constructed as the identity which is a mono. For the proof of the second statement let i x = i  y and j x = j  y . Now consider arbitrary preimages x' and y' for x and y wrt. []≡, i.e., [x']≡ = x and [y']≡ = y. Since j °[]≡ = id ° g , we conclude g(x') = g(y'). Since i °[]≡ = m° n , it follows that m(n(x')) = m(n(y')). Thus, x ' , y ' ∈ kern g ∩ kernm° n , which means that [x']≡ = [y']≡. Hence, x = y. □ Folding diagrams possess an interesting universal property as the following proposition shows. Proposition 13 (Initiality of Foldings). Let the pair of morphisms (m*: g → f, n*: f → k) be the folding of a weak typing n: g → h and a refactoring morphism m: h → k as it is constructed in construction 11. Then for every triple of morphisms (w: g → b, t: b → a, v: k → a) such that t is weak typing and t ° w = v°m°n , there is a unique morphism u: f → b with t °u = v° n * and u° m * = w . Proof. Let the folding be given as in Fig. 8. We set u = w and get immediately (1) u° m * = u ° id = u = w . We show that ≡ ⊆ kernw . Let m(n(x)) = m(n(y)) and g(x) = g(y). It follows t(w(x)) = v(m(n(x))) = v(m(n(y))) = t(w(y)) and b(w(x)) = w(g(x)) = w(g(y)) = b(w(y)). Since t is weak typing, we get w(x) = w(y) as desired. Now there is a unique u: G/≡ → B with (2) u °[]≡ = u °m * = w . Since b° u °[]≡ = b°w = w ° g = w ° j °[]≡ = u° j °[]≡ , we can conclude (3) b° u = u ° j . And t °u °[]≡ = t °w = v °m° n = v° i °[]≡ provides (4) t °u = v°i = v° n * . Finally, we also have (5) t ° u = v° m° n = v° n * . □ Proposition 13 characterizes foldings up to isomorphism. In the following, we say that a diagram is an abstract folding if it has the property of proposition 13: Definition 14 (Abstract Folding). As depicted in Fig. 9, a pair (m*: g → f, n*: f → k) consisting of a refactoring morphism m* and a weak typing n* is the abstract folding of a weak typing n: g → h and a refactoring morphism m: h → k if (1) n *°m * = m° n and (2) for every triple (w: g → b, t: b → a, v: k → a) such that t is weak typing and t ° w = v°m°n , there is a unique morphism u: f → b with t ° u = v°n * and u° m * = w . Corollary 15 (Uniqueness of Abstract Foldings). Two abstract foldings of a weak typing n: g → h and a refactoring morphism m: h → k coincide up to isomorphism. Hence (m*: g → f, n*: f → k) is the abstract folding if and only if the statement m* is epimorphism and m * x=m *  y⇔ g x=g  y∧mnx=mn y holds. Proof. Direct consequence of Definition 14 and the fact that the first folding compares to the second and vice versa. Therefore, we get two morphisms between the two foldings, which must Proc. SeTra 2006 11 / 17 Fig. 9: Abstract Folding h k a g f b m v n m* n* t w u! ECEASST be inverse morphisms, because their composition coincide with the identity on the folding objects (unique morphism from a folding to itself). □ Abstract foldings enjoy the same composition and decomposition properties as pushouts or pullbacks. Proposition 16 (Composition and Decomposition of Abstract Foldings). Consider the situation depicted in the diagram below8. 1. If the squares (1) and (2) are abstract foldings, then the rectangle (1) + (2) is an abstract folding.9 2. If the rectangle (1) + (2) and the square (1) are abstract foldings, then (2) is an abstract folding. 3. If typing e and refactoring h is the abstract folding of typing c and refactoring b° a , it can be decomposed into two foldings as in the diagram on the right, where h = g° f , if the underlying category has all abstract foldings. Proof. (1) Let morphisms v, t, w, be given such that t is typing and t °w = v° b°a °c . Since (1) is abstract folding, we get u1 such that u1 ° f = w and t ° u1 = v °b° d . Now u1, t and v compare to (2) and we get u2 with u2 ° g = u1 and t °u2 = v°e . Substituting u2 ° g = u1 in u1 ° f = w provides u2 ° g ° f = w . For the proof of uniqueness, let morphism u3 be given such that u3 ° g ° f = w and t ° u3 = v °e . Then u3 ° g ° f = u 2°g ° f and t °u3 °g = v° b° d = t °u 2° g hold. We obtain u3 ° g = u2 ° g , since (1) is abstract folding. But this implies u3 = u2, since (2) is abstract folding. (2) Let v, t, w, be given such that t is typing and t °w = v°b° d . It follows t °w° f = v °b°a° c . Since (1)+(2) is abstract folding, there is u such that u ° g ° f = w° f and t ° u = v°e . We also have t ° u ° g = v °e° g = v °b°d . Since (1) is abstract folding, we get u° g = w . Uniqueness follows from the uniqueness of u for (1)+(2). (3) If there are all abstract foldings, we can construct (d, f) as a folding, which provides diagram (1). The morphism g is obtained as the unique completion of the diagram from the folding (1). That diagram (2) is an abstract folding follows from (2) of this proposition. □ Definition 17 (Generalized Migration). A refactoring m  l k  r n and a weak typing t : d  m induce a generalized migration from t : d  m to weak typing u : e n , if there is a diagram as depicted to the right that satisfies: 1. Subdiagram (1) is pullback and 2. Subdiagram (2) is abstract folding. 8 Here, for the sake of readability, CG-objects are presented in capital letters. 9 (1)+(2) consists of the morphisms b° a , e , g ° f , and c . 12 / 17 Volume 3 (2006) A B D Ef a c d C Fg b e(1) (2) m k n d f e l r t v(1) (2) u l' r' Refactoring Information Systems Theorem 18 (Existence and Uniqueness of Generalized Migrations). Given a weak typing t : D M and refactoring M  l K  r N for the model of t, there is an induced migration and the result of the migration is unique up to isomorphism. Proof. Direct consequence of (1) the existence and uniqueness of pullbacks in CG, (2) the fact that pullbacks in CG preserve weak typings (proposition 6, 3.), and (3) the existence and uniqueness of abstract foldings in CG (Construction 11 and Corollary 15). □ Theorem 18 justifies that we write R(t) for the result typing of a migration from a typing t : D M using a refactoring R = M  l K  r N . Fig. 10 shows the generalized migration that we searched for in Fig. 7. 4. Sequential Composition In this section, we show that there is a natural sequential composition R2 ° R1 of refactorings R1 and R2 and that applying a sequential composition to a weak typing t provides exactly the same result as the sequence of first applying R1 to t and second R2 to R1(t), i.e., R2 ° R 1t =R 2R 1t . Definition 19 (Sequential Composition of refactorings). The sequential composition R2 °R1=l1 ° p1 : J  M , r 2° p2 : J  P of two refactorings R1=l1 : K  M , r1 : K  N  and R2=l 2 : H  N , r2 : H  P is defined with the help of the pullback object  p1 : J  K , p2 : J  H  of r1 and l2 as depicted in Fig. 11. Note that the sequential composition is well-defined due to proposition 6, 1. and the fact that the composition of two refactoring morphisms is a refactoring morphism again. In order to prove our main theorem, i.e., R2 °R1t=R 2R 1t , we need the following technical lemma. Lemma 20 (Pullback Cubes Preserve Abstract Foldings). Consider the commuting diagram in CG below10. If the pair of morphisms (i, q) is the abstract folding of the morphism pair (r, m), 10 Here, we depict CG-objects as arrows with filled tip. Proc. SeTra 2006 13 / 17 Fig. 11: Sequential composition M N K r1 l 1 PH r 2 l 2 J p 2 p 1 (PB) Fig. 10: A generalized migration 1 2 4 5 6 3 7 1 2,4,5 6 3 7 r 1' 2' 6' 3' 7' r u 1,2,6 4 5 3 7 l 1,2,6' 3' 7' t l' M K N D 1' 2' 6' 3' 7'r' v F E ECEASST the pair (p, t) is the pullback of the pair (s, q), and (j, v) is the pullback of (t, i), then the pair of morphisms (j, p) is the abstract folding of (n, k). Proof. The assumptions of the lemma provide that i is the identity. Since pullbacks preserve isomorphisms, we can set j = id without loss of generality. Because the bottom face is a pullback and i is an epimorphism (see Corollary 15), j is an epimorphism as well. Again, from Corollary 15 we deduce that it suffices to show, that j  x = j y ⇔ [ k n x = k n y ∧ g  x = g y] holds for all x , y ∈ G . “⇒”: (1) j  x = j y ⇒ p j  x = p j  y ⇒ k nx = k n y (2) j  x = j y ⇒ c j  x = c j  y ⇒ id g x = id g y ⇒ g  x = g y “⇐”: Let k n x = kn  y ∧ g  x = g  y be given. Since (p, t) is pullback, it is sufficient to show: (3) t  j x = t  j  y ∧ (4) p  j x = p j  y: (3) (a) g x = g y ⇒ vg  x = v g y ⇒ h v x = hv y . (b) k n x = kn  y ⇒ sk n x = skn  y ⇒ mrvx = mr v y . Since (q, i) is abstract folding, it follows from Corollary 15, (a) and (b) that i v x = i v y , which provides t  j x = t  j  y , because t ° j = i °v . (4) k n  x = kn  y ⇒ p j x = p  j y . □ Theorem 21 (Sequential Composition). If R2 R1t  for two refactorings R1 and R2 is defined, we have R2 °R1t=R 2R 1t . 14 / 17 Volume 3 (2006) Fig. 12: Pullback Cube s b c a d f g e h k j id i m id n p q r t u v Refactoring Information Systems Proof. Consider the following diagram, which depicts R2 R1t  . This migration sequence is given by the four squares (1) MKFD, (2) KNEF, (3) NHCE, and (4) HPBC. (1) and (3) are pullbacks and (2) and (4) are abstract foldings. The additional material in the diagram is defined as follows: The pair of morphisms (p1, p2) is given as a pullback of r1 and l2, compare construction of R2 °R1 in definition 19. We write (5) for the resulting square NKJH. We construct (p1',p2') as pullback of (r1',l2'). We write (6) for the resulting square EFIC. The mor- phism u2 is the universal completion of the diagram into the pullback object J. Now, the square (7) KJIF is pullback as well. This is due to the fact that (3)+(6) is pullback11, (3)+(6) = (5)+(7), and (5) is pullback12. The square (8) JICH is abstract folding due to Lemma 20. Now diagram (1)+(7) is pullback, since pullbacks compose. It is the left-hand side of the migration induced by R2 °R 1 . Diagram (8)+(4) is abstract folding, since abstract foldings compose (compare Proposition 16, 1.). It is the right- hand side of the migration induced by R2 °R1 . This together shows that R2 °R1 migrates t to w as well. □ With Theorem 21 we are, on the one hand, able to compose long refactoring sequences into one single refactoring, which can capture the effect of the whole sequence. On the other hand, we can decompose complex refactorings into a composition of simpler ones. 5. General Framework The whole approach presented above is almost independent from the underlying category of graphs resp. component graphs. What we need for the existence and uniqueness of migrations is the existence of pullbacks and abstract foldings. For the results concerning sequential composition, we need the cube lemma 20, i.e., that pullbacks “pull back” abstract foldings. Thus, we can present our requirements for a category to provide the infrastructure for unique migrations and sequential compositions as follows: An abstract migration framework is a category C together with two subcategories T and R which have the same objects as C. The morphisms in T are called typings and the morphisms in R are called refactoring morphisms. The system (C, T, R) is subject to the following requirements: (1) C has all pullbacks 11 Composition property of pullbacks. 12 Decomposition property of pullbacks. Proc. SeTra 2006 15 / 17 Fig. 13: Migration sequence J M K N P H l 1 p 1 p 2 r 1 l 2 r 2 D F I E B C l 1 ' p 1 ' p 2 ' r 1 ' l 2 ' r 2 ' t v1 u 2 u 1 v 2 w ECEASST (2) C has abstract foldings for all pairs of morphisms (f: A → B ∈ T, g: B → C ∈ R). (3) Pullbacks in C preserve morphisms of T and of R. (4) In each cube with corners K, N, H, J, F, E, C, and I, as it is depicted in Fig. 13, the square JICH is an abstract folding if KNEF is abstract folding and the squares IFEC and NECH are pullbacks. Since abstract foldings are a generalization of surjective pullback complements13, the framework presented in section 2 fits into this setting as well. Another instance is given by simple graphs, arbitrary morphisms as typings and injective morphisms as refactoring morphisms. Here we can use surjective pullback complements as abstract foldings as well [15]. 6. Conclusion We propose formalizations of aspects in the process of refactoring information systems. The power of our attempt is that a model refactoring can uniquely and automatically be extended to the instance level. In contrast to other more practical solutions, we can prove correctness of our approach. The framework is described using abstract notions from category theory. With a strong assumption to the typing morphisms we can generalize a migration to a double- pullback diagram. As a first step, it is possible to handle addition, renaming, and removal of model objects. The investigation under which conditions folding and unfolding is possible, leads to a model structure where one had to restrict to 1:1 associations on certain components. A refactoring morphism may fold or unfold on these components, only. In a second step we showed that these settings are correct as well. However, object trees of inheritance structures are, in general, not completely instantiated. To treat this case in a similar way, we have to weaken the assumptions on the type mappings. But weak typings do not always lead to double-pullback constructions. Thus, this third step requires a generalization of pullback complements. We introduced abstract foldings that enjoy some of the well-known properties of pullbacks and pushouts. Abstract foldings are initial in a reasonable context, which reveales a uniqueness statement of generalized migrations and prepares a statement on the composition of refactorings. Composing migrations into larger projects and decomposing migrations into smaller steps leads to the question if there is a minimal set of atomic refactorings, from which each refactoring can be constructed by sequential composition. This might be an interesting topic for future research as well as the question, under which conditions refactorings are parallel or sequential independent and can be performed concurrently. These results are valuable for tools that produce migrations on the basis of the construction of pullbacks and abstract foldings. Finally, in a forth step, we describe a way of integrating refactoring and migration procedures in a more general framework that abstracts away from the underlying category. We define require- ments that are the basis for a generalized system. These requirements are very similar to the axioms for adhesive categories in [14]. It is up to future research to investigate if both frameworks can be seen as two instances of an even more general system. 13 pullback complements such that the morphism into the complement is surjective 16 / 17 Volume 3 (2006) Refactoring Information Systems References 1 Havey, M.: Essential Business Process Modeling. O'Reilly (2005) 2 Martin, R. C.: Agile Software Development, Principles, Patterns, and Practices. Prentice Hall (2002) 3 Beck, K.: Extreme Programming Explained. Addison Wesley (2000) 4 Beck, K.: Test-driven Development by Example. Addison-Wesley (2002) 5 Fowler, M.: Refactoring: Improving the Design of Existing Code. Addison-Wesley (1999) 6 Kerievsky, J.: Refactoring to Patterns. Addison-Wesley (2004) 7 D’Anjou, J et al: The Java Developer’s Guide to Eclipse. Addison-Wesley (2005) 8 Ambler, S. W.: Agile Database Techniques. Wiley (2003) 9 Ambler, S. W.: Refactoring Databases : Evolutionary Database Design. Addison-Wesley (2006) 10 Hainaut, J.-L.: Introduction to database reverse engineering. LIBD Publish. (2002) 11 Bauer, Ch., King, G.: Hibernate in Action. Manning Publications (2004) 12 Löwe, M.: Evolution Patterns – A Graphical Framework for Software Redesign. Proceedings ISAS'99 (1999) 13 Adamek, J., Herrlich, H., Strecker G. E.: Abstract and Concrete Categories – The Joy of Cats. (2004) [http://katmat.math.uni-bremen.de/acc/acc.pdf] 14 Ehrig, H., Ehrig, K., Prange, U., Taentzer, G.: Fundamentals of Algebraic Graph Transformation. Springer (2006) 15 Löwe, M., König, H., Peters, M., Schulz, Ch.: A Formal Framework for Information System Refactorization. Proceedings WMSCI 2006, Vol. 1, 75-80 (2006) 16 Meisen, J.: Pullbacks in Regular Categories. Canad. Math. Bull. Vol.16(2) (1973) 17 Bauderon, M., Jacquet, H.: Pullback as a generic graph rewriting mechanism. Applied Categorical Structures Vol.9(1) (2001) Proc. SeTra 2006 17 / 17 1.Introduction 2.Migration Framework 3.Partial Instantiation of Components 4.Sequential Composition 5.General Framework 6.Conclusion