Edit languages for information trees Electronic Communications of the EASST Volume 57 (2013) Proceedings of the Second International Workshop on Bidirectional Transformations (BX 2013) Edit languages for information trees Martin Hofmann and Benjamin C. Pierce and Daniel Wagner 14 pages Guest Editors: Perdita Stevens, James F. Terwilliger Managing Editors: Tiziana Margaria, Julia Padberg, Gabriele Taentzer ECEASST Home Page: http://www.easst.org/eceasst/ ISSN 1863-2122 http://www.easst.org/eceasst/ ECEASST Edit languages for information trees Martin Hofmann and Benjamin C. Pierce and Daniel Wagner Ludwig-Maximilians-Universität and University of Pennsylvania Abstract: We consider a simple set of edit operations for unordered, edge-labeled trees, called information trees by Dal Zilio et al [DLM04]. We define tree languages using the sheaves automata from [FPS07] which in turn are based on [DLM04] and provide an algorithm for deciding whether a complex edit preserves membership in a tree language. This allows us to view sheaves automata and subsets of tree edits as edit languages in the sense of [HPW12]. They can then be used to instantiate the framework of edit lenses between such languages and model concrete examples such as synchronisation between different file systems or address directories. Keywords: Bidirectional programming, lens, edit, information tree 1 Introduction Semantic models of bidirectional transformations are generally presented as transformations be- tween the states of replicas. For example, the familiar framework of asymmetric lenses defines a lens between replicas of types A and B as a pair of a get function from A to B and a put func- tion from A×B to A. An implementation directly based on this semantics would pass to the put function the entire states of the original A and updated B replicas. Though pleasingly simple, this treatment falls short of telling a full story in at least two impor- tant ways. First, it does not explain how the lens should align the parts of the old A and the new B so that the parts of A that are “hidden” in the B view retain their positions in the result. For example, if A and B are both lists of people where each element of A includes a name, address, and email while the corresponding elements of B give only a name and address, the user will rea- sonably expect that inserting a new element at the head of the B replica will lead to an updated A replica where each existing name keeps its associated email. And second, the simple “classical” form of asymmetric lenses fails to capture the reasonable expectation that a small change to the B replica can be transformed into a change to the A replica using time and space proportional to the size of the change, not to the sizes of the replicas. One way to address at least the first concern is to enrich the basic structure of a state-based lens with a new input to the put function, a data structure that explicitly represents the align- ment between the original and updated B replicas; this idea forms the basis for dictionary lenses [BFP+08], matching lenses [BCF+10], and symmetric [DXC+11b] and asymmetric delta lenses [DXC11a] (based on [Ste07]). Another approach is to annotate the B structures themselves with change information [XLH+07, HHI+10]. However, all these approaches still involve whole replica states, either as explicit inputs and outputs of the put function or implicitly as part of the type to which a delta belongs. Thus, it is not clear whether these models can be implemented in such a way that small changes to a replica are propagated in time and space proportional to the size of the change. 1 / 14 Volume 57 (2013) Edit languages for information trees In earlier work [HPW12], we proposed going a step further and defining lenses that work exclusively with edits. We defined a semantic model called edit lenses in which the sets of source and target replicas A and B are enriched with monoids of edits and a lens’s get and put functions map edits to edits. This work was carried out in an abstract algebraic setting where the actual data structures being transformed and the exact shapes of edits to them were left unspecified; in this setting, we showed how a number of familiar constructions on lenses—products, sums, etc.—could be carried out. The present paper takes a first step toward instantiating this abstract semantic model with concrete data structures and a concrete notion of edits. The data model we choose is a very common and expressive one: unordered, edge-labeled trees—called information trees by Dal Zilio et al [DLM04]. These can encode a variety of data formats, including XML-style trees, their original application. Such trees can also be used to represent graphs [BDHS96, HHI+10] by unrolling up to bisimulation. This paper thus makes a first step towards general edit languages for trees. Our edit operations include, in particular, insertion and deletion of subtrees and renaming of edges; we show how these give rise to edit languages on tree languages and can thus be used to instantiate the framework of edit lenses so as to yield bidirectional synchronization between information trees. Our main technical result is that weakest preconditions of our edits can be effectively com- puted for sets of information trees specified by sheaves automata. This allows for effective “type checking” of edits and thus permits automatic checks that an edit language for trees presented as sequences of atomic edits does indeed preserve acceptance by a given sheaves automaton. The present work is thus a first step; we hope that the introduction of tree automata into the world of editing and synchronization will also lead to high-level support for constructing edit lenses themselves and for checking their soundness, but this remains future work. 2 Examples As a running example, we will use a simplified model of a filesystem. In our filesystem, there are three kinds of objects: directories, files, and bit strings. In our simple model, we will identify files by a name that starts with a lower case letter; files will contain bit strings. For example, here’s a simple file named “greeting”: {|greeting 7→0110100001101001|}. We will identify directories by a name that starts with an upper case letter; directories contain filesystems. A filesystem is a collection of files and directories. For example, here is a simple filesystem: {|Etc 7→{|passwd 7→0110 ,alternatives 7→010|} , foo 7→1010110|} |} Call this example filesystem E. Proc. BX 2013 2 / 14 ECEASST In addition to modeling the current state of a file system, we will want to model file system operations like creating, deleting, and moving files and directories. To do so, we will define a collection of edits and an operation that applies edits to filesystems. To create new files and directories at the top level, we define the insert operation, which takes a filesystem to merge into the existing one. For example, to create a “/home/john” directory (with no files or directories inside) in our example filesystem, we would use: insert({|Home 7→{|John 7→{||}|}|})·E = {| Etc 7→{|passwd 7→0110 ,alternatives 7→010|} , Home 7→{|John 7→{||}|} , foo 7→1010110|} |} For simplicity, we will say that any name clashes result in failure; for example, trying to apply the edit insert({|Etc 7→{||}|}) to E would be undefined. To allow the creation of files in other places than at the root, we will define another edit operation, at(n,e), which applies edit e to the subpart of the filesystem rooted at n. For example, to add the “/etc/dbus” file to our example filesystem, we would at(Etc,insert({|dbus 7→10101|}))·E which would result in {| Etc 7→{|passwd 7→0110 ,alternatives 7→010 ,dbus 7→10101|} , Home 7→{|John 7→{||}|}|} , foo 7→1010110|} |}. To support moving files and directories around the filesystem, we define two more kinds of edits, hoist and plunge, which float and sink entire subtrees one level. For example, if we wanted to move the file “/foo” into “/home/john”, we might first move it into “/home” with plunge(Home,foo), then move it into “/home/john” with at(Home,plunge(John,foo)). To round out our operations, we also define delete and rename operations, whose behaviors are predictable from their names. Besides renaming files and directories in-place, rename has a more subtle use. Consider the following filesystem: {|Bin 7→{|cat 7→0000|} , Usr 7→{||} , Opt 7→{|Bin 7→{|coq 7→1111|}|}|} Call this filesystem E′. Now, we decide we would like to move the “/opt/bin” directory into “/usr”. Naively, we would like to hoist “/opt/bin” to “/bin”, then plunge “/bin” to “/usr/bin”; however, the first hoist operation would result in a name clash with the existing “/bin”. One way 3 / 14 Volume 57 (2013) Edit languages for information trees we can avoid this is to temporarily rename Bin to a secret name which definitely will not clash: at(Usr,rename(#Bin,Bin))· plunge(Usr,#Bin)·hoist(Opt,#Bin)· at(Opt,rename(Bin,#Bin))·E′ This sequence of edits is now quite robust: it will work in any filesystem that has both “/opt/bin” and “/usr” but no “/usr/bin”. On the other hand, notice that this sequence of edits temporarily breaks some of the invariants of filesystems: for a time, there are subtrees that are not bit strings, directories, or files. For a more interesting examples where invariants need to be temporarily broken consider that in the spirit of B-trees we impose maximum and minimum numbers of files to be held by any one non-root directory. Then, in order to insert a file, it may be necessary to split a directory into two using a long sequence of basic edits only at the end of which the invariant will be restored. During our exploration of filesystems and how to represent edits, we have briefly seen several ways that edits can “go wrong”. There are three failure classes of interest: 1. It may be clear that a particular edit can never apply to a valid filesystem; e.g. the edit delete(#foo) that tries to delete a file with an invalid name. 2. An edit may be applicable to some filesystems but break filesystem invariants and never properly restore them. We would like to prevent this without ruling out edits which do properly restore invariants. 3. There may be edits that apply to some filesystems correctly but fail on others; e.g. the edit insert({|Foo 7→{||}|}) that inserts a new directory works on any filesystem that does not yet have a “/foo” directory. In this paper, we discuss a way to detect the first two classes of failures. 3 Trees and sheaves automata This section introduces some notations and definitions that we need in the sequel. Some of the descriptions are taken from [FPS07]. 3.1 Trees We assume a finite alphabet Σ and consider unordered trees whose edges are labeled by Σ∗. We write trees with a pair of braces {||} for each node; each subtree is written n 7→ t, where n names the edge that leads to the subtree t. To reduce clutter, we abbreviate the tree {|n 7→{||}|} to just n when no confusion arises. For example, here is an explicit tree with two children labeled “home” and “etc”. . . {|Home 7→{|John 7→{||}|},Etc 7→{|passwd 7→{||}|}|} Proc. BX 2013 4 / 14 ECEASST . . . and its abbreviated form: {|Home 7→John,Etc 7→passwd|} We write t(n)↓ to indicate that tree t has child n, t(n)↑ to indicate that tree t does not have child n, and t(n) for the child under the edge labeled n. We write dom(t) = {n | n ∈ Σ∗∧t(n)↓}. The expression t[n 7→ t′] describes the tree that agrees with t everywhere, except that its child under label n is t′. The expression t\{n1,n2,...,nk} describes the tree that is undefined at n1,n2,...,nk but agrees with t everywhere else. When context makes it clear that n is a name, we write t \n to mean t \{n}. To simplify the definition of partial functions, we take an expression like t[n 7→ E] when E is undefined to be undefined itself. 3.2 Tree edits The set of atomic tree edits is defined as follows1 (where e ranges over edits, t over trees, and m and n over names): e ::= insert(t) | hoist(m,n) | delete(m) | rename(m,n) | at(n,e) The application of an atomic edit e to a tree t is either undefined (⊥) or a new tree defined as follows: insert(t′)·t = t ∪t′ if dom(t)∩dom(t′) = /0 hoist(m,n)·t = t[m 7→ t(m)\n,n 7→ t(m)(n)] if t(m)(n)↓∧ t(n)↑ delete(n)·t = t \n if t(n) ={||}2 rename(n,n′)·t = (t \n)[n′ 7→ t(n)] if t(n)↓∧ t(n′)↑ at(n,e)·t = t[n 7→ e·t(n)] if t(n)↓ e·t =⊥ in all other cases Tree edits are sequences of atomic edits. We overload · as tree edit application, which is the natural lifting of atomic edit application to sequences: 〈〉·t = t 〈a1,...,an〉·t = 〈a1,...,an−1〉·(an ·t) 3.3 Sheaves formulae and automata Conceptually, a sheaves automaton consists of a set of states, each associated with a sheaves formula; each sheaves formula describes a set of trees by specifying which names may occur 1 The edits presented here are fairly simple. One might wonder whether more complicated edits—for example, ones that allowed recursion or tree queries—could also be supported in this framework. For the moment, we take the stance that these more complicated edits might be provided by some external editing tool, but that they should then be “compiled down” to sequences of atomic edits. This is a point that deserves further consideration. 2 Our delete operation deletes a single, explicitly named leaf node, but this is not a fundamental restriction. Operators that delete entire subtrees or all children in a regular language seem like reasonable alternate design choices. 5 / 14 Volume 57 (2013) Edit languages for information trees as immediate child edges and, for each one, an automaton state that describes the subtree found beneath it. To describe sheaves automata more formally, we must first fix a formalism for writing down arithmetic constraints. We use Presburger arithmetic—the decidable first-order theory of the naturals with addition but without multiplication—for this purpose.3 Expressions in Presburger arithmetic include constants, variables, and sums, and formulae include equalities between ex- pressions, boolean combinations of formulae, and quantified formulae: m ::= 0 | 1 | 2 | ··· v ::= m | xm | v + v φ ::= v = v | φ ∨φ | φ ∧φ |¬φ | ∃φ We use a de Bruijn representation—a variable x j within the scope of k quantifiers represents the ( j−k)th free variable if j ≥ k, and otherwise is bound by the jth enclosing quantifier, counting from the inside-out. The semantics of a Presburger formula is the set of vectors of naturals that satisfy it. We write c̄ � φ for formula satisfaction, substituting ci for xi, and fv(φ) for the set of free variables in φ . Next, a sheaves automaton comprises a finite set of states together with a mapping Γ from states to sheaves formulae. The transition behavior from a state is given by the sheaves formula associated with it in Γ. Each sheaves formula has three components—a Presburger formula φ , a list of regular languages r̄, and a list of successor states s̄. The pair of a regular language ri and the successor state si at the corresponding index is called an element. The operation of a sheaves automaton is like a bottom-up regular tree automaton. Let t be a tree and s be an automaton state with Γ(s) = (φ,〈r0,...,rk〉,〈s0,...,sk〉). For each i in the range 0 to k, let ci be the number of children n ∈ dom(t) for which n ∈ ri and t(n) is accepted by si. Then t is accepted by s iff (c0,...,ck) |= φ . Note that the integers that represent variables in de Bruijn notation give the correspondence between free variables in φ and elements—the constraint on xi controls the number of children whose name matches ri with subtrees accepted by si. In the following, it will sometimes be convenient to treat elements (and the corresponding Presburger variables in the sheaves formula) as if they were indexed by some set with more structure than the natural numbers. This is perfectly reasonable, provided there is an injective mapping from the structured set to the naturals. When it is clear that such an injective mapping f exists (and in particular especially when the structured set is finite) we will leave the mapping unspecified and simply use values from the structured set as subscripts to the collections r̄ and s̄ as well as any Presburger variables, understanding rv to stand for r f (v). Sheaves automata and sheaves formulae are subject to some well-formedness conditions. A sheaves formula (φ, r̄, s̄) with |r̄| = k is well-formed iff |s̄| = k; the free variables of φ are {x0,...,xk−1}; the elements are pairwise disjoint—i.e., if there exists a tree accepted by both si and s j, then the regular languages denoted by ri and r j are disjoint; and the elements are generating—i.e., for every tree t and label n ∈ dom(t) there is an i such that n ∈ ri and t(n) is ac- cepted by si. A list of elements obeying these conditions is called a basis. A sheaves automaton is 3 Here we follow the lead of [FPS07] and [DLM04]. Presburger arithmetic is quite an expressive logic, and it is possible that a simpler fragment suffices, but we leave an investigation of this possibility to future work. Proc. BX 2013 6 / 14 ECEASST well-formed iff every sheaves formula in the range of Γ is well-formed. These well-formedness conditions guarantee two properties. First, because the elements are non-overlapping, every tree has a unique decomposition over the basis, which means that the semantics of a sheaves au- tomaton is well-defined. Second, because the elements generate the set of all tree slices, certain constructions are simple. For example, (φ, r̄, s̄) and (¬φ, r̄, s̄) accept complementary sets of trees. One of the simplest automata is the one that accepts any tree. It has a single state, which we will name >, and whose transition relation is given by Γ(>) = (0 = 0,[Σ∗],[>]). In fact, having such a state is so convenient that we will assume in the remainder of the paper that every automaton has a state named > that accepts all trees (and consequently omit the definition of Γ(>)). As a more exciting example, the set of trees {{||}, {|a,b|}} is described by the automaton state s, where Γ(s) is:  ((x0 =0∧x1 =0)∨(x0 =1∧x1 =1))∧(x2 =0),[a,b,{a,b}], [>,>,>]   To see this, observe that the constraints on x0 and x1 force the number of children described by the elements (a,>) and (b,>) to both be 0 or both be 1, and that the constraint on x2 forces the number of children belonging to the final element to be 0, that is, there are no edge labels other than a or b. The relation A,s ` t tells when automaton A accepts tree t when started at state s. We often write sheaves automata as A = (S,s0,Γ), where s0 is a distinguished initial state. We then write A t to mean A,s0 ` t. We also write L(A) ={t | A t} for the language accepted by automaton A. 4 Weakest preconditions of edits So far, we have reviewed a definition for trees and a notion of type-checking for trees, namely, sheaves automata. We have also given a definition for tree edits; what remains is to give a notion of type-checking for tree edits. The type of a tree edit will have two tree types, one describing source trees and one describing target trees. Below, we will show how to take a target tree type T and an atomic edit e and infer a type S for which applying e to an S tree produces a T tree; we will argue that this S is maximal and therefore checking that e has a given type S′ → T amounts to checking that S′ is a subtype of S. Let A,B be sheaves automata and e be a tree edit. We write e : A → B to mean that whenever A t and e·t is defined we have B e ·t. We now define for each atomic edit e and automaton A a new automaton e ·A such that t ∈ L(e ·A) iff e · t ∈ L(A) whenever e · t is defined, that is, e ·A is an automaton representing the weakest precondition that ensures that applying edit e will result in a tree of type A. Formally: L(e ·A) = {t | e.t↓⇒ e.t ∈ L(A)}. It is then clear that e : A → B iff L(A) ⊆ L(e ·B) (notice that if e.t is undefined then, trivially, t ∈ L(e.A)). Language inclusion of sheaves automata being 7 / 14 Volume 57 (2013) Edit languages for information trees decidable [DLM04], this then implies a decision procedure for e : A → B and in particular for deciding whether a given tree edit belongs to A → A. Theorem 1 Let A = (S,s0,Γ) be a sheaves automaton and e be an atomic edit. There exists a sheaves automaton e ·A = (S′,s′0,Γ ′) such that t ∈ L(e ·A) iff e ·t is undefined or e ·t ∈ L(A). Moreover, e·A can be effectively obtained from e and A. Proof. The construction proceeds in two stages. First, we define for each edit e a sheaves au- tomaton De such that L(De) ={t | e ·t↑}. Then, for each edit e, we construct a sheaves automaton e ? A such that for all t with e · t↓ one has e ? A t ⇐⇒ A e ·t. We then define the desired automaton e ·A so that L(e ·A) = L(De)∪L(e ? A) using the union construction from [DLM04]. Given this strategy, the remaining definitions are essentially a programming exercise. • e = insert(t′). For De we need to check that some top-level label from t′ is present. If the top-level labels are, say, r0 ...rn and rn+1 is Σ∗\{r0,...,rn} then the sheaves formula (0 6= ∑ni=0 xi, r̄,>̄) achieves the purpose when attached to the initial state of De. To build e ? A we add a fresh state s′0 that is just like s0 except that it has already “seen” the subtree t′. That is, if Γ(s0) = (φ, r̄, s̄) and ci is the number of labels n ∈ ri∩dom(t′) for which A,si ` t′(n), then we define S′ = S∪{s′0} φ ′ = φ[xi + ci/xi] Γ ′ = Γ[s′0 7→ (φ ′, r̄, s̄)] where φ[e/x] is the formula φ with expression e substituted for variable x. • e = hoist(m,n). Suppose Γ(s0) = (φ, r̄, s̄). The high-level plan for e ? A is to add some new states for each state that the m-branch could be accepted under that tell which state the n-branch was accepted under. To this end, define sets Im and In so that m ∈ ri iff i ∈ Im and likewise n ∈ ri iff i ∈ In. We now perform the following modifications: – Add a fresh state s′0, making it the initial state, and letting it initially be a copy of s0. – Remove the Im languages (and their associated successor states) from the sheaves formula associated with s′0. – For each (i, j) ∈ Im ×In, add a new regular language which is a copy of the original ri and whose successor state is a copy of the automaton that accepts the language of trees that can be split into an n part accepted by s j and a remainder accepted by si. Name the indices of the regular languages and successor states added by this operation ki j. – Modify the Presburger formula associated with s′0 to reflect the changes above: for each i ∈ Im, replace occurrences of xi with ∑ j xki j , and for each j ∈ In, replace occur- rences of x j with ∑i xki j . (If i ∈ Im∩In for some i, then these two operations coincide, because of the partitioning property of sheaves formulae.) Proc. BX 2013 8 / 14 ECEASST m n (a) The trees automaton A accepts. n m (b) The trees hoist(m,n)·A should accept. Figure 1: the hoist operation For De, we must check that the tree t has t(m)(n)↑ or t(n)↓; this is easily done by defining De = ({>,s,n,n̄},s,Γ) where Γ(s) = (x0 = 0∨x1 6= 0∨x2 6= 0,[{m}[n],{m}[n̄],{n}[>],{m,n}[>]]) Γ(n) = (x0 6= 0,[{n}[>],{n}[>]]) Γ(n̄) = (x0 = 0,[{n}[>],{n}[>]]). • e = delete(n). The automaton that accepts exactly the tree {|n 7→{||}|} is easy to construct; call this automaton A′. We then define De = ¬(>+ A′) using the constructions described in [FPS07]. For e ? A, remove n from each of the ri. Then add an extra condition guarded by n. Place no constraint on the corresponding cardinality. Leave the other cardinalities as they were. • e = rename(n,n′). Suppose Γ(s0) = (φ, r̄, s̄). Choose a fresh s′0 and define r′i = { r′i ∪{n} n ′ ∈ r′i r′i \{n} n ′ /∈ r′i S′ = S∪{s0} Γ ′ = Γ[s′0 7→ (φ,r′, s̄)] Then this new automata counts any n-rooted subtree as if it were an n′-rooted one instead. We should briefly argue that the sheaves formula given for s′0 is generating and pairwise disjoint. It is generating: take any tree t and name n′′∈ dom(t), and apply the definition of “generating” to the original sheaves formula using the tree [n′ 7→ t(n′′)] if n′′ = n or using the tree [n′′ 7→ t(n′′)] if not. It is pairwise disjoint: consider r′i[si] and r ′ j[s j] for which both si and s j accept tree t. Take any name n′′ = n (resp. n′′ 6= n). Since the original sheaves 9 / 14 Volume 57 (2013) Edit languages for information trees formula is pairwise disjoint, at most one of ri and r j contain n′ (resp. n′′), hence at most one of r′i and r ′ j contain n ′′. For De we use the automaton ({>,s},s,Γ) where: Γ(s) = (x0 = 0∨x1 6= 0,[{n}[>],{n′}[>],Σ∗\{n,n′}[>]]) • e = at(n,e′). Suppose Γ(s0) = (φ, r̄, s̄). Define the set I so that n ∈ ri iff i ∈ I. We will apply the edit e to each of the automata that start at si such that i∈ I, and use these modified automata as the new states associated with these regular languages. Define (recursively): Ãi = e·(S,si,Γ) Later, we will want to ensure that the Ãi automata are disjoint in the sense that they accept no common trees. This is mostly true of these Ãi: since the states si accept disjoint trees, the trees which arrive at state si after being edited by e are also disjoint sets. However, these Ãi may also accept trees for which the edit e does not apply. Since we don’t care what our final automata does with such trees (as we will be dealing with this situation in De), this subtlety is not important, so we will define it away. The language difference operation can be implemented on automata, so we define Ai = (Si,ai,Γi) = Ãi \Ai−1 \···\A0. Additionally, there are standard constructors for defining an automaton A−1 = (S−1,a−1,Γ−1) which accepts exactly when none of the automata Ai for i ∈ I do. Without loss of gener- ality, we may assume the Si are pairwise disjoint and disjoint from S. (If not, just rename them: this does not change the behavior of the automata.) We are now ready to begin constructing the new automaton (S′,s′0,Γ ′). Its state set is the disjoint union S′ ={s′0}∪S∪S−1 ∪ ⋃ i∈I Si with s′0 chosen afresh. The transition function Γ ′ on the last three components is as in the automata A,Ai,A−1 above, Γ ′(s) = Γ(s) s ∈ S Γ ′(s) = Γ−1(s) s ∈ S−1 Γ ′(s) = Γi(s) s ∈ Si, so that these automata appear as subcomponents. The interesting bit is Γ′(s′0). Its compo- nents are indexed by the following set, where the unions are assumed disjoint. J ={edited(i) | i ∈ I}∪{unedited(i) | 0 ≤ i < |r̄|}∪{illtyped} The sheaves formulae indexed by edited(i) deal with children that would be successfully edited if at(n,e′) were applied. Those indexed by unedited(i) capture those that remain Proc. BX 2013 10 / 14 ECEASST unaffected by the edit. The last sheaves formula indexed by illtyped covers those children that would be edited but for which the edit would not be defined. We now put Γ ′(s′0) = (φ ′,r′,s′) where r′ = (r′j) j∈J and s ′ = (s′j) j∈J and r′edited(i) ={n} r′unedited(i) = ri \{n} r′illtyped ={n} s′edited(i) = ai s′unedited(i) = si s′illtyped = a−1 ρ(xi) = xedited(i) + xunedited(i) i ∈ I ρ(xi) = xunedited(i) i /∈ I φ ′ = ρ φ ∧xilltyped = 0 One might worry about whether the states associated with the regular language {n} above are disjoint and generating. They are generating because of the addition of the catch-all state s′illtyped, and are disjoint because the Ai are disjoint as automata. To build De, we first recursively build De′ = (S,s0,Γ), then create some fresh states s′0 and ¬s0. The new automaton will check whether either there is no n child or there is one, but it’s accepted by D′e: S′ = S∪{s′0,¬s0} Γ ′(s′0) = (x0 6= 0∨x1 = 0,[{n}[s0],{n}[¬s0],Σ ∗\{n}[>]]) Γ ′(¬s0) = (¬φ,e) where Γ(s0) = (φ,e) Γ ′(s) = Γ(s) s ∈ S De = (S ′,s′0,Γ ′) We lift the operation that computes weakest preconditions for atomic edits to tree edits in the obvious way. Lemma 1 Language inclusion of sheaves automata is decidable. Proof. Given sheaves automata A and B, to tell whether L(A)⊆ L(B), build an automaton C such that L(C) = L(A)\L(B) using the algorithms for intersection and complement given in [DLM04]. Then check whether or not L(C) = /0. Corollary 1 Given sheaves automata A and B and tree edit e it is decidable whether e : A → B. Proof. Given the theorem, this amounts to deciding whether L(A)⊆ L(e·B). 11 / 14 Volume 57 (2013) Edit languages for information trees 5 Edit languages and lenses In previous work [HPW12] we defined an edit language E as a set E (or |E|) of elements, a monoid ∂ E of edit operations and a partial action · : ∂ E ×E → E. Given a sheaves automaton A, we can thus form an edit language EA, the full tree edit language over A, whose set of elements is L(A) and whose monoid of edits ∂ EA is {e | e : A → A}. We have seen that membership in ∂ EA is effectively decidable. Continuing the file system example from Section 2, the automaton below checks that a tree is a valid filesystem; it has the five states { f ,d, f ,d,>}, corresponding to files, directories, trees that claim to be files but have an error in them, trees that claim to be directories but have an error in them, and other miscellaneous errors. The transition relation is fairly straightforward; success states demand that there are no included error states, and error states demand the negation of the success states. Γ(d) = (φd,sd) Γ(d) = (¬φd,sd) Γ( f ) = (φ f ,s f ) Γ( f ) = (¬φ f ,s f ) φd = (x1 = x3 = x4 = 0) sd = [rd[d],rd[d],r f [ f ],r f [ f ],rd ∪r f [>]] φ f = (x0 = 1∧x1 = 0) s f = [{0,1}∗[>],{0,1}∗[>]] rd ={A,··· ,Z}Σ∗ r f ={a,··· ,z}Σ∗ Our framework then lets you effectively check that edits like at(p1,at(...,at(pn,at( f ,e))...)) which modify the contents of the file “/ p1/···/ pn/ f ” in the filesystem are in the (full) edit monoid that comprises those sequences of atomic edits that preserve the file system structure. On the other hand, an edit like at( f ,delete(c)) which deletes the contents c of file f but neglects to delete the file itself will be rejected. Also recall from [HPW12] that an edit lens between two edit languages E and E′ comprises • a complement set C • a consistency relation K ⊂|E|×C×|E′| • a rightward translation V∈ ∂ E ×C → ∂ E′×C • a leftward translation W∈ ∂ E′×C → ∂ E ×C such that • consistency and typing are preserved, that is, (a,c,b)∈ K e ∈ ∂ E e ·a↓ V(e,c) = (e′,c′) (e ·a,c′,e′ ·b)∈ K e′ ∈ ∂ E′ e′ ·b↓ and similarly for W, and Proc. BX 2013 12 / 14 ECEASST • composition of edits is respected, that is, V(e1,c) = (e′1,c ′) V(e2,c′) = (e′2,c ′′) V(e2e1,c) = (e′2e ′ 1,c ′′) and similarly for W. It is now possible to design lenses between full tree edit languages of the form EA for a sheaves automaton A, but we believe that this is not necessarily the best way of proceeding, since the edit transformation embodied in the lens might need to get some intensional information about the intended semantics of the edit to be translated. We are thus led to define a tree edit language to be an edit language whose set of elements is of the form |E|= L(A) for some sheaves automaton A and whose monoid of edits ∂ E comes with a monoid morphism f : ∂ E →{e | e : A → A}. Thus, every edit is “implemented” as a sequence of atomic edits and, thus, assuming that ∂ E is finitely generated, we can check well typedness by checking whether f (g) : A → A holds for every generator g. It is then possible to design tree edit lenses that synchronise between different file systems and file structures, for example one having nested subdirectories and the other one being essentially flat. In such a case, the complement will store alignment information in the form of a bijection between files and directories. 6 Conclusion We have defined a simple set of edits for information trees comprising insertions, deletions, relo- cations, and renamings of subtrees. Our main technical result states that tree languages defined by sheaves automata [DLM04] are effectively closed under weakest preconditions for these edits and that therefore, typechecking of edits against tree types defined by sheaves automata [FPS07] is algorithmically tractable. We see this result as a first step towards an automata-based high-level formalism for tree synchronisation; in particular we would like to investigate to what extent complements and con- sistency relations can be defined by automata and how the tree types from [FPS07] and the as- sociated term formers can be lifted to edit lenses. More speculative goals include the automatic discovery of tree edits (“tree diffing”) and the extension to graphs. Bibliography [BCF+10] D. M. J. Barbosa, J. Cretin, N. Foster, M. Greenberg, B. C. Pierce. Matching Lenses: Alignment and View Update. In ACM SIGPLAN International Conference on Functional Programming (ICFP), Baltimore, Maryland. Sept. 2010. [BDHS96] P. Buneman, S. Davidson, G. Hillebrand, D. Suciu. A Query Language and Opti- mization Techniques for Unstructured Data. In ACM-SIGMOD. Pp. 505–516. 1996. 13 / 14 Volume 57 (2013) Edit languages for information trees [BFP+08] A. Bohannon, J. N. Foster, B. C. Pierce, A. Pilkiewicz, A. Schmitt. Boomerang: Resourceful Lenses for String Data. In ACM SIGPLAN–SIGACT Symposium on Principles of Programming Languages (POPL), San Francisco, California. Jan. 2008. [DLM04] S. Dal Zilio, D. Lugiez, C. Meyssonnier. A Logic You Can Count On. In ACM SIGPLAN–SIGACT Symposium on Principles of Programming Languages (POPL), Venice, Italy. Pp. 135–146. ACM Press, Jan. 2004. [DXC11a] Z. Diskin, Y. Xiong, K. Czarnecki. From State- to Delta-Based Bidirectional Model Transformations: the Asymmetric Case. Journal of Object Technology 10:6:1–25, 2011. [DXC+11b] Z. Diskin, Y. Xiong, K. Czarnecki, H. Ehrig, F. Hermann, F. Orejas. From State- to Delta-based Bidirectional Model Transformations: The Symmetric Case. Technical report GSDLAB-TR 2011-05-03, University of Waterloo, May 2011. [FPS07] J. N. Foster, B. C. Pierce, A. Schmitt. A Logic Your Typechecker Can Count On: Unordered Tree Types in Practice. In Workshop on Programming Language Tech- nologies for XML (PLAN-X), informal proceedings. Jan. 2007. [HHI+10] S. Hidaka, Z. Hu, K. Inaba, H. Kato, K. Matsuda, K. Nakano. Bidirectionaliz- ing Graph Transformations. In ACM SIGPLAN International Conference on Func- tional Programming (ICFP), Baltimore, Maryland. Sept. 2010. [HPW12] M. Hofmann, B. C. Pierce, D. Wagner. Edit Lenses. In ACM SIGPLAN–SIGACT Symposium on Principles of Programming Languages (POPL), Philadelphia, Pennsylvania. Jan. 2012. [Ste07] P. Stevens. Bidirectional Model Transformations in QVT: Semantic Issues and Open Questions. In International Conference on Model Driven Engineering Lan- guages and Systems (MoDELS), Nashville, TN. Lecture Notes in Computer Sci- ence 4735, pp. 1–15. Springer-Verlag, 2007. [XLH+07] Y. Xiong, D. Liu, Z. Hu, H. Zhao, M. Takeichi, H. Mei. Towards automatic model synchronization from model transformations. In IEEE/ACM International Confer- ence on Automated Software Engineering (ASE), Atlanta, GA. Pp. 164–173. 2007. Proc. BX 2013 14 / 14 Introduction Examples Trees and sheaves automata Trees Tree edits Sheaves formulae and automata Weakest preconditions of edits Edit languages and lenses Conclusion