Spreadsheets with a Semantic Layer Electronic Communications of the EASST Volume 62 (2013) Specification, Transformation, Navigation Special Issue dedicated to Bernd Krieg-Brückner on the Occasion of his 60th Birthday Spreadsheets with a Semantic Layer Andrea Kohlhase, Michael Kohlhase 20 pages Guest Editors: Till Mossakowski, Markus Roggenbach, Lutz Schröder Managing Editors: Tiziana Margaria, Julia Padberg, Gabriele Taentzer ECEASST Home Page: http://www.easst.org/eceasst/ ISSN 1863-2122 http://www.easst.org/eceasst/ ECEASST Spreadsheets with a Semantic Layer Andrea Kohlhase, Michael Kohlhase Jacobs University Bremen Abstract: Spreadsheets are active documents that are heavily employed in adminis- tration, financial forecasting, education, and science because of their intuitive, flex- ible, and direct approach to computation. But they are also error-prone, poorly documented, often contain actual data in legacy form. Therefore, assistance for high-impact spreadsheet users is needed. To determine what kind of help could be useful, we analyze user expectations with an “Wizard-of-Oz” experiment. This shows that background knowledge is missing in spreadsheets. In the SACHS project we approach the missing background knowledge by adding a semantic layer. We illustrate spreadsheets with a semi-formal domain ontology and equip them with a semantically transparent interface that allows new forms of interaction like “semantic navigation”, “framing”, or “playing with variants”, on which a survey is given. Moreover, an integration of assessment knowledge into the SACHS approach is presented. We model it based on theory graphs and sketch a potential SACHS extension with innovative assessment interaction. Keywords: Spreadsheets, Semantic Web, Knowledge Management 1 Introduction In many spreadsheet-based applications even longtime users cannot interpret all data and are not certain about their origins (see [AE06] and its references). This often results in errors on the data level and misinterpretation or misapprehension of the underlying model. Usability and maintenance problems are not only well-known, they have severe consequences, see for exam- ple [Mur08]. It has been estimated that each year tens of millions professionals and managers create hundreds of millions of spreadsheets [Pan00]. ABRAHAM and ERWIG report an astound- ing error rate of up to 90% (!) in spreadsheets [AE06]. In this article we show that (interactional) semantic knowledge management techniques can be used to enhance the interaction with spreadsheets and alleviate usability problems caused by spreadsheet complexity. In particular, we give a survey on the user assistance system “SACHS” (Semantic Annotation for a Controlling Help System) that aims at overcoming usability issues for MS Excel documents. To deepen our understanding of the underlying problem, let us first discuss why spreadsheet technology is so attractive and why this sets knowledge traps. Spreadsheets are active docu- ments, i.e., they are on the one hand of a document type that distinguishes between form and content (like cell layout vs. computed cell values), and on the other hand they exploit the distinc- tion by using a presentation engine like Excel to adapt the surface structure of the document to the environment or user input. For example, if an author adds a column, the concerned relative 1 / 20 Volume 62 (2013) Spreadsheets with a Semantic Layer Figure 1: Running Example: A Simple Controlling System Using MS Excel after [Win06] cell references in all formulas of the spreadsheet are automatically updated by the presentation engine. Interestingly, the potential activeness of the document is not at all exploited when reading a spreadsheet. To remedy this we must first understand the ways activeness supports authoring. The underlying semantic model for the content of spreadsheets consists of various layers (see [KK09a]). We will expose them using the spreadsheet in Figure 1, which we will use as a running example for this article1: Data Cell values are contained in the data layer. These are the objects that are actively handled by the presentation engine. Their meaning is defined in other layers. Surface The surface layer sets visual cues for the interpretation of the shown data. For instance, the grid of cells seen in Figure 1 can be roughly divided into three areas. The darker, ochre area in the center contains values of actual and past expenses and revenues; the lighter, yellow box on the right contains values projected from these. The white region that surrounds both boxes supplies explanatory text or header information that helps users to interpret these numbers. We call meaningful grid regions like the ochre one, here the one containing actual data, semantic blocks. Generally, non-empty cells that do not contain input or computed values contain text strings that give auxiliary information (comparable to a legend) on the cells that do. The author’s cell format decision also influences the surface layer as one and the same datum can be for example presented with a chosen currency symbol or as a percentage rate. Formula Authors experience a great deal of satisfaction from spreadsheets, because cell values can be computed automatically via assigned, sometimes very complex underlying formu- lae. Here, a formula is an expression built up from constants, an extended set of numeric and logic operators, and references to other cells. 1 The SACHS system was developed for the DCS system, a financial controlling system based on Excel in daily use at the German Research Center for Artificial Intelligence (DFKI). Of course, we cannot use that as an example here for privacy reasons. Festschrift Bernd Krieg-Brückner 2 / 20 ECEASST In our example, the yearly profit as cell value in [B17: F17] can be computed from the resp. revenues in [B6: F6] and the total expenses in [B15: F15] by a simple subtraction, the total expenses can in turn be computed by summing up the various particular expense categories listed in cells [A9: A13]. We call grid regions containing cells with the same underlying formula like [B17: F17] computational functional blocks. Note that the projected profit values in cell range [G17: H17] do not belong to this computational functional block as they are calculated as projected values, but they could have been computed as well within this computational functional block as difference between the projected revenues and expenses. Background The background layer in a spreadsheet is invisible as it concerns the knowledge that is needed for the interpretation of data, that is not made explicit in the document. For example, the author of the spreadsheet in Figure 1 knew which business the data are taken from or which one they are concerned with, but a reader of this document has to know this information to understand its sense. The interpretation of the data in row 17 for instance consists in the profit/loss situation over time (i.e., in the years 1984-1990 as indicated by the values in row 4). In particular, the meaning of the values in row 17 is that they represent profits and losses as a function π of time: recall that a function is a right-unique relation (i.e., a set of pairs of input values and output values such that for every input there is a unique output). In our example the pair 〈1984,1.662〉 of values of the cells [B4] and [B17] is one of the pairs of π . We call cell ranges each of which corresponds to a specific such function implicit functional blocks. Note that the cell range [B17: F17] in our running example is a computational functional block, whereas it is only part of the implicit functional block [B17: H17]. The distinct notions of “semantic”, “computational functional” and “implicit functional block” show the complexity of making sense of spreadsheets. Even though spreadsheets are appealing as they are deceptively easy to use, they are also error-prone but high-impact, widely-disseminated but poorly documented, and contain actual data in legacy form (see e.g. [Pan00, Mur08]). Errors caused by misunderstandings at the surface and formula layer are tackled via user assistance for the spreadsheet player itself. For instance, in [AN08] a multi-layer interface approach is used and realized for Excel — especially addressing the problem of appropriateness of explanation. Support for comprehending spreadsheets is often concerned with data visualization techniques and data/formula dependency graphs (see [BP08] and [HM08] resp. as examples), which only cover the data layer. In contrast to these layers that can be handled once and for all at the level of the system, the domain modeled in spreadsheets and with that the background knowledge is specific to the spreadsheet itself and must thus be handled specially. This makes the treatment of background knowledge in the background layer into an iceberg that endangers safe passage of spreadsheets. The background layer is touched in [Din09] with a documentation-through-annotation approach, that formulates a collection of design guidelines for supporting rich annotation in spreadsheets. Issues concerning the interpretation of spreadsheets are presented in [BM08]. To the best of our knowledge all of the available solutions for missing background knowledge attack the problem on the social or organizational level. But no system has yet been developed to provide user assistance for spreadsheets on the background layer. 3 / 20 Volume 62 (2013) Spreadsheets with a Semantic Layer In the next section we explore the hidden iceberg of background knowledge in spreadsheets using a “Wizard-of-Oz” experiment — a research method in which subjects interact with a computer system that they believe to be autonomous, but which is actually being operated or partially operated by an unseen human being (see [DJA93]). In particular, we wanted to deter- mine distinct knowledge kinds necessary for ‘understanding the numbers’ in spreadsheets. The results suggested to start building our user assistance system for spreadsheets SACHS with the goal to integrate previously only implicit domain knowledge. We give a short overview of the SACHS architecture and system in Section 3. In particular, it illustrates DCS with a semi-formal domain ontology and makes it available with a semantically transparent interface. We show that the semantic enhancement of spreadsheets with SACHS not only enables access to more knowledge, it also allows new forms of spreadsheet interaction. Furthermore, the “Wizard-of- Oz” experiment yielded the need for different knowledge kinds. For example, spreadsheet users often are concerned with questions like “Is it good or bad for my business if this cell has value 0.992?” concerning assessment knowledge. In Section 4 we present a model for the knowledge involved with assessing numbers and concepts within a spreadsheet based on theory graphs. We sketch how this can be incorporated into the SACHS system by providing paper prototypes. Section 5 concludes the article and discusses future research directions. 2 Background Knowledge in Spreadsheets HODNIGG and MITTERMEIR state that “comprehension of a workbook is non-trivial as there are several factors that aggravate its comprehension.” [HM08, p. 82]. But what are the necessary factors for comprehension? To develop the domain ontology for the background knowledge of the DFKI Controlling system DCS we organized interviews with a DFKI expert on the topic and recorded them as MP3 streams2. Even though these interviews were not originally intended as a “Wizard of Oz” experiment, in the following we will interpret them so because any “Wizard of Oz” experiment would have been isomorphic in setup and outcome3. Here, the interviewee plays the part of an ideal SACHS system and gives help to the interviewer who plays the part of the user. This experiment gives us valuable insights about what background knowledge consists of and how it is — or should be — organized. Conceptually, the experiment reveals that the DCS system only models the factual part of the situation it addresses, while important aspects for ‘understanding the numbers’ remain implicit. Concretely, it yields qualitatively distinct expla- nations for a user assistance system, which the expert thought was necessary to understand the specific controlling system spreadsheet. When studying the MP3 streams, we were surprised that in many cases a question of “What is the meaning of . . . ” for a specific knowledge item was answered by the expert with up to six of the following seven explanation types, the occurrence rate of each relative to the number of knowledge items is listed in the brackets (“What is the percentage of this specific explanation 2 We recorded three interview sessions amounting to approximately 1.5 hrs concerning 39 distinct knowledge items and containing 110 explanations. 3 The only difference is that the interviewers were aware that they were talking to a human. We contend that this is immaterial, since the aim of the experiment is to find out which kinds of knowledge were expected, not how a concrete system interface works. Festschrift Bernd Krieg-Brückner 4 / 20 ECEASST type being selected for describing a knowledge item?”): 1. Definition (Conceptual) [71.8%] A definition of a knowledge item like a functional block is a thorough description of its meaning. For example the functional block “cover ratio per project in a research area” was defined as the percentage rate to which the necessary costs are covered by the funding source and own resources. 2. Purpose (Conceptual) [46.2%] The purpose of a knowledge item in a spreadsheet is de- fined by the spreadsheet author’s intention, in particular, the purpose explains why the author put the information in. A principal investigator of a project or the respective de- partment head e.g. needs to get the information about its cover ratio in order to know whether either more costs have to be produced to exploit the full funding money or more equity capital has to be acquired. 3. Assessment of Purpose [30.8%] Given a purpose of a knowledge item in a spreadsheet, its reader must also be able to reason about the purpose, i.e., the reader must be enabled to draw the intended conclusions/actions or to assess the purpose. For understanding whether the cover ratio is as it is because not enough costs have yet been produced, the real costs have to be compared with the necessary costs. If they are still lower, then the costs should be augmented, whereas if they are already exploited, then new money to cover the real costs is needed. 4. Assessment of Value [51.3%] Concrete values given in a spreadsheet have to be inter- preted by the reader as well in order to make a judgement of the data itself, where this assessment of the value is a trigger for putting the assessment of purpose to work. For instance, the size of the cover ratio number itself tells the informed reader whether the project is successful from a financial standpoint. If the cover is close to 100%, “everything is fine” would be one natural assessment of its value. 5. Formula [23.1%] With a given formula for a value in a spreadsheet’s cell the reader knows exactly how the value was computed, so that she can verify her understanding of its inten- tion against the author’s. Note that a lot of errors in spreadsheets result from this distinc- tion. In our experiment, if a value of a cell was calculated with a formula explicitly given in the spreadsheet, then the expert explained the dependency of the items in the formula, but restricted from just reading the formula aloud. In particular, he pointed to the respec- tive cells and tried to convey the notion of the formula by visualizing their dependency, not so much what the dependency was about. 6. Provenance [43.6%] The provenance of data in a cell describes how the value of this data point was obtained, e.g. by direct measurement, by computation from other values via a spreadsheet formula, or by import from another source; see [MGM+08] for a general discussion of provenance. In our interviews — as many of the data of the concrete spread- sheet were simply an output of the underlying controlling data base — the provenance explanations mostly referred to the specific data base where the data comes from. But when the formula for a value was computed, but not within Excel, the expert tried to give the formula as provenance information, e.g. in the case of the cover ratio. This knowledge was often very difficult to retrieve afterwards for the creation of the semantic document. 7. History [15.4%] The history, i.e., the creation process of a spreadsheet over time, often is important to understand its layout that might be inconsistent with its intention. For instance, if an organizational change occurs that alleviates the controlling process and 5 / 20 Volume 62 (2013) Spreadsheets with a Semantic Layer makes certain information fragments superfluous, then those fragments will still be shown in the transition phase and beyond, even though their entropy is now 100% in the most of cases. These seven explanation types were distilled from the recorded set of 110 explanations. The percentages given can function as a relevance ranking done by the expert with respect to the importance of explanation types for providing help. Figure 2: Explanation Types Figure 2 portrays the distribution of oc- currences according to each type (“How often occurred this specific explanation type out of all explanations?”). The “Wiz- ard of Oz” experiment interpretation sug- gests that Figure 2 showcases the user re- quirements for SACHS as a user assis- tance system (see also [NW06]). Excel is able to give help for 8% of the explanations we found in the help of a hu- man expert, whereas not surprisingly Def- inition explanations are the most frequent ones. A realized domain ontology together with an appropriate interface (as realized in the SACHS system described in the next section) bumps this up to 33%. Even though this is certainly an improvement, it leaves much more to be desired than we anticipated. 3 SACHS: A Semantic Help System for MS Excel Semantic technologies like the Semantic Web add novel functionalities to existing information resources by enhancing them with explicit representations of the underlying knowledge objects and their relations and exploiting them for computing new information. The information re- sources in spreadsheets are cells and grids, and it is not clear how they can be enhanced seman- tically because they as objects belong to the spreadsheet player, in our case Excel. If we tie user assistance to these objects, then we either get generic help like in an Excel manual, particularly not help for understanding a specific spreadsheet, or we have to extend the (possibly proprietary) player software. Therefore, we used a different approach for the use of semantic technologies which we call Semantic Illustration: Instead of enhancing resources into semiformal ontologies by annotating them with formal objects that allow reasoning as in the Semantic Web ‘paradigm’, here a semantic system illustrates a software artifact A (an application, program, or document) with a semiformal ontology by complementing it with enough information to render new se- mantic services. This approach contains a somewhat analogous requirement phrased in [Tag09]. Conceptually, a help system H is independent of A , even though an implementation may well integrate it into A . In any case H is related to A via an interpretation mapping, so that it can serve as a “semantic ally” for A . The Semantic Illustration approach opens the ‘use of seman- Festschrift Bernd Krieg-Brückner 6 / 20 ECEASST tic data’ for non-semantic software applications: Any system with formal data can be mashed up with semantic applications. In SACHS we made use of the fact that spreadsheets are active documents whose surface structure can adapt to the environment and user input. Here, a specific DCS spreadsheet is the artifact A for which SACHS provides a help system H . The SACHS system is an add-in for MS Excel 2003 written in Visual Basic for Applications (VBA). The system was actively developed 2008-2009 and has been kept stable since then, since we are planning to re-implement it on a newer technology basis (XML-based Office Suite and the new OMDoc technology stack). The system is in a state, where it can be used to explore the information architecture and user interactions afforded by it, but it is not ready for general dissemination. Nonetheless, the code is available from the authors upon request. In SACHS cells in the spreadsheet are linked by the “interpretation mapping” to elements of an accompanying OMDoc document that contains a representation of the background knowledge. In particular, all cells in a functional block are linked to the same OMDoc definition — the definition of its intended function. This assignment is internally represented by an extra work- sheet within the spreadsheet, that is maintained by the spreadsheet author. With respect to the concrete DCS spreadsheets SACHS was created for, these were semi-automatically generated. 3.1 Semi-Formal Ontologies as Theory Graphs In SACHS we utilize a semi-formal domain ontology for the respective spreadsheet as back- ground layer. This ontology is encoded in the OMDoc format, which allows to mix informal elements (e.g. descriptions in natural language) with formal structure elements (like interlinked axioms, definitions, theorems, and proofs, organized into theories). The main structure we uti- lize here is that of a theory graph (i.e., a graph of theories interlinked by theory morphisms; see [RK13] for details), which can be understood as an object-oriented organization of the background knowledge corresponding to the spreadsheet: Closely related information objects (definitions, descriptions, statements of properties, etc.) are grouped into theories related by the imports relation, which corresponds to a functional dependency relation. In particular, the nodes of a theory graph consist of theories and the edges are given by imports relations; Figure 3 demonstrates the richness of a theory graph even for a small example like the running one. For more details consider the situation on the right side of Figure 4, where we base a theory of profits in enterprises on theories of revenues and expenses. Formally the links (thin arrows) in this graph have in common that all assertions that are true in the link source are true in the link target (possibly after translation). Say SemAnteX Profit contains the assertion (*) If c∗ owns c, then the larger π(c,now), the better for c∗. which is proven from axioms in an envisioned theory Income that is imported by SemAnteX Profit. This allows us to introduce the other type of link in a theory graph: views. Formally, a view is a mapping of concepts from the source theory to the target theory, such that all axioms and definitions in the source theory are true in the target theory. Consider for instance a theory The More the Better with the single assumption that “the more of a commodity x I obtain the better it is for me”, then the mapping that maps x to π(c,now), where c is a company I own stock in, is a view by virtue of assertion (∗). Theory graphs with imports relations and views support very efficient reuse of information. For user assistance, they support multiple explanations. In our example, we could now also 7 / 20 Volume 62 (2013) Spreadsheets with a Semantic Layer Figure 3: The Complexity of a Theory Graph of Background Knowledge for Figure 1 The More the Better SemAnteX Profit SemAnteX Revenues SemAnteX Expenses Figure 4: A Simple Theory Graph explain the concept of a profit via the red view rather than the two imports relations, resulting in an explanation “the more profit you make the better you will be off ”. In essence, a theory graph can be viewed as an “and/or graph” for user assistance systems, in our example we can explain SemAnteX Profit by SemAnteX Revenues and SemAnteX Expenses or by The More the Better — depending on prior knowledge and preferences of the user. Note that views theoretically allocate help provisions for distinct context cultures as described in [Hug08]. We utilize the theory graph structure of the background ontology of a spreadsheet. 3.2 In-Place Help One of the major issues for user assistance systems consists in “providing help at an appropriate level” (e.g. [NW06]). In recent years, the demand for embedded user assistance, i.e., user assis- tance that is provided without having a user to push a help button and go searching for help, has Figure 5: The SACHS Panel grown and was even called the “future for software help” [Ell07]. But what does ‘embedded’ really signify? The Excel objects that carry meaning are the cells. They are interpreted by the user in both the grid layout like within a table with an assigned row and col- umn specification and the underlying formula. With SACHS we offer a Festschrift Bernd Krieg-Brückner 8 / 20 ECEASST third interpretation by aligning cells with concepts in the domain ontology. Hence, we realized embeddedness by using cell clicks as entry points for the help system, so that every click on a cell generates help. Concretely, the SACHS system addresses the Definition explanation type with the direct help text generator shown in the SACHS panel in Figure 5. The OMDoc domain ontology has three text slots of different granularity that can be used for help texts. Accordingly, the SACHS panel Figure 6: SACHS-Generated Help Texts for Cell [H9] in Figure 1 offers the choice of get- ting “labels” (a title), “com- ments” (a short descrip- tion), or “explanations” (a detailed description); see Figure 6 for an example of the various granularities. The generated help texts are enhanced by instantiat- ing the parameters with con- crete values from the cell context. For instance the time interval is instantiated to the year 1990 in Figure 6. The SACHS panel also of- fers to search the domain on- tology for concepts that contain user-given strings, to present them once selected and (if avail- able) to link to the according cell. In the functional block mode every cell click highlights all those cells that functionally belong together (as showcased in Figure 1 and 6). 3.3 Semantic Navigation Another option in the SACHS panel is the generation of a dependency graph for the concept connected to the selected cell. For instance, if this option is chosen for cell [B15] (expenses 1984), then the first two levels of the graph as seen in Figure 7 are generated. Concretely, all the different expense types like “Salary Costs” and “Utility Costs”, that build up the expenses of the company called “SemAnteX”, are listed as well as a node for the company itself. If the user wants to elaborate on a specific concept like “Salary Costs”, then a click on the corresponding node expands it by another level (as happened for Figure 7). This feature is comparable to hyperlinks in help texts, but adds semantic navigation cues. We mashed-up the graph-based interface with the interactions needed within a spreadsheet to allow the user to navigate the spreadsheet via the structured background ontology by the definitional structure of the intended functions. Here, the color-coding of the nodes indicates whether the concept is connected to a specific cell in the workbook. Darker grey means that it is available on the active spreadsheet, lighter grey hints that the assigned cell is on another spreadsheet but still within the active workbook, and light violet points out a semantic concept with no connection to spreadsheets. Note that the user has the choice of text granularity in each node (via right mouse click) or all nodes (via SACHS panel). 9 / 20 Volume 62 (2013) Spreadsheets with a Semantic Layer Figure 7: Dependency Graph for Cell [B15] 3.4 Framing While the domain knowledge provided by the semantic background is important information, first evaluations made it clear that the interpretation mapping (hence the information provided by the SACHS system to the user) is strongly dependent on the author’s point of view — how she frames the data. If we understand framing as the practice of viewing novel situations in terms of something already understood, then we can now model the framing practice by defining a framing to be the establishment (creating or choosing) of an imports relation or view from a source theory (the framing theory) into the theory representing the problem (the framed theory). Hence, a frame is understood as a scaffolding of concepts that influence the understanding of situations. In Figure 4 the theory The More the Better and theories Revenues and Expenses represent a frame for Profit. The view explanation makes use of framing to anchor the explanation of profit in the user’s previous experience. prognosispolynomial charpolynomial lagrangeinterpolation crystalball linear-lagrange quadratic-lagrange cubic-lagrange sax-salarycosts sax-prognosisSemAnteX sax-salarycosts-projected Figure 8: A Fragment of the SACHS Domain- Ontology Theory Graph Let us look at the graph in Figure 8 that represents a part of the domain ontology theory graph. The projected salary costs in cell [G9] feed on two concepts: salary costs and a projection “sax-prognosis”, that is a project-specific prognosis function. Both concepts are realized in distinct theories, which are represented in Figure 8. If we fol- low the graph, then we can see that the cell value in [G9] is computed with a quadratic LaGrange function, which is a LaGrange interpolation, which is used as prognosis function. Now, we don’t know which in- formation the user reading our spreadsheet wants to know: Is she interested which specific La- Grange function is used, is she interested more generally what a LaGrange interpolation is, or Festschrift Bernd Krieg-Brückner 10 / 20 ECEASST does she wonder what a projection is? All these are different frames. A first theory of framing based on theory-graphs and theory morphisms was developed in [KK09b] and implemented in SACHS. Note that the concept of frames does not depend on cells but really on functional blocks. Therefore, SACHS’ interface for framing was implemented in the functional block mode. Con- cretely, in Figure 9 we find the SACHS panel extended by framing features. Once a cell is selected, the assigned definition in the ontology with its home theory is shown as the framed theory. The natural framing theory determines the framing theory in the first step, and all the background information is subsequently shown with respect to this frame. The user is offered to change the frame via frame generalization or frame specialization. Figure 9: The SACHS Functional Block Panel Note that the home theory of cell [G9] — i.e., the theory that con- tains the definition sax-salarycosts- projected.def in the interpretation — is the theory sax-salarycosts-projected as seen in the SACHS panel in Fig- ure 9. It imports the theories sax- salarycosts and sax-prognosis. These theories can hence be used as frame generalizations. If we are more inter- ested in the latter theory, we select it and get a new choice of frame gen- eralizations SemAnteX and quadratic- lagrange. After choosing the latter, the only available frame generalization becomes lagrangeinterpolation. Finally, here we can select prognosis as a frame for the projected salary costs at SemAnteX Corp. Importantly, with each change of frame the semantic information given to the user changes. For instance, in Figure 6 we can see different explanations for the same selected cell with respect to distinct frames. Note that usually the user can only get the information regarding the author’s framing as the OMDoc document is fixed and consequently the imports relation for the home theory. Another author might have chosen to e.g. import the lagrangeinterpolation theory directly instead of importing the more specific sax-prognosis. Here, the SACHS panel broadens the user’s opportunities and takes back the rigor and subjectivity of the author’s choice of framing. The set of frame specializations wrt. a certain framing theory consists of all theories that im- port this framing theory. Frame specializations can supply the user with surprising insights. For example, the theory prognosis is imported by the theory crystallball, which offers the prognosis method of sitting in front of a crystal ball and — disregarding the data set — coming up with a mapping from times to values. With this, the reader may realize that there are always worse possible prognosis functions. 11 / 20 Volume 62 (2013) Spreadsheets with a Semantic Layer 3.5 Frame Variants Another interesting service a framing-aware SACHS can offer is the display of variants. That is, the concrete framing assumption reified in the MS Excel formula for a cell can be changed. The conventional way to deal with such variants in a spreadsheet is to just replace the formulae in the functional block with new ones and see what the result is; a destructive and error-prone process at best. Given enough background knowledge we can do better. In our example, we have three theories specializing lagrangeinterpolation with concrete Lagrange extrapolations of differ- ent order, from which we can derive spreadsheet formulae, which in turn can be entered into the spreadsheet. In the example in Figure 10, we are looking for variants for the ’∼Definition’ la- grangeinterpolation.def in the framing theory for the definition sax-salarycostsperti-projected.def assigned by the author to cell [E9]. Concretely, selecting the option “Variants” in the SACHS panel shown in Figure 9 leads to the opening of the “Variants Panel” demonstrated in Figure 10. Figure 10: Frame-Based Variants We see that there are three possible variants for the Lagrange extrapolation function: the linear, the quadratic, and the cubic Lagrange extrapolations. Remember that the quadratic one was originally used as the SemAnteX prognosis function (indicated by the arrow in front of this variant in Figure 10). In the example the user selected the variant linear-extrapolator.def. Once the check box is checked, the SACHS system generates new space in the spreadsheet (the light grey row 10 in Figure 10) enabling the presentation of the variant values for the entire functional block. The according variant formula (in the MS Excel formula box at the top of Figure 10) is evaluated.4 Note that framing influences which concrete variants are available: if we have framed [E9] as the result of a Lagrange extrapolation, we should be allowed to vary the order k of the Lagrange Polynomial (if we have enough data points). If we have however framed [E9] only as the result of a general prognosis function then we should also have crystal ball prognosis at our disposition as a variant. 4 This implementation is just a prototype: one limitation is that computed values on the spreadsheet might change by this action as Excel automatically adjusts any existing formula that use cells moved by the insertion. This is not a problem though, since the user’s focus is on the values of the selected and unchanged functional block and its variant (otherwise using the variants isn’t sensible). Festschrift Bernd Krieg-Brückner 12 / 20 ECEASST 3.6 Background Knowledge Coverage by SACHS? To evaluate the user assistance situation for spreadsheets with the SACHS system: the Formula explanation type (8%) is rudimentarily covered by Excel, and the Definition explanation is offered by SACHS (25%). It can be argued that the ontology-based SACHS architecture is well-suited to cope with Purpose explanations (17%) — indeed, some of the purpose-level explanations have erroneously found their way into definitions for the DCS, where they rather should have been classified as ‘axioms and theorems’ (which are currently not supported by the SACHS interface). The explanation category Provenance (16%) has been anticipated in the SACHS architecture (see [KK09a]) but remains unimplemented in the SACHS system. The Assessment of Purpose type (9%) is completely missing from SACHS as well as History (6%) and Assessment of Value (19%). 4 Assessment Now we take up the problem of Assessment of Value explanations. On the one hand, it is ranked second in the list of explanation types with a stunningly high percentage of 51.3%, which can be interpreted as the second-best type of explanations from the point of view of our expert. On the other hand, the very nice thing about assessment for computational data is that we can hope for a formalization of its assessment in the form of formulas, which can be evaluated by the spreadsheet player in turn. 4.1 Modelling Assessment A naive approach of complementing spreadsheets with assessment knowledge could be the in- clusion of Assessment of Value information into the definition text itself. This is — ontologically speaking — a very impure approach as such judgements do not solely depend on the concept it- self. For instance, they also depend on the respective Community of Practice: At one institution e.g. a cover ratio of 95% might be judged as necessary, at another 100% (or more) might be expected. Clearly we need a better theory of modeling assessment. So before we address the question of how to model assessment, first we have to take a closer look at assessment itself: What is it about? Assessments consist of value judgements passed on situations modeled by (parts of) spreadsheets. As such, we claim that assessments are deeply in the semantic realm. To strengthen our intuition, let us consider some examples; we will use a slightly varied version of the simple spreadsheet document in Figure 1, which we have already used in [KK09a, KK09b] for this. The following can be considered typical assessment statements: I) “Row 6 looks good.” II) “The revenues look good.” III) “I like this [points to cell [E17]] but that [points to cell [F17]] is a disaster.” IV) “I like the profit in 1987 but of course not that in 1988.” V) “Upper Management will be happy about the leftover funds in [nn] that they can now use elsewhere, but the PI of the project will be angry that he got less work out of the project than expected. Not to mention the funding agency; they cannot be told of this at all, because 13 / 20 Volume 62 (2013) Spreadsheets with a Semantic Layer it violates their subsistence policy.” On the surface, the first statement refers to a row in the spreadsheet, but if we look closer, then we see that this cannot really be the case since if we shift the whole spreadsheet by one row, then we have to readjust the assessment. So it has to be about the intended meaning of row 6, i.e., the development of revenues over the years. Indeed we can paraphrase I with II — another clue that the assessments are really about situations modeled by a functional block in the spreadsheet. But assessments are not restricted to functional blocks as statements III and IV only refer to individual cells. Note again that the statements are not about the numbers 0.992 and -0.449 (numbers in themselves are not good or bad, they just are). Here the assessment seems to be intensional, i.e., about the intension “the profit in 1987/8” rather than the extension. Another way to view this is that the latter two assessments are about the argument/value pairs 〈1987,0.992〉 and 〈1988,−0.449〉. We will make this view the basis of our treatment of assessment in SACHS: We extend the background ontology by a set of assessment theories that judge the intended functions in the functional blocks of the spreadsheet on their functional properties. 4.2 Assessment via Theories and Morphisms Assessment D,E,av,ad fi : D→E av : E D×D→B ad : E D×℘(D)→B AssessValue pos good ∀x∈D.av( fi,x)⇔ fi(x)>0 AssessDom grow good ∀D′⊆D.ad( fi,D′)⇔asc( fi,D′) Revenue ρ ρ : T→R Profit π π : T→R π(x)=ρ(x)−γ(x) ARevenue AProfit . . . . . . mv : σ, fi 7→ π md : σ, fi 7→ ρ where B is the set of Boolean values, R is the set of real numbers, and T the set of time intervals (over which profits are measured). Furthermore, σ :={D 7→T,E 7→R} Figure 11: A Partial Assessment Graph for Profits Consider the partial theory graph in Figure 11, which we will use to account for the assessments in the examples I to IV above. The figure shows the theories Rev- enue and Profit which are part of the background knowledge, the assessed theories ARevenue and AProfit, and the assessment theo- ries (set in the gray part) that will cover assessment itself. The theory Assessment pro- vides three concepts: a generic function fi (used as a placeholder for the intended function of the functional block we are assess- ing), a function av for assess- ing whether a value in a cell is ‘good’, and finally a function ad for assessing whether a function is ‘good’ over a subdomain. This generic theory — note that this does not provide any judgements yet, only the functions to judge — is then refined into concrete as- sessment theories by adding ax- ioms that elaborate the judgement functions av and ad , which are then used to provide concrete Festschrift Bernd Krieg-Brückner 14 / 20 ECEASST judgement functions to the assessed theories, via interpreting theory morphisms. The theory As- sessValue pos good restricts the interpretation of av so that it assesses the function fi as ‘good’ on an argument x, iff fi(x) is positive, and the theory AssessDom grow good restricts the in- terpretation of ad to a function asc to evaluate fi as ‘good’ on a subdomain D′ ⊆ D, iff fi is increasing on D′. Note that these assessments are still on the ‘generic function’ fi over a ‘generic domain’ D with a ‘generic range’ in E. These are made concrete by the theory morphisms mv and md that map these concrete sets and functions into the assessed theories, thereby applying the judgmental axioms in the assessment theories in the assessed theories. Of course theories AssessValue pos good and AssessDom grow good are just chosen to model the examples from the start of this section. A realistic formalization of assessment, would provide a large tool-chest of theories describing the “shape” of the function fi for knowledge engineers to choose from. With this, providing a judgement about a value becomes as simple as choosing a cell and an assessment theory: the cell determines the intended function, with its domain and range and thus the mapping of the theory morphism. Thus the assessed theory can be constructed automatically by the SACHS system. In our example we have restricted ourselves to unary functions, but of course it is very simple to provide assessment theories for any arity that occurs in practice. Moreover, we have only used assessment theories that only refer to inherent properties of the intended functions (e.g. being monotonically increasing), but many real-world assessments are context-dependent. E.g. one might want the profit of a German Company to grow more rapidly than the DAX. This is where the knowledge-based approach we are proposing really starts to shine: we just add an assessment theory with an axiom ∀t.av( fi,t)⇔ fi(t) fi(p(t)) > DAX(t) DAX(p(t)) where p(t) is the predecessor time interval of t. 4.3 The Envisioned Assessment Extension in SACHS We will now show how assessments modeled in the theory graph can be made useful for the user. As the assessments are bound to (the intended function of) a functional block, we extend the context menu with entries for all assessment functions. On the right we assume a right mouse click on the cell [B17] to show the context menu with the two assessment functions av and ad . When the “Assess Values of fBlock” entry is selected, SACHS is put into a special “assess- ment mode”, which brings assessment information to the user’s attention. In the background the SACHS system determines the version of the av axiom inherited by the AProfit, translates it into an Excel formula, and evaluates it to obtain the judgements. Here the axiom is ∀t.av(π,t)⇔π(t) > 0, and it is evaluated on all cells in the functional block, resulting in the values t,t,t,t, f , which SACHS color-codes as shown in Figure 13 to warn the user of any cells that get a negative judgement. At the same time, the assessment mode extends the explanatory labels by explanations texts from the background ontology. Selecting the menu element “Assess Domain of fBlock” gives the result in Figure 12. 15 / 20 Volume 62 (2013) Spreadsheets with a Semantic Layer Figure 12: Assess the Domain Figure 13: Assess the Values But as the assessments are synchronized with the assessed theories in the background theory graph, we can also analyze the assessments for possible causes. Recall that profits are defined as the difference between revenues and expenses, it makes sense to trace assessments through the dependency graph provided by the SACHS system for understanding the definitional structure of the spreadsheet concepts. Figure 14: Assess all Values Assess AAssess UAssess AAssessProfit UAssessProfit Profit ϕ ϕ ϕ ϕ pA pU ϕ := σ, f 7→ π and σ as in Figure 11 Figure 15: Multi-Context Assessment Note that this analysis is anchored to the cell: Figure 14 shows the definitional graph for the negatively assessed cell [F17] for the profits in the year 1988. Here the revenues are also negatively assessed (color-coded red in the definitional graph), so the problem might be with the revenues. Note as well that this graph cannot be used for a causal analysis, as the arrows here are still definitional dependency relations. We conjecture that causal analysis knowledge can transparently be included in the background ontology and can be made effective for the user in a similar interface. But we leave this for further research. Festschrift Bernd Krieg-Brückner 16 / 20 ECEASST 4.4 Multi-Context Assessments and Framing Note that the assessments above are “author assessments” since they are supposedly entered into the background ontology by the spreadsheet author. But the author’s assessment is not the only relevant one for the user to know: In the exemplary assessment statement V in Section 4.1 we have a single explanation that refers to three different assessments that differ along the role of the “assessor”. Multiple assessment contexts can be accommodated in our proposed model — any user of the system can enter assessments. These user assessments can even be stored in a private extension to the background ontology if the user does not have write access to the system-provided one. In fact we can enable multi-context assessment by just providing the av and ad functions with another argument that determines a fitting user or Community of Practice (see [KK06] for an introduction to Communities of Practice and their reification in the background knowledge). This will generally get us into the situation in Figure 15, where we have an assessment of profits by the author — in theory AAssessProfit — and one by the user — UAssessProfit (we have abstracted from the internal structure of the theories). The dashed arrow is the (functional) interpretation that maps the functional block to the author-assessed theory. In the framing-based user interface described above we use imports relations and views as framings and provide frame-based exploration of variants. In this example the canonical frame (the identity morphism from AAssessProfit to itself) can be generalized to the frame pA with source theory Assess, which spans a frame variant space that includes the frame pU and thus the user assessment, which the user can choose to explore this assessment. Needless to say, this works for any number of assessments (private or public). 5 Conclusion We have analyzed the reasons for users’ difficulties in understanding and appropriating complex spreadsheets, as they are found e.g. in financial controlling systems. A Wizard-Of-Oz experi- ment shows that one of the causes for this is that underlying semantic documents are biased to computational aspects and fail to model the provenance, interpretation, and ontological relations of the objects and concepts operationalized by the system. To remedy the situation we propose to explicitly model the intention of a spreadsheet as an intention model in an OMDoc-encoded theory graph that serves as an explicit knowledge base for a spreadsheet help system. We have developed the SACHS system that draws on such intention models to offer various semantic services that aid the user in understanding and interacting with the spreadsheets. An unanticipated benefit of our design decision to use theory graphs (a data structure inherited from formal methods in Software Engineering) was that we could model the practice of framing a mathematical object as establishing an imports relation or view into a theory describing it. The model is able to account for the salient aspects of framing: We have shown that taking framings into account in the user interface allows users to find their subjective perspective in the semantic help system. The necessary framing possibilities were naturally present in the background theory graph for our example. We attribute this to the fact that the theory graph was developed as a comprehensive overview over the background knowledge and not just tailored to the single spreadsheet application at hand. 17 / 20 Volume 62 (2013) Spreadsheets with a Semantic Layer But the Wizard-Of-Oz experiment also revealed that significant categories of explanations are still systematically missing from this setup, severely limiting the usefulness of the system. We have tried to extend the background ontology with a model of assessment to see whether the Semantic Illustration paradigm is sufficiently flexible to handle assessment. The proposed model shows that this is indeed the case, but still has various limitations. For instance, the need to pollute the background ontology with one new theory per assessment theory, where each assessed theory is largely empty seems somewhat unnatural and possibly intractable. Also, we lack a convincing mechanism for coordinating the exploration of assessment variants: In our example in Figure 1, if we change the assessment of a profit value, we would like to change that of the respective revenue cell to a corresponding assessment. Finally, we have only talked about Assessment of Value explanations. It seems that we can model Purpose and Assessment of Purpose explanations with a similar construction as the one proposed in Section 4.1: We start out with a base assessment theory which provides an assess- ment function like av, which in turn acts on a generic intended function fi of the functional block in question. But this — instead of mapping into Boolean values — maps into a set of purposes and tasks formalized in a ‘task ontology’ by which we would extend the background ontology. This might also make it possible to generate explanations for assessments in SACHS. Acknowledgements: This paper contains our view on the foundations of the SACHS (funded by DFKI) and the ensuing SiSsI project (funded by the German Research Council under grant KO-2484-12-1). Both project aim to extend semantic document modeling to spreadsheets. Bernd Krieg-Brückner has been proposing for years to do this. He always maintained that spreadsheets would be ideal targets for semantic preloading. Indeed our research shows that spreadsheets are semantically much more interesting than anticipated. We also thank the other SACHS and SiSsI project members as well as Klaus Hofmann for valuable discussions. The quality of the article was greatly enhanced by the helpful feedback of our reviewers, which we really appreciate. Bibliography [AE06] R. Abraham, M. Erwig. Inferring templates from spreadsheets. In ICSE ’06: Pro- ceedings of the 28th international conference on Software engineering. Pp. 182– 191. ACM, New York, NY, USA, 2006. doi:http://doi.acm.org/10.1145/1134285.1134312 [AN08] O. D. Andrade, D. G. Novick. Expressing Help at Appropriate Levels. In SIG- DOC’08 Conference Proceedings. Pp. 125–130. ACM, 2008. [BM08] D. A. Banks, A. Monday. Interpretation as a factor in understanding flawed spread- sheets. CoRR abs/0801.1856, 2008. [BP08] R. Brath, M. Peters. Spreadsheet Validation and Analysis through Content Visual- ization. CoRR abs/0803.0166, 2008. Festschrift Bernd Krieg-Brückner 18 / 20 http://dx.doi.org/http://doi.acm.org/10.1145/1134285.1134312 ECEASST [CDSW09] J. Carette, L. Dixon, C. Sacerdoti Coen, S. M. Watt (eds.). MKM/Calculemus Pro- ceedings. LNAI 5625. Springer Verlag, July 2009. [Din09] M. Dinmore. Documenting Problem-Solving Knowledge: Proposed Annota- tion Design Guidelines and their Application to Spreadsheet Tools. CoRR abs/0908.1192, 2009. [DJA93] N. Dahlbäck, A. Jönsson, L. Ahrenberg. Wizard of Oz studies: why and how. In Proceedings of the 1st international conference on Intelligent user interfaces. IUI ’93, pp. 193–200. ACM, New York, NY, USA, 1993. doi:http://doi.acm.org/10.1145/169891.169968 http://doi.acm.org/10.1145/169891.169968 [Ell07] M. Ellison. Embedded user assistance: The future for software help? interactions 14(1):30–31, 2007. doi:http://doi.acm.org/10.1145/1189976.1189997 [HM08] K. Hodnigg, R. T. Mittermeir. Metrics-Based Spreadsheet Visualization: Support for Focused Maintenance. CoRR abs/0809.3009, 2008. [Hug08] M. Hughes. User Assistance: Writing for a High-Context Culture. Online publication (http://www.uxmatters.com/mt/archives/2008/05/ user-assistance-writing-for-a-high-context-culture.php), May 2008. Accessed on 2009-05-15. [KK06] A. Kohlhase, M. Kohlhase. Communities of Practice in MKM: An Extensional Model. In Borwein and Farmer (eds.), Mathematical Knowledge Management (MKM). LNAI 4108, pp. 179–193. Springer Verlag, 2006. http://kwarc.info/kohlhase/papers/mkm06cp.pdf [KK09a] A. Kohlhase, M. Kohlhase. Compensating the Computational Bias of Spreadsheets with MKM Techniques. Pp. 357–372 in [CDSW09]. http://kwarc.info/kohlhase/papers/mkm09-sachs.pdf [KK09b] A. Kohlhase, M. Kohlhase. Spreadsheet Interaction with Frames: Exploring a Mathematical Practice. Pp. 341–356 in [CDSW09]. http://kwarc.info/kohlhase/papers/mkm09-framing.pdf [MGM+08] L. Moreau, P. Groth, S. Miles, J. Vazquez, J. Ibbotson, S. Jiang, S. Munroe, O. Rana, A. Schreiber, V. Tan, L. Varga. The Provenance of Electronic Data. Com- munications of the ACM 51(4):52–58, 2008. doi:http://doi.acm.org/10.1145/1330311.1330323 [Mur08] S. Murphy. Spreadsheet Hell. CoRR abs/0801.3118, 2008. [NW06] D. G. Novick, K. Ward. What Users Say They Want in Documentation. In SIG- DOC’06 Conference Proceedings. Pp. 84–91. ACM, 2006. 19 / 20 Volume 62 (2013) http://dx.doi.org/http://doi.acm.org/10.1145/169891.169968 http://doi.acm.org/10.1145/169891.169968 http://dx.doi.org/http://doi.acm.org/10.1145/1189976.1189997 http://www.uxmatters.com/mt/archives/2008/05/user-assistance-writing-for-a-high-context-culture.php http://www.uxmatters.com/mt/archives/2008/05/user-assistance-writing-for-a-high-context-culture.php http://kwarc.info/kohlhase/papers/mkm06cp.pdf http://kwarc.info/kohlhase/papers/mkm09-sachs.pdf http://kwarc.info/kohlhase/papers/mkm09-framing.pdf http://dx.doi.org/http://doi.acm.org/10.1145/1330311.1330323 Spreadsheets with a Semantic Layer [Pan00] R. R. Panko. Spreadsheet Errors: What We Know. What We Think We Can Do. In Symp. of the European Spreadsheet Risks Interest Group (EuSpRIG 2000). 2000. [RK13] F. Rabe, M. Kohlhase. A Scalable Module System. 2013. Manuscript, submitted to Information & Computation. http://kwarc.info/frabe/Research/mmt.pdf [Tag09] T. Tague. The Big Picture – How Semantic Technologies Introduce a New Paradigm for Interaction. Invited talk at the Semantic Technology Conference, 2009. http://www.semantic-conference.com/session/2120/ [Win06] T. Winograd. The Spreadsheet. In Winograd et al. (eds.), Bringing Design to Soft- ware. Pp. 228–231. Addison-Wesley, 2006. Festschrift Bernd Krieg-Brückner 20 / 20 http://kwarc.info/frabe/Research/mmt.pdf http://www.semantic-conference.com/session/2120/ Introduction Background Knowledge in Spreadsheets SACHS: A Semantic Help System for MS Excel Semi-Formal Ontologies as Theory Graphs In-Place Help Semantic Navigation Framing Frame Variants Background Knowledge Coverage by SACHS? Assessment Modelling Assessment Assessment via Theories and Morphisms The Envisioned Assessment Extension in SACHS Multi-Context Assessments and Framing Conclusion