Safe Integration of Annotated Components in Open Source ProjectsThis work was supported by project CROSS, funded by FCT (PTDC/EIA-CCO/108995/2008) Electronic Communications of the EASST Volume 33 (2010) Proceedings of the Fourth International Workshop on Foundations and Techniques for Open Source Software Certification (OpenCert 2010) Safe Integration of Annotated Components in Open Source Projects Sérgio Areias, Daniela da Cruz, Pedro Rangel Henriques, Jorge Sousa Pinto 17 pages Guest Editors: Luis S. Barbosa, Antonio Cerone, Siraj A. Shaikh Managing Editors: Tiziana Margaria, Julia Padberg, Gabriele Taentzer ECEASST Home Page: http://www.easst.org/eceasst/ ISSN 1863-2122 http://www.easst.org/eceasst/ ECEASST Safe Integration of Annotated Components in Open Source Projects∗ Sérgio Areias1, Daniela da Cruz1, Pedro Rangel Henriques1, Jorge Sousa Pinto1 Departamento de Informática, Universidade do Minho, Braga, Portugal1 Abstract: The decision of using existing software components versus building from scratch custom software is one of the most complex and important choices of the entire development/integration process. However, the reuse of software compo- nents raises a spectrum of issues, from requirements negotiation to product selection and integration. The correct tradeoff is reached after having analyzed advantages and issues correlated to the reuse. Despite the reuse failures in real cases, many efforts have been made to make this idea successful. In this context of software reuse in open source projects, we address the problem of reusing annotated components proposing a rigorous approach to assure the quality of the application under construction. We introduce the concept of caller-based slicing as a way of certifying that the integration of a component annotated with a contract into a system will preserve the correct behavior of the former, avoiding malfunctioning after integration. To complement the efforts done and the benefits of slicing techniques, there is also a need to find an efficient way to visualize the main program with the annotated components and the slices. To take full profit of visualization, it is crucial to combine the visualization of the control/data flow with the textual representation of source code. To attain this objective, we extend the notions of System Dependence Graph and Slicing Criterion to cope with annotations. Keywords: Caller-based slicing, Annotated System Dependency Graph 1 Introduction Reuse is a very simple an natural concept, however in practice is not so easy. According to the literature, selection of reusable components has proven to be a difficult task [MS93]. Sometimes this is due to the lack of maturity on supporting tools that should easily find a component on a repository or library [SV03]. Also, non experienced developers tend to reveal difficulties when describing the desired component in technical terms. Most of the times, this happens because they are not sure of what they want to find [SV03, SS07]. Another barrier is concerned with reasoning about component similarities in order to select the one that best fits in the problem solution; usually this is an hard mental process [MS93]. Integration of reusable components has also proven to be a difficult task, since the process of understanding and adapting components is hard, even for experienced developers [MS93]. ∗ This work was supported by project CROSS, funded by FCT (PTDC/EIA-CCO/108995/2008) 1 / 17 Volume 33 (2010) Safe Integration of Annotated Components in Open Source Projects1 Another challenge to component reuse is to certify that the integration of such component in a open-source software system (OSS) keeps it correct. This is, to verify that the way the component is invoked will not lead to an incorrect behavior. A strong demand for formal methods that help programmers to develop correct programs has been present in software engineering for some time now. The Design by Contract (DbC) approach to software development [Mey92] facilitates modular verification and certified code reuse. The contract for a software component (a sub-program, or commonly, a procedure) can be regarded as a form of enriched software documentation composed of annotations (pre-conditions, post-conditions and invariants) that fully specifies the behavior of that component. So, a well- defined annotation can give us most of the information needed to integrate a reusable component in a OSS, as it contains crucial information about some constraints to obtain the correct behavior from the component. In this context, we say that the annotations (the pre-, post-conditions and invariants that form the contract) can be used to verify that each component invocation is valid (preserves the con- tract); in that way, we can guarantee that a correct system will still be correct after the integration of that component. This is the motivation for our research: to find a way to help on the safety reuse of components. This article introduces the concept of caller-based slicing, an algorithm that takes into ac- count the calls to an annotated component in order to certify that it is being correctly used. To support the idea, we also introduce GamaPolarSlicer, a tool that implements such algorithm: to identify when an invocation is violating the component annotation; and to display a diagnostic or guidelines to correct it. The remainder of paper is composed of 8 sections. Section 2 is devoted to basic concepts cru- cial to the understanding of the rest of the paper: the notions of slicing and system dependency graph are introduced. Section 3 formalizes the definition of caller-based slicing that supports our approach to safety reused of annotated components. Section 4 defines the concept of annotated System Dependency Graph (SDGa), used for the visual analysis of the slices and pre-conditions preservation. Section 5 illustrates the main idea through a concrete example. Section 6 gives a general overview of GamaPolarSlicer, introducing its architecture, functionalities and implemen- tation details. Section 7 discusses related work on slicing programs with annotated components as it is the main idea behind our proposal. Section 8 discusses related work on visualization of (sliced) programs, because we strongly believe that good visual tool is crucial for software analysis. Then the paper is closed in Section 9. 2 Basic Concepts In this section we introduce both the original concepts of slicing and system dependency graph. 2.1 Slicing Since Weiser first proposed the notion of slicing in 1979 in his PhD thesis [Wei79], hundreds of papers have been proposed in this area. Tens of variants have been studied, as well was al- gorithms to compute them. Different notions of slicing have different properties and different Proc. OpenCert 2010 2 / 17 ECEASST applications. These notions vary from Weiser’s syntax-preserving static slicing to amorphous slicing which is not syntax-preserving; algorithms can be based on dataflow equations, informa- tion flow relations or dependence graphs. Slicing was first developed to facilitate program debugging [M.93, ADS93, WL86], but it is then found helpful in many aspects of the software development life cycle, including software testing [Bin98, HD95], software metrics [OT93, Lak93], software maintenance [CLM96, GL91], program comprehension [LFM96, HHF+01], component re-use [BE93, CLM95], program inte- gration [BHR95, HPR89] and so on. Program slicing, in its original version, is a decomposition technique that extracts from a pro- gram the statements relevant to a particular computation. A program slice consists of the parts of a program that potentially affect the values computed at some point of interest referred to as a slicing criterion. Definition 1 (Slicing Criterion) A static slicing criterion of a program P consists of a pair C = (p,Vs), where p is a statement in P and Vs is a subset of the variables in P. A slicing criterion C = (p,Vs) determines a projection function which selects from any state trajectory only the ordered pairs starting with p and restricts the variable-to-value mapping func- tion σ to only the variables in Vs. Definition 2 (State Trajectory) Let C = (p,Vs) be a static slicing criterion of a program P and T =< (p1,σ1), (p2,σ2), ...,(pk,σk) > a state trajectory of P on input I. ∀i,1 ≤ i ≤ k: Pro j′C(pi,σi) = { λ i f pi 6= p < (pi,σi|Vs) > i f pi = p where σi|Vs is σi restricted to the domain Vs, and λ is the empty string. The extension of Pro j′ to the entire trajectory is defined as the concatenation of the result of the application of the function to the single pairs of the trajectory: Pro jC(T ) = Pro j ′ C(p1,σ1)...Pro j ′ C(pk,σk) A program slice is therefore defined behaviorally as any subset of a program which preserves a specified projections in its behavior. Definition 3 (Static Slicing) A static slice of a program P on a static slicing criterion C = (p,Vs) is any syntactically correct and executable program P′ that is obtained from P by deleting zero or more statements, and whenever P halts, on input I, with state trajectory T , then P′ also halts, with the same input I, with the trajectory T ′, and Pro jC(T ) = Pro jC(T ′). Related work of slicing programs taking into account the annotations of a program will be referred in Section 7. 3 / 17 Volume 33 (2010) Safe Integration of Annotated Components in Open Source Projects2 2.2 System Dependency Graph The use of dependency graphs to visualize the data and control flow of a program has been widely accepted in the last years (Section 8). Before exploring the use Dependency Graphs for visualization and comprehension, we present below the definitions of Procedure Dependency Graph and System Dependency Graph. Definition 4 (Procedure Dependence Graph) Given a procedure P, a Procedure Dependence Graph, PDG, is a graph whose vertices are the individual statements and predicates (used in the control statements) that constitute the body of P, and the edges represent control and data dependencies among the vertices. In the construction of the PDG, a special node, considered as a predicate, is added to the vertex set: it is called the entry node and is decorated with the procedure name. A control dependence edge goes from a predicate node to a statement node if that predicate condition the execution of the statement. A data dependence edge goes from an assignment statement node to another node if the variable assigned at the source node is used (is referred to) in the target node. Additionally to the natural vertices defined above, some extra assignment nodes are included in the PDG linked by control edges to the entry node: we include an assignment node for each formal input parameter, another one for each formal output parameter, and another one for each returned value — these nodes are connect to all the other by data edges as stated above. More- over, we proceed in a similar way for each call node; in that case we add assignment nodes, linked by control edges to the call node, for each actual input/output parameter (representing the value passing process associated with a procedure call) and also a node to receiving the returned values. Definition 5 (System Dependence Graph) A System Dependence Graph, SDG, is a collection of Procedure Dependence Graphs, PDGs, (one for the main program, and one for each compo- nent procedure) connected together by two kind of edges: control-flow edges that represent the dependence between the caller and the callee (an edge goes from the call statement into the entry node of the called procedure); and data-flow edges that represent parameter passing and return values, connecting actualin,out parameter assignment nodes with formalin,out parameter assign- ment nodes. 3 Caller-based slicing In this section, we introduce our slicing algorithm. We start by extending the notion of static slicing and slicing criterion to cope with the contract of a program. Definition 6 (Annotated Slicing Criterion) An annotated slicing criterion of a program P con- sists of a triple Ca = (a,Si,Vs), where a is an annotation of Pa (the annotated callee), Si corre- spond to the statement of P calling Pa and Vs is a subset of the variables in P (the caller), that are the actual parameters used in the call and constrained by α or δ . Proc. OpenCert 2010 4 / 17 ECEASST Definition 7 (Caller-based slicing) A caller-based slice of a program P on an annotated slicing criterion Ca = (α,call f ,Vs) is any subprogram P′ that is obtained from P by deleting zero or more statements in a two-pass algorithm: 1. a first step to execute a backward slicing with the traditional slicing criterion C = (call f ,Vs) retrieved from Ca — call f corresponds to the call statement under consideration, and Vs corresponds to the set of variables present in the invocation call f and intervening in the precondition formula (α ) of f 2. a second step to check if the statements preceding the call f statement will lead to the satisfaction of the callee precondition. For the second step in the two-pass algorithm, in order to check which statements are re- specting or violating the precondition we are using abstract interpretation, in particular symbolic execution. According to the original idea of James King in [Kin76], symbolic execution can be described as “instead of supplying the normal inputs to a program (e.g. numbers) one supplies symbols representing arbitrary values. The execution proceeds as in a normal execution except that values may he symbolic formulas over the input symbols.” Using symbolic execution we will be able to propagate the precondition of the function being called through the statements preceding the call statement. In particular, to integrate symbolic execution with our system, we are thinking to use JavaPathFinder [APV07]. JavaPathFinder is a tool than can perform program execution with symbolic values. Moreover, JavaPathFinder can mix concrete and symbolic execution, or switch between them. JavaPathFinder has been used for finding counterexamples to safety properties and for test input generation. To sum up, the main goal of our caller-based slicing algorithm is to facilitate the use of an- notated components by discovering statements that are critical for the satisfaction of the pre- condition, i.e., that do not verify the precondition or whose statements values can lead to its non-satisfaction (a kind of tracing call analysis of annotated procedures). 4 Annotated System Dependency Graph (SDGa) In this section we present the definition of Annotated System Dependency Graph, SDGa for short, that is the internal representation that supports our slicing-based code analysis approach. Definition 8 (Annotated System Dependence Graph) An Annotated System Dependency Graph, SDGa, is a SDG in which some nodes of its constituent PDGs are annotated nodes. Definition 9 (Annotated Node) Given a PDG for an annotated procedure Pa, an Annotated Node is a pair < Si,a > where Si is a statement or predicate (control statement or entry node) in Pa, and a is its annotation: a pre-condition α , a post-condition ω , or an invariant δ . The differences between a traditional SDG and an SDGa are: • Each procedure dependency graph (PDG) is decorated with a precondition as well as with a postcondition in the entry node; 5 / 17 Volume 33 (2010) Safe Integration of Annotated Components in Open Source Projects3 • The while nodes are also decorated with the loop invariant (or true, in case of invariant absence); • The call nodes include the pre- and postcondition of the procedure to be called (or true, in case of absence); these annotations are retrieved from the respective PDG and instantiated as explained below. We can take advantage from the call linkage dictionary present in the SDGa (inherited from the underlying SDG) to associate the variables used in the calling statement (the actual parameters) with the formal parameters involved in the annotations. Given a program and an annotated slicing criterion, we identify the node of the respective SDGa that corresponds to the criterion (yellow node in Figure 1). After building the respective caller-based slice, the critic statements will be highlighted in the graph, making easier to identify the statements violating the precondition (red nodes in Figure 1). 5 An illustrative example To illustrate the previous definitions and our proposal, consider the program listed below (Exam- ple 1) that computes the maximum difference among student ages in a class. Example 1 DiffAge p u b l i c i n t D i f f A g e ( ) { i n t min = S y s t e m . I n t 3 2 . MaxValue , max = S y s t e m . I n t 3 2 . MinValue , d i f f ; S y s t e m . o u t . p r i n t ( ” Number o f e l e m e n t s : ” ) ; i n t num = S y s t e m . i n . r e a d ( ) ; i n t [ ] a = new i n t [ num ] ; f o r ( i n t i = 0 ; i y)? \result == x : \result == y @∗/ 1: public int Min(int x, int y) { 2: int res; 3: res = x−y; 4: return ((res > 0)? y : x); 5: } Example 3 Max /∗@ requires x ≥ 0 && y ≥ 0 @ ensures (x > y)? \result == y : \result == x @∗/ 1: public int Max(int x, int y) { 2: int res; 3: res = x−y; 4: return ((res > 0)? x : y); 5: } Example 4 Backward Slice for a[i] i n t [ ] a = new i n t [ num ] ; f o r ( i n t i = 0 ; i 3.3.CO;2-0 http://dx.doi.org/http://dx.doi.org/10.1002/(SICI)1096-908X(199605)8:3<145::AID-SMR127>3.3.CO;2-0 http://dx.doi.org/http://doi.acm.org/10.1145/372202.372784 citeseer.ist.psu.edu/gallagher91using.html Safe Integration of Annotated Components in Open Source Projects12 [GO97] K. Gallagher, L. O’Brien. Reducing Visualization Complexity using Decomposition Slices. In Proc. Software Visualisation Work. Pp. 113–118. Department of Computer Science, Flinders University, Adelaide, Australia, 11–12 Dezembro 1997. [HD95] M. Harman, S. Danicic. Using Program Slicing to Simplify Testing. Software Test- ing, Verification & Reliability 5(3):143–162, 1995. citeseer.ist.psu.edu/100763.html [HHF+01] M. Harman, R. Hierons, C. Fox, S. Danicic, J. Howroyd. Pre/Post Conditioned Slic- ing. icsm 00:138, 2001. doi:http://doi.ieeecomputersociety.org/10.1109/ICSM.2001.972724 [HPR89] S. Horwitz, J. Prins, T. Reps. Integrating noninterfering versions of programs. ACM Trans. Program. Lang. Syst. 11(3):345–387, 1989. doi:http://doi.acm.org/10.1145/65979.65980 [Kin76] J. C. King. Symbolic execution and program testing. Commun. ACM 19(7):385–394, 1976. doi:http://doi.acm.org/10.1145/360248.360252 [Kri04] J. Krinke. Visualization of Program Dependence and Slices. In ICSM ’04: Proceed- ings of the 20th IEEE International Conference on Software Maintenance. Pp. 168– 177. IEEE Computer Society, Washington, DC, USA, 2004. [Lak93] A. Lakhotia. Rule-based approach to computing module cohesion. In ICSE ’93: Pro- ceedings of the 15th international conference on Software Engineering. Pp. 35–44. IEEE Computer Society Press, Los Alamitos, CA, USA, 1993. [LC04] G. T. Leavens, Y. Cheon. Design by Contract with JML. 2004. [LFM96] A. D. Lucia, A. R. Fasolino, M. Munro. Understanding Function Behaviors through Program Slicing. In Proceedings of the 4th Workshop on Program Comprehension. Pp. 9–18. 1996. citeseer.ist.psu.edu/delucia96understanding.html [M.93] K. M. Interprocedural dynamic slicing with applications to debugging and testing. PhD thesis, Linkoping University, Sweden, 1993. [Mey92] B. Meyer. Applying ”Design by Contract”. Computer 25(10):40–51, 1992. doi:http://dx.doi.org/10.1109/2.161279 [MS93] N. A. M. Maiden, A. G. Sutcliffe. People-oriented Software Reuse: the Very Thought. In Advances in Software Reuse - Second International Workshop on Soft- ware Reusability. Pp. 176–185. IEEE Computer Society Press, 1993. [OT93] L. Ottenstein, J. Thuss. Slice based metrics for estimating cohesion. 1993. citeseer.ist.psu.edu/ott93slice.html Proc. OpenCert 2010 16 / 17 citeseer.ist.psu.edu/100763.html http://dx.doi.org/http://doi.ieeecomputersociety.org/10.1109/ICSM.2001.972724 http://dx.doi.org/http://doi.acm.org/10.1145/65979.65980 http://dx.doi.org/http://doi.acm.org/10.1145/360248.360252 citeseer.ist.psu.edu/delucia96understanding.html http://dx.doi.org/http://dx.doi.org/10.1109/2.161279 citeseer.ist.psu.edu/ott93slice.html ECEASST [San95] G. Sander. Graph Layout through the VCG Tool. In GD ’94: Proceedings of the DIMACS International Workshop on Graph Drawing. Pp. 194–205. Springer-Verlag, London, UK, 1995. [SS07] S. G. Shiva, L. A. Shala. Software Reuse: Research and Practice. In ITNG. Pp. 603– 609. IEEE Computer Society, 2007. http://dblp.uni-trier.de/db/conf/itng/itng2007.html#ShivaS07 [SV03] K. Sherif, A. Vinze. Barriers to adoption of software reuse a qualitative study. Inf. Manage. 41(2):159–175, 2003. doi:http://dx.doi.org/10.1016/S0378-7206(03)00045-4 [Wei79] M. D. Weiser. Program slices: formal, psychological, and practical investigations of an automatic program abstraction method. PhD thesis, Ann Arbor, MI, USA, 1979. [WL86] M. Weiser, J. Lyle. Experiments on slicing-based debugging aids. In Papers pre- sented at the first workshop on empirical studies of programmers on Empirical stud- ies of programmers. Pp. 187–197. Ablex Publishing Corp., Norwood, NJ, USA, 1986. 17 / 17 Volume 33 (2010) http://dblp.uni-trier.de/db/conf/itng/itng2007.html#ShivaS07 http://dx.doi.org/http://dx.doi.org/10.1016/S0378-7206(03)00045-4 Introduction Basic Concepts Slicing System Dependency Graph Caller-based slicing Annotated System Dependency Graph (SDGa) An illustrative example GamaPolarSlicer Architecture Implementation Related Work — Slicing Related Work — Visualization of (sliced) programs Conclusion