Climbing the Software Assurance Ladder - Practical Formal Verification for Reliable Software Electronic Communications of the EASST Volume 076 (2019) Automated Verification of Critical Systems 2018 (AVoCS 2018) Climbing the Software Assurance Ladder - Practical Formal Verification for Reliable Software Claire Dross , Guillaume Foliard, Théo Jouanny, Lionel Matias, Stuart Matthews, Jean-Marc Mota, Yannick Moy, Pascal Pignard, Romain Soulat 17 pages Guest Editors: David Pichardie, Mihaela Sighireanu ECEASST Home Page: http://www.easst.org/eceasst/ ISSN 1863-2122 http://www.easst.org/eceasst/ ECEASST Climbing the Software Assurance Ladder - Practical Formal Verification for Reliable Software Claire Dross1∗ , Guillaume Foliard2, Théo Jouanny3, Lionel Matias2, Stuart Matthews4, Jean-Marc Mota5, Yannick Moy1, Pascal Pignard3, Romain Soulat5 1 AdaCore, F-75009 Paris 2 Thales Air Systems, F-91470 Limours 3 Thales Communications & Security, F-49300 Cholet 4Altran, Bath BA1 1AN, United Kingdom 5 Thales Research & Technologies, F-91767 Palaiseau Abstract: There is a strong link between software quality and software reliabil- ity. By decreasing the probability of imperfection in the software, we can augment its reliability guarantees. At one extreme, software with one unknown bug is not reliable. At the other extreme, perfect software is fully reliable. Formal verifica- tion with SPARK has been used for years to get as close as possible to zero-defect software. We present the well-established processes surrounding the use of SPARK at Altran UK, as well as the deployment experiments performed at Thales to fine- tune the gradual insertion of formal verification techniques in existing processes. Experience of both long-term and new users helped us define adoption and usage guidelines for SPARK based on five levels of increasing assurance that map well with industrial needs in practice. Keywords: Certification, Formal methods, Programming by contract 1 Introduction Taken literally, reliable software is the notion that we can rely on software to perform as intended. This is also how the international standard bodies and academic experts define it, as phrased in IEC 60050 terms applied to software: “reliability [is the] ability to perform as required, without failure, for a given time interval, under given conditions”. Currently, almost no software is reli- able in this sense, because the intention is usually expressed in ambiguous natural language, and the confidence that software behaves as intended is obtained by a combination of development discipline (to avoid introducing errors) and partial testing of all the possible software behaviors (to detect errors that were introduced). Hence, reliable software today is more an aspiration when building the software than a quality of the software produced. However, a link between software quality and reliability does exist, and it was clarified by researcher John Rushby [Rus09]: “prob- ability of (im)perfection [..] provides a bridge between correctness, which is the goal of software ∗ Work partly supported by the Joint Laboratory ProofInUse (ANR-13-LAB3-0007, https://why3.gitlabpages.inria.fr/ proofinuse/) 1 / 17 Volume 076 (2019) https://why3.gitlabpages.inria.fr/proofinuse/ https://why3.gitlabpages.inria.fr/proofinuse/ Climbing the Software Assurance Ladder verification (and especially formal verification), and the probabilistic properties such as relia- bility that are the targets for system level assurance.” This interpretation of reliable software as probably perfect software has been taken seriously in some companies like Altran UK, where specifications are routinely expressed with the preci- sion of a formal language, and confidence is obtained by a combination of classical techniques plus the guarantees provided by the use of formal verification. Tools for formal verification of software have reached a degree of automation and usability that makes them suitable for use in commercial contexts across a large set of industries, from the well-established - space, railway, aerospace & defense - to industries that more recently included software as a critical compo- nent like automotive and medical. The main tool used at Altran UK for formal specification, programming and formal verification of software is SPARK [MC15], a subset of the Ada pro- gramming language targeted at safety- and security-critical applications. The use of SPARK allows Altran UK to provide assurance that software will not crash or behave erratically, and that critical properties are satisfied, which it demonstrates by committing to these properties with its customers. While the benefits obtained by formal verification at Altran UK are clearly desirable, it may be intimidating for companies without formal verification know-how to start on this path. Knowing that others have replicated these benefits in other contexts is an important argument to make. Here, we are describing the experiments done at Thales with the support of AdaCore, during the years 2016 and 2017, to assess the costs and benefits of using formal verification of software using SPARK. With little investment in training (2 days in one experiment, self-training only in the other) and consulting (20 days in one experiment, online support only in the other), either provided externally or through self-training, the operational teams managed to specify intended behavior formally. The engineers in these teams were knowledgeable about Ada but not familiar with formal methods. The teams also proved critical properties of their software. Specifically: in multiple case studies the code was fully proved to be free of run-time errors (like buffer overflows and divisions by zero); in a subset of these case studies the code was proved to implement func- tional API specifications; while in another case study the code was proved to follow a specified safety automaton. In addition, the collaboration of AdaCore and Thales resulted in a set of guidelines [AT] that should be followed for an easier adoption of formal verification in existing projects, codebases and processes. These guidelines are based on five levels of assurance that can be achieved on software, in increasing order of costs and benefits. These guidelines are a major result of this work, as there is very little available guidance on the use of specific formal methods and tools in industrial context. It could be used as an inspiration by other formal verification platforms for software. In section 2, we introduce the SPARK formal verification platform. In section 3, we present the experience of Altran UK, a long-time user of formal verification with SPARK, and how it relates to traditional assurance levels (DAL and SIL) considered in industry. In section 4, we present two studies carried at Thales, a recent user of formal verification with SPARK, and how such adoption can be facilitated by the use of suitable guidelines. We finish with related works in section 5 and conclude. AVoCS 2018 2 / 17 ECEASST 2 SPARK: Formal Verification Focused on Practical Use For his PhD defense in 1969 on “A Program Verifier”, J. King submitted a manuscript that started with these sentences [Kin70]: This research is a first step toward developing a “verifying compiler”. Such a com- piler, as well as doing the standard translation of a program to machine executable form, attempts to prove that the program is “correct”. In order to do this a pro- gram must be annotated with propositions in a mathematical notation which define the “correct” relations among the program variables. The verifying compiler then checks for consistency between the program and its propositions. To most programmers, this may sound like a naive dream, whose illusory nature is exemplified by C.A.R. Hoare’s call in 2003 for researchers to tackle the “verifying compiler” as a Grand Challenge, more than thirty years later. Yet, “verifying compilers” are available today. For example, the formal development environments Coq and Isabelle/HOL have been used to create a compiler for C [Ler09] and a microkernel [KEH+09] which are guaranteed to be “correct” (related to a set of requirements and assumptions). The problem is that these “verifying compilers” are operating on proof languages that are reserved for experts. Since King’s PhD defense, there have been numerous attempts at defining practical “verifying compilers” for programming languages used in industry (first Pascal, then Ada, more recently Java and C#), none of which has succeeded in gaining industrial adoption. It is difficult to prove that a program is “correct”, and it will remain so for the foreseeable future. As Rustan Leino, a prominent researcher in formal program verification, put it in 2010: “Program verification is unusable. But perhaps not useless.” [LM10] Departing from this academic tradition, SPARK has been focused on practical formal verifi- cation from the start. SPARK has been adopted in numerous large industrial projects and only critical parts of the software were proved “correct” with respect to full functional (i.e. behav- ioral) specification. SPARK was used to prove specific properties of interest about the software, like the absence of all possible run-time errors (no division by zero, no buffer overflow, etc.) and some user-specified safety or security properties. SPARK is both a language and a toolset, supported by specific development and verifica- tion processes. In this article, we are focusing on the latest generation of SPARK technology, called SPARK 2014 [DEL+14], in which the specification language and the programming lan- guage have been unified as a subset of the programming language Ada 2012. Constraints on both program data and control can be specified using respectively type contracts (predicates and invariants) and function contracts (preconditions and postconditions). The concept of program contracts was invented by the researcher C.A.R. Hoare in 1969 in the context of reasoning about programs. In the mid-1980s, another researcher, Bertrand Meyer, introduced the modern function contract and type invariant in the Eiffel programming language [Mey88]. In its simplest formulation, a function contract consists of two Boolean expressions: a precondition to specify input constraints and a postcondition to specify output constraints. Function contracts have subsequently been included in many other languages, either as part of the language (e.g. contracts for SPARK), as part of the standard library (e.g. Code- Contracts for .NET [Fäh10]) or as an annotation language (such as JML for Java [CKLP06] or 3 / 17 Volume 076 (2019) Climbing the Software Assurance Ladder ACSL for C [BCF+]). Type invariants may come in two forms, depending on whether they can be temporarily violated (type invariants in SPARK) or not (type predicates in SPARK). Contracts can be executed as runtime assertions, interpreted as logic formulas by analysis tools, or both. The latest version of SPARK has opted for both. This design choice has far-reaching conse- quences. First, specifying properties of programs is similar to programming: there is no addi- tional language to learn and the tools available to the programmer also work on specifications. Second, contracts are executable, which means that they can be tested and debugged like code. Another important design choice was to allow SPARK and Ada code to coexist in the same files. Hybrid verification is obtained by using proof on SPARK code and test on Ada code. This is possible because contracts can be executed, and because test and proof use the exact same se- mantics for contracts [Cha10]. Other formal program verification technologies like Frama-C for C programs have made similar although not identical choices [KMMS16]. SPARK toolset focuses on automation and usability. Generation of implicit specifications lowers the cost of writing specifications, and generation of loop invariants, use of multiple state- of-the-art automatic provers, possible generation of counterexamples when proof fails, combi- nation of static analysis and proof, all lower the cost of proof by reducing the time and effort required to prove that the code respects its contracts. Usability is similar to other tools in the developer’s toolbox, mostly because formal verification can be performed by developers while they are developing, using their personal computers, thanks to the modularity and parallelization of the analysis. We identify five levels of assurance that can be achieved with SPARK, which are - in increasing order: 1. Stone level - valid SPARK 2. Bronze level - initialization and correct data flow 3. Silver level - absence of run-time errors (AoRTE) 4. Gold level - proof of key properties 5. Platinum level - full functional correctness At Stone level, strict SPARK rules are enforced on the code, having the effect of ensuring that a strong semantic coding standard is followed, which leads to better code quality and maintain- ability. At Bronze level, the SPARK code is guaranteed to be free from a number of defects like reads of uninitialized variables. At Silver level, the SPARK code is guaranteed to be free of run- time errors. At Gold level, the SPARK code is guaranteed to respect key integrity properties. At Platinum level, the SPARK code is guaranteed to implement a complete specification of intended behavior. Note that each level builds on the previous one, so that at Platinum level the guarantees given by all the lower levels are also achieved. 3 The Practice of Formal Verification Altran UK has a special relationship with the SPARK technology, being the heir of both PVL and then Praxis, the companies which have developed SPARK since 1987 [CS14]. Along the AVoCS 2018 4 / 17 ECEASST Software Integrity Level SPARK Verification Objective DAL SIL Bronze Silver Gold Platinum A 4 B 3 C 2 D 1 E 0 Figure 1: Technical Planning Guidelines for the Application of SPARK. The filled cells denote the three most common categories of application. years, Altran has used SPARK both directly and in partnership with our customers - through training, support and consulting - in a number of project domains which range across air traffic management, airborne systems, avionics, railway control & protection, security and defense systems. SPARK is used at Altran as an efficient means to both get as close as possible to zero-defect software and as a means to address the objectives of the relevant standards. This technical strat- egy has been subject to careful evaluation of costs and benefits, in order to apply formal verifi- cation where it brings more value to the business. At Altran UK, SPARK fits within an overall software development philosophy known as Correctness By Construction [Cha06a]. The key principles of this approach are: • to use techniques that prevent the introduction of errors (e.g. language subsets); • to maximize the ability to detect defects early (e.g. through the use of formal techniques); • to generate assurance evidence as you progress. The detail of how SPARK is applied varies from project to project, depending on factors which include the required integrity level, applicable standards, and the overall verification strategy for the system (in which SPARK will play a part amongst other techniques and tools). Together, these considerations will lead to a set of verification objectives for SPARK, which will be doc- umented in the technical plan at the start of a project (and which in turn support the assurance case either implicitly or explicitly if it is a formal deliverable). In spite of these variations, one can identify certain typical ways in which SPARK is applied on projects which have been shown to deliver high value in relation to the effort required. The table in Figure 1 summarizes Altran’s experience of how best to apply the different assurance levels possible with SPARK vs. the relative design integrity level of the software under development. Stone level is not represented as it is more an intermediate level during adoption of SPARK than a target assurance level. The way to understand this table is as both a summary of experience on industrial projects at Altran UK and as a starting point for how Altran UK approaches new projects. Every project at Altran will tailor its own approach. However, one would expect new projects to fall within 5 / 17 Volume 076 (2019) Climbing the Software Assurance Ladder the typical region(s) indicated in the table; any which did not would require justification in the planning phase. We have chosen to represent the software integrity level using two commonly understood scales: DAL (Design Assurance Level) is the terminology from DO-178 and SIL (Software Integrity Level) is the terminology used in DEFSTAN 00-55, IEC 61508, EN 50128 et al. The correspondence between DAL and SIL is necessarily informal because different standards define the levels according to different criteria. Note also that while DAL-E is defined by DO-178 its counterpart “SIL-0” is an informal but widely used term taken here to mean software below SIL-1 but which is still well-engineered. Experience shows that projects can be grouped into three broad categories, shown by the three filled regions in the table. Category 1, shown in black, represents our practice at the highest levels of integrity SIL-3 and SIL-4. Within this category, Silver (AoRTE proof) is considered the “default level”, but may be increased to Gold or even Platinum depending on whether key prop- erties and functional correctness respectively are verified by other means. Targeting Platinum (full functional proof) becomes less likely for a SIL-3 system where verification by testing could more easily be argued to be sufficient. Category 2, shown in gray, captures our practice at medium levels of integrity SIL-1 and SIL-2. Silver is still the default level, and it is very unlikely that Platinum would be employed on systems below SIL-3. However, proof of key properties (Gold) should still be strongly considered. There may be some key property where proof represents a very efficient means of verification, i.e. it is relatively easy to prove and relatively difficult to verify by any other means. The nature of such properties will vary from system to system, but could include even one key safety property (“the lift will not move when the doors are open”) or security property (“the account details cannot be accessed when the user is not logged in”). While testing can provide some level of confidence in such properties it can never provide a complete guarantee for any realistically-sized system, due to the impossibility of covering all possible states and input combinations. Category 3, shown in light gray, represents the lowest levels of integrity, so-called SIL-0. Even here, Silver is the default objective, but this could be weakened to Bronze if there is enough confidence that AoRTE was being sufficiently-well assured by other means or mitigation. The table shows that - for all but SIL-0 software - SPARK code will as a minimum be checked for AoRTE. Note that this level of verification implicitly means that all SPARK code has also been shown to be free of references to uninitialized variables and basic data flow errors. Experi- ence shows that the presence of this kind of flaw - which can have far-reaching consequences - can be immensely difficult to detect by other forms of verification [KHCP00a]. A key part of the software engineering process which maximizes the benefit of SPARK is a careful delineation of the “SPARK boundary” i.e. choosing which parts of the application software will be written in SPARK. Although the benefits of SPARK would push towards max- imizing the proportion of the software written in SPARK, other factors are likely to affect this engineering decision. For example, there may be pre-existing libraries to support the user inter- face or other external communication protocols that one wishes to use and which are qualified by alternative means. It is not unusual even to use different levels of SPARK verification within the same application. For example, SHOLIS [Cha06a] used this approach with SIL-4 parts of the application attaining full functional proof (Platinum level) while in lower-integrity functions (SIL-2) they verified only up to AoRTE (Silver level). The non-interference between different AVoCS 2018 6 / 17 ECEASST sections of the code was assured by the use of information flow analysis: a contract was attached to each subprogram specifying which global data items it could access in accordance with its SIL and the SPARK tools were used to check that the implementation respected these contracts. More generally, consideration has to be given to the assumptions that are made to support the verification objectives - how these are satisfied or mitigated by other activities in the overall V&V strategy [KCC+14]. The use of SPARK within the Correctness By Construction framework as described above has been demonstrated to produce software with very low defect density when compared to other high-integrity processes [Cha06a]. Although the above approach is the standard approach within Altran UK, the company has continued to explore new ways in which benefits can be gained from the use of SPARK, in particular the possibility of so-called “hybrid” approaches to verification, where a mixture of static and dynamic verification techniques are used to exploit the SPARK contracts. The hybrid approach that Altran is currently pioneering, called ConTestor, uses SPARK ver- ification at Silver level i.e. assurance of AoRTE using proof. In addition, SPARK contracts are used to add a functional specification to the code. Rather than verifying these contracts by proof using the SPARK tools (as per the standard Platinum approach), they are verified dynamically by testing. To perform these tests a fully-integrated version of the code is compiled with the run- time checks enabled for the functional contracts. Test cases for the integrated code are generated using constrained-random test generation and if no exceptions are raised during execution then the code has passed this functional test. The contracts effectively provide a test oracle i.e. an independent calculation of the expected outcome for each test case. However, rather than having to manually calculate the expected outcomes per test case, the contracts are written once and provide an implicit definition of the expected outcome for all possible test cases. 4 The Adoption of Formal Verification Contary to Altran UK, Thales has no established use of formal verification, but different units in Thales have been experimenting since 2015 with formal verification of programs using SPARK and Frama-C. The two case studies in this section describe the experiments with SPARK in the context of two different units working respectively in the domains of air defense systems software and cryptography. 4.1 First Study: Define an Adoption Strategy One trait of established industrial software development processes is their inertia in accepting new practices which could be considered as too disrupting, either by lack of understanding and know-how or, mostly for early adopters, because of the difficulty to assess costs and benefits. In the latter case, the upfront adoption effort is hiding the longer term process optimisation opportunities. In order to get a first idea of the possibilities SPARK-based formal verification could provide at Thales, a study was carried throughout 2016 with the aim of producing a first set of deployment guidelines supported by real life experiments on actual software applications. As formal verification with SPARK is not a widespread technology in the software industry, 7 / 17 Volume 076 (2019) Climbing the Software Assurance Ladder a prerequisite is to picture the range of its capabilities with a simple to remember concept. This led to the definition of the five levels of assurance previously introduced. As part of this study, AdaCore and Thales wrote a guidance document [AT] describing how SPARK could be adopted at these different levels. This is further conditioned by the phase of the software development lifecycle, which has a significant impact on the definition of activities to be performed when deploying SPARK. Given the current state of progress of some ongoing software development projects, four case studies were identified as potential targets for SPARK deployment experiments, from teams working on air defense systems software. The first case study meant to assess the effort for transitioning from Ada to SPARK code (Stone level) using a mature software application about to be ported onto a new execution plat- form. As porting the application on a new platform using a different compiler may introduce a different behavior in case of errors such as references to uninitialized variables, reaching the Bronze level seemed a desirable aim. A significant refactoring effort was required in order to cope with constructs excluded from the SPARK subset of the Ada language, the most promi- nent one being pointers. Thales engineers started using successfully the refactoring solutions described in the guidance document, but did not manage to complete refactoring in the expected time frame (5 person-days), due to the size of the chosen code base (around 300 klocs). This is expected to be completed in the coming year. The second case study focused on programmer proficiency. In that case study the small sub- program of less than 10 lines of code listed in Figure 2 was given to an experienced Ada program- mer with the goal of performing validation activities, both using the usual unitary test approach and a contract-based approach. Based on current tools, it took less than one hour for the expe- rienced software test engineer to set up a working test environment for the subprogram. On the other hand, writing relevant contracts on that same subprogram to formally prove properties took an order of magnitude more time for the same engineer. Interestingly, the amount of code to im- plement a contract was in that case as long and complex as the code to prove. As a consequence, there is no intention to invest in Gold level verification on numerical computations in the near future. The lesson is that one should start with the lowest levels of assurance and work upwards, as practiced in the subsequent case studies. The third case study was designed to complement test result artifacts on automatically gener- ated code. A large amount of the unit’s software application source code relating to data binary serialization and deserialization is automatically generated. The code generator compiles data models described through a domain specific language into Ada code. Up to now the test strategy for the code generator was mostly based on a limited set of regression tests and the confidence acquired over time as this technology was deployed across many projects over the last fifteen years. However, a hard to trigger weakness was lying dormant, which was cleaned up using a Gold level approach. With the support of SPARK experts, a first stage was to correct and refac- tor the code (2 kloc for the runtime and 21 kloc of generated code) to pass Stone, Bronze and Silver levels. For code written by savvy programmers making a moderate use of specialized AVoCS 2018 8 / 17 ECEASST subtype Nb Type i s N a t u r a l range 0 . . 100 ; subtype D Time Type i s F l o a t range 0 . 0 . . 1 000 . 0 ; subtype D e l t a T i m e T y p e i s D Time Type range 0 . 0 . . 1 . 0 ; procedure Study Case ( Nb Of Fp : i n Nb Type ; Nb Of Pp : i n Nb Type ; D e l t a T i m e : i n D e l t a T i m e T y p e ; Time : i n out F l o a t ) with Pre ⇒ Nb Of Pp > 0 and D e l t a T i m e > 0 . 0 and Time ≥ 0 . 5 ∗ F l o a t ( Nb Of Fp + Nb Of Pp ) ∗ D e l t a T i m e and Time < F l o a t ’ L a s t − F l o a t ( Nb Of Fp + Nb Of Pp ) ∗ D e l t a T i m e , Post ⇒ ( i f Nb Of Fp > 0 then Time ≥ Time ’ Old ) i s D : D Time Type ; T Fp : F l o a t ; T Pp : F l o a t ; begin D := F l o a t ( Nb Of Fp + Nb Of Pp ) ∗ D e l t a T i m e ; T Fp := Time − (D / 2 . 0 ) ; T Pp := T Fp + F l o a t ( Nb Of Fp ) ∗ D e l t a T i m e ; Time := T Pp + 0 . 5 ∗ F l o a t ( Nb Of Fp ) ∗ D e l t a T i m e ; end Study Case ; Figure 2: Simple problematic case for formal verification. The initial postcondition contained a strict inequality Time > Time’Old that is not true for high values of Time where the offset is absorbed. Even the fixed postcondition with a non-strict inequality is not provable as it depends on the respective magnitures of Nb Of Fp and Nb Of Pp which are not specified in precondition. 9 / 17 Volume 076 (2019) Climbing the Software Assurance Ladder language features such levels are easy targets, in this case less than half a person-day for a few hundreds lines of code. Reaching Gold level to prove one property related to buffer overflow on the generated code required a larger effort, two person-days in that case, in order to refactor the code for proof (to avoid the weakness mentioned above related to buffer overflow), interact with automatic provers through intermediate assertions and provide the required loop invariants. Given the extra level of confidence regarding the robustness these changes provide, Thales plans to deploy them in the next release of the code generator. The fourth case study targeted the proof of safety properties in a context where safety stan- dards apply. Safety properties are usually written as “nothing bad will ever happen” and, since their scope is usually on a large part of the code, need to be specified at the highest level of the code, almost at the entry point. Inside a 70 kloc control commands project, Thales and Ada- Core engineers identified a few units (7 kloc) defining a set of high level automata where those properties could be specified. As a first step, the engineers reached the Stone, Bronze and Silver levels on this code in less than a person-day. Then, contracts were added on subprograms imple- menting the automata, mostly to express the effect of calling each automaton, also in less than a person-day. Automatic proof was obtained without much difficulty after that, with no need for intermediate assertions, loop invariants or specific proof switches. The lessons learned here are that SPARK is expressive enough for typical safety automata properties, and powerful enough for automatic proof of such properties. Lessons learned. From an adoption point of view, Thales concluded from this first study that formal verification as implemented by SPARK 2014 and its associated toolset can be considered as a toolbox providing various opportunities for subsetting and tailoring. This flexibility gives the possibility to fine-tune the gradual insertion of formal verification techniques in existing processes, while mitigating risks both on their efficiency from a cost and planning point of view and their ability to output software with a defect density under control. 4.2 Second Study: Implement and Refine the Adoption Strategy In the field of high-security applications, which is particularly important for Thales, testing rep- resents a considerable part of the software development process. In addition to unit tests, other principles are implemented such as enforcing coding rules, peer code reviews and qualimetry surveys with many tools checking that those principles are strictly followed. One solution to lighten and improve this process, to produce software of improved quality, is the use of more suitable tools, such as formal verification tools to replace part of the tests. Indeed, formal proofs allow a comprehensive checking of proved parts, unlike testing that can only guarantee a partial checking of the software. After a previous internship in 2015 comparing some available environments for formal verifi- cation (eCv [Cro14], Frama-C [KKP+15], SPARK), another six-months internship in 2017 was dedicated to the study of the benefits of the Ada language and particularly the SPARK language for the security software developed at Thales. During this internship, Thales evaluated the vari- ous advantages of Ada and SPARK, by implementing the Adacore and Thales adoption guidance on two proofs of concept in the field of cryptography. AVoCS 2018 10 / 17 ECEASST The first case study was porting from C to Ada, then to SPARK, part of a cryptographic library which is used as an abstraction layer between a lower level cryptographic library (also in C) and client applications. This case study followed the guidance document produced in the previously mentioned first study, to convert an application from Ada to SPARK. The preliminary stage consisted of porting the C library code to valid Ada code. Porting API (.h files) was facilitated by g++ switch “–fdump-ada-spec” which produced comprehensive Ada specifications (.ads files) as well as Ada body skeletons (.adb files) generated automatically with the gnatstub tool. The body code was completed manually without difficulties as most C idioms are available with Ada. Interfacing with the C low-level cryptographic library was essential and was supported natively by Ada. This small step brought simpler code with pointer-related defensive code eliminated thanks to the use of handy Ada array attributes and warnings from the Ada compiler. Firstly, Stone level was reached by transforming the Ada code to be valid SPARK code. It mostly consisted in suppression of pointers (or at least encapsulating them in a non-SPARK unit) and transformation of functions with side effects into procedures (or at least encapsulating them in wrappers within a non-SPARK unit). Thus, it was possible to make a first analysis of the code with SPARK tools. This first step didn’t require major changes in the code but it pinpointed parts of the code with potential security vulnerabilities (pointer casts and side effects in particular). In a second stage, Bronze level was reached, analyzing the code for data flow and variable initialization. Data flow (Global) and information flow (Depends) contracts were added in the code to specify precisely the intended behavior. The analysis detected unused inputs which could then be removed, which is useful for maintenance, as well as partially initialized data structures, which is useful for debugging. In a third stage, Silver level was reached, ensuring absence of run-time errors in the code (AoRTE). Preconditions were added in the code, mostly to link the right algorithm with the right variant of a discriminated structure. In a fourth stage, Gold level was reached, verifying the functional behavior of the code. Pre- conditions and postconditions were added in the code to specify key security requirements: cleanup of security-sensitive working variables, correctness of output value, and consistency between parameters as presented in Figure ??. At this level, all the existing defensive code had been replaced by contracts. By achieving complete proof of these specifications, the propagation of error codes from low-level subprograms to high-level ones was no longer necessary. The second case study was about producing an API similar to the API ported from C during the first proof of concept, this time based on a low-level cryptographic library in Ada, which was also later proved with SPARK. The whole process from Stone level to Gold level was followed again. New technical issues arised: the need for loop invariants, contracts on type hierarchies for subprograms supporting dispatching, visibility of global variables in contracts of high level subprograms, and non-provable Ada code. Though loop invariants are the basis of formal proof, they are considered as tricky. Many unproved properties came mostly from weak preconditions or weak postconditions of subprograms called inside a loop, which were not obvious to under- stand. Object Oriented Programming brings another layer of complexity, with specific rules for inheriting subprograms and contracts over these subprograms. Global variables mentioned in 11 / 17 Volume 076 (2019) Climbing the Software Assurance Ladder procedure computeSha ( i n p u t : i n u i n t 8 t a r r a y ; i n p u t B y t e L e n : i n s t d i n t h . u i n t 3 2 t ; d i g e s t : out u i n t 8 t a r r a y ; o u t p u t B y t e L e n : i n s t d i n t h . u i n t 3 2 t ; hashByteLen : i n s t d i n t h . u i n t 3 2 t ) with Contract Cases ⇒ ( hashByteLen = 20 ⇒ d i g e s t ’ Length = 20 , hashByteLen = 32 ⇒ d i g e s t ’ Length = 32 , hashByteLen = 48 ⇒ d i g e s t ’ Length = 48 , hashByteLen = 64 ⇒ d i g e s t ’ Length = 64 ) , Pre ⇒ ( i n p u t B y t e L e n = i n p u t ’ Length and ( hashByteLen = 20 or hashByteLen = 32 or hashByteLen = 48 or hashByteLen = 64 ) ) ; Figure 3: Simple case of contract for expressing consistency between parameters, checking here that the length of the hashed message digest’Length is consistent with the type of hash used hashByteLen. data flow contracts propagate to the upper levels of the call tree, where they may not be visible anymore (due to abstraction mechanisms in Ada), which required costly workarounds. A better solution would have been to hide this particular effect in a low-level non-SPARK package body, or to use the data abstraction feature available in SPARK. Finally, some idiomatic Ada code did not lead to automatic proofs in SPARK, which led to changes for simpler and more readable code. Lessons learned. Thales learned a few lessons from this second study. First, the adoption guidance document was really helpful: it eased the implementation of SPARK during the second internship. As a result, it was also refined for future uses inside and outside Thales. Secondly, as stated in the guidance document, Users should refrain from changing the program for the benefit of only getting fewer messages from the tool, a principle that could be phrased as “do not please the tools”. Of course, it is sometimes adequate to change the program in a way that will cause some messages about unproved properties to disappear, provided this favors code quality, readability or maintenance. Otherwise, tools provide ways to silence messages, that should be used instead of changing the program. Thirdly, reaching Gold Level is more easily achievable when clear and meaningful software specifications are available. Finally, not all code can be proved but non-provable parts that are well identified can undergo peer code reviews. For instance, 90% of the code was automatically provable in the second case study, after suitable addition of contracts where necessary. 5 Related Works Formal methods have long been considered as a means of compliance to satisfy verification ob- jectives in critical software development for some certification domains, for example in railway (EN 50128) and industrial processes (IEC 61508). The avionics standard DO-178C in 2012 has more recently recognized formal methods as a means of compliance on a par with the dominant technique of testing. Other certification standards in the domains of automotive (ISO 26262), AVoCS 2018 12 / 17 ECEASST nuclear (IEC 60880) and space (ECSS-QST-80C) also recognize some uses of formal methods as verification techniques. The adequateness of formal methods for certification was thoroughly investigated by John Rushby in his report for the NASA in 1993 on “Formal Methods and the Certification of Critical Systems” [Rus93]. Although Rushby’s report talks about “formal methods”, this mostly corre- sponds to what we call today “theorem proving”: model checking techniques are mentioned en passant, and nothing is said about abstract interpretation techniques, which did not have then the recognition that they do today. More recently, Graf and Garavel studied extensively the use of formal methods for developing critical systems, and they cover in particular the impact of formal methods on development and verification processes [GG13]. More specific guidance exists in certain application domains, such as in avionics [BDHW10]. There is on the contrary very little guidance on the use of specific formal methods and tools. This is somewhat remediated by the availability of tool specific user guides and publicly available experience reports [WLBF09]. Company-specific guidance is developed to carry-over the expe- rience gained from project to project, in the companies using formal methods, but such guidance is kept confidential. Indeed, the experience gathered through previous projects is considered as a business advantage over the competition, and the guidance having been developed in the specific business context of the company, the information related to formal methods usage is very tied to other confidential information. In their dual role of SPARK tools providers and practitioners, Praxis and then Altran have always been keen on publicizing best practices and lessons learned with formal verification on industrial projects [KHCP00b, Cha06b, CS14]. The publication of the guidance co-developed between AdaCore and Thales [AT] on SPARK adoption follows this lead, which was possible because it was written since the start with the tool provider. This is sim- ilar to the joint effort by tool provider CEA, certifier Bureau Veritas and user Sirehna to publish guidelines on the use of Frama-C [DFK+16]. Formal methods have been divided between heavyweight and lightweight ones, with the for- mer being the original formal methods and the latter also being called the disappearing formal methods [Rus00]. SPARK is a case of lightweight or disappearing formal methods, in which the user does not directly manipulate the underlying formalism, but instead interacts with tools through multiple interfaces. Formal methods and tools are usually placed at some point in be- tween the heavyweight and lightweight extreme points. With the notion of software assurance levels, we have shown that a given tool can be placed at multiple places along this axis, and that a project can move between these places using the same tool. In particular, it is likely that other formal verification platforms for software such as Atelier B and Frama-C could similarly define their own software assurance levels. For example, the plugin structure of Frama-C could be used to define levels in terms of plugin usage [KKP+15]. 6 Conclusion Formal program verification with SPARK has been used for years at companies like Altran UK to get as close as possible to zero-defect software. Altran UK has developed software engineering processes to maximize the costs-benefit ratio of using SPARK. In particular, it has defined a mapping between levels of use of SPARK and software assurance targets (SIL/DAL), which is 13 / 17 Volume 076 (2019) Climbing the Software Assurance Ladder used by all projects at Altran UK. Altran UK is now investing in its use of SPARK for the future, by investigating innovative ways to generate tests from contracts, to combine tests and proofs and to analyze code generated from Simulink. Other companies like Thales are starting to use SPARK to obtain similar benefits. We have presented in this article the lessons learned at Thales on various deployment experiments at different levels of use of SPARK. As for every promising but complex technology, the success of its deployment is conditioned by the pace at which adopters can climb the learning curve and identify relevant insertion points and strategies into established development processes. While AdaCore expertise was essential in the success of these experiments, Thales has identified typical use cases where the methodology used could be replicated without external help. Thales is now aiming at clarifying how SPARK can be adapted to its internal processes. The guidance document written as a result of Thales experiments is being used to support adoption of SPARK in other teams inside Thales and is available for other companies to start on this path. We have benefited in multiple ways from the definition of the five software assurance levels that can be achieved with SPARK. First, the five levels clarify the verification objectives that can be achieved with formal verification: not only they provide simple and easy-to-remember names for communicating between stakeholders, they also make it explicit that upper levels build on the lower levels, and they provide at each level a clear identification of the costs and benefits. Secondly, the five levels make it easier to plan for progressive adoption of higher levels of software assurance, with lower levels requiring less effort than higher levels, and each level providing already very valuable benefits. These results could be translated to other formal methods that similarly provide different depths of use that could be translated to assurance levels. Acknowledgements. We would like to thank the anonymous referees for their useful remarks, as well as our colleagues at AdaCore, Altran and Thales for their reviews on earlier drafts of this article. Bibliography [AT] AdaCore, Thales. Implementation Guidance for the Adoption of SPARK. https: //www.adacore.com/books/implementation-guidance-spark. [BCF+] P. Baudin, P. Cuoq, J.-C. Filliâtre, C. Marché, B. Monate, Y. Moy, V. Prevosto. ACSL: ANSI/ISO C Specification Language. http://frama-c.com/download/acsl. pdf. [BDHW10] D. Brown, H. Delseny, K. Hayhurst, V. Wiels. Guidance for Using Formal Methods in a Certification Context. In Proc. ERTS. 2010. [Cha06a] R. Chapman. Correctness by Construction: A Manifesto for High Integrity Soft- ware. In Proceedings of the 10th Australian Workshop on Safety Critical Systems and Software - Volume 55. SCS ’05, pp. 43–46. Australian Computer Society, Inc., Darlinghurst, Australia, Australia, 2006. http://dl.acm.org/citation.cfm?id=1151816.1151820 AVoCS 2018 14 / 17 https://www.adacore.com/books/implementation-guidance-spark https://www.adacore.com/books/implementation-guidance-spark http://frama-c.com/download/acsl.pdf http://frama-c.com/download/acsl.pdf http://dl.acm.org/citation.cfm?id=1151816.1151820 ECEASST [Cha06b] R. Chapman. Correctness by Construction: A Manifesto for High Integrity Soft- ware. In Proceedings of the 10th Australian Workshop on Safety Critical Systems and Software - Volume 55. SCS ’05, pp. 43–46. Australian Computer Society, Inc., Darlinghurst, Australia, Australia, 2006. http://dl.acm.org/citation.cfm?id=1151816.1151820 [Cha10] P. Chalin. Engineering a Sound Assertion Semantics for the Verifying Compiler. IEEE Trans. Software Eng. 36(2):275–287, 2010. [CKLP06] P. Chalin, J. R. Kiniry, G. T. Leavens, E. Poll. Beyond Assertions: Advanced Spec- ification and Verification with JML and ESC/Java2. In Boer et al. (eds.), Formal Methods for Components and Objects. Pp. 342–363. Springer Berlin Heidelberg, Berlin, Heidelberg, 2006. [Cro14] D. Crocker. Can C++ Be Made As Safe As SPARK? In Proceedings of the 2014 ACM SIGAda Annual Conference on High Integrity Language Technology. HILT ’14, pp. 5–12. ACM, New York, NY, USA, 2014. doi:10.1145/2663171.2663179 http://doi.acm.org/10.1145/2663171.2663179 [CS14] R. Chapman, F. Schanda. Are We There Yet? 20 Years of Industrial Theorem Proving with SPARK. In Klein and Gamboa (eds.), Interactive Theorem Proving. Pp. 17–26. Springer International Publishing, Cham, 2014. [DEL+14] C. Dross, P. Efstathopoulos, D. Lesens, D. Mentré, Y. Moy. Rail, Space, Security: Three Case Studies for SPARK 2014. In Proc. ERTS. 2014. [DFK+16] L. Duboc, S. Flanc, F. Kirchner, H. Marteau, V. Prévosto, F. Sadmi, F. Védrine. Safer Marine and Offshore Software with Formal-Verification-Based Guidelines. In Proc. ERTS. 2016. [Fäh10] M. Fähndrich. Static Verification for Code Contracts. In Cousot and Martel (eds.), Static Analysis. Pp. 2–5. Springer Berlin Heidelberg, Berlin, Heidelberg, 2010. [GG13] H. Garavel, S. Graf. Formal Methods for Safe and Secure Computers Systems - BSI Study 875. BSI German Federal Office for Information Security, 2013. [KCC+14] J. Kanig, R. Chapman, C. Comar, J. Guitton, Y. Moy, E. Rees. Explicit Assumptions -A Prenup for Marrying Static and Dynamic Program Verification. 07 2014. [KEH+09] G. Klein, K. Elphinstone, G. Heiser, J. Andronick, D. Cock, P. Derrin, D. Elka- duwe, K. Engelhardt, R. Kolanski, M. Norrish, T. Sewell, H. Tuch, S. Winwood. seL4: Formal Verification of an OS Kernel. In Proceedings of the ACM SIGOPS 22Nd Symposium on Operating Systems Principles. SOSP ’09, pp. 207–220. ACM, New York, NY, USA, 2009. doi:10.1145/1629575.1629596 http://doi.acm.org/10.1145/1629575.1629596 15 / 17 Volume 076 (2019) http://dl.acm.org/citation.cfm?id=1151816.1151820 http://dx.doi.org/10.1145/2663171.2663179 http://doi.acm.org/10.1145/2663171.2663179 http://dx.doi.org/10.1145/1629575.1629596 http://doi.acm.org/10.1145/1629575.1629596 Climbing the Software Assurance Ladder [KHCP00a] S. King, J. Hammond, R. Chapman, A. Pryor. Is proof more cost-effective than testing? IEEE Transactions on Software Engineering 26(8):675–686, Aug 2000. doi:10.1109/32.879807 [KHCP00b] S. King, J. Hammond, R. Chapman, A. Pryor. Is Proof More Cost-Effective Than Testing? IEEE Trans. Software Eng. 26(8):675–686, 2000. [Kin70] J. C. King. A Program Verifier. PhD thesis, Pittsburgh, PA, USA, 1970. AAI7018026. [KKP+15] F. Kirchner, N. Kosmatov, V. Prevosto, J. Signoles, B. Yakobowski. Frama-C: A software analysis perspective. Formal Aspects of Computing 27(3):573–609, May 2015. doi:10.1007/s00165-014-0326-7 https://doi.org/10.1007/s00165-014-0326-7 [KMMS16] N. Kosmatov, C. Marché, Y. Moy, J. Signoles. Static versus Dynamic Verification in Why3, Frama-C and SPARK 2014. In 7th International Symposium on Leverag- ing Applications. 7th International Symposium on Leveraging Applications, p. 16. Springer, Corfu, Greece, Oct. 2016. https://hal.inria.fr/hal-01344110 [Ler09] X. Leroy. A formally verified compiler back-end. Journal of Automated Reasoning 43(4):363–446, 2009. http://gallium.inria.fr/∼xleroy/publi/compcert-backend.pdf [LM10] K. R. M. Leino, M. Moskal. Usable Auto-Active Verification. In Usable Verifica- tion Workshop. 2010. http://fm.csl.sri.com/UV10/ [MC15] J. W. McCormick, P. C. Chapin. Building High Integrity Applications with SPARK. Cambridge University Press, 2015. [Mey88] B. Meyer. Object-Oriented Software Construction. Prentice-Hall, Inc., Upper Sad- dle River, NJ, USA, 1st edition, 1988. [Rus93] J. Rushby. Formal Methods and the Certification of Critical Systems. Technical report, 1993. [Rus00] J. Rushby. Disappearing Formal Methods. In High-Assurance Systems Engineering Symposium. Pp. 95–96. Association for Computing Machinery, Albuquerque, NM, nov 2000. http://www.csl.sri.com/papers/hase00/ [Rus09] J. Rushby. Software Verification and System Assurance. In 2009 Seventh IEEE International Conference on Software Engineering and Formal Methods. Pp. 3–10. Nov 2009. doi:10.1109/SEFM.2009.39 AVoCS 2018 16 / 17 http://dx.doi.org/10.1109/32.879807 http://dx.doi.org/10.1007/s00165-014-0326-7 https://doi.org/10.1007/s00165-014-0326-7 https://hal.inria.fr/hal-01344110 http://gallium.inria.fr/~xleroy/publi/compcert-backend.pdf http://fm.csl.sri.com/UV10/ http://www.csl.sri.com/papers/hase00/ http://dx.doi.org/10.1109/SEFM.2009.39 ECEASST [WLBF09] J. Woodcock, P. G. Larsen, J. Bicarregui, J. Fitzgerald. Formal Methods: Practice and Experience. ACM Comput. Surv. 41(4):19:1–19:36, Oct. 2009. doi:10.1145/1592434.1592436 http://doi.acm.org/10.1145/1592434.1592436 17 / 17 Volume 076 (2019) http://dx.doi.org/10.1145/1592434.1592436 http://doi.acm.org/10.1145/1592434.1592436 Introduction SPARK: Formal Verification Focused on Practical Use The Practice of Formal Verification The Adoption of Formal Verification First Study: Define an Adoption Strategy Second Study: Implement and Refine the Adoption Strategy Related Works Conclusion