Formal Specification, Verification, and Implementation of Fault-Tolerant Systems using EventML Electronic Communications of the EASST Volume 72 (2015) Proceedings of the 15th International Workshop on Automated Verification of Critical Systems (AVoCS 2015) Formal Specification, Verification, and Implementation of Fault-Tolerant Systems using EventML Vincent Rahli, David Guaspari, Mark Bickford, and Robert L. Constable 15 pages Guest Editors: Gudmund Grov, Andrew Ireland ECEASST Home Page: http://www.easst.org/eceasst/ ISSN 1863-2122 http://www.easst.org/eceasst/ ECEASST Formal Specification, Verification, and Implementation of Fault-Tolerant Systems using EventML Vincent Rahli1∗, David Guaspari, Mark Bickford2, and Robert L. Constable2 1 vincent.rahli@gmail.com SnT, University of Luxembourg, Luxembourg 2 www.nuprl.org Cornell University, Ithaca, NY, USA Abstract: Distributed programs are known to be extremely difficult to implement, test, ver- ify, and maintain. This is due in part to the large number of possible unforeseen interactions among components, and to the difficulty of precisely specifying what the programs should accomplish in a formal language that is intuitively clear to the programmers. We discuss here a methodology that has proven itself in building a state of the art implementation of Multi-Paxos and other distributed protocols used in a deployed database system. This article focuses on the basic ideas of formal EventML programming illustrated by implementing a fault-tolerant consensus pro- tocol and showing how we prove its safety properties with the Nuprl proof assistant. Keywords: functional programming, formal methods, formal verification, theorem proving, distributed systems, fault tolerance, event logic, event-based programming 1 Introduction Protocol Specification, Verification, and Synthesis. There is good evidence that appropriate formal methods can substantially improve the reliability of distributed protocols and that such methods are especially valuable for this kind of programming because of its intrinsic complexity. We have invested in this line of work for several years, using constructive logic because it sup- ports provably correct code synthesis from proofs and because aspects of distributed computing are essentially constructive: agents make decisions according to some local information, and a protocol specifies how that information is acquired. “Provably correct” here means that machine checked proofs guarantee that programs satisfy desired correctness properties. One reason that distributed systems are especially difficult to code correctly and maintain is that there are many intricate failure scenarios to consider. Failure scenarios can be hard to generate and testing them all is not usually possible. Model checkers are often used to verify that distributed systems are correct [30, 1, 20]. However, only models of the actual code are verified correct, and such tools may not be able to exhaustively search the space of failure scenarios. Proof assistants allow one to provide definitive arguments. We use the EventML language to develop protocols. EventML works synergistically with the Nuprl proof assistant [13, 2], which is closely related to the Coq [4, 14] proof assistant. Nuprl is ∗ This work was partially supported by the DARPA CRASH project, award number FA8750-10-2-0238, by the SnT, and by the National Research Fund Luxembourg (FNR), through PEARL grant FNR/P14/8149128. 1 / 15 Volume 72 (2015) mailto:vincent.rahli@gmail.com www.nuprl.org Formal Specification, Verification, and Implementation of Fault-Tolerant Systems using EventML a programing/logical environment based on Constructive Type Theory (CTT) [13, 2], that allows one to both prove mathematical results, and program and prove properties about these programs, and this in a single formal method tool. EventML. EventML is a domain-specific ML-like functional programming language for dis- tributed protocols based on asynchronous message passing. It allows programming distributed programs in an event-based style, hence the name “EventML”. The language provides combi- nators to implement what can be regarded as event recognizers and event handlers. EventML is based on two formal models of distributed computing implemented in Nuprl: (1) the Logic of Events (LoE) [5, 7] to specify and reason about the information flow of distributed program runs, (2) a General Process Model (GPM) [6] to implement these information flows. The seman- tic meaning of an EventML program is expressed both by a LoE formula and a GPM program. Because of this dualism we also refer to EventML programs as constructive specifications. Currently, EventML docks with Nuprl, but in principle can connect to any prover that imple- ments LoE and GPM. Because every EventML type is a Nuprl type, docking means that any Nuprl expression whose type is an EventML type can be imported into an EventML program. The diagram below shows the interaction between EventML and Nuprl. Once we have ex- tracted the semantic meaning of an EventML specification in terms of a LoE formula and a GPM program, we automatically prove that the program satisfies the formula. It remains to interac- tively prove that the LoE formula satisfies the desired correctness properties. Computation Model. EventML’s computation model is based on GPM. A process that takes inputs of type A, and outputs elements of type B, is an element of the following type (the definition of the Nuprl corec type is outside the scope of this paper): corec(λ P.(A → P × Bag(B))+Unit). Unit is a singleton type and + is the disjoint union type. Therefore, a process is one of two things: a function that given an input of type A, generates a (possibly empty) bag of outputs of type B, and becomes a possibly different process; or a special value called halt, that is used to denote a terminated process. Because GPM is implemented in Nuprl, a process is a Nuprl program (i.e., an expression of Nuprl’s programming language, an untyped λ -calculus) that can be executed by interpreting it according to the rules of Nuprl’s computation system. The Logic of Events. The Logic of Events (LoE) [5, 7], related to Lamport’s notion of causal order [22], was developed to reason about events occurring in the execution of a distributed system. LoE has been used among other things to verify consensus protocols [29] and cyber- physical systems [3]. In the context of this paper, an event is an abstract entity corresponding to the reception of a message1; the message is called the primitive information of the event. An event happens at a specific point in space/time. The space coordinate of an event is called its location, and the time coordinate is given by a well-founded causal ordering on events that totally orders all events at the same location. Using LoE one can describe systems in terms of the causal relations among events and (ultimately) their primitive information. 1 Formally events are more general than that because they might correspond to something else than just the reception of messages. Proc. AVoCS 2015 2 / 15 ECEASST Event orderings. To reason about a protocol in LoE, we reason about its possible runs. An event ordering is an abstract representation of one run of a distributed system; it provides a formal definition of a message sequence diagram as used by systems designers. It is a structure consisting of: (1) a set of events; (2) a function loc that associates a location with each event; (3) a function info that associates primitive information with each event; and (4) a well-founded causal ordering relation, <, on events [22]. We express system properties as predicates on event orderings. A system satisfies such a property if every execution satisfies the predicate. The message sequence diagram on the right depicts a simple event ordering. L1 L2 `̀ echo`̀ • e1 `̀ forward`̀ •e2 `̀ ackn`̀•e3 Event e1 corresponds to the reception of a message with header `̀ echo`̀ at location L1. Upon receipt of that `̀ echo`̀ message, L1 forwards it to L2, which causes e2. Upon re- ceipt of that `̀ forward`̀ message, L2 sends an acknowledgment to L1, which causes e3. Events e1 and e3 have same location, and e1 happens causally before e2, which happens causally before e3. We write e1 < e2, e2 < e3, and e1 >= Round ) ; ; o b s e r v e r V o t e r ( n , cmd ) = ( Rounds ( n , cmd ) u n t i l ( N o t i f y n ) ) || ( N o t i f y n ) ; ; (∗ ============ NewVoters : a s t a t e machine ============ ∗) (∗ −− i n p u t s −− ∗) o b s e r v e r RcvProposal = p r o p o s e ’ b a s e || ( ( \ ( ( ( n , r ) , c ) , s ) . { ( n , c )}) o v o t e ’ b a s e ) ; ; (∗ −− f i l t e r −− ∗) l e t n e w p r o p o s a l ( n , cmd ) ( max , m i s s i n g ) = n > max o r deq−member ( op = ) n m i s s i n g ; ; (∗ −− update −− ∗) l e t u p d r e p l i c a ( n , cmd ) ( max , m i s s i n g ) = i f n e w p r o p o s a l ( n , cmd ) ( max , m i s s i n g ) t h e n i f n > max t h e n ( n , m i s s i n g ++ ( from−u p t o ( max + 1 ) n ) ) e l s e ( max , l i s t− d i f f ( op = ) m i s s i n g [ n ] ) e l s e ( max , m i s s i n g ) ; ; (∗ −− o u t p u t −− ∗) l e t o u t p r o p o s a l l o c ( n , cmd ) s t a t e = i f n e w p r o p o s a l ( n , cmd ) s t a t e t h e n {( n , cmd)} e l s e {}; ; (∗ −− s t a t e machine −− ∗) o b s e r v e r R e p l i c a S t a t e = Memory(\ l o c . ( 0 , [ ] ) , u p d r e p l i c a , RcvProposal ) ; ; o b s e r v e r NewVoters = o u t p r o p o s a l o ( RcvProposal , R e p l i c a S t a t e ) ; ; (∗ ============ R e p l i c a & Main program ============ ∗) o b s e r v e r R e p l i c a = NewVoters >>= V o t e r ; ; main SC where SC = R e p l i c a @ r e p s array of commands gets filled whenever a quorum of agents reach consensus on which command to place in n. Decisions result from holding elections, and we spawn a separate process to conduct each one. In this case, for each slot number n, we hold an election to decide which proposals of form (n, ) to accept. The tally from any particular ballot may be indecisive, so additional rounds of balloting will be spawned as needed. The crucial decisions are when to begin a new 5 / 15 Volume 72 (2015) Formal Specification, Verification, and Implementation of Fault-Tolerant Systems using EventML round of balloting, what constraints participants must observe in their successive votes, and how to detect that consensus has been achieved (complicated by the fact that multiple rounds in the same election may be occurring simultaneously). Interface. An input event to the protocol is the arrival of a message with header `̀ propose`̀ whose body is a proposal—i.e., a value of type: t y p e P r o p o s a l = I n t ∗ Cmd The type of commands is a parameter of the specification: p a r a m e t e r Cmd, cmdeq : Type ∗ Cmd Deq One subtlety: The protocol requires the ability to determine whether two values of Cmd are equal. So we require an additional parameter, an “equality decider”—here called cmdeq—able to perform that computation. The inputs to the protocol are messages with header `̀ propose`̀ and body of type Proposal i n p u t propose : P r o p o s a l This declaration implicitly defines the base observer propose’base that detects these input events and observes their data. Outputs of the protocol are directed messages with header `̀ notify`̀ . The data component of an output contains a Proposal value that has been accepted. o u t p u t n o t i f y : P r o p o s a l This declaration does not introduce a base observer recognizing the arrival of `̀ notify`̀ mes- sages, because those events occur outside our system. However, it implicitly declares the func- tions notify’send and notify’bcast for creating directed messages. If m is the `̀ notify`̀ message with body p, then the expression (notify’send l p) is the directed message (l,m) instructing that m be sent to l; and the expression (notify’bcast {l1,l2,...} p) is the bag {(l1,m),(l2,m),...} of such instructions. Typically, the complete interface of a system is defined in terms of its input messages, its out- put messages, and its internal messages, i.e., messages that can only be produced and consumed by the participants of the system. The internal messages exchanged by the participants of the protocol presented in this section are as follows: `̀ vote`̀ messages, by which the replicas cast their votes; `̀ decided`̀ messages, which inform replicas that consensus has been detected on a particular proposal; and `̀ retry`̀ messages, which are described below. Replicas. To characterize top-level agents in the protocol we will define the event observer Replica. The main program, SC, executes the protocol: main SC where SC = R e p l i c a @ r e p s The bag of locations reps, another parameter of the specification, denotes the locations at which the replicas will execute. We may think of SC as the restriction of Replica to an observer that responds only to events at the locations in reps, or as the result of installing an “instance” of Replica at each of those locations. SC can be implemented by a finite number of instances, while Replica cannot because it responds to events at all possible locations. For each n, the protocol conducts a separate election to vote on proposals for the nth command. Replica spawns subprocesses that cast votes in these elections and identify the winners. The Proc. AVoCS 2015 6 / 15 ECEASST spawning (delegation) operator “_>>=_” is an EventML primitive which is used by processes to start sub-processes:4 o b s e r v e r R e p l i c a = NewVoters >>= V o t e r The observer NewVoters decides when to spawn a new voting process. Voter is a higher-order function; the values it returns are observers that do the voting. When some NewVoters-event e occurs and v ∈ NewVoters(e), Replica spawns a local instance of the observer Voter(v). By local instance we mean this: each subprocess spawned at a NewVoters-event e at location loc acts only at loc and can only react to messages arriving at loc after e. For any e there will be at most one v such that v ∈ NewVoters(e). So a NewVoters-event spawns only one subprocess. (Though it is not required, we typically apply delegation only to such “singled- valued” observers.) A note on terminology: SC requires several higher-order functions, such as Voter, that return event observers. For convenience we will use “a Voter observer” or “a Voter” as a shorthand for “an event observer returned by Voter.” State machines. Informally, we will call an observer a state machine if it defines a distinct state machine at each location. We will say that it reacts to an event if it recognizes the event or if the event can cause its internal state to change. NewVoters is a state machine. It reacts to `̀ proposal`̀ (coming from outside the system) and `̀ vote`̀ messages (from inside), and it filters those events. At any location loc, NewVoters recognizes the first time that loc has received a proposal or vote about the nth command and, when it does, outputs (a singleton bag containing) its value. If the value of such an event is (n,c), the effect of (NewVoters >>= Voter) is there- fore to spawn a local instance of the event observer Voter(n,c) at location loc. The initialization data (n,c) instructs that Voter to vote for (n,c) on the first round. Voter. Voter observers cast votes and tally the votes they receive to determine whether some proposal has achieved consensus. A Voter will not announce a consensus for proposal (n,c) unless it has received 2 ∗F+ 1 votes for (n,c) from 2 ∗F+ 1 different replicas. We cannot guarantee that any particular poll of the Voter observers will achieve such a result. Accordingly, for each n we allow arbitrarily many do-over polls: Successive polls for slot number n are assigned consecutive integers called round numbers. Voting rounds (or just rounds for short) are pairs of the form (n,r)—(slot number/round number). Ballots are pairs of the form ((n,r),c)—(voting round/command). Thus, a Voter casts votes for a particular proposal in a particular round. Votes are pairs of the form (((n,r),c),loc)—(ballot/location). A voter includes its location in each vote. By arranging that replicas ignore duplicate votes, we guarantee that the protocol works even if messages get duplicated. A Replica spawns Voter subprocesses to conduct separate elections for each slot number. A Rounds observer uses essentially the same idiom to spawn Round observers that handle indi- vidual balloting rounds within a single election. A Voter is, essentially, a Rounds process that runs until its election has been decided: o b s e r v e r Rounds ( n , c ) = Round ( ( n , 0 ) , c ) || ( NewRounds n >>= Round ) ; ; o b s e r v e r V o t e r ( n , c ) = ( Rounds ( n , c ) u n t i l ( N o t i f y n ) ) || ( N o t i f y n ) ; ; where “_||_” performs parallel composition. For any event observers A and B, the observer (A until B) acts like A until a B-event occurs, at which point it terminates. We use this to 4 We use the symbol “>>=” because the event observers have the structure of a monad having this combinator as its bind operation. 7 / 15 Volume 72 (2015) Formal Specification, Verification, and Implementation of Fault-Tolerant Systems using EventML terminate any voting for n once consensus has been reached on n. Rounds, Round, NewRounds, and Notify are also functions that return event observers. A local instance of Round((n,r),c) conducts the voting for round (n,r) at a particular location. By definition it will cast its vote in round (n,r) for (n,c). Therefore, the first component of Rounds(n,c) ensures that Voter(n,c) votes for proposal (n,c) in round (n,0); other instances of Round, spawned by the second component of Rounds, may cast votes for other proposals in later rounds. Round (detailed in Sec. 2.2) inputs `̀ vote`̀ messages and outputs directed messages of various kinds: `̀ vote`̀ ; `̀ decided`̀ ; and `̀ retry`̀ , an internal message calling for a new round when a poll does not achieve consensus. NewRounds(n) recognizes events that call for new rounds of voting for the nth command. Thus (NewRounds n >>= Round) spawns instances of Round as required. Notify(n) handles `̀ decided`̀ message with data (n,c) indicating that consensus has been reached about the nth command, by sending notifications to the clients of the system indicating that slot n has been filled with command c. 2.2 Detecting Consensus Round((n,r),c) has two components: o b s e r v e r Round ( ( n , r ) , c ) = O u t p u t (\ l o c . v o t e ’ b c a s t r e p s ( ( ( n , r ) , c ) , l o c ) ) || Once ( Quorum ( n , r ) ) The first component multicasts a vote for (n,c) in round (n,r) to all locations in reps and then terminates. The second executes the consensus-detecting process, Quorum(n,r), and terminates once it has either announced a consensus or called for a new round. Once(A) is an observer that acts like A but terminates after the first A-event. Because there is at most one Quorum(n,r) event at any location the use of Once is logically redundant; but effects an optimization that guarantees that a process is cleaned up once it has produced an output. Quorum(n,r) produces an output as soon as it has received votes in round (n,r) from 2 ∗F+ 1 distinct locations. If all of them are votes for the same proposal, call it (n,d), it decides that (n,d) has achieved consensus and sends appropriate `̀ decided`̀ messages (which will be handled by Notify observers which will send `̀ notify`̀ messages). If the received votes are not unanimous then it is possible that, however many more votes are tallied, no proposal will receive 2 ∗F+ 1 votes on this round. (Note that if F failures have occurred, no more votes will arrive, so Quorum cannot wait for more votes or it might become permanently stuck.) In that case it sends a `̀ retry`̀ message to call for round (n,r + 1). That `̀ retry`̀ message also tells the Voter that spawned the Quorum how to vote in the new round. If some command d received a majority of the 2 ∗F+ 1 votes, the Voter must vote for (n,d). (If no command gets a majority, how it votes does not matter to the logical correctness of the protocol.) It is possible that a round will occur in which a Quorum(n,i) at one location detects a consen- sus and a Quorum(n, j) at another location calls for a new round of voting. As a result, multiple notifications may be sent about n, in a single round or in different rounds. Sec. 3 shows that, for any n, all notifications about the nth command will agree on which command has been chosen. 2.3 Implementing Quorum Quorum(n,r) is a Mealy machine: in response to inputs it may change state and produce outputs. Let us factor its definition. We first define QuorumState(n,r), a Moore machine whose state is the collection of votes for round (n,r) that the process has received thus far. Proc. AVoCS 2015 8 / 15 ECEASST Quorum(n,r) observes QuorumState(n,r) and issues appropriate directed messages. EventML provides primitives such as Memory for defining Moore machines: o b s e r v e r QuorumState ( n , r ) = Memory(\ l o c . ( [ ] , [ ] ) , upd quorum ( n , r ) , v o t e ’ b a s e ) A QuorumState(n,r) state is a pair of lists (cmds,locs), where cmds is a list of commands and locs is a list of locations. The state ([c1; c2;...],[l1; l2;...]) means that, in round (n,r), the state machine has thus far received a vote from l1 for c1, a vote from l2 for c2, etc. By maintaining that location list in addition to the command list, QuorumState can ignore duplicates; thus, as mentioned above, we need not assume that messages are delivered only once. In the definition of QuorumState, the arguments to Memory have the following meanings: (a) The expression (\loc.([],[])) assigns the initial state to each location, i.e., a pair of empty lists. (b) The transition function upd_quorum(n,r) computes the next state from the location and value of the input event and the current state. If an input vote arrives for c from l, and l is not listed in the current state, then upd_quorum adds c and l to its state, otherwise the current state stays unchanged. (c) vote’base recognizes input `̀ vote`̀ events and supplies their values. Memory is defined so that QuorumState will recognize every `̀ vote`̀ event, update its internal state, and then return (a singleton bag containing) the value of the internal state before performing that update. Had it been more convenient that QuorumState return the value of the internal state after the update we would have used the primitive combinator State instead of Memory. We define the observer Quorum from QuorumState using the primitive composition combi- nator (f o (X1,...,Xn)), which combines the function f with the event observers X1, . . . , Xn. This combinator behaves as follows: for all i ∈ {1,...,n}, if Xi observes xi at event e then the event observer (f o (X1,...,Xn)) observes each value of the bag (f loc(e) x1 ... xn) at event e. Quorum is defined as follows: o b s e r v e r Quorum ( n , r ) = ( when quorum ( n , r ) ) o ( v o t e ’ b a s e , QuorumState ( n , r ) ) This computes the response of Quorum(n,r) to event e by applying when_quorum(n,r) to loc(e), and to the values observed at e by vote’base and QuorumState(n,r). Note that Quorum(n,r) observes only votes, but not all of them since when_quorum(n,r) sometimes returns an empty bag. If an input vote arrives for c from l, and l is listed in the current state, then when_quorum does not output anything. Otherwise, it calls roundout, which requires the most complex definition: l e t r o u n d o u t l o c ( ( ( n , r ) , c ) , sender ) ( cmds , l o c s ) = i f l e n g t h cmds = 2 ∗ F t h e n l e t ( k , c ’ ) = poss−maj cmdeq ( c . cmds ) c i n i f k = 2 ∗ F + 1 t h e n d e c i d e d ’ b c a s t r e p s ( n , c ’ ) e l s e { r e t r y ’ s e n d l o c ( ( n , r + 1 ) , c ’ ) } e l s e {} The first argument loc is the location of the Quorum process calling roundout on receipt of a vote; the second argument (((n,r),c),sender) matches the data from the input vote; and the third argument (cmds,locs) matches the state when the input arrives. Therefore c.cmds, where . is the cons operation on lists, is the command list that results from processing the input. We can now understand the outer conditional: If its condition is false then we have not seen 2 ∗ F + 1 votes; so Quorum returns an empty bag, and the input event is not a Quorum(n,r)- event. Suppose now that the condition is true and consider the inner conditional. The poss-maj function, imported from EventML’s library (a snapshot of Nuprl’s library), implements the Boyer-Moore majority vote algorithm. The pair (k,c’) satisfies the following 9 / 15 Volume 72 (2015) Formal Specification, Verification, and Implementation of Fault-Tolerant Systems using EventML property: If there is a majority entry in the list c.cmds, c’ is its value and k is the number of times c’ occurs in that list. The condition (k = 2 * F + 1) therefore tests whether the vote is unanimous. If so, the function returns instructions that the choice of c’ be broadcast in appro- priate `̀ decided`̀ messages; if not, it returns the instruction to send a `̀ retry`̀ message. Recall that the declaration of `̀ retry`̀ messages introduces the operation retry’send, for construct- ing directed messages. Therefore, retry’send loc ((n,r+1),c’) is the instruction to send to loc a `̀ retry`̀ message with body ((n,r+1),c’). So Quorum sends a message to its own location, which will be observed by NewRounds, which will spawn the round (n,r+1). The message data directs the spawned instance of Round to vote for c’ in the new round. 3 The Safety Properties of 2/3 Consensus From SC, our EventML specification of the 2/3 consensus protocol, we generate a LoE specifi- cation and a GPM program that express SC’s semantic meaning in our two models of distributed computing. We verify SC’s correctness using the LoE specification, and we execute it using the GPM program. This section describes the formal verification, in the Nuprl proof assistant, of this protocol using the LoE specification, and Sec. 4 below describes the process of generating the GPM program and automatically verifying that it implements the LoE specification. 3.1 Agreement and Validity The basic safety properties of any consensus protocol are agreement and validity. Both these properties have been formally proved by induction on the causal order of events in Nuprl for the 2/3 consensus protocol of Sec. 2. We state them in terms of notifications. Recall that system properties are predicates on event orderings; we must prove that the predicates are true of all pos- sible runs of the system consistent with the SC specification5. Agreement says that notifications never contradict one another: ∀e1,e2 : E. ∀l1,l2 : Loc. ∀n : Z. ∀c1,c2 : Cmd. (notify ′send l1 (n,c1)) ∈ SC(e1) ∧ (notify ′send l2 (n,c2)) ∈ SC(e2) ⇒ c1 = c2 Validity says that any proposal decided on must be one that was proposed: ∀e : E. ∀l : Loc. ∀v : Proposal. (notify′send l v) ∈ SC(e) ⇒ ↓∃e′ : E. e′ < e ∧ v ∈ propose′base(e′) One subtlety: The reader can think of ↓∃ as a classical existential. The ↓ type operator, called “squash”, enforces proof erasure, which is necessary here because, generally, there is no con- structive way to pinpoint the exact `̀ propose`̀ event that led to a notification being sent. For example, there might have been two such proposals sent, and once we receive them, we have no way to distinguish between them if the content of these messages is identical. 3.2 Assumptions For every distributed system we assume that every internal or output message received must have been sent by one of the agents of the system. Formally, we make a separate assumption for each base observer that observes an internal or an output message. For example, if v ∈ vote’base(e), and e occurs at location loc, there must exist some e′ < e such that (vote’send loc v) ∈ SC(e′). Our tool does not enforce that the generated code respects such assumptions, therefore, they have to be enforced by other means. For example, the above assumption can be enforced, e.g., by physical means or by message encryption. We also assume that reps is a bag of size 3 ∗F+ 1 without repetitions. 5 The formal statements of these properties contain a universally quantified event ordering eo that the notation suppresses. Proc. AVoCS 2015 10 / 15 ECEASST Figure 2 ILF instance for `̀ vote`̀ messages ∀[Cmd:{T:Type| valueall-type(T)}]. ∀[clients,reps:bag(Id)]. ∀[cmdeq:EqDecider(Cmd)]. ∀[F:Z]. ∀[f:headers_type{i:l}(Cmd)]. ∀[es:EO]. ∀[e:E]. ∀[i,sender:Id]. ∀[d,n,r:Z]. ∀[v:Cmd]. (, c>, sender>)> ∈ main(Cmd;clients;cmdeq;F;reps;f)(e) ⇐⇒ loc(e) ↓∈ reps ∧ i ↓∈ reps ∧ (d = 0) ∧ (↓∃n’:Z. ∃c’:Cmd. ∃e’:{e’:E| e’ ≤loc e }. ((((header(e’) = ‘‘propose‘‘) ∧ = body(e’)) ∨ (has-es-info-type(es;e’;f;Z × Z × Cmd × Id) ∧ (header(e’) = ‘‘vote‘‘) ∧ (n’ = (fst(fst(fst(msgval(e’)))))) ∧ (c’ = (snd(fst(msgval(e’))))))) ∧ (((fst(ReplicaStateFun(Cmd;f;es;e’))) < n’) ∨ (n’ ∈ snd(ReplicaStateFun(Cmd;f;es;e’)))) ∧ (no Notify(Cmd;clients;f) n’ between e’ and e) ∧ (((<<, c>, sender> = <<, c’>, loc(e)>) ∧ (e = e’)) ∨ (∃r’:Z. ∃c’’:Cmd. ((<<, c>, sender> = <<, c’’>, loc(e)>) ∧ (∃e1:{e1:E| e1 ≤loc e } ((((header(e1) = ‘‘retry‘‘) ∧ <, c’’> = body(e1)) ∨ (has-es-info-type(es.e’;e1;f;Z × Z × Cmd × Id) ∧ (header(e1) = ‘‘vote‘‘) ∧ (n’ = (fst(fst(fst(msgval(e1)))))) ∧ (r’ = (snd(fst(fst(msgval(e1)))))) ∧ (c’’ = (snd(fst(msgval(e1))))))) ∧ (NewRoundsStateFun(Cmd;f;n’;es.e’;e1) < r’) ∧ (e = e1))))))))) 1 2 3 4 5 6 7 8 3.3 Automation We have developed two automation tools that help us prove properties of distributed systems. One is a rewriting tool that consists in using the ILFs mentioned in Sec. 1 in order to prove properties by induction on causal order. The other one consists in the automation of standard patterns of reasoning on state machines. Inductive Logical Form. ILFs are declarative logical statements that precisely answer questions such as: “What led the process at location l1 to send a vote to the process at location l2?”, in terms of input messages’ content and state machines’ states. ILFs are automatically generated from main observers using logical simplifications, and characterizations of the LoE combinators. For example, one of the simplest but subtle such characterizations is the one for “_||_”: v ∈ X||Y(e) ⇐⇒ ↓(v ∈ X(e) ∨ v ∈ Y(e)). This says that v is produced by X||Y iff it is produced by either of its components. The ↓, which enforces proof erasure, is needed because just by knowing that X||Y produced v, we cannot in general know whether v was produced by X or Y . For example, if identical replicas run in parallel, and receive the same inputs, then there is no way to distinguish between their outputs if they do not label them with different tags. Given a main observer X, we wrote a program that starts with a formula of the form v ∈ X(e), and keeps on rewriting it using equivalences such as the one presented above, to finally generate a formula of the form v ∈ X(e) ⇐⇒ C, where C is a complete declarative characterization of X’s outputs. In addition, our program also applies various logical simplifications to C. Finally, we have built a proof tactic that automatically proves such double implications. An ILF provides a characterization of all the messages sent by a system. Because it is of- ten useful to get these characterizations for specific kinds of messages, we also generate ILF instances for all the kinds of messages that the system outputs. Intuitively, an ILF instance for a given kind of message K, provides a slice (expressed as a mathematical formula) of the LoE specification that corresponds to the output K only. Such slices allow us to analyze specifications without having to consider the entire code. Fig. 2 shows the ILF instance for `̀ vote`̀ messages as generated by Nuprl. The details of this formula are not critical for understanding our methodology. However, let us explain how it characterizes the sending of `̀ vote`̀ messages. This formula says that a vote of the form 11 / 15 Volume 72 (2015) Formal Specification, Verification, and Implementation of Fault-Tolerant Systems using EventML <,c>,sender> (< , > is Nuprl’s pair constructor) is sent by SC at event e to location i (see box 1) iff: (box 2): e happens at a replica location, which we call R; (box 3): i is also a replica location; (box 4): there exists a proposal that was received by R in a `̀ propose`̀ or `̀ vote`̀ message at a prior event e’; (box 5): is such that n’ has never been received by R prior to e’ (there is no important distinction between ReplicaStateFun and ReplicaState, which maintains the list of proposed slot numbers); (box 6): is such that no decision has been made about n’ between e’ and e; finally, (box 7): either is and is being voted for at the initial round r=0 in response to the `̀ propose`̀ or `̀ vote`̀ message mentioned above (see box 4) that led to a new Voting process being spawned; (box 8): or comes from a `̀ retry`̀ or `̀ vote`̀ message, and r is not the initial round, i.e., either some replica believed that consensus could not have been reached at round r-1 (in case of a `̀ retry`̀ ), or R was still working on a smaller round number when it received r (in case of a `̀ vote`̀ ), and is now voting at round r. Using such formulas we can trace back a distributed system’s outputs to the states of its state machines, and to its inputs. For example, to prove SC’s validity property we start from the characterization of `̀ notify`̀ messages and trace these messages back to `̀ proposals`̀ using the various ILF instances. State Machine Properties. As mentioned in Sec. 2.3, one can define Moore machines in EventML using the Memory and State keywords. Reasoning about such state machines often turns out to be a large part of the verification effort of a distributed program’s correctness. There- fore, our system provides some automation to prove four kinds of local properties of Memory and State state machines, called: invariant, ordering, progress, and memory. Informally, a state machine invariant is a unary property about all possible states of the state machine. A state machine ordering property is a binary property about all pairs of states ordered in time. A state machine progress property w.r.t. some predicate P is a binary property about all pairs of states ordered in time, such that P is true about at least one of the transitions made between the two states, i.e., such that some progress characterized by P has been made between the two states. A state machine memory property is a ternary property between an input, the current state of the machine at the time it received this input, and a later state. Memory properties are used to specify that state machines keep track of some parts of their inputs in their states. We have proved in Nuprl, by induction on causal order, that Memory and State state machines satisfy each of these properties if, among other things, they satisfy some transition property regarding consecutive states (in the case of invariants, a base case is also necessary). Therefore, to prove that a state machine satisfies an instance of one of these four properties, we simply have to instantiate the corresponding general lemma and prove the simpler transition property. We have developed an annotation language to state such properties in EventML, as well as general Nuprl tactics that try to prove these properties automatically (and often succeed) using logical simplifications and simple reasoners on lists, integers, etc. We illustrate invariants using QuorumState (the other properties are described in a longer version of this paper [28]): an in- variant of a QuorumState state of the form (cmds,locs) is that locs has no repeats and same length as cmds. We call that invariant quorum_inv, which we state in EventML as follows: i m p o r t n o r e p e a t s l e n g t h i n v a r i a n t q u o r u m i n v on ( cmds , l o c s ) i n ( QuorumState n i ) == n o r e p e a t s : : Loc l o c s /\ l e n g t h ( cmds ) = l e n g t h ( l o c s ) ; ; The Nuprl tactic we have designed tries to automatically prove this statement by unfolding Proc. AVoCS 2015 12 / 15 ECEASST QuorumState’s definition to a Memory observer and by instantiating the corresponding general lemma. It (mainly) remains to prove that the base and induction properties are satisfied, which are trivial to prove in this case. Because we have already proved the general principle by induction on causal order, the tactic does not have to use induction on causal order to prove quorum_inv. 3.4 Proof Effort Thanks to our automation tools and to the rich library of definitions, facts, and proof tactics about LoE and GPM that we have developed over the years, we have specified 2/3 consensus and have proved its two safety properties in Nuprl in merely two days. Proving these two properties involved: automatically generating and proving 8 state machine properties; automatically gener- ating and proving 1 ILF and 4 instances of that ILF; and interactively proving 8 other lemmas (3 of them being trivial, and therefore candidates for future automation). In terms of size: the LoE specification has a size of about 850 AST nodes, and the proof’s size is about 8200 AST nodes. 4 Correct-by-Construction Program Generation As mentioned in Sec. 1, the semantic meaning of an EventML program is both a LoE event observer and a GPM program. We carry out our correctness proofs on the LoE description of the main event observer. To gain trust in the program we run, we prove that the GPM program implements that LoE description, i.e., that it outputs exactly the same observations. Given an EventML specification, proving that the corresponding GPM program satisfies the corresponding LoE specification is trivial: For each EventML combinator C, there exists a corresponding LoE combinator LC and a corresponding GPM combinator PC which provably implements LC. For example, let X1 and X2 be event observers of type T , implementable by pr1 and pr2, respectively. The LoE parallel combinator X1||X2 is defined as λ eo.λ e.(X1 eo e) + (X2 eo e), where + is the append operation on bags. The GPM parallel combinator pr1||pr2 is defined as follows (for simplicity we use the same symbol as for the LoE combinators): λ l.fix     λ R.λ s.let p1,p2 = s in if halted(p1)∧b halted(p2) then halt else run ( λ m.let p′1,out1 = p1(m) in let p′2,out2 = p2(m) in (R (p ′ 1,p ′ 2),out1 + out2) )     (pr1 l,pr2 l) This function takes a location l and returns a process that runs p1 and p2 in parallel at l. This process maintains a state s composed of two processes: its two components. Its initial state is (pr1 l,pr2 l). If the current state s of the process is a pair (p1,p2), then if both p1 and p2 have halted, i.e., they are the special halted process halt, then the process becomes halt. Otherwise, the process waits for an input message m, and once it has received one, then (1) for i ∈ {1,2}, it applies6 pi to m to obtain a new process p ′ i and a bag of output values outi; (2) it outputs out1 + out2 and recursively calls itself on the new state (p ′ 1,p ′ 2). We proved that pr1||pr2 imple- ments X1||X2. The same is true about the other combinators. 5 Related Work Much work has been done on specifying and reasoning about distributed systems [25, 18, 8, 11, 12, 10, 20, 32] (to only cite a few). 6 The application of a process p to a message m is defined as follows: if halted(p) then return (halt,{}), otherwise p is of the form run( f ), and therefore, return (f m). 13 / 15 Volume 72 (2015) Formal Specification, Verification, and Implementation of Fault-Tolerant Systems using EventML IOA. IOA [17, 16, 18] is a programming/specification language for describing asynchronous distributed systems as I/O automata (labeled state transition systems) and stating their properties. IOA can interact with a large range of tools such as type checkers, simulators, model checkers, theorem provers, and there is support for synthesis of Java code. Both I/O automata and event observers can specify I/O observations of distributed systems. While IOA is state-based, LoE is event-based (states are implicitly maintained by recursive combinators). Also, our methodology allows us to both prove protocol properties and generate code within Nuprl, and does not require any translation to another language. TLA. TLA is a temporal logic, based on first-order logic and set theory, that “provides a math- ematical foundation for describing systems” [24]. TLA+ [24, 12] is a language for specifying systems described in TLA. TLAPS “is a platform for the development and mechanical verifi- cation of TLA+ proofs” [12]. To validate proofs, TLAPS uses a collection of theorem provers, proof assistants, SMT solvers, and decision procedures. One can use a model checker to help catch errors before attempting any proof. At its current stage, TLAPS allows one to prove safety properties (the safety property of a variant of Paxos has been verified using TLAPS) but not liveness/non-blocking properties (we have not yet proved such properties either). TLA+ does not perform program synthesis. seL4. Our approach is similar to the one taken by Klein et al. to verify the seL4 microkernel [21]. They use Haskell as their specification language, which roughly corresponds to the level of abstraction of EventML in our framework. Then, they translate this code to an Isabelle/HOL version. They prove that this executable specification refines an abstract one, which corresponds to LoE’s level. Finally, they generate by hand a C implementation of the specification, which they translate into Isabelle/HOL, in which they defined a model of C, and manually prove that this implementation refines their executable specification. This corresponds to GPM’s level. Among other things, our paper shows that a similar methodology can be used to design and implement correct fault-tolerant distributed systems. Verdi. More recently, Wilcox et al. developed Verdi [32], which is a framework, similar to ours, to develop and reason about distributed systems using Coq. As in our framework they do not have gaps between the code they verify and the code they run: they run OCaml code that they extract from Coq. Verdi provides a compositional way of specifying distributed systems. This is done by applying verified system transformers. For example, Paxos transforms a distributed system into a crash-tolerant distributed system (they verified Raft [26] instead of Paxos). One difference between our respective methods is that they verify systems by reasoning about the evolution of the state of the world, while our approach relies on the notion of causal order. 6 Conclusion, Current and Future Work Our methodology scales to more complicated and subtle distributed protocols: we have specified the Multi-Paxos protocol [23, 31] in EventML and proved its safety properties to be correct in Nuprl. We have also built an ordered broadcast service that can switch between various consen- sus protocols [29]. To get efficient code, we have built in Nuprl a formal tool tuned to automat- ically optimize GPM programs and prove that the optimized and non-optimized programs are bisimilar [27]. We are also experimenting with compilers to Lisp and Scala. We are now build- ing support in EventML and Nuprl to: (1) abstract away from implementation details such as specific data structures, (2) automatically prove simple properties such as validity properties, (3) Proc. AVoCS 2015 14 / 15 ECEASST replay large proofs in order to support modifications to specifications. This paper only discussed safety properties. We have started proving progress and non-blocking properties of 2/3 consen- sus (similar proofs have been carried out in [11, 10]). However, it turns out that these proofs are tedious. Next, we want to build automation tools to assist us in proving such properties. Bibliography [1] Francesco Alberti, Silvio Ghilardi, Elena Pagani, Silvio Ranise, and Gian Paolo Rossi. Automated support for the design and validation of fault tolerant parameterized systems: a case study. ECEASST, 35, 2010. [2] Stuart F. Allen, Mark Bickford, Robert L. Constable, Richard Eaton, Christoph Kreitz, Lori Lorigo, and Evan Moran. Innovations in compu- tational type theory using Nuprl. J. Applied Logic, 4(4):428–469, 2006. http://www.nuprl.org/. [3] Abhishek Anand and Ross Knepper. ROSCoq: Robots powered by constructive reals. ITP 2015 (http://www.cs.cornell.edu/∼aa755/ROSCoq/), 2015. [4] Yves Bertot and Pierre Casteran. Interactive Theorem Proving and Program Development. SpringerVerlag, 2004. http://www.labri.fr/perso/casteran/CoqArt. [5] Mark Bickford. Component specification using event classes. In Component-Based Software Engineering, 12th Int’l Symp., volume 5582 of LNCS, pages 140–155. Springer, 2009. [6] Mark Bickford, Robert Constable, and David Guaspari. Generating event logics with higher-order processes as realizers. Technical report, Cornell University, 2010. [7] Mark Bickford, Robert L. Constable, and Vincent Rahli. Logic of events, a framework to reason about distributed systems. In Languages for Distributed Algorithms Workshop, 2012. [8] Jeremy W. Bryans. Developing a consensus algorithm using stepwise refinement. In ICFEM 2011, volume 6991 of LNCS, pages 553–568. Springer, 2011. [9] B. Charron-Bost and A. Schiper. The Heard-Of model: computing in distributed systems with benign failures. Distributed Computing, 22(1):49–71, 2009. [10] Bernadette Charron-Bost, Henri Debrat, and Stephan Merz. Formal verification of consensus algorithms tolerating malicious faults. In SSS 2011, volume 6976 of LNCS, pages 120–134. Springer, 2011. [11] Bernadette Charron-Bost and Stephan Merz. Formal verification of a consensus algorithm in the heard-of model. Int. J. Software and Informatics, 3(2-3):273–303, 2009. [12] Kaustuv Chaudhuri, Damien Doligez, Leslie Lamport, and Stephan Merz. Verifying Safety Properties With the TLA+ Proof System. In Jürgen Giesl and Reiner Haehnle, editors, IJCAR 2010, volume 6173 of Lecture Notes in Artificial Intelligence, pages 142–148. Springer, 2010. [13] R.L. Constable, S.F. Allen, H.M. Bromley, W.R. Cleaveland, J.F. Cremer, R.W. Harper, D.J. Howe, T.B. Knoblock, N.P. Mendler, P.Panangaden, J.T. Sasaki, and S.F. Smith. Implementing mathematics with the Nuprl proof development system. Prentice-Hall, Inc., Up- per Saddle River, NJ, USA, 1986. [14] The Coq Proof Assistant. http://coq.inria.fr/. [15] Michael J. Fischer, Nancy A. Lynch, and Mike Paterson. Impossibility of distributed consensus with one faulty process. J. ACM, 32(2):374– 382, 1985. [16] S. Garland, N. Lynch, J. Tauber, and M. Vaziri. IOA user guide and reference manual. Technical Report MIT/LCS/TR-961, Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, MA, 2004. [17] Stephen J. Garland and Nancy Lynch. Using I/O automata for developing distributed systems. In Gary T. Leavens and Murali Sitaraman, editors, Foundations of componentbased systems, pages 285–312. Cambridge University Press, New York, NY, USA, 2000. [18] Chryssis Georgiou, Nancy Lynch, Panayiotis Mavrommatis, and Joshua A. Tauber. Automated implementation of complex distributed algo- rithms specified in the IOA language. Int. J. Softw. Tools Technol. Transf., 11:153–171, February 2009. [19] Michael J. C. Gordon, Robin Milner, and Christopher P. Wadsworth. Edinburgh LCF: A Mechanised Logic of Computation., volume 78 of LNCS. Springer-Verlag, 1979. [20] Annu John, Igor Konnov, Ulrich Schmid, Helmut Veith, and Josef Widder. Parameterized model checking of fault-tolerant distributed algo- rithms by abstraction. In FMCAD 2013, pages 201–209. IEEE, 2013. [21] Gerwin Klein, Kevin Elphinstone, Gernot Heiser, June Andronick, David Cock, Philip Derrin, Dhammika Elkaduwe, Kai Engelhardt, Rafal Kolanski, Michael Norrish, Thomas Sewell, Harvey Tuch, and Simon Winwood. seL4: Formal verification of an OS kernel. In SOSP 2009, pages 207–220. ACM, 2009. [22] Leslie Lamport. Time, clocks, and the ordering of events in a distributed system. Commun. ACM, 21(7):558–565, 1978. [23] Leslie Lamport. The part-time parliament. ACM Trans. Comput. Syst., 16(2):133–169, 1998. [24] Leslie Lamport. Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers. Addison-Wesley, 2004. [25] Patrick Lincoln and John M. Rushby. A formally verified algorithm for interactive consistency under a hybrid fault model. In FTCS 1993, pages 402–411. IEEE Computer Society, 1993. [26] Diego Ongaro and John K. Ousterhout. In search of an understandable consensus algorithm. In USENIX ATC ’14, pages 305–319. USENIX Association, 2014. [27] Vincent Rahli, Mark Bickford, and Abhishek Anand. Formal program optimization in Nuprl using computational equivalence and partial types. In ITP’13, volume 7998 of LNCS, pages 261–278. Springer, 2013. [28] Vincent Rahli, David Guaspari, Mark Bickford, and Robert L. Constable. Formal specification, verification, and implementation of fault- tolerant systems. http://nuprl.org/KB/show.php?ID=709, 2013. [29] Nicolas Schiper, Vincent Rahli, Robbert van Renesse, Mark Bickford, and Robert L. Constable. Developing correctly replicated databases using formal tools. In DSN 2014, pages 395–406. IEEE, 2014. [30] Tatsuhiro Tsuchiya and André Schiper. Using bounded model checking to verify consensus algorithms. In DISC 2008, volume 5218 of LNCS, pages 466–480. Springer, 2008. [31] Robbert van Renesse and Deniz Altinbuken. Paxos made moderately complex. ACM Comput. Surv., 47(3):5:1–5:36, 2015. [32] James R. Wilcox, Doug Woos, Pavel Panchekha, Zachary Tatlock, Xi Wang, Michael D. Ernst, and Thomas Anderson. Verdi: A framework for formally verifying distributed system implementations. PLDI 2015 http://verdi.uwplse.org/, 2015. 15 / 15 Volume 72 (2015) http://www.nuprl.org/ http://www.cs.cornell.edu/~aa755/ROSCoq/ http://www.labri.fr/perso/casteran/CoqArt http://coq.inria.fr/ http://nuprl.org/KB/show.php?ID=709 http://verdi.uwplse.org/ Introduction A Specification of 2/3 Consensus A Top-Down Look at the Protocol Detecting Consensus Implementing Quorum The Safety Properties of 2/3 Consensus Agreement and Validity Assumptions Automation Proof Effort Correct-by-Construction Program Generation Related Work Conclusion, Current and Future Work