Electronic Communications of the EASST Volume ? (2009) Proceedings of the Ninth International Workshop on Automated Verification of Critical Systems (AVOCS 2009) Automatic translation of C/C++ parallel code into synchronous formalism using an SSA intermediate form Loı̈c Besnard, Thierry Gautier, Matthieu Moy, Jean-Pierre Talpin, Kenneth Johnson and Florence Maraninchi 15 pages Guest Editors: Markus Roggenbach Managing Editors: Tiziana Margaria, Julia Padberg, Gabriele Taentzer ECEASST Home Page: http://www.easst.org/eceasst/ ISSN 1863-2122 ECEASST Automatic translation of C/C++ parallel code into synchronous formalism using an SSA intermediate form Loı̈c Besnard1, Thierry Gautier1, Matthieu Moy2, Jean-Pierre Talpin1, Kenneth Johnson1 and Florence Maraninchi2 1INRIA Centre Rennes-Bretagne Atlantique/CNRS IRISA. Campus de Beaulieu. 35042 Rennes Cedex, FRANCE. e-mail: FirstName.LastName@irisa.fr 2Verimag. Centre Équation. 2, avenue de Vignate. 38610 Gières, FRANCE. e-mail: FirstName.LastName@imag.fr Abstract: We present an approach for the translation of imperative code (like C, C++) into the synchronous formalism SIGNAL, in order to use a model-checker to verify properties on the source code. The translation uses SSA as an intermediate formalism, and the GCC compiler as a front-end. The contributions of this paper with respect to previous work are a more efficient translation scheme, and the man- agement of parallel code. It is applied successfully on simple SYSTEMC examples. Keywords: SSA, synchronous, Signal, C, compilation, GCC, model-checking 1 Introduction 1.1 Context and motivations Nowadays, embedded systems are becoming more and more complex, with stronger and stronger constraints of many kinds: cost, reliability, short life-cycle, and so on. Design correctness of software and hardware functionalities of embedded systems is one of the major challenges and priorities for designers using software programming languages such as SYSTEMC and C/C++ to describe their systems. These programming languages allow for a comfortable design entry, fast simulation, and software/hardware co-design. Moreover, as the complexity of systems increases, designers are bound to reuse existing Intellectual Property components (IPs) in their design to improve the design productivity. However, system validation is a critical challenge for design reuse based on software programming languages. In recent years, many automated simulators and test tools have been developed to deal with design verification problems. However, mere simulation with non-formal development tools does by no means cover all design errors. What we therefore need is to use formal methods to ensure the quality of system designs. Among formal methods, model-checking [CGP99] has proved successful at increasing the reliability of some systems. On the other hand, synchronous languages [BB91, BCE+03] have been introduced and used successfully for the design and implementation of real-time critical software. They rely on math- ematical models such as data-flow equations or finite state machines that enable formal reasoning 1 / 15 Volume ? (2009) Translation of C/C++ parallel code into synchronous formalism on designs. As a matter of fact, their associated toolsets provide among other formal transforma- tions, automatic code generation, and verification of properties. Relying on these bases, we propose an approach in which we automatically translate C/C++ models into the synchronous formalism SIGNAL [LGLL91, LTL03], hence enabling the appli- cation of formal methods without having to deal with the complex and error prone task to build formal models by hand. In particular, this allows one to use the SIGNAL toolbox, which includes in particular a code generator and a model-checker. We base our approach and associated tool on previous studies for the translation of imperative languages to the synchronous data-flow language SIGNAL [TLSG05, KTBB06]. This transla- tion relies on the use of SSA (“Static Single Assignment”) as intermediate representation of programs. Until now however, those previous works had not been fully implemented nor exper- imented. Only manual experiments had been completed. Moreover we extend the existing approach in two ways: first, the new translation scheme is more efficient, since it generates as few state variables as possible, thus reducing the work of the model-checker, and secondly, the tool now manages parallel, non-preemptive code. As an example of such parallel language, we study the particular case of SYSTEMC, a library for C++ for the high-level modeling of Systems-on-a-Chip, which provides among other things a non-preemptive scheduler for C++ processes. The translation specified here was implemented in the POLYCHRONY toolset [INRa]. Model- checking was successfully applied with the tool SIGALI. An approach that could be compared to ours, although different (uses neither SSA nor syn- chronous formalisms) is that presented in [HFG08]. The authors define a semantic mapping from SYSTEMC to UPPAAL timed automata in order to get model-checking for SYSTEMC designs. It can be observed that our approach, thanks to the different strategies of code generation avail- able for SIGNAL programs [BGT09], also provide simulation code corresponding to the parallel SYSTEMC description, including e.g. static scheduling code. In this paper, we give in Section 1.2 an overview of the synchronous data-flow language SIG- NAL and in Section 1.3 a brief description of SSA, which is the basis for our translation. The principles of the translation from SSA to SIGNAL and its implementation are described in Sec- tion 2. Then Section 3 addresses the addition of co-routine parallelism in the programs, using a SYSTEMC flavour. Experimental results are provided for the model-checking of properties on some use case. Some concluding remarks are given in Section 4. 1.2 An overview of SIGNAL In SIGNAL, a process P consists of the composition of simultaneous equations x := f (y, z) over signals x, y, z. A signal x ∈ X is a possibly infinite flow of values v ∈ V sampled at a discrete clock noted ˆx. P, Q ::= x := y f z | P where x | P|Q (SIGNAL process) The synchronous composition of processes P|Q consists of the simultaneous solution of the equations in P and in Q. It is commutative and associative. The process P where x restricts the signal x to the lexical scope of P. Proc. AVOCS 2009 2 / 15 ECEASST An equation x := y f z denotes a relation between the input signals y and z and an output signal x by a combinator f . An equation is usually a ternary and infixed relation noted x := y f z but it can in general be an m + n-ary relation noted (x1, . . . xm) := f (y1, . . . yn). Such equations are built with usual boolean or arithmetic operations such as or, and, not, =, <, +, ∗, . . . In addition, SIGNAL requires primitive combinators to perform delay x := y $1 init v, sam- pling x := y when z, merge x = y default z and specify scheduling constraints x → y when ˆz. The equation x := y f z where f is or, =, +, . . . defines the nth value of the signal x by the result of the application of f to the nth values of signals y, z. All signals x, y, z are synchronous (have the same clock): they share the same set of tags t1,t2, . . . (a tag represents a clock tick). The equation x := y $1 init v initially defines the signal x by the value v and then by the previous value of the signal y. The signal y and its delayed copy x are synchronous: they share the same set of tags t1,t2, . . . Initially, at t1, the signal x takes the declared value v and then, at tag tn, the value of y at tag tn−1. The equation x := y when z defines x by y when z is true (and both y and z are present); x is present with the value v1 at t1 only if y is present with v1 at t1 and if z is present and true at t1. The equation x := y default z defines x by y when y is present and by z otherwise. If y is absent and z present with v1 at t1 then x holds v1 at t1. If y is present with v2 (at t2 or t3) then x holds its value v2 whether z is present (at t2) or not (at t3). The equation x → y when ˆz forces the constraint that y cannot occur before x when z is present. In SIGNAL, the presence of a value along a signal x is the proposition noted ˆx that is true when x is present and that is absent otherwise. The clock expression ˆx can be defined by the boolean operation x = x (i.e. y := ˆx =defy := (x = x)). Specific operators are defined on clocks. For instance, yˆ+z is the union of the clocks of signals y, z (x := yˆ+z =defx := ˆy default ˆz). Clock expressions naturally represent control, the clock when x represents the time tags at which the boolean signal x is present and true (i.e. y := when x =defy := true when x). The clock when not x represents the time tags at which the boolean signal x is present and false. We write ˆ0 for the empty clock (the empty set of tags). A clock constraint E is a SIGNAL process. The constraint eˆ= e′ synchronizes the clocks e and e′. It corresponds to the process (x := (ˆe = ˆe′)) where x. Composition E |E′, written also (|E |E′ |), corresponds to the union of constraints, and restriction E where x to the existential quantification of E by x. A useful derived operator is the memory x := y cell z init v, that allows to memorize in x the latest value carried by y when y is present or when z is true (x := y cell z init v =def(|x := y default (x $1 init v)|xˆ= yˆ+(when z)|)). 1.3 SSA: an intermediate representation A program is said to be in static single assignment (SSA) form whenever each variable in the pro- gram appears only once on the left hand side of an assignment. Following [CFR+91], a program is converted to SSA form by replacing assignments of a program variable x with assignments to new versions x1, x2, . . . of x, uniquely indexing each assignment. Each use of the original vari- able x in a program block is replaced by the indexed variable xi when the block is reachable by the ith assignment. For variables in blocks reachable by more than one program block, the φ 3 / 15 Volume ? (2009) Translation of C/C++ parallel code into synchronous formalism operator is used to choose the new variable value depending on the program control-flow. For example, x3 = φ (x1, x2) means “x3 takes the value x1 when the flow comes from the block where x1 is defined, and x2 otherwise”. This is needed to represent C programs where a variable can be assigned in both branches of a conditional statement or in the body of a loop. In this paper, we consider the SSA intermediate representation of the GCC compiler (other modern compilers usually have a similar intermediate format). The compiler provides a language independent, locally optimized intermediate representation for C, C++, and Java programs where programming units such as functions and methods are transformed into a structure in which all native operations are represented by 3-address instructions x = f (y, z). A C program pgm is represented by a sequence of labeled blocks L:blk, where each block is a sequence of statements. Statements may be function calls x = f (y∗) or branches ifxgotoL. Each block is terminated by either a return rtn or gotoL statement. We summarise this representation in Figure 1. (program) pgm ::= L:blk|pgm; pgm (instruction) stm ::= x = f (y∗) (call) (instruction) stm ::= x = φ (y∗) (phi) | ifxgotoL (test) (block) blk ::= stm; blk|rtn (return) rtn ::= gotoL (goto) | return (return) Figure 1: Intermediate representation for C programs 2 Translating sequential code from SSA to SIGNAL 2.1 SSA to SIGNAL: an example We depict the structure of the SSA form of a typical C program, Figure 2. The function ones counts the number of bits set to 1 in a bit-array data. It consists of four blocks (labeled bb 0, L0, L1, L2). The block bb 0 initializes the local state variable idata to the value of the input signal data and icount to 0. Then it passes control to the block L1. Label L1 evaluates the termination condition of the loop and passes control accordingly. As there are several possible sources in the control flow for the variables idata and icount, it determines the most recent value with the help of φ functions. If the termination condition is not yet satisfied, control goes to block L0, which corresponds to the actual loop contents that shifts idata right and adds its right-most bit to icount. If the termination condition is satisfied (all available bits have been counted) control goes to block L2 where the result is copied to the output signal ocount. Figure 3 depicts the translation of function ones into SIGNAL. Signals data and ocount are respectively input signal (line 2) and output signal (line 3) of the corresponding SIGNAL process. Lines 4–8 define the labels as boolean signals being true when control flow is in the corresponding block. For instance, bb 0 is true at the first instant, then it is false forever; L1 is true when either L0 or bb 0 is (control can go to L1 from L0 or bb 0). Note that there is no need to introduce a delay when control passes to L1. This is not the case for L0, for which there is a transition from L1 when the termination condition of the loop is not satisfied ((idata 3/=0) when L1): in that case, control will be in L0 at the next step. Proc. AVOCS 2009 4 / 15 ECEASST void ones(int data, int *ocount) { int icount, idata; idata = data; icount = 0; while (idata) { icount += idata & 1; idata >>= 1; } *ocount = icount; } bb_0: idata_1 = data; icount_1 = 0; goto L1; L0: D = idata_3 & 1; icount_2 = D + icount_3; idata_2 = idata_3 >> 1; L1: idata_3 = PHI<idata_1, idata_2>; icount_3 = PHI<icount_1, icount_2>; if (idata_3 != 0) goto L0; L2: *ocount = icount_3; return; Figure 2: From C to static single assignment Lines 14–15, 17–19 and 24 represent respectively the computations that are done in blocks bb 0, L0 and L2: this is expressed by the sampling (when) on the corresponding boolean. Note that state variables are necessary to memorize the values of idata 3 and icount 3 (lines 10– 11). Line 18, the operands of the plus operator have to be memorized at some common clock ( pk 1, line 27) since the arguments of the plus must be synchronous. The φ functions of block L1 are simply defined with merge (default) operators in SIGNAL (lines 21–22). 2.2 SSA to SIGNAL: translation scheme A general scheme for the translation is described in Figure 4 with a function I [[pgm]], defined by induction on the formal syntax of pgm. The overall idea is to translate one (or several) SSA basic blocks into a parallel assignment in SIGNAL. The sequence of instructions in SSA is then considered within a SIGNAL clock tick. In the presence of loops (i.e. backward gotos), we must represent the successive values taken by variables at different laps of the loop with different clock ticks. The control flow is modeled with the notion of clock: a parallel assignment in SIGNAL has its clock activated when the corresponding piece of SSA code would be executed. The present value of a signal is noted x, its next value is noted x′. With each block of label L ∈ L f in a given function f , the function I [[pgm]] associates an input clock xL, an immediate clock ximmL and an output clock x exit L (note that all these clocks are not necessarily generated in the effective translation). The clock xL is true iff L has been activated in the previous transition (by emitting the event x′L). The clock x imm L is set to true to activate the block L immediately. The clock xexitL is set to true when the execution of the block labeled L terminates. The default activation condition of this block is the clock xL ∨ ximmL (union of clocks xL and ximmL : equation (1) of Figure 4). The block blk is executed iff the proposition xL ∨ ximmL holds, meaning that the program counter is at L. For a return instruction or for a block, the function returns a SIGNAL process P. For a block instruction stm, the function I [[stm]]e1L = 〈P〉e2 takes three arguments: an instruction stm, the label L of the block it belongs to, and an input clock e1. It returns the process P corresponding to the instruction and its output clock e2. The output clock of stm corresponds to the input clock of the instruction that immediately follows it in the execution sequence of the block. Rules (1-2) in Figure 4 are concerned with the iterative decomposition of a program pgm into 5 / 15 Volume ? (2009) Translation of C/C++ parallel code into synchronous formalism 1 process ones = 2 ( ? integer data; 3 ! integer ocount; ) 4 (| (| bb_0 := (not (ˆbb_0)) $1 init true 5 | next_L0 := ((idata_3/=0) when L1) default false 6 | L0 := next_L0 $1 init false 7 | L1 := (true when L0) default (true when bb_0) 8 | L2 := (not (idata_3/=0)) when L1 9 |) 10 | (| Z_idata_3 := idata_3 $1 11 | Z_icount_3 := icount_3 $1 12 |) 13 | (| data_1 := data cell (ˆbb_0) |) 14 | (| idata_1 := data_1 when bb_0 15 | icount_1 := 0 when bb_0 16 |) 17 | (| D := bit_and(Z_idata_3, 1) when L0 18 | icount_2 := ((D cell _pK_1)+(Z_icount_3 cell _pK_1)) when L0 19 | idata_2 := bit_right_shift(Z_idata_3, 1) when L0 20 |) 21 | (| idata_3 := idata_2 default (idata_1 default Z_idata_3) 22 | icount_3 := icount_2 default (icount_1 default Z_icount_3) 23 |) 24 | ocount_1 := icount_3 when L2 25 | (| when bb_0 ˆ= data 26 | bb_0 ˆ= L0 ˆ= idata_3 ˆ= icount_3 ˆ= data_1 27 | _pK_1 := Z_icount_3 ˆ+ D 28 |) 29 | (| ocount := (ocount_1 cell L2) when L2 |) 30 |) 31 where ... end; Figure 3: From SSA to SIGNAL blocks blk and with the decomposition of a block into stm and rtn instructions. In rule (2), the input clock e of the block stm; blk is passed to stm. The output clock e1 of stm becomes the input clock of blk. The input and output clocks of an instruction may differ. This is the case, rule (3), for an ifxgotoL1 instruction in a block L. Let e be the input clock of the instruction. When x is false, then control is passed to the rest of the block, at the output clock e∧¬x (intersection of clocks e and ¬x). Otherwise, the control is passed to the block L1, at the clock e∧x. There are two ways of passing the control from L to L1 at a given clock e. They are defined by the function GL(L1, e): either immediately, by activating the immediate clock ximmL1 , i.e., e ⇒ ximmL1 (the notation e ⇒ P means: if e is present then P holds); or by a delayed transition to L1 at e, i.e., e ⇒ x′L1 . This choice depends on whether L1 is after L in the control flow, i.e. whether the block L1 can be executed immediately after the block L. Rule (4) is concerned with the translation of native and external function calls x = f (y∗). The generic translation of f is taken from an environment E ( f ). It is given the name of the result x, of the actual parameters y∗ and of the input clock e to obtain the translation of x = f (y∗). This Proc. AVOCS 2009 6 / 15 ECEASST (1) I [[L:blk; pgm]] =I [[blk]] xL∨ximmL L |I [[pgm]] (2) I [[stm; blk]]eL=let〈P〉e1 = I [[stm]]eL in P|I [[blk]]e1L (3) I [[ifxgotoL1]]eL=〈GL(L1, e∧x)〉e∧¬x (4) I [[x = f (y∗)]]eL=〈E ( f )(xy∗e)〉e (5) I [[gotoL1]]eL=(e ⇒ xexitL |GL(L1, e)) (6) I [[return]]eL=(e ⇒ (xexitL |xexitf )) where GL(L1, e)= if L1 is after L in the control-flow then e ⇒ ximmL1 else e ⇒ x′L1 E ( f )(xyze)=e ⇒ (x̂|x = [[ f ]](y, z)), ∀ f xyze Figure 4: Translation scheme translation works when there is only one call to f at the same time. Recursive calls of f would require an explicit stack, and parallel calls would require duplicating the generated code for f for each thread. The generic translation of 3-address instructions x = f (y, z) at clock e is given by E ( f )(xyze). Instructions goto and return, rules (5-6), define the output clock xexitL of the current block L by their input clock e. This is the right place to do that: e defines the very condition upon which the block actually reaches its return statement. A gotoL1 instruction, rule (5), passes control to block L1 unconditionally at the input clock e by GL(L1, e). A return instruction, rule (6), sets the exit clock x f to true at clock e to inform the caller that f is terminated. 2.3 C/C++ to SIGNAL: implementation SIGNAL models are automatically generated from C/C++ component descriptions with the help of the GNU Compiler Collection (GCC) [Fre] and its static single assignment (SSA) intermedi- ate representation [GCC03, The]. This is obtained in three main stages, as described below: 1. Converting C/C++ into SSA: The first step of the translation scheme consists in con- verting C/C++ models into the SSA form. This step is performed by GCC, which goes through several intermediate formats (Gimple Trees, then Control-Flow Graph (CFG)), and then produces the SSA form which is used by GCC for optimizations. 2. Converting SSA into SIGNAL: The next step of the translation scheme consists in con- verting SSA into SIGNAL processes. It is implemented in the GCC front-end. The output of this step is a SIGNAL program which reflects directly the SSA code in a SIGNAL syn- tax (but without a correct semantics at this point). The implementation of the SIGNAL generation is inserted in the GCC source tree as an additional front-end optimization pass. GCC currently features over fifty optimization passes. It can be chosen to use all of these by inserting this additional pass at the very end, but it may also make sense to exclude some of the optimizations. The resulting syntactic SIGNAL program is another view of the SSA code without any transformation (so the connexion to some other C compiler with an 7 / 15 Volume ? (2009) Translation of C/C++ parallel code into synchronous formalism SSA internal representation would be easily possible). This code is composed of a set of labeled blocks, composed of a set of φ definitions, a set of computations, and a branching. 3. Transforming the SIGNAL program: The next step consists in the definition of (i) the control induced by the references to the labels in the branching statements; (ii) the mem- ories induced by the loops and the control. The control is given first to the first block (through the signal bb_0 in the function ones example). 3 Modeling parallelism in the SSA to SIGNAL line The previous section described a translation of sequential, imperative code, into SIGNAL. We now present a way to extend this translation scheme to parallel code. We consider the case of co-routine semantics (i.e. non-preemptive scheduling with a single processor). SYSTEMC is an example of an execution platform with co-routine semantics. It is built on top of the C++ language, augmented with a scheduler, and communication and synchronization primitives. We implemented a translation from a small subset of SYSTEMC which has basically two elements with respect to parallelism: pieces of code to be executed “atomically”, and a yield() instruction, that stops the execution of a thread, and yields the control back to the scheduler. The scheduler then elects any thread, non-deterministically. The official SYSTEMC library does not have a yield() instruction, but this instruction can be implemented either with a slight modification of the scheduler as proposed in [Hel07], or more simply by a wait(random(), SC NS);. The motivation for choosing yield() instead of the usual wait() statements of SYSTEMC is to start with the simplest scheduling, to keep the focus on the notion of parallelism. We will show later that implementing an arbitrary scheduling policy on top of this is possible. In this subset of SYSTEMC, we do not have any specific communication and synchronization primitives, but processes can communicate using shared variables. 3.1 Presentation of SYSTEMC The core of the SYSTEMC syntax relevant to the present study is represented in Figure 5. A system consists of the composition of classes and modules sys. A class declaration class m{dec} associates a class name m with a sequence of fields dec. It is optionally parameterized by a class with template〈class m1〉. To enforce a strong typing policy, we annotate the class parameter m1 with #TYPE(m1, m2) to denote the type of m1 with the virtual class m2. A module SC MODULE(m) is a class that defines an architecture component. Its constructor SC CTOR(m) {new; pgm} allocates threads (e.g. SC THREAD( f )) and executes an initialization program pgm. Modules define threads whose execution is concurrent. Declarations dec associate locations x with native classes or template class instances m〈m∗〉, and procedures with a name f and a definition pgm. pgm can be any C++ code. We assume x to denote the name of a variable or signal and to be possibly prefixed as m :: x by the name of the class it belongs to. A simple example in given in Figure 6. It defines two n-bits counters in parallel. The macro DECLARE COUNTER declares n boolean state-variables bi, the function step performs one step (applying next(bi) = bi xor ci−1 and ci = bi and ci−1, ci being the carry), and the macro Proc. AVOCS 2009 8 / 15 ECEASST sys ::= [ template〈class m1〉#TYPE(m1, m2) ] class m{dec} (class) | SC MODULE(m) {dec; SC CTOR(m) {new}}; (module) | sys sys (sequence) dec ::= m〈m∗〉x (field) | void f () {pgm}; (thread) | dec dec (sequence) new ::= SC THREAD( f ); sensitive ¿ x∗ | new; pgm (constructor) Figure 5: Abstract syntax for SYSTEMC BEGINNING OF PROCESS declares the local variables ci. Each counter comes with two addi- tional variables ... started and ... finished, maintained up-to-date by step(), that are true respectively after the counter did its first step, and once each bit of the counter is true. SC_MODULE(module) { DECLARE_COUNTER(count1_); DECLARE_COUNTER(count2_); void compute1() { BEGINNING_OF_PROCESS; while(!(count2_started)) { yield(); } while(!(count1_finished)) { step(count1_); yield(); } ASSERT(count2_finished); } void compute2() { BEGINNING_OF_PROCESS; while(!(count2_finished)) { step(count2_); // yield(); } } SC_CTOR(module) { INIT_COUNTER(count1_); INIT_COUNTER(count2_); SC_THREAD(compute1); SC_THREAD(compute2); } }; Figure 6: Parallel counters 3.2 Principle of the translation for SYSTEMC code In SYSTEMC, the bodies of processes are described using plain C++ code (plus function calls to the SYSTEMC API, i.e. yield in our example). As a consequence, the translation from C/C++ to SIGNAL can be reused for that purpose with a few adjustments, detailed in the following. The general principle is to represent each thread by a SIGNAL process, and the scheduler may be also represented as another SIGNAL process. These processes communicate through added control signals corresponding to communication events. The clocks of these signals have to be precisely defined as SIGNAL expressions. 9 / 15 Volume ? (2009) Translation of C/C++ parallel code into synchronous formalism 3.2.1 Isolation of system calls First it can be noticed that GCC considers SYSTEMC macros (including synchronization prim- itives (“system calls”) like yield as plain C++. As opposed to that, our approach requires a special handling of these macros in the SIGNAL code. Thus they have first to be visible in the SSA code generated for the SYSTEMC threads. To this end, they have to be viewed by GCC as external function calls. This is the case, for instance, for the instruction yield used in the program of Figure 6: it is passed as such in the SSA code. However, if system calls are processed by GCC as usual external calls, they are not distin- guished from other instructions in the SSA code and they may appear among other computations in SSA labeled blocks. A first requirement for being able to process system calls specifically in SIGNAL is thus to isolate them in specific blocks in SSA. This is an easy transformation that consists in breaking up the blocks containing system calls, while respecting the control flow. In the SSA to SIGNAL transformation, it is implemented as the very first step of the transforma- tions applied on the syntactic SIGNAL code (see Section 2.3, step 3). In the resulting SIGNAL translation, the label of the block containing a system call will be viewed as the activation clock of the call of the primitive. Then, suppose that l0, . . . , ln are the labels corresponding to the different calls of a given primitive (say yield, for instance) in a given thread, then the following signal: yield := (when l0) default . . . default (when ln) represents the clock at which control has to be returned to the scheduler, from a yield primitive, for the considered thread. Also, it is necessary to take into account that the block that follows a system call cannot be run in the same logical instant than this system call (the OS has to take the control). Thus, the signal representing the label of this block has to be delayed by one instant and additional memories may be required for some variables. 3.2.2 Addition of control signals In the C to SIGNAL translation, input and output signals of the SIGNAL process resulting from the translation of a C procedure correspond to the parameters of the procedure. When translating a thread in a multi-thread context, a few input or output control signals have to be added, in order to communicate with the system. These signals are the following: input signal running: This signal is defined by the system. It specifies the clock at which the process is actually running (the processor is attributed to this process). Remind that in the SIGNAL code obtained from a SSA form, each operation is sampled, either directly or recursively, by the clock corresponding to a given label (for instance, ocount 1 := icount 3 when L2). In the process corresponding to a thread, each label is addition- ally sampled by the signal running (for instance, L2 := (not (idata 3/=0)) when L1 when running). output signal end processing: This signal is defined by the clock corresponding to the fi- nal block of the SSA (for the example of Figure 3, it would be: end processing := when L2). This is the way for processes to inform the scheduler that a yielding instruc- tion was reached, letting the scheduler decide which process to wake up after. output signals corresponding to system calls in the thread: for example, a signal yield , as defined above, is produced, corresponding to the clock of the calls of the yield prim- Proc. AVOCS 2009 10 / 15 ECEASST itive in the thread. Other signals correspond to wait or signal primitives, for instance. If the primitives are not used in the process, their clock is the empty clock. These sig- nals complement end processing in that end processing says whether a yielding instruction was reached, while other signals like yield tell which one. These signals are added automatically in the translation when the application is a multi- threaded one. Note that another input signal, start, can be added if restart is possible. It is then used to replace the definition of the initial label of the process: bb 0 := (start default false) when running. 3.2.3 Shared variables Care has to be taken for variables shared by different threads. First, when a variable is not de- clared in the procedure where it is used, GCC does not produce real SSA for these variables: there is no creation of distinct instances for them, no φ function for merging their different def- initions. They are considered as “virtual SSA”. For these variables, the mechanism of partial definitions provided in SIGNAL is used. Let x be such a variable and suppose one of its def- initions is x = E in a SSA block labeled li. The corresponding SIGNAL definition will be: x ::= E when li (the use of ::= means that there are possibly other definitions for x). The shared variables are necessarily state variables in SIGNAL, for which their previous value is kept, when they are not redefined. 3.2.4 Inclusion in a simulator In order to validate the translation scheme described above, a mock-up scheduler is described in SIGNAL. This scheduler contains a non-preemptive scheduler that attributes non-deterministically the processor to one process when several processes are ready to execute. This corresponds to the SYSTEMC scheduler with the yield() instruction described above. In SIGNAL, if conflict represents the clock at which there is a potential conflict for the attribution of the processor (several processes are ready), the running process is chosen by: pid := (any when conflict) default ..., where any is an input signal that can take any value. The scheduler manages the status of each one of the processes pi corresponding to the threads of the application. A SIGNAL process SET STATUS is associated with each pi, with state vari- ables representing the current and next status of pi (ready, running). The scheduler receives and uses the control signals that are produced in the processes pi. For instance, the clock of the signal yield produced in some process pk defines instants at which the next status of pk, whose current status is running, will be ready (so that the scheduler will have to choose, non-deterministically, a new running process). Thus, in return, the scheduler defines the control signals running provided to each one of the pi’s. For a given pk, the corresponding signal running represents the clock at which pk has the running status. It is worth noticing that the control of a given application is represented very differently in SIGNAL than it would be in some usual imperative parallel language. There is no explicit pro- gram counter, no imperative description of suspend or resume actions. The control is fully de- scribed by the clocks of the signals of the application. The SIGNAL equations defining some process pi define its behavior as invariant equations. We explained that all operations in pi are 11 / 15 Volume ? (2009) Translation of C/C++ parallel code into synchronous formalism conditioned by a given input signal running. Periods of time in which pi is not running (or is otherwise suspended) correspond to the instants at which the signal running is absent. So that suspend/resume, for instance, is automatically handled through the clock of the signal running. 3.2.5 Possible extensions The scheduler described above is very simple. With the signals running, and end processing, it can model a non-preemptive scheduler. By adding more signals between the scheduler and the processes, and a more complex state-machine in the scheduler, one can relatively easily model a more complex scheduler, like the complete scheduler of SYSTEMC. Indeed, a similar approach was followed in the tool LusSy [MMM06] and could easily be adapted, since LusSy also uses a synchronous formalism to model the scheduler. The scheduler used in LusSy omits a few details of the actual scheduler specifications, but a more complete version is described in [TKVS08], and even this version is still only a dozen states, and could be modeled with a few tens of lines of code in SIGNAL. The main difference with the translation implemented in LusSy is that the latter does not use SSA as an intermediate form, and is indeed less efficient for the translation of plain C++ code (a more detailed comparison of the tools is in progress and will not be detailed here). 3.3 Experiments The example described in Section 3.1 (Figure 6) is used for basic experiments. The program is automatically translated into SIGNAL via SSA following the general scheme described above. Then simulation code (in C) is generated from the SIGNAL program with the POLYCHRONY toolset [INRa]. Traces have been added in the SIGNAL program to be able to follow the simula- tion. The results are those expected, whatever is the choice of the scheduler. Besides these first results, a main objective of the experiments is to demonstrate the possibility of formal validation of models using this methodology. We use again the same example (parallel counters) to prove formal properties using model-checking. The first counter waits that the second one has started counting to count also. At the end, it checks that the second counter has finished (property ASSERT(count2 finished)). Indeed, from the point of view of the first counter, when the second one has started, it has also terminated, so that the property is true. A variant of the program (parallel counters with variant) is when a yield() is introduced in the body of the loop of the second counter. In that case, it is possible to start the second counter without finishing it, and then the first counter can run till the end, so that the property is false. The SIGNAL compiler included in the POLYCHRONY toolset allows for checking static prop- erties such as contradictory clock constraints, cycles, null clocks, exclusive clocks. . . In order to check dynamic properties, the SIGNAL companion model-checker SIGALI [INRb, MR] may be used. It is an interactive tool specialized on algebraic reasoning in Z/3Z logic. SIGALI trans- forms SIGNAL programs into sets of dynamic polynomial equations that basically describe an automaton. Then it can analyze this automaton and prove properties such as liveness, reachabil- ity, and deadlock. The fact that it is reasoning only on a Z/3Z logic constrains the conditions to the boolean data type (true, false, absent). This is practical in the sense that true numerical verifi- cation very soon would result in state spaces that are no longer manageable, however it requires, Proc. AVOCS 2009 12 / 15 ECEASST depending on the nature of the underlying model, major or minor modifications prior to formal verification. For many properties, numerical values are not needed at all and can be abstracted away thus speeding up verification. When verification of numerical manipulations is sought, an abstraction to boolean values can be performed (like replacing any condition depending on inte- gers with non-deterministic Boolean), that suffices in most cases to satisfy the needs. Note that the translation to the SIGALI format is done after the so-called clock calculus completed by the SIGNAL compiler. This clock synthesis allows to reduce significantly the number of constraints. Unfortunately, SIGALI does not provide counter-examples for the system when the proof fails. For the parallel counters example and its variant, all data types are boolean (and the mock-up scheduler has been encoded also using only boolean types). The results of the verification of properties using SIGNAL and SIGALI are those expected. Performances (time and size of the system) for obtaining the results are provided in Figure 7 for 2-bits and 8-bits versions of the counters (they are obtained using a reordering of variables in SIGALI). size time program state var. states reach. states transitions 2-bits parallel counters (property true) 24 224 36 116 0.15 s 2-bits parallel counters with variant (prop. false) 25 225 107 359 0.27 s 8-bits parallel counters (property true) 36 236 1.296 3.896 66 s 8-bits parallel counters with variant (prop. false) 37 237 328.715 1.117.223 124 s Figure 7: Performances for proving properties with SIGALI 3.4 Discussion on performances One common mis-conception about SSA is that since multiple assignments to the same variable are translated into assignments to multiple intermediate variables, the explosion of the number of variables introduces a huge overhead. This paper shows that the explosion of the number of variables is indeed not a problem: most variables are encoded into temporary variables in SIGNAL, and will be dealt with very efficiently by a model-checker (no additional nodes in BDDs). Our experiments show that the number of state variables does not explode. On the other hand, one real advantage of this approach is that it creates very few transitions in the generated code. In the absence of loops, a portion of code between two yielding in- structions is indeed encoded in one clock tick in the synchronous world. As opposed to this, a naive approach translating each instruction with an explicit control-point would generate huge automata, with a lot of control-points. The encoding of this automaton would introduce either a lot of Boolean state variables (with a one-hot encoding) or state variables with a large number of possible values. Our SSA-based translation avoids this overhead. 13 / 15 Volume ? (2009) Translation of C/C++ parallel code into synchronous formalism 4 Conclusion We described the principle and implementation of a translation of imperative code into the syn- chronous data-flow language SIGNAL. Using SSA as an intermediate format, the translation is natural. Since the SSA form we are using (the one of GCC) is very simple, the implementation does not have to deal with all the particular cases of the source language (C++), and can focus on optimizations and performances. We also showed that the extension to a simple model of parallelism was possible and relatively straightforward, and showed the way to encode a more complex scheduling policy. The main limitation of the approach with respect to parallelism is that although the co-routine semantics (non-preemptive scheduling) is encoded as a natural extension of the sequential trans- lation, a preemptive scheduling, or even real parallelism, would be much harder to model faith- fully with this principle. Indeed, the main point of our approach is to encode a sequence of assignments, in one or several basic blocks, into a single parallel assignment, within one clock tick. This turns the sequential piece of code into large atomic actions, which were not atomic in the original code. In other words, applying the translation naively would reduce the set of possible interleaving, thus reducing the set of possible behaviors, and missing bugs in the proof. Applying the translation to real parallel code would therefore require identifying which por- tions of code can be considered atomic, and where to introduce preemption points. Each pre- emption point could then be encoded in the same way as the yield() instruction. Actually, identifying which section can be considered atomic, which instructions can permute and move from a section to another, is an active research domain (see for example [QRR04]). Another limitation of the current implementation is that we currently manage only a small subset of SYSTEMC. Modeling the scheduling algorithm itself would probably not be the most difficult part. One bigger difficulty is to take the architecture of the platform into account. For ex- ample, when one process writes wait(event); and the other writes event.notify();, the translation obviously has to take into account the fact that event is the same object in both cases. Another example is when one does a port.write(value); and the other a another port.read(): in this case, the translation has to depend on whether port and another port are bound together or not. Extracting such information from SYSTEMC code requires a SYSTEMC front-end, and not just a C++ one. Many such front-ends exist, like Pinapa [MMM05] used by LusSy, but none of them use SSA as an intermediate representation. Unfortunately, re-using an existing compiler to get both an SSA intermediate form and the architecture of the platform, linked together, is not easy [BGM+08]. The approach followed by Pinapa does reuse an existing compiler, but relies partly on the fact that the intermediate format is a high-level abstract syntax tree. We are working on a new version of Pinapa that would use an SSA-based compiler, but this requires a substantial rework of the approach, and a complete rewrite of the code itself. Bibliography [BB91] A. Benveniste, G. Berry. The synchronous approach to reactive and real-time systems. Proceedings of the IEEE 79(9):1270–1282, Sep. 1991. Proc. AVOCS 2009 14 / 15 ECEASST [BCE+03] A. Benveniste, P. Caspi, S. A. Edwards, N. Halbwachs, P. Le Guernic, R. de Simone. The synchronous languages 12 years later. In Proceedings of The IEEE. Pp. 64–83. 2003. [BGM+08] L. Besnard, T. Gautier, F. Maraninchi, M. Moy, J.-P. Talpin. Comparative study of ap- proaches to semantics extraction and virtual prototyping of system-level models. Tech- nical report, Verimag, Grenoble INP, France; IRISA, INRIA Rennes, France, 2008. http://www-verimag.imag.fr/∼moy/fotovp/rapport-fotovp.pdf. [BGT09] L. Besnard, T. Gautier, J.-P. Talpin. Code generation strategies in the Polychrony environment. Research report RR-6894, INRIA, 2009. http://hal.inria.fr/inria-00372412/en/ [CFR+91] R. Cytron, J. Ferrante, B. Rosen, M. Wegman, F. Zadeck. Efficiently computing static single assignment form and the control dependence graph. ACM Transactions on Programming Languages and Systems (TOPLAS) 13(4):451–490, 1991. [CGP99] E. Clarke, O. Grumberg, D. Peled. Model checking. Springer, 1999. [Fre] Free Software Foundation. The GNU Compiler Collection. http://gcc.gnu.org. [GCC03] Proceedings of the 2003 GCC Developers Summit. Ottawa, Ontario Canada, 2003. [Hel07] C. Helmstetter. Validation de modèles de systèmes sur puce en présence d’ordonnancements indéterministes et de temps imprécis. PhD thesis, INPG, Grenoble, France, March 2007. http://www-verimag.imag.fr/∼helmstet/these-fr.html [HFG08] P. Herber, J. Fellmuth, S. Glesner. Model checking SystemC designs using timed automata. In CODES/ISSS ’08: Proceedings of the 6th IEEE/ACM/IFIP international conference on Hard- ware/Software codesign and system synthesis. Pp. 131–136. ACM, New York, NY, USA, 2008. [INRa] INRIA Espresso Team. Polychrony tool. http://www.irisa.fr/espresso/Polychrony. [INRb] INRIA Vertecs/Espresso Teams. Sigali tool. http://www.irisa.fr/vertecs/Softwares/sigali.html. [KTBB06] H. Kalla, J.-P. Talpin, D. Berner, L. Besnard. Automated translation of C/C++ models into a synchronous formalism. In Engineering of Computer Based Systems, 2006. ECBS 2006. 13th Annual IEEE Interna- tional Symposium and Workshop on. Pp. 426–436. March 2006. [LGLL91] P. Le Guernic, T. Gautier, M. Le Borgne, C. Le Maire. Programming Real-Time Applications with SIG- NAL. Proceedings of the IEEE 79(9):1321–1336, Sep. 1991. [LTL03] P. Le Guernic, J.-P. Talpin, J.-C. Le Lann. Polychrony for System Design. Journal for Circuits, Systems and Computers 12(3):261–304, April 2003. [MMM05] M. Moy, F. Maraninchi, L. Maillet-Contoz. Pinapa: An Extraction Tool for SystemC descriptions of Systems-on-a-Chip. In EMSOFT. Pp. 317 – 324. September 2005. [MMM06] M. Moy, F. Maraninchi, L. Maillet-Contoz. LusSy: an open Tool for the Analysis of Systems-on-a-Chip at the Transaction Level. Design Automation for Embedded Systems, 2006. special issue on SystemC- based systems. http://www-verimag.imag.fr/∼moy/publications/springer.pdf [MR] H. Marchand, E. Rutten. Sigali User Manual. http://www.irisa.fr/espresso/Polychrony. [QRR04] S. Qadeer, S. K. Rajamani, J. Rehof. Summarizing procedures in concurrent programs. In POPL ’04: Proceedings of the 31st symposium on Principles of programming languages. Pp. 245–255. ACM, New York, NY, USA, 2004. [The] The Tree SSA project. Tree-SSA. http://gcc.gnu.org/projects/tree-ssa. [TKVS08] D. Tabakov, G. Kamhi, M. Vardi, E. Singerman. A Temporal Language for SystemC. In Formal Methods in Computer-Aided Design, 2008. FMCAD ’08. Pp. 1–9. 2008. [TLSG05] J.-P. Talpin, P. Le Guernic, S. K. Shukla, R. Gupta. A compositional behavioral modeling framework for embedded system design and conformance checking. International Journal of Parallel Programming 33(6):613–643, 2005. 15 / 15 Volume ? (2009)