Parallel Evaluation of Interaction Nets: Case Studies and Experiments Electronic Communications of the EASST Volume 73 (2016) Graph Computation Models Selected Revised Papers from GCM 2015 Parallel Evaluation of Interaction Nets: Case Studies and Experiments Ian Mackie and Shinya Sato 20 pages Guest Editors: Detlef Plump Managing Editors: Tiziana Margaria, Julia Padberg, Gabriele Taentzer ECEASST Home Page: http://www.easst.org/eceasst/ ISSN 1863-2122 http://www.easst.org/eceasst/ ECEASST Parallel Evaluation of Interaction Nets: Case Studies and Experiments Ian Mackie1 and Shinya Sato2 1LIX, CNRS UMR 7161, École Polytechnique, 91128 Palaiseau Cedex, France 2University Education Center, Ibaraki University, 2-1-1 Bunkyo, Mito-shi, Ibaraki 310-8512, Japan Abstract: Interaction nets are a particular kind of graph rewriting system that have many properties that make them useful for capturing sharing and parallelism. There have been a number of research efforts towards implementing interaction nets in parallel, and these have focused on the implementation technologies. In this paper we investigate a related question: when is an interaction net system suitable for par- allel evaluation? We observe that some nets cannot benefit from parallelism (they are sequential) and some have the potential to be evaluated in a highly parallel way. This first investigation aims to highlight a number of issues, by presenting experi- mental evidence for a number of case studies. We hope this can be used to help pave the way to a wider use of this technology for parallel evaluation. Keywords: Interaction nets, parallel evaluation, case studies 1 Introduction Interaction nets are a model of computation based on a restricted form of graph rewriting: the rewrite rules must be between two nodes on the left-hand side, be local (not change any part of graph other than the two nodes), and there must be at most one rule for each pair of nodes. These constraints have no impact on the expressive power of interaction nets (they are Turing complete), but they offer a very useful feature: they are confluent by construction. The confluence property taken together with the locality constraint means that they lend themselves to parallel evaluation: all possible rewrites can be done in parallel. There have been a number of studies on the parallel evaluation of interaction nets. Examples include Pinto [Pin00], Jiresch [Jir14], and more recently Kahl [Kah15]. In all these works, it is the parallel implementation that is the focus. Pinto’s work focusses on building abstract machines for interaction nets that bring out the parallelism. Jiresch studies compiling nets for evaluation on graphic processing units (GPUs), and finally, the work of Kahl uses a parallel implementation of Haskell to build a simple interpreter on the underlying Haskell parallel framework. In all these studies, evidence is obtained that interaction nets can benefit from parallel implementation: examples are shown where speedup is possible. In this paper, we look at a completely different aspect of this issue. We want to study which kinds of programs are suitable for, and thus can take advantage of, parallel evaluation. Thus we study when is a net able to take advantage of parallelism, and when it cannot (i.e., if it is sequential). We will do this though a number of 1 / 20 Volume 73 (2016) Parallel Evaluation of Interaction Nets case studies, and back up the work with experimental evidence using our own parallel evaluator developed to support this study. The specific question that we propose in this paper is: when is a particular interaction net system well suited for parallel evaluation? More precisely, are some interaction nets “more parallel” than others? A question that naturally follows from this is: can we transform a net so that it is more suited for parallel evaluation? Once we have understood this, we can also ask the reverse question: can a net be made sequential? The purpose of this paper is to make a start to investigate these questions, and we begin with an empirical study of interaction systems to identify when they are suitable for parallel evaluation or not. We take a number of typical examples (some common ones from the literature together with some new ones we developed for this paper) to see if they can benefit from parallel evaluation. In addition, we make some observations about how programs can be transformed so that parallelism is more useful. Using these examples, we give some heuristics for getting more parallelism out of an interaction net system. Our focus will be restricted to comparing different interaction net systems. Other platforms and programming languages supporting Parallel evaluation can be found in the literature (see for example [JS08, HS08]). We leave for a future study how useful interaction nets are in comparison to these alternative approaches. The main contributions of the paper are: 1. Through examples and case studies we identify that some nets are sequential and some have potential for parallelism. We observe that it is also possible to transform a net to make it more parallel (and therefore by the reverse transform, more sequential). 2. We define an unbounded notion of parallel evaluation (all reductions possible are done in one step). We build an evaluator to measure this, and we test it with a number of examples. 3. Finally, we use our own parallel evaluator that not only gives some practical aspect of the case studies, but we can compare with extant parallel implementations. This paper is the start of a research effort to understand better parallel evaluation of interaction nets, and to put them to practical use. Structure. In the next section we recall the definition of interaction nets, and describe the notion of parallel evaluation that we are interested in. Through examples we motivate the ideas behind this work. In Section 3 we give a few small case studies to show how parallelism can have a significant impact on the evaluation of a net. In Section 4 we give some experimental results, and discuss these. Finally, we conclude in Section 5. 2 Background and Motivation In the graphical rewriting system of interaction nets [Laf90], we have a set Σ of symbols, which are the names of the nodes in our diagrams. Each symbol has an arity ar that determines the number of auxiliary ports that the node has. If ar(α) = n for α ∈ Σ, then α has n + 1 ports: n auxiliary ports and a distinguished one called the principal port. Selected Revised Papers from GCM 2015 2 / 20 ECEASST α ··· x1 xn Nodes are drawn variably as circles, triangles or squares. A net built on Σ is an undirected graph with nodes at the vertices. The edges of the net connect nodes together at the ports such that there is only one edge at every port. A port which is not connected is called a free port. Two nodes (α,β )∈ Σ×Σ connected via their principal ports form an active pair, which is the interaction net analogue of a redex. A rule ((α,β ) =⇒ N) replaces the active pair (α,β ) by the net N. All the free ports are preserved during reduction, and there is at most one rule for each pair of agents. The following diagram illustrates the idea, where N is any net built from Σ. α β ... ... x1 xn ym y1 =⇒ N ... ... x1 xn ym y1 The most powerful property of this system is that it is one-step confluent: the order of rewrit- ing is not important, and all sequences of rewrites are of the same length (in fact they are just permutations). This has practical consequences: the diagrammatic transformations can be ap- plied in any order, or even in parallel, to give the correct answer. It is the parallelism aspect that we develop in this paper. We define some notions of nets and evaluation. A net is called sequential if there is at most one active pair that can be reduced at each step. We say that a net is evaluated sequentially if one active pair is reduced at each step. For our notion of parallel evaluation, we require that all active pairs in a net are reduced simultaneously, and then any redexes that were created are evaluated at the next step. We do not bound the number of active pairs that can be reduced in parallel. We remark that the number of parallel steps will always be less than or equal to the number of sequential steps (for a sequential net, the number of steps is the same for sequential and parallel evaluation). As an example, consider unary numbers with addition. We represent the following term rewrit- ing system add(Z,y) = y add(S(x),y) = add(x,S(y)) as a system of nets with agents Z, S, +: Z S + together with two rewrite rules: 3 / 20 Volume 73 (2016) Parallel Evaluation of Interaction Nets Z + S + S + =⇒ =⇒ In this example, nets correspond to the terms in a very straightforward way. If we construct the net corresponding to the addition of two numbers, then we can observe that the addition of two numbers is sequential: at any time there is just one active pair, and reducing this active pair creates one more active pair, and so on. We call operations like this batch operations. In terms of cost, reducing the net corresponding to add(n,m) requires n + 1 interactions. If we consider the net corresponding to the term add(add(m,n),p), then the system is still sequential, and the cost is now 2m + n + 2. Using associativity of addition, the situation changes significantly. The net corresponding to add(m,add(n,p)) has sequential cost m + 1 + n + 1 = m + n + 2, and parallel cost max(m + 1,n + 1). Thus, using the associativity property of numbers, we obtain a system that is significantly more efficient sequentially, and moreover is able to benefit from parallel evaluation. The example becomes even more interesting if we change the rules of the rewrite system to an alternative version of addition: add(Z,y) = y add(S(x),y) = S(add(x,y)) The two interaction rules are now: Z + S + + S =⇒ =⇒ Unlike the previous system, the term add(add(m,n),p) already has scope for parallelism. We call operations like this stream operations. The sequential cost is now 2m + n + 2 and the parallel cost is m+n+2. But again, if we use associativity then we can do even better and achieve sequential cost m + n + 2 and parallel cost max(m + 1,n + 1) for the term add(m,add(n,p)). These examples illustrate that some nets are sequential; some nets can use properties of the system (in this case associativity of addition) to get better sequential and parallel behaviours; and some systems can have modified rules that are more efficient, and also more appropriate to exploit parallelism. The next section gives examples where there is scope for parallelism in nets. 3 Case studies The previous arithmetic example demonstrates that some systems are more amenable to parallel evaluation than others. In this section we give some empirical case studies for a number of different systems to show that when a suitable system can be found, the parallel evaluation gives significantly better results than sequential evaluation. Selected Revised Papers from GCM 2015 4 / 20 ECEASST Fibonacci. The Fibonacci function is a good example where many recursive calls generate a lot of possibilities for parallel evaluation. We build the interaction net system that corresponds to the term rewriting system: fib 0 = fib 1 = 1 fib n = fib(n-1) + fib(n-2) Using a direct encoding of this system together with addition defined previously, we can obtain an interaction system: Fib Fib2Z ⇒ S Z Fib ⇒ S Fib2 Z ⇒ S Z Fib S Fib Dup + Fib2 ⇒ S SDup ⇒ S S Dup Dup ⇒Z Z Z The following is an example of rewriting: Fib Fib S Fib Dup + S Z S S Fib2 S Z �→ �→ Fib S Fib + S Z S Z S S Z �→ � fib 3 fib 1 + fib 2 With respect to the two versions of the addition operation introduced in Section 2, we call the former a batch operation, which returns the computational result after finishing processing all of the given data, and the latter a streaming operation, which computes one (or a small number of) 5 / 20 Volume 73 (2016) Parallel Evaluation of Interaction Nets 0 1000 2000 3000 4000 5000 6000 7000 8000 0 2 4 6 8 10 12 14 s te p s n fib n (batch additive operation) sequential parallel 0 1000 2000 3000 4000 5000 6000 7000 8000 0 2 4 6 8 10 12 14 s te p s n fib n (streaming additive operation) sequential parallel 0 200 400 600 800 1000 1200 0 2 4 6 8 10 12 14 s te p s n fib n (in parallel) batch streaming Figure 1: Comparing batch and streaming operations elements of the given data and returns partial parts of the computational result immediately. The graphs in Figure 1 show the number of interactions in each version, where we plot sequential steps against parallel steps to indicate the rate of growth of each one. Both graphs demonstrate that the sequential computation is exponential, while the parallel one is quadratic. We remark that, in the parallel execution, the number of steps with the streaming operation are less than a half of the numbers with the batch operation. This result is illustrated in the third graph in the figure. By allowing attributes as labels of agents, we can include integer numbers in agents. In addi- tion, we can use conditional rewritings, preserving the one-step confluence, when these condi- tions on attributes are disjoint. In this case, the interaction net system representing the Fibonacci function is written as follows: n ⇒Fib n=0 1 n ⇒Fib n=1 1 n ⇒Fib not(n=0) and not(n=1) Fib n-1 Fib n-2 Add n ⇒ Addn (n) m n+m⇒ Add Addn (n) There is very little difference between the load balances of fib(n-1) and fib(n-2), and thus this system gives the following graph, demonstrating that the growth rate for parallel computation is linear, while the sequential rate is exponential: Selected Revised Papers from GCM 2015 6 / 20 ECEASST 0 500 1000 1500 2000 2500 0 2 4 6 8 10 12 14 s te p s n fib n (integers) sequential parallel Ackermann. The Ackermann function is defined by three cases: ack 0 n = n+1 ack m 0 = ack (m-1) 1 ack m n = ack (m-1) (ack m (n-1)) We can build the interaction net system on the unary natural numbers that corresponds to the term rewriting system as follows: SA ⇒ A2A ⇒ Z y r xy r S y r r y S x A2 ⇒Z r A rx x S Z Pred SA2 ⇒ yrx A r A y Pred Dup x where the agent Dup duplicates S and Z agents. The following is an example of rewriting: SA Z �→ A2 �→ A A Pred Dup ack 1 2 �→ � S S Z S S Z S Z S Z S Z AS Z A S Z Z ack 0 (ack 1 1) 7 / 20 Volume 73 (2016) Parallel Evaluation of Interaction Nets (a) unary natural numbers 0 20000000 40000000 60000000 80000000 100000000 120000000 140000000 0 2 4 6 8 10 s te p s n ack 3 n sequential parallel (b) integers 0 10000000 20000000 30000000 40000000 50000000 60000000 70000000 80000000 90000000 0 2 4 6 8 10 s te p s n ack 3 n sequential parallel Figure 2: Benchmarks of the execution of Ackermann function in sequential and parallel When we use numbers as attributes, the system can be written as: 1 ⇒ m=0 m A A2(m)⇒ not(m=0) m A n ⇒ n=0 m-1 A n ⇒ not(n=0) m-1 A m n-1 A Addn (1) A2(m) A2(m) Figure 2 shows the number of interactions in the cases of (a) unary natural numbers and (b) integer numbers, where we plot sequential steps against parallel steps to indicate the rate of growth of each one. Unfortunately, in Figure 2 (b), there is no significant difference in the sequential and the parallel execution, and thus there is no possibility of the improvement by parallel execution. This is because the Addn agent works as the batch operation, thus it waits for part of the result. For instance, after the last step in the following the computation step ack 2 1, the Addn(1) agent, which is the result of ack 0 (ack 1 0), waits for the computational result of ack 1 0. However, the computation of A2 should proceed because the result of the Addn(1) will be more than 0. A 2 1 �→ � A2(1) A A 0 1 0 �→ A2(1) A 1 0 Addn (1) On the other hand, in the case of the computation on unary natural numbers, A2 interacts with the streaming result of ack 0 (ack 1 0): Selected Revised Papers from GCM 2015 8 / 20 ECEASST SA Z S S Z �→ � AA S Z Z Z A2 S Z �→ A S Z Z A2 S Z S Here, borrowing the S agent to denote numbers greater than 0, we change the rules, especially in the case of ack 0 n, by replacing Addn by S as follows: 1 A2(m)⇒ not(m=0) m A n ⇒ n=0 m-1 A n ⇒ not(n=0) m-1 A m n-1 A A2(m) A2(m) A2(m) ⇒ m-1 A m A ⇒ m=0 m A S Sum (n) m ⇒ n+m Sum (n) ⇒ Sum (n+1) S S Thanks to the introduction of the S agent, A2 can be processed without waiting for the result of ack 1 0. This therefore gives a streaming operation: A 2 1 �→ � Sum (0) A2(1) A A 0 1 0 �→ A2(1) A 1 0 S Sum (0) Sum (0) In addition, the benchmark graph shows that the improved system is more efficient and more appropriate to exploit parallelism: 0 10000000 20000000 30000000 40000000 50000000 60000000 70000000 0 2 4 6 8 10 s te p s n ack 3 n (integers, streaming) sequential parallel 0 10000000 20000000 30000000 40000000 50000000 60000000 70000000 0 2 4 6 8 10 s te p s n ack 3 n (integers, in parallel) batch streaming 9 / 20 Volume 73 (2016) Parallel Evaluation of Interaction Nets To summarise this section, a system can exploit parallelism by changing some batch operations into streaming ones. We leave as future work the criteria to determine when this transformation can benefit from parallelism. Bubble sort. The simple sorting algorithm bubble sort can benefit from parallel evaluation in interaction nets. One version of this algorithm, written in Standard ML [MTHM97], is as follows: fun bsortsub (x::x2::xs) = if x > x2 then x2::(bsortsub (x::xs)) else x::(bsortsub(x2::xs)) | bsortsub x = x fun bsort t = let val s = bsortsub t in if t=s then s else bsort s end; Using a direct encoding of this program, we obtain the interaction system: ⇒ B(x) BS x B(x) xBS Nil ⇒ ⇒ Nil ⇒ ⇒B(x) B(x) y x B(y) y B(x) Nil Nil x � y not (x � y) EQn(x) � y EQn(x) y x = y ⇒ EQ y EQn(x) y not(x = y) ⇒ BS y � EQ Nil ⇒ N i l � EQ x ⇒ EQn(x) where the δ and ε agents are defined as a duplicator and an eraser: � � n n � ⇒ ⇒ Nil n Nil Nil � ⇒Nil � n ⇒ � For instance, a list [3,4,2] is sorted as follows: BS 3 4 2 Nil �→ B(3) EQn(3) � 4 2 Nil 3 �→ � B(3) EQn(3) 4 2 Nil 4 2 Nil �→ B(4) EQn(3) 4 2 Nil 2 Nil Selected Revised Papers from GCM 2015 10 / 20 ECEASST 23 �→ EQn(3) 4 2 Nil 4 Nil 3 2 �→ 4 2 Nil 4 Nil EQ 3 2 �→ � 2 Nil 4 NilBS � B(3) EQn(3) 2 4 Nil 2 4 Nil �→ � 32 �→ � 2 4 Nil 4 Nil � BS �→ � B(2) EQn(2) 3 4 Nil 3 4 Nil 2 3 �→ � Nil 4 Nil EQ 2 3�→ � 4 Nil This system shows that parallel bubble sorting is linear, whereas sequential evaluation is quadratic, as indicated in the graph below. 0 2000 4000 6000 8000 10000 12000 14000 16000 10 20 30 40 50 60 70 s te p s n BS n (direct translation) sequential parallel However, it contains the equality test operation by EQ and EQn to check whether the sorted list is the same as the given list. In comparison to the typical functional programming languages, interaction nets require copying and erasing of lists for the test that can cause inefficient compu- tation. Moreover, the sorting process is applied to the sorted list by B again and again. Taking into account that the B moves the maximum number in the given unsorted list into the head of the sorted list, we can obtain a more efficient system: ⇒ ⇒B(x)BS x B(x) x x BS Nil ⇒ ⇒ Nil BS B(x) BS M ⇒ ⇒ ⇒B(x) B(x) y x B(y) y B(x) Nil M Nil M M x � y not (x � y) y For instance, a list [3,4,2] is sorted as follows: 11 / 20 Volume 73 (2016) Parallel Evaluation of Interaction Nets BS 3 4 2 Nil �→ BS B(3) 4 2 Nil �→ BS B(4)3 2 Nil �→ BS B(4)3 2 Nil �→ BS 3 2 M Nil4 �→ 2 M Nil4BS B(3) �→ 2 M Nil4BS B(3) �→ 2 M Nil4BS 3 M Nil43�→ BS B(2) �→ M Nil43BS 2 Nil43�→ 2 The system reduces the number of computational steps significantly, and gives the best expected behaviour as follows: 0 500 1000 1500 2000 2500 3000 10 20 30 40 50 60 70 s te p s length BS n (improved) sequential parallel 0 2000 4000 6000 8000 10000 12000 14000 0 50 100 150 200 250 300 350 s te p s n BS n (in parallel) direct improved Map function. The map function map takes a function f and a list [a1,a2,...,an], returns a list as follows: map f [a1,a2,...,an] = [ f (a1), f (a2),..., f (an)]. This function is well-known as a higher-order function in functional languages and also Google’s MapReduce [DG04]. Generally, we can build, for an agent f that has one auxiliary port: f the interaction net system such that an agent map f works as the map function as follows: mapf ⇒ f n n mapf opCons mapf ⇒Nil Nil where the agent opCons is defined as follows: opCons n ⇒ n For instance, computation of map fib [0,1,2] is performed in the interaction nets system as follows: Selected Revised Papers from GCM 2015 12 / 20 ECEASST mapFib 0 1 2 Nil 0Fib �→ mapFib opCons 1 2 Nil �→ � 0Fib opCons 1Fib opCons 2Fib opCons Nil �→ � 1 1 2 Nil Here, for the benchmark, we just write MAPFib m n as the map application with the Fi- bonacci function and a n-length list of m such that [m,m,...,m], thus MAPFib 10 4 means map fib [10,10,10,10]. The following graph shows the benchmark of the execution MAPFib 10 n in sequential and parallel evaluation: 0 1000 2000 3000 4000 5000 6000 7000 0 2 4 6 8 10 12 14 16 18 s te p s n MAPFib 10 n sequential parallel This shows that both of the evaluations are linear, however the slope in the sequential evaluation is more steep than in the parallel evaluation. In the sequential evaluation each execution of fib 10 is accumulated, whereas in the parallel evaluation each is performed simultaneously. All these examples show the scope for harnessing parallelism from an empirical study: some systems do not benefit, whereas others allow quadratic computations be executed in linear par- allel complexity. However, these results give a flavour of the potential, and do not necessarily mean that they can be implemented like this in practice. 4 Discussion Here we examine the potential of parallelism illustrated by the graphs in Section 3, by using a multi-threaded parallel interpreter of interaction nets, called Inpla [Sat14], implemented with gcc 4.6.3 and the Posix-thread library. We compare the execution time of Inpla with other evaluators and interpreters. The programs were run on a Linux PC (2.4GHz, Core i7-3630QM, 16GB) and the execution time was measured using the UNIX time command as the average of five executions. First, for executions of pure interaction nets, we take INET [HMS09], amineLight [HMS10] and HINet [Kah15], and compare Inpla with those by using programs—the Fibonacci function (streaming additive operation) and the Ackermann function. Table 1 shows the execution time in seconds among interaction nets evaluators, where HINet runs on a single thread although it supports multi-thread execution. We see that HINet runs 13 / 20 Volume 73 (2016) Parallel Evaluation of Interaction Nets INET amLight HINet Inpla Inpla1 Inpla2 Inpla3 Inpla4 Inpla5 fib 29 2.31 2.05 127.62 0.80 0.82 0.52 0.43 0.41 0.43 fib 30 3.82 3.40 609.64 1.25 1.26 0.76 0.63 0.61 0.60 ack 3 10 18.26 11.40 438.79 4.30 4.42 2.31 1.60 1.54 1.40 ack 3 11 66.79 46.30 1697.30 17.55 18.18 9.42 6.80 5.81 6.06 Table 1: The execution time in seconds of the pure interaction nets on interaction nets evaluators 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 S (n ) (s p e e d u p ) n (threads) HINet fib 29 fib 30 ack 3 10 ack 3 11 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 S (n ) (s p e e d u p ) n (threads) Inpla fib 29 fib 30 ack 3 10 ack 3 11 Figure 3: The speedup-ratio by multi-thread execution on HINet and Inpla slowest since it evaluates a given net on an interpreter written by the Glasgow Haskell Com- piler (GHC) in comparison with INET that compiles it to source codes of the C programming language, and amineLight and Inpla that evaluate it on interpreters written by the C language. Inpla runs faster than INET since Inpla is a refined version of amineLight, which is the fastest interaction nets evaluator [HMS10]. In the table the subscript of Inpla gives the number of threads in the thread pool, for instance Inpla2 means that it was executed by using two threads. Figure 3 illustrates the speedup-ratio S(n) = T (1) T (n) where T (i) is an execution time by i-threads. HINet uses Haskell parallel and memory management framework, whereas Inpla manages these by simple mechanism for the sake of realising the fastest computation. Generally, we see similarity between these trends, although they are fluctuating. Next, we compare Inpla with Standard ML of New Jersey (SML v110.74) [MTHM97] and Python (2.7.3) [RD11] in the extended framework of interaction nets which includes integer numbers and lists. SML is a functional programming language and it has the eager evaluation strategy that is similar to the execution method in interaction nets. Python is a widely-used in- terpreter, and thus the comparison with Python gives a good indication on efficiency. Here we benchmark the Fibonacci function and the streaming operation versions of the Ackermann and the improved version of the Bubble Sort algorithm for randomly generated list elements. Table 2 shows that SML computes those arithmetic functions fastest. Inpla uses agents to represent the functions and integer numbers, and those agents are consumed and reproduced repeatedly during computation. Thus the execution time becomes slower eventually, compared to the execution in Selected Revised Papers from GCM 2015 14 / 20 ECEASST SML Python Inpla Inpla1 Inpla2 Inpla3 Inpla4 Inpla5 fib 34 0.12 2.09 1.67 1.50 0.80 0.70 0.68 0.82 fib 38 0.66 16.32 11.39 10.22 5.68 4.47 4.40 4.75 ack 3 6 0.03 0.05 0.02 0.03 0.02 0.02 0.02 0.02 ack 3 9 0.06 -1 0.69 0.72 0.38 0.27 0.24 0.24 BS 10000 1.64 6.71 2.11 2.25 1.17 0.87 0.76 0.68 BS 20000 8.38 30.35 8.38 8.93 4.57 3.64 2.98 2.49 MAPFib 34 5 0.49 9.89 8.92 8.09 4.55 3.21 2.54 2.73 MAPFib 34 10 0.94 19.77 17.81 17.23 9.28 6.44 5.22 5.38 1 RuntimeError: maximum recursion depth exceeded Table 2: The execution time in seconds on interpreters 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 S (n ) (s p e e d u p ) n (threads) Inpla fib 38 ack 3 9 BS 20000 MAPFib 34 10 Figure 4: The speedup-ratio by multi-thread execution on Inpla SML that performs computation by function calls and managing stacked arguments. In compar- ison with Python, Inpla computes those functions faster. The sort algorithm is a special case in that interaction nets are efficient to implement these algorithms. In the case of the map applica- tion, because fib 34 is applied 5 and 10 times, the execution time increases also about 5 and 10 times, respectively. Figure 4 illustrates the speedup-ratio by multi-thread execution on Inpla. Generally, since Core i7 processors have four cores, it tends to reach the peak with four or five execution threads. Table 3 shows execution time in seconds on another Linux PC (4.2GHz, Core i7-6700K, 32GB), corresponding to Table 1 and 2. The speedup-ratio by multi-thread execution is illus- trated by Figure 5. This processor also has four cores, hence it tends to reach the peak with four or five execution threads as well. Next we analyse the results of the parallel execution in Inpla by using graphs in Section 3, which show the trends of steps in parallel execution on the assumption of the unbounded re- sources. We may write “parallel(n)” in the following graphs to make explicit that Inplan is used for the experiment. Fibonacci function. Figure 6 shows the execution time of each program for the Fibonacci function by using Inpla. We see that each sequential execution is exponential as shown in the 15 / 20 Volume 73 (2016) Parallel Evaluation of Interaction Nets Inpla Inpla1 Inpla2 Inpla3 Inpla4 Inpla5 Inpla6 Inpla7 Inpla8 fib 29 0.70 0.75 0.52 0.46 0.45 0.45 0.45 0.44 0.44 fib 30 1.04 1.11 0.73 0.63 0.59 0.59 0.58 0.58 0.58 ack 3 10 2.61 2.82 1.46 1.08 0.86 0.75 0.75 0.70 0.70 ack 3 11 10.62 11.44 5.92 4.64 3.74 2.99 2.86 2.81 2.82 fib 34 0.93 0.96 0.51 0.39 0.38 0.34 0.40 0.40 0.41 fib 38 6.28 6.51 3.41 2.58 2.38 2.17 2.39 2.34 2.50 ack 3 6 0.01 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 ack 3 9 0.39 0.43 0.23 0.17 0.16 0.13 0.12 0.12 0.12 BS 10000 1.33 1.58 0.79 0.55 0.51 0.46 0.39 0.38 0.37 BS 20000 5.30 6.11 3.11 2.57 2.04 1.71 1.49 1.43 1.38 MAPFib 34 5 4.74 4.90 2.54 1.87 1.64 1.40 1.49 1.48 1.46 MAPFib 34 10 9.48 9.78 5.06 3.49 3.25 2.81 2.75 2.72 2.62 Table 3: The execution time in seconds on another PC 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 S (n ) (s p e e d u p ) n (threads) Inpla (unary natural numbers) fib 29 fib 30 ack 3 10 ack 3 11 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 S (n ) (s p e e d u p ) n (threads) Inpla (integers) fib 38 ack 3 9 BS 20000 MAPFib 34 10 Figure 5: The speedup-ratio by multi-thread execution on another PC Selected Revised Papers from GCM 2015 16 / 20 ECEASST 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 0 5 10 15 20 25 30 ti m e ( s e c ) n fib n (unary natural numbers, batch-add, Inpla) sequential parallel(2) parallel(4) 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 0 5 10 15 20 25 30 ti m e ( s e c ) n fib n (unary natural numbers, streaming-add, Inpla) sequential parallel(2) parallel(4) 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 0 5 10 15 20 25 30 35 ti m e ( s e c ) n fib n (integers, Inpla) sequential parallel(2) parallel(4) Figure 6: The execution time of Fibonacci function by Inpla graphs on the assumption of unbounded resources (Figure 1). The increase rate of execution time in the parallel execution by Inpla gradually becomes close to, as we increase the number of threads, the trends of the parallel computation as given in Figure 1. We note that, in the computation of unary natural numbers, the execution of the streaming version is slower than the batch version as shown in the graph on the left side in Figure 7. The graph on the right side shows the ratio of steps in the streaming version to steps in the batch version on the assumption of the unbounded resources. The ratio becomes around 0.4 according to increasing n in fib n. This means that there is a limited benefit of the parallelism, even if we assume unbounded resources. In the real computation, the cost of parallel execution more affects the execution time in comparison to the benefit of the parallelism, and thus the streaming version becomes slower. Ackermann function. Figure 8 shows the execution time of each program for the Ackermann function by using Inpla. We see that, except for the batch operation version, the parallel com- putation follows well the trends on the assumption of the unbounded resources. On the other hand, the parallel execution of the batch operation version takes quite a long time compared to the streaming version. This is because, in the unbounded resources, not only that there is no significant difference in sequential and parallel execution, but also that there is a cost of parallel execution such as scheduling of threads execution uselessly. These are some of the reasons why the parallel execution does not always have good performance, but are improved in the streaming version. 17 / 20 Volume 73 (2016) Parallel Evaluation of Interaction Nets 0.5 1.0 1.5 2.0 2.5 3.0 0 5 10 15 20 25 30 ti m e ( s e c ) n fib n (unary natural numbers, Inpla in parallel) batch(4) stream(4) 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 25 30 ra ti o n streaming / batch (unbounded, in parallel) Figure 7: Comparison between the batch and the streaming addition in parallel execution by Inpla 0.0 2.0 4.0 6.0 8.0 10.0 12.0 14.0 16.0 18.0 0 1 2 3 4 5 6 7 8 9 10 11 ti m e ( s e c ) n ack 3 n (unary natural numbers, Inpla) sequential parallel(2) parallel(4) 0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 0 1 2 3 4 5 6 7 8 9 10 ti m e ( s e c ) n ack 3 n (integers, batch, Inpla) sequential parallel(2) parallel(4) 0.0 2.0 4.0 6.0 8.0 10.0 12.0 0 1 2 3 4 5 6 7 8 9 10 11 ti m e ( s e c ) n ack 3 n (integers, streaming, Inpla) sequential parallel(2) parallel(4) Figure 8: The execution time of Ackermann function by Inpla Selected Revised Papers from GCM 2015 18 / 20 ECEASST 0.0 10.0 20.0 30.0 40.0 50.0 60.0 0 5000 10000 15000 20000 ti m e ( s e c ) n BS n (direct, Inpla) sequential parallel(2) parallel(4) 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 0 5000 10000 15000 20000 ti m e ( s e c ) n BS n (improved, Inpla) sequential parallel(2) parallel(4) Figure 9: The execution time of Bubble sort by Inpla 0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 0 2 4 6 8 10 12 14 16 18 ti m e ( s e c ) n MAPFib 34 n (inpla) sequential parallel(2) parallel(4) Figure 10: The execution time of the map application by Inpla Bubble sort. Figure 9 shows the execution time of the two programs for Bubble sort using Inpla. As anticipated by the graphs on the assumption of the unbounded resources, we see that the improved version performs best as expected. Map function. Figure 10 shows the execution time of the map application MAPFib 34 n using Inpla. As with Bubble sort, this algorithm performs well as we increase the number of threads. 5 Conclusion Although discussed for many years, we believe that parallel implementations of interaction nets is still a very new area and much needs to be done. In this work we have assumed unbounded re- sources in terms of the number of processing elements available. This is a reasonable assumption with GPU when many thousands of processing elements are available. We analysed the execu- tion result of the multi-threaded execution by using the investigation result on the assumption, and also showed that, on the one hand, these perform as the best expected, and on the other hand, some of execution results take something away from the investigation results due to an overhead of using parallel technologies as anticipated by the investigation. We hope the ideas in this paper 19 / 20 Volume 73 (2016) Parallel Evaluation of Interaction Nets may help in moving this work forward. Bibliography [DG04] J. Dean, S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In Brewer and Chen (eds.), 6th Symposium on Operating System Design and Im- plementation (OSDI 2004), San Francisco, California, USA, December 6-8, 2004. Pp. 137–150. USENIX Association, 2004. http://www.usenix.org/events/osdi04/tech/dean.html [HMS09] A. Hassan, I. Mackie, S. Sato. Compilation of Interaction Nets. Electr. Notes Theor. Comput. Sci. 253(4):73–90, 2009. [HMS10] A. Hassan, I. Mackie, S. Sato. A lightweight abstract machine for interaction nets. ECEASST 29, 2010. [HS08] M. Herlihy, N. Shavit. The Art of Multiprocessor Programming. Morgan- Kaufmann, 2008. [Jir14] E. Jiresch. Towards a GPU-based implementation of interaction nets. In Löwe and Winskel (eds.), DCM. EPTCS 143, pp. 41–53. 2014. [JS08] S. P. Jones, S. Singh. A Tutorial on Parallel and Concurrent Programming in Haskell. In Lecture Notes in Computer Science. Springer Verlag, 2008. [Kah15] W. Kahl. A Simple Parallel Implementation of Interaction Nets in Haskell. In Lago and Harmer (eds.), Proceedings Tenth International Workshop on Developments in Computational Models, DCM 2014. EPTCS 179, pp. 33–47. 2015. [Laf90] Y. Lafont. Interaction Nets. In Proceedings of the 17th ACM Symposium on Princi- ples of Programming Languages (POPL’90). Pp. 95–108. ACM Press, 1990. [MTHM97] R. Milner, M. Tofte, R. Harper, D. MacQueen. The Definition of Standard ML (Re- vised). MIT Press, 1997. [Pin00] J. S. Pinto. Sequential and Concurrent Abstract Machines for Interaction Nets. In Tiuryn (ed.), Proceedings of Foundations of Software Science and Computation Structures (FOSSACS). Lecture Notes in Computer Science 1784, pp. 267–282. Springer-Verlag, 2000. [RD11] G. van Rossum, F. L. Drake. The Python Language Reference Manual. Network Theory Ltd., 2011. [Sat14] S. Sato. Design and implementation of a low-level language for interaction nets. PhD thesis, University of Sussex, September 2014. Selected Revised Papers from GCM 2015 20 / 20 http://www.usenix.org/events/osdi04/tech/dean.html Introduction Background and Motivation Case studies Discussion Conclusion