Acta Polytechnica Vol. 52 No. 5/2012 A Network Simulation Tool for Task Scheduling Ondřej Votava Dept. of Computer Science, Czech Technical University, Karlovo nám. 13, 121 35 Praha 2, Czech Republic Corresponding author: votavon1@fel.cvut.cz Abstract Distributed computing may be looked at from many points of view. Task scheduling is the viewpoint, where a distributed application can be described as a Directed Acyclic Graph and every node of the graph is executed independently. There are, however, data dependencies and the nodes have to be executed in a specified order. Hence the parallelism of the execution is limited. The scheduling problem is difficult and therefore heuristics are used. However, many inaccuracies are caused by the model used for the system, in which the heuristics are being tested. In this paper we present a tool for simulating the execution of the distributed application on a “real” computer network, and try to tell how the execution is influenced compared to the model. Keywords: task scheduling, DAG scheduling, simulation, network simulation. 1 Introduction Heterogeneous computation platforms have become very popular in the past decade. They are cheap and easy to construct and offer good computation power. Compared to parallel computers, distributed systems offer better price-to-power ratio. However, the prop- erties of distributed systems are different. Communi- cation is provided by a high-speed network which is still slow in comparison with the specialized networks used in parallel systems [15]. Mainly, communication leads to the need to modify traditional parallel algo- rithms into distributed algorithms [25, 14, 19]. Task scheduling is one of many approaches used for distributed algorithms. The idea is simple. Let us take an application: this application con- sists of several parts that may be executed indepen- dently. These part can then be computed on different computers concurrently and the application can be speeded up. Task scheduling tries to answer which parts should be computed on which computers and when, so that the computation time is minimized. The structure of the paper is as follows. In the next section we describe the problem of task schedul- ing itself, and show several approaches that are wide used for solving the problem. At the end of the next section we show the network-related problem of the simplified models that are used. Section 3 then de- scribes the simulation tool that we used for making measurements, and in section 4 we show some inter- esting results obtained from the simulations. 2 Task scheduling The application that is to be scheduled can be de- scribed as a Directed Acyclic Graph (DAG), i.e. AM = (V,E,B,C), where: V = {v1,v2, . . . ,vv}, |V| = v is the set of tasks, task vi ∈ V represents the piece of code that has to be executed sequentially on the same machine; E = {e1,e2, . . . ,ee}, |E| = e is the set of edges, edge ej = (vk,vl) represents data dependencies, i.e. task vl cannot start computation until the data from task vk have been received, task vk is called the parent of vl, vl is called the child of vk; B = {b1,b2, . . . ,bv}, |B| = v is the set of computa- tion costs (e.g. number of instructions), where bi ∈ B is the computation cost for task vi; C = {c1,c2, . . . ,ce}, |C| = e is the set of data depen- dency costs, where cj = ck,l is the data depen- dency cost (e.g. amount of data) corresponding to edge ej = (vk,vl). A task which has no parents or children is called an entry task or an exit task, respectively. If there is more than one entry/exit task in the graph a new virtual entry/exit task can be added to the graph. Such a task would have zero weight and would be connected by zero weight edges to the real entry/exit tasks. The application has to be scheduled on a hetero- geneous computation system (CS) which can be de- scribed as a general graph, but there is one very im- portant restriction. The graph represents a connec- tion structure, and even though there may be no di- rect connection of two computation nodes there must be an edge between all of the nodes that are able to communicate. This restriction leads to the ob- servation that the CS is always a complete graph. The computation system can then be described as CS = (P,Q,R,S), where: P = {p1,p2, . . . ,pp}, |P| = p is a set of the compu- tation nodes; 112 Acta Polytechnica Vol. 52 No. 5/2012 c b i i i,j j bj entry exit 1 2 3 4 5 6 7 8 9 10 00 10 3020 2514 26 36 37 40 25 36 57 88 49 11 48 58 51 69 0 0 00 0 Figure 1: Application model described as a DAG. Q = {q1,q2, . . . ,qp}, |Q| = p is the set of speeds of computation nodes, where qi is the speed of node pi; R is a matrix describing the communication costs, the size of R is p×p; S is a matrix used to describe the communication startup costs; it is usually one-dimensional, i.e. its size is p× 1. Scheduling is connected to the specific application and the specific CS. The computation time ti,j of task vi on a node pj can be calculated using equation ti,j = bi qi . (1) When thinking of static scheduling, matrix W can be used. W contains information on the computation times for all of the tasks on every node, i.e. the size of W is v×p. The scheduling algorithm has to take into account the communication delay. The duration of transfer of the edge ei from node p to node q is then defined as t(i)p,q = S[p] + ci · R[p][q]. (2) 2.1 Scheduling algorithms The problem of task scheduling is claimed to be NP- complete [21, 7] and therefore intensive research in heuristics has been done. The heuristics can be di- vided into several categories, the common criterion being knowledge of when the schedule is created. If p 1 p 2 p 3 p 5 p 8 p 76 p p 4 Figure 2: The computation system described as a complete graph. the schedule is computed before computation of the application begins, i.e. if the schedule is known a pri- ori, the heuristic is called static or offline. In con- trast, when the schedule is computed as a part of the computation of the application, the heuristic is called dynamic or online. Both static and dynamic algorithms have been pro- posed in the literature. For a heterogeneous com- putation platform, an example of a very well-known static algorithm is HEFT [20]. The main idea of HEFT is to order the tasks in a list and to assign the tasks which are ready to the computer which mini- mizes the execution time for the task. Another algo- rithm proposed in [20] is CPOP. This algorithm finds a critical path and minimizes the execution of tasks which are in the path. The quality of the schedule is then very dependent on how the critical path was created. CPOP has slightly worse computation com- plexity than HEFT, but scheduling quality results are close. The idea of creating a list of tasks ordered in a spe- cific manner is common for a whole group of schedul- ing heuristics. They are called list scheduling algo- rithms. Modifying HEFT in such a way that some tasks can be duplicated, we obtain the algorithm pre- sented in [8]. This algorithm can then be applied to a specific context of cluster-based computation sys- tems, and the results that it achieves are very good [11]. Many other algorithms have been published. Some were summarized in [2], and many other, which are focused on homogeneous computation system, are compared in [13]. Unlike static algorithms, dynamic algorithms are usually used to schedule more than one application at a time. However, there are some exceptions. For ex- ample, [24] schedules several applications in a static way and compares several attitudes that are permis- sible for this problem. A semi-dynamic attitude is 113 Acta Polytechnica Vol. 52 No. 5/2012 100 bps 100 bps 100 bps 100 bps R =   0 0.01 0.01 0.01 0.01 0 0.01 0.01 0.01 0.01 0 0.01 0.01 0.01 0.01 0   100 bps 100 bps 100 bps 100 bps 100 bps 100 bps Figure 3: A real network (left), its parameters, and the false representation from matrix R (right). described in [1], where the tasks are scheduled stati- cally, but there is a global application structure which contains all of the applications, and this global struc- ture changes when a new application arrives in the system. A completely different attitude to the dy- namic algorithm is presented in [17], where there is one central scheduling node and each computation node collects statistics of its own usage. These data are then sent to the central node, and the scheduling algorithm adds the tasks from the queue to the queue of tasks of specific nodes according to the predic- tion of node utilization computed from the statistics. A system in which there is more than one schedul- ing node, is described in [10]. The scheduling nodes are independent machines and therefore have no in- formation about schedules of others, so a statistical approach to node utilization is used. A two-level scheduling algorithm is described in [9]. The first level schedules the task to the specific “server”, which is the leader of a set of close nodes. The “server” then schedules the task to a specific node. A very simple dynamic algorithm was also proposed in [23]. The idea is that the nodes are not differentiated, i.e. a node can be both a “worker” and a “server”. The schedule is created in steps, and in each step several messages are sent that try to get information on the utilization of the neighbours of the node. 2.2 Weaknesses of the model The model of the application (AM ) describes the ap- plication in a very good way. However, this descrip- tion is limited and does not reflect reality in all terms that we could imagine. There is, for example, a hid- den prerequisite that all of the nodes know the code of the application and all the data that are needed as input for the application are available before the application is computed. Similarly, the output of the application is not targeted to a specific node, and the computation can finish on any node, which may be confusing in reality. The computation system CS is also simplified. All of the properties of the network are merged in the two matrices R and S. The values of the matrices do not take into account all of the possible properties of the network. In figure 3 the network contains one bottleneck. However, the properties of the network gained from independent measurements of the prop- erties of the link do not indicate this, and communi- cation may therefore be delayed against the plan of the schedule. However, this delay may or may not be critical for the subsequent computation and it is pur- pose of this paper to show how the communication delay may change the execution order of the schedule. 2.3 Related work The problem of evaluating the correctness of the generated schedules has been studied extensively. Several tools have been presented, all of which try to help researchers to validate their algorithms. There are two main attitudes to the problem. Testing the algorithms on real platforms, and simulating the ex- periments. The problem with using real systems for testing is that there are very limited possible system architectures due to the limited hardware resources. There are, however, some systems that are focused on this type of testing. Grid’5000 [4] and PlanetLab [6] are two examples of platforms available for appli- cation testing. The results provided by these tools are very reliable, but the scalability of the network is limited. Another very important problem tightly coupled with task scheduling is that the number of existing applications is limited. Since only generated structures of non-existing applications are tested, real systems cannot be used. The same problem emerges when we talk about emulation tools. Simulating experiments suits the task scheduling problem much better. This method is very widely used, though not all authors mention the system and the simulation method that they have used [16]. However, several systems became well known. Grid- Sim [3] is a simulation tool focused on modeling the resources of the nodes of CS. The network layer is modeled using a simple discrete event simulation, and starts at the level of the third layer of ISO/OSI. The ALEA framework [12], an extension of GridSim, pro- vides a tool for various GRID scheduling problems. 114 Acta Polytechnica Vol. 52 No. 5/2012 Figure 4: Topologies of four testing networks. Another well-known simulator is SimGrid [5]. Sim- Grid has transformed from a tool for scheduling ap- plications with a DAG structure to a system that is able to both simulate and emulate distributed algo- rithms. SimGrid uses a math model of the network, but the version 3 also introduces a hybrid system that allows the use of GTNetS [18] as a transport simulation tool. 3 Simulation tool The purpose of this paper is to show the influence of network parameters on schedule execution. Hence the simulation tool has to offer a realistic simu- lation of the network, and we decided to use the OMNeT++ simulation tool [22]. OMNeT++ aims primarily at network simulations and is used as a core for many projects. There are also many ex- tensions to OMNeT++, e.g. INet Framework is a set of OMNeT modules that simulate Internet de- vices. INet contains modules for both physical de- vices (routers, switches, hubs or access points for wireless networks) and protocols (TCP/IP protocol family, SCTP, OSPF or MPLS). Since we want to make the simulation of the network as realistic as possible, we chose OMNeT++ with INet Framework as a simulation core. OMNeT++ itself offers no support for scheduling. As mentioned above, the applications that we use for testing the scheduling algorithms are randomly gen- erated and have no real representation (i.e. code). The simulation of the execution of the schedule then consists only of sequences when the data is sent or re- ceived and when the nodes pretend to be working. In terms of simulation, they sleep for a specified amount R2 R1 10 GBit, 1000 m R3 100 MBit, 100 m R4 1 GBit, 10000 m Table 1: Speeds and distances used in network 3. R2 R1 100 MBit, 100 ms R3 10 MBit, 100 m Table 2: Speeds and distances in network 4. of time. This behavior is executed in the TaskExecu- tor module, which may be connected to two modules, the first for TCP communication and the second for UDP communication. The communication itself is then simulated by standard INet Framework. This involves packet collisions, routing, queuing of pack- ets, etc. 4 Simulation results We created four network topologies that were used for simulation. The topologies of the networks are shown in Figure 4, and the main properties of each of the networks are as follows. Network 1 contains 10 nodes connected by a 1 GBit ethernet link to the central switch; the cable length is 10 meters. Network 2 contains 2 groups of 10 nodes. The nodes are connected by 1 GBit ethernet to the router; the routers are connected by a 10 GBit point-to-point line with the delay corresponding to a distance of 10 km. Network 3 contains 4 groups of 5 nodes. The nodes are connected by a 1 GBit ethernet to the router, and the routers are connected as shown in the Table 1. Network 4 contains 1 group of 10 nodes and two groups of 5 nodes. The nodes are connected by 1 GBit ethernet to the router, and the routers are connected as shown in Table 2. The communication delays are specified in time units (ms) or in distance units (m) – the real delay is then computed by the following equation: delay = distance 0.64c . (3) Networks 1 to 4 were the computation systems for a set of 300 randomly generated applications. The method for generating them was copied from [20]. 115 Acta Polytechnica Vol. 52 No. 5/2012 -4.5e-06 -4e-06 -3.5e-06 -3e-06 -2.5e-06 -2e-06 -1.5e-06 -1e-06 -5e-07 0 0 2000 4000 6000 8000 10000 12000 14000 16000 T im e D if fe re n c e ( s ) Time of Computation (s) Time Differences for One Application on Network 1 TCP UDP Figure 5: Differences between the expected and real start of tasks of one application. Network 1 as an execution platform. A schedule for each network was created for each ap- plication. We used HEFT and CPOP as scheduling algorithms and TCP and UDP as the transport layer. In the end, we had a set of 4800 schedules. All of the schedules were simulated and the differences between the execution time of the scheduled task and the sim- ulated task were recorded. The results of the simulations showed that the dif- ferences between real execution time and expected execution time are not large. We had expected that these small differences would cumulate and grow, but the difference seems to be almost constant, see Fig- ures 5, 6 and 7. The structure of the network influences the exe- cution of the schedule. It may not be a coincidence that the time differences are usually in the order of the startup delay. For example, the time differences in Network 1 were very slow, the maximum being about 10−6 s, which is the same order as the startup delay for Network 1 stored in S. The differences in Network 4 were higher, 10−1 s, which is higher than the order of values in the startup delay matrix S. Nevertheless, the value is much smaller than the to- tal execution time of the computation. We have also shown that the average length of the schedule was 103 s and the average difference caused by the network transport was −10−6 to −10−1 s. The difference is really small compared to the schedule length. 5 Conclusion In this paper we have presented the problem of task scheduling. Since the problem is difficult, heuristics are used, and great progress has been achieved in this area. However, the models that are used as a standard input for most of the algorithms are only models, and may suffer from many simplifications. We have focused here on the computation system, especially the networking subsection. We have shown that there are several points that may lead to misun- derstandings about how the network may work. In order to show whether these points affect real world applications, we created a simulation tool which is able to simulate the execution of the schedule and the corresponding networking activity. The tool is based on OMNeT++, which is often used for network-based simulations. We created a set of randomly generated applica- tions, and schedules for them. We ran the simulations on four network topologies, and our results show that the communication caused some differences against the expected execution time. However, the differ- ences were very small. For this specific set of appli- 116 Acta Polytechnica Vol. 52 No. 5/2012 -0.4 -0.35 -0.3 -0.25 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0 5000 10000 15000 20000 25000 30000 35000 T im e D if fe re n c e ( s ) Time of Computation (s) Time Differences All Networks and Complex Applications Average Minimal Maximal Figure 6: Differences between the expected and real start of applications with more than 50 tasks across all platforms. cations and networks we may say that the differences are insignificant. We also mentioned the idea that the time differ- ences caused by the network, are in the order of startup delay. To make this idea bullet-proof we need to make more simulations on various types of networks and with various types of applications. Of course, the best way would be to execute real ap- plications on real networks. Real networks have more problems than only collisions or delays. There may be some other traffic, and the bandwidth may therefore change during the computation. These and many other problems are still to be solved. Acknowledgements The research reported in this paper has been sup- ported by the Ministry of Education, Youth and Sports of the Czech Republic under research pro- gram MSM 6840770014 and by the Grant Agency of the Czech Technical University in Prague, grant No. SGS11/158/OHK3/3T/13. References [1] J. Barbosa, B. Moreira. Dynamic job scheduling on heterogeneous clusters. In Parallel and Dis- tributed Computing, 2009. ISPDC ’09. Eighth International Symposium on, pp. 3 –10. 2009. [2] Tracy D. Braun, Howard Jay Siegel, Noah Beck, et al. A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems. Journal of Parallel and Distributed Computing 61(6):810 – 837, 2001. [3] Rajkumar Buyya, Manzur Murshed. Gridsim: a toolkit for the modeling and simulation of dis- tributed resource management and scheduling for grid computing. Concurrency and Computa- tion: Practice and Experience 14(13-15):1175– 1220, 2002. [4] F. Cappello, E. Caron, M. Dayde, et al. Grid’5000: A large scale and highly reconfig- urable grid experimental testbed. In Proceed- ings of the 6th IEEE/ACM International Work- shop on Grid Computing, GRID ’05, pp. 99–106. IEEE Computer Society, Washington, DC, USA, 2005. [5] Henri Casanova, Arnaud Legrand, Martin Quin- son. SimGrid: a Generic Framework for Large- Scale Distributed Experiments. In 10th IEEE 117 Acta Polytechnica Vol. 52 No. 5/2012 -7e-06 -6e-06 -5e-06 -4e-06 -3e-06 -2e-06 -1e-06 0 1e-06 0 5000 10000 15000 20000 25000 30000 T im e D if fe re n c e ( s ) Time of Computation (s) Time Differences for the Network 1 Average Minimal Maximal Figure 7: Average, minimal and maximal time difference between the expected and real start of all tasks of all applications executed on Network 1. International Conference on Computer Model- ing and Simulation. 2008. [6] Brent Chun, David Culler, Timothy Roscoe, et al. Planetlab: an overlay testbed for broad- coverage services. SIGCOMM Comput Commun Rev 33:3–12, 2003. [7] Michael R. Garey, David S. Johnson. Computers and Intractability: A Guide to the Theory of NP- Completeness. W. H. Freeman & Co., New York, NY, USA, 1979. [8] T. Hagras, J. Janecek. A high performance, low complexity algorithm for compile-time task scheduling in heterogeneous systems. Parallel Computing 31(7):653 – 670, 2005. Heteroge- neous Computing. [9] M A Iverson, F Özgüner. Hierarchical, compet- itive scheduling of multiple dags in a dynamic heterogeneous environment. Distributed Systems Engineering 6(3):112, 1999. [10] Michael Iverson, Fusun Ozguner. Dynamic, competitive scheduling of multiple dags in a dis- tributed heterogeneous environment. Heteroge- neous Computing Workshop 0:70, 1998. [11] Jan Janeček, Peter Macejko, Tarek Morad Go- maa Hagras. Task scheduling for clustered het- erogeneous systems. In M.H. Hamza (ed.), IASTED International Conference - Paral- lel and Distributed Computing and Networks (PDCN 2009), pp. 115–120. 2009. ISBN: 978- 0-88986-783-3, ISBN (CD): 978-0-88986-784-0. [12] Dalibor Klusáček, Hana Rudová. Alea 2 – job scheduling simulator. In Proceedings of the 3rd International ICST Conference on Simulation Tools and Techniques (SIMUTools 2010). ICST, 2010. [13] Yu kwong Kwok, Ishfaq Ahmad. Benchmark- ing the task graph scheduling algorithms. In In Proc. IPPS/SPDP, pp. 531–537. 1998. [14] Claudia Leopold. Parallel and Distributed Com- puting: A Survey of Models, Paradigms and Ap- proaches. John Wiley & Sons, Inc., New York, NY, USA, 2001. [15] Jiuxing Liu, B. Chandrasekaran, Jiesheng Wu, et al. Performance comparison of mpi implemen- tations over infiniband, myrinet and quadrics. In Supercomputing, 2003 ACM/IEEE Conference, p. 58. 2003. 118 Acta Polytechnica Vol. 52 No. 5/2012 [16] S. Naicken, A. Basu, B. Livingston, S. Rod- hetbhai. Towards yet another peer-to-peer sim- ulator. In Proceedings of The Fourth Inter- national Working Conference on Performance Modeling and Evaluation of Heterogeneous Net- works (HET-NETs), pp. 37/1–37/10. Ilkley, UK, 2006. [17] Xiao Qin, Hong Jiang. Dynamic, reliability- driven scheduling of parallel real-time jobs in heterogeneous systems. Parallel Processing, In- ternational Conference on 0:0113, 2001. [18] George F. Riley. Simulation of large scale net- works ii: large-scale network simulations with gtnets. In Proceedings of the 35th conference on Winter simulation: driving innovation, WSC ’03, pp. 676–684. Winter Simulation Conference, 2003. [19] Gerard Tel. Introduction to Distributed Algo- rithms. Cambridge University Press, New York, NY, USA, 2nd edn., 2001. [20] H. Topcuoglu, S. Hariri, M.Y. Wu. Performance- effective and low-complexity task scheduling for heterogeneous computing. IEEE Transactions on Parallel and Distributed Systems 13:260–274, 2002. [21] J.D. Ullman. Np-complete scheduling prob- lems. Journal of Computer and System Sciences 10(3):384 – 393, 1975. [22] András Varga, Rudolf Hornig. An overview of the omnet++ simulation environment. In Proceedings of the 1st international conference on Simulation tools and techniques for commu- nications, networks and systems & workshops, Simutools ’08, pp. 60:1–60:10. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), ICST, Brus- sels, Belgium, Belgium, 2008. [23] O. Votava, P. Macejko, J. Kubr, J. Janeček. Dynamic Local Scheduling of Multiple DAGs in a Distributed Heterogeneous Systems. In Proceedings of the 2011 International Confer- ence on Telecommunication Systems Manage- ment, pp. 171–178. American Telecommunica- tions Systems Management Association Inc., Dallas, TX, 2011. [24] Henan Zhao, R. Sakellariou. Scheduling mul- tiple dags onto heterogeneous systems. Parallel and Distributed Processing Symposium, Interna- tional 0:130, 2006. [25] Albert Y. H. Zomaya (ed.). Parallel and dis- tributed computing handbook. McGraw-Hill, Inc., New York, NY, USA, 1996. 119