ijcccv4n3Draft.pdf Int. J. of Computers, Communications & Control, ISSN 1841-9836, E-ISSN 1841-9844 Vol. IV (2009), No. 3, pp. 283-290 On Parallel Graph Rewriting Systems Dragoş Sburlan Ovidius University of Constantza, Romania Faculty of Mathematics and Informatics 124 Mamaia Blvd., 900527 Constantza E-mail: dsburlan@univ-ovidius.ro Received: April 5, 2009 Accepted: May 30, 2009 Abstract: In this paper we introduce a new theoretical paradigm, called PGR systems, which can be used to model in a discrete manner some natural phenomena occurring in-vivo/in-vitro en- vironments. PGR systems make use of graphs to describe the spatial structure of space of individuals, while the system dynamics caused by the movement/interaction of individuals is captured by the parallel applications of some graph rewriting rules. In this frame, an il- lustrative example is studied and based on it, an eloquent comparison between the abstract rewriting machines and PGR systems is done. Several further ideas to overcome the global computational effort needed for simulations, but still maintaining the overall ability for mod- eling are finally proposed. Keywords: parallel multiset processing, abstract rewriting systems, P systems 1 Introduction Nowadays an increasing interest regards the study of the development of biological systems in which more species of individuals interact (usually to perform a certain global task). Research ranging from completely dif- ferent areas like the study of metapopulations (the study of groups of spatially separated populations of the same species that live in fragmented habitats and interact at some level) and HIV infections was done in essentially the same manner. Traditionally, such studies were done by employing continuous models where (partial) differential equations were used to capture the dynamics of these systems. Currently, the usage of discrete models where the system dynamics is captured from the collective actions of individual entities has been shown to be a promising choice (see [4], [6], [7], [9]). This is based on the fact that living organisms are spatially discrete and the individuals occupy particular localities at a given time. The interactions between individuals are strongly connected with their neighborhood relations. While characterizing these facts a basic issue regards the way the space is represented. Simple models that involve no detailed spatial structure are in general analytically easily solvable. However, as the complexity of the reaction-diffusion dynamics grows, the models based on partial differential equations become intractable to be analyzed. On the other hand, integrating within the model a detailed spatial structure (as cellular automata models do, for instance) the setback comes in general from the impossibility to analyze the models except only by performing simulations. Although such models have much greater biological reality, they suffer from the difficulty of gen- eralization (hence of finding the exact behavior). This is especially important while formulating some practical testable predictions regarding a given model. P systems are formal computing devices that were initially inspired and abstracted from the cell functioning (see [10]). In general, P systems make use of multisets to represent the computational support. These multisets are placed inside the membranes which in their turn are disposed in some hierarchical tree structure. The (maximal) parallel applications of some multiset rewriting rules (particular to each membrane) were used to process the multisets. Although these formal systems were extensively studied with respect to their computational power and ef- ficiency, while representing some biological processes many difficulties arise. Representing the data support as multisets essentially simplifies the structure of the environment and of the individuals interaction (the neighboring relations between the individuals are completely ignored), the focus being over the system dynamics. However, in Copyright c© 2006-2009 by CCC Publications 284 Dragoş Sburlan this case, two main assumptions are considered: the environment is homogeneous so that the concentration of the individuals do not change with respect to space and the number of individuals of each species in the environment is “adequately" large (hence the concentration of the individuals might be assumed to vary continuously over the time). Moreover, the rules that describe the interactions between the individuals are assumed to be executed in a maximal parallel manner and governed by a global clock that marks equal steps. Even if all these simplifications are useful while defining a computing formal framework, they are questionable if the aim is to model and simulate actual biological systems. This is way many new features that are meaningful to biologists were added to the original paradigm in order to extend its functionality and versatility for modeling. In order to cope with these issues, probabilistic/stochastic P systems were introduced (see [4], [14], [2]). In general, the main idea was to associate to the rules some weights describing how they should be applied at a given moment. For a particular rule, the weight gives the susceptibility of its execution at certain instant. Hence, employing this principle to all interaction rules it sets up more realistic bounds of the nondeterministic application of the rules. The ultimate goal of this approach is to integrate the structural and dynamical characteristics of a real bio-system into the way the rules of the model are selected to be applied and executed step by step (preserving at the same time the unstructured computing support). Although this method has in general good simulation time complexity it is inadequate if the interacting species are poorly represented, when there exist many “inactive" individuals (that are not the subject of any rule) with respect to the entire population of individuals, or when the environment is not homogeneous. 2 Preliminaries We assume the reader familiar with the basic notions of P systems (one can consult [10] for more details), so that here we only recall some notions regarding the abstract rewriting systems on multisets. ARM systems represent a variant of P systems which was proposed in order to perform simulations of some bio-chemical processes. Later on, due to its modeling flexibility, it was used to study some symbiotic mechanism of an ecological system and even for proposing a novel theory of evolution. ARMS is a stochastic model that uses multisets to represent the bio-chemical support. Multiset rewriting rules are used to describe the bio-chemical reactions. As opposed to the classical definition of P systems where the rules are applied in a nondeterministic, maximal parallel manner and with competition on the objects composing the multisets, in ARMS the rules obey the Mass Action Law where the frequency of a reaction follows the concentra- tion of bio-chemicals and a rate constant. Consequently, the rules to be applied are chosen probabilistically from the rules set and each probability is given by the ratio of the total number of colliding chemicals of a reaction to the sum of the total number of colliding chemicals of every reaction in the rule; the applications of the rules remain parallel and with competition on the objects. More formally, an ARM system is a construct Π = (O, w, R) where O is the alphabet of objects, w represents the multiset of objects at the beginning of computation, and R is a set of multiset rewriting rules of type u k → v, where u ∈ O+, v ∈ O∗, and k ∈ R is the rate constant of the rule. For example, in case of a cooperative rule of type ri : aA + bB → cC + dD, where a, b, c, d ∈ IN, A, B,C, D ∈ O, and a given multiset of objects M ∈ O∗, the probability of rule execution is defined as Prob(ri) = kiM a A Mb B R , where ki is the rate constant (determined experimentally), MA and MB are the multiplicities of objects A and B in M, and R is a coefficient for normalizing the probabilities (i.e., ∑ i Prob(ri) = ). In a straightforward way probabilities can be defined for any type of rules. The system Π starts to evolve from the initial configuration (represented by w) by applying the rules in parallel, randomly selecting the rules but according to the probabilities computed as above. Π is governed by a universal clock that marks equal time units. A simple example, meaningful for our work, is presented bellow. We ran various tests using an ARM system Π where O = {A, B,C, D, X , F}, and the set of rules R is given below: The initial configuration of Π was w = ABCD and in our tests we used several values for ki,  ≤ i ≤ . The system attempts to simulate the behavior of some interacting individuals, represented here as the objects A, B, C, and D, sharing the same environment. In addition, the individuals corresponding to the objects C and D (which are much less than the individuals corresponding to the objects A and B) share a localized patch in the environment. Thus, we assumed the environment not to be homogeneous and we aimed to test the ARM system ability to simulate such conditions. On Parallel Graph Rewriting Systems 285 If at least once the objects C and D interact (i.e., the rule CD k → F is applied) they will produce an object F which will trigger the conversion of all existing objects in the multiset into F (the rules r, r, and r). The rest of the rules (r till r) are used to slow down the rewriting rate of objects A, B, C, D, and X . r : AB k→ X r : F k→ F r : AC k→ X r : A k → A r : BD k → X r : B k→ B r : CD k → F r : C k→ C r : F X k → F F r : D k→ D r : FA k→ F F r : X k → X r : F B k → F F Since we have assumed the existence in the environment of a patch of individuals corresponding to objects C and D, then we could make another further assumption. If the patch is "large enough” so that there exists at least two individuals C and D which can interact with each other but which cannot interact initially with the individuals A and B, then there exists a "significant” probability that the rule CD k → F is executed (assuming that k > k and k > k). While using multisets to represent the individuals in the environment we lose the structure, hence when simulating such systems we actually have to relay on the probabilities of the executions of the rules (which in their turn depend on some constants experimentally determined). In Figure 1, one can notice the different behaviors of the same system and they are related to the usage of such probabilities. The diagrams shown on the left hand side present a simulation when the rule CD k → F was executed at least once, while the charts on the right hand side present a simulation when the rule CD k → F was not executed at all. Although the model considered is very simple a similar situation might happen when representing some complex systems. Even more, such situations might emerge during the system evolution and sudden shifts in the behavior might arise from some minor changes in the circumstances (as it is in the presented charts); if this is the case, then it would make almost impossible the precise identification of the rate constants associated to the rules. Figure 1: Two runs of system Π . The results are presented on columns and they show the different behaviors of the same system when some minor changes in the circumstances happen. Besides all of these issues, if the number of objects in the model decreases under a certain limit, the usage of probabilities to specify the way the rules are applied becomes inadequate. 286 Dragoş Sburlan 3 PGR Systems Aiming to tackle the mentioned issues, in this section we introduce a new model for simulating bio-systems that are composed by interacting individuals of various species in a given environment. Denote by C the finite set of species in an environment represented here as a metric space (for simplicity, let R k, k ≥ , be the environment). Let V ⊆ L ×C be the finite set of labeled individuals in the environment (L denotes a finite set of labels that uniquely identify the individuals in the environment). In addition, let f : V → Rk, k ≥ , be a bijective mapping; for a node v = (n, l) ∈ V , the value h(v) denotes the position of the individual v in the environment. In addition let r ∈ R, r > , be a positive constant. Based on the above definitions one can represent the environment and the individuals from within as a graph G = (V, E) where V = V and the set of edges is constructed as follows: for two nodes v, v ∈ V , if h(v) belongs to the open ball centered in h(v) and with radius r (i.e., h(v) ∈ B(h(v), r)), then there exists an edge from v to v. For simplicity we assume that G is connected, that is, for any two nodes m, n ∈ V there exists a sequence m = v, v, . . . , vt = n ∈ V such that h(vi) ∈ B(h(vi−), r), for  ≤ i ≤ t. For example, in Figure 2 it is presented a set of 4 individuals which initially lay on an environment represented as R and the way the corresponding labeled graph is constructed. Figure 2: The construction of the labeled graph representing the initial computational support Motivated by these facts we introduce the following model. A parallel graph rewriting system (in short, a PGR system) is a construct Γ = (C, G, R) where: • C = {c, . . . , ck} is a finite set of symbols; • G = (V, E) is the initial global graph – a connected graph such that V ⊆ L ×C is a set of labeled nodes and E ⊆ V ×V is a set of edges between nodes from V; • R is a finite set of graph rewriting rules. A graph rewriting rule r ∈ R is of the following type: r = (G = (V, E), G = (V, E)), where Vi ⊆ Li ×C, Ei ⊆ Vi ×Vi, i ∈ {, }. The graphs G and G are connected graphs; G represents the neigh- boring relations between the individuals that are required for an interaction to take place and G represents the output of an actual interaction between the individuals represented in G. In addition we will assume that G and G are not arbitrary graphs, but rather they obey some physical constraints: any node from G and G cannot be the subject of more than a constant tr ∈ IN edges – a condition that assumes the nonexistence of more than tr individuals in an open ball of radius r. A graph rewriting rule r = (G, G) ∈ R can be applied on a graph G if G is label isomorphic with one subgraph Gs = (Vs, Es) of G, that is, there exists a bijective mapping h : V → Vs such that h((m, c)) = (n, c) and h−((n, c)) = (m, c), where (m, c) ∈ V, (n, c) ∈ Vs and such that any two nodes u, v ∈ Vs are adjacent in Gs if and only if h(u) and h(v) are adjacent in G (see Figure 3). In other words, a graph rewriting rule r can be applied on G iff the left-hand side rule’s graph is "contained" in G both as layout and as corresponding node labels (via an edge/label-preserving bijection). On Parallel Graph Rewriting Systems 287 Figure 3: A graph G = (V, E) denoting the computing support and a graph rewriting rule r = (G, G). The sites where the rule r can be applied in G are explicitly figured. If Gs = (Vs = {(n, B), (n, A)}, Es = {((n, B), (n, A))}) then G is label isomorphic with Gs. The neighborhood set of degree k =  of Gs is B = {(n, A), (n,C), (n,C), (n, B)}. The following steps are accomplished when a rule r is applied over G: • eliminate Gs from G (all the nodes from Vs are eliminated from V ; all the edges of the type (v, vs), v ∈ V , vs ∈ Vs are deleted from E); • add G to G (some relabeling of the nodes from G is required in order to avoid duplicates of nodes at multiple application of r). All the (relabeled) nodes and edges of G are added to G; • add a set of edges from some nodes of V to some nodes of V \ (Vs ∪ V). The edges are established as described below. For the graph Gs let us define the neighborhood set of degree k Bk = {v ∈ V \ Vs | there exists a path of length less or equal with k from v to a node u ∈ Vs} As we mentioned above, the output of the application of a rule consists of new individuals that, by hypothesis, at the moment of their apparition it is assumed to belong to the same vicinity. How big is that vicinity and how the new individuals are related to the rest depend on many factors among which we just mention the type of the rule and the environment. Consequently, in our framework, the set Bk is useful when defining the new neighborhood relations triggered by the application of a rule. By some straightforward physical arguments, the output graph G of the rule r is likely to be "connected" to G via the nodes from Bk. However, for simplicity, we will consider the neighborhood set of degree  in our simulations 1. Let E = {(n, n) ∈ E | n ∈ B, n ∈ Vs}. Then, a number equals with card(E) of random edges from the nodes of G to the nodes from B are added to G but such that any node considered is not the subject of more than tr ∈ IN + edges. Starting from the initial configuration (the initial global graph G), the system evolves according to the rules from R and the current labeled graph in a non-deterministic parallel manner (but not necessarily maximal). The labeled graph of Γ at any given moment constitutes the configuration of the system at that moment. For two configurations GA and GB we can define a transition from GA to GB if we can pass from GA to GB by applying rules from R. The problem of determining whether two graphs are isomorphic is referred to as the graph isomorphism problem. Although this problem belong to NP it is neither known to be solvable in polynomial time nor it is NP- complete. A generalization of this problem (that is used in our formalism) is the subgraph isomorphism problem which is NP-complete; hence the known deterministic algorithms for this problem are exponential. 1If the studied individuals are particles that perform the Brownian motion, then at each application of a rule a random positive integer k can be generated and correspondingly a new neighborhood set can be defined; hence, the set Bk might be also useful to describe the random particle movement in the environment. 288 Dragoş Sburlan Remark 3.1. There is a physical motivation to assume that after applying a rule of the system, the newly produced objects (that correspond to the output nodes of the rule) belong to the same vicinity, hence the right hand side graph of any rule should be complete. Remark 3.2. For a given PGR system, as much as the radius r grows (hence the number of edges in the initial global graph is close to n(n−)  where n is the number of the nodes, that is, the initial global graph is "almost” complete) and the degrees of the neighboring sets grow as well, the result of a simulation is as “similar" to the one obtained using parallel multiset rewriting. This is because multisets can be seen in our formalism as complete graphs, hence any individual in the system is in a neighboring relation with any other individual (hence, they can interact if proper rules exist). 4 PGRS Simulator and a Test Case The simulator implements the model introduced in Section 3. Its main characteristics regard the definition of the rules set by using an XML file, and the possibility to save/load intermediate configurations. The simulator is written in Java language hence it benefits of cross-platform compatibility, parallelism, and possibility to distribute the computational effort. The task that has the most computational resource consumption is the subgraph isomorphism problem which is addressed whenever a rule r = (G, G) is selected for application and the set S of all the subgraphs of the global graph that are isomorphic with G has to be determined. Even more, whenever a subgraph G ∈ S is selected to be rewritten by r, a run through all the elements of S has to be performed in order to eliminate those subgraphs that have some nodes from G (a task useful when multiple applications of the same rule are performed). Considering all these matters for all the rules from the rule set and a relatively small global graph, the overall time complexity for simulating just one computational step is exponential, hence in general unfeasible. Nevertheless, if the left hand side graphs of the rules from the rule set are very simple (i.e., less than 4 nodes) and the global graph contains at most hundreds of nodes, the simulation is viable. Moreover, taking into account that the problem can be easily parallelized one can divide the problem into smaller instances and distribute them over a network. Let us consider the following PGR system Γ = (C, G, R) where • C = {A, B,C, D, F, X }, • R = {r, r, r, r, r, r, r, r} is defined as follows: r : (n, A) (n, B) (n, X ) y yy r : (n, A) (n,C) (n, X ) y yy r : (n, B) (n, D) (n, X ) y yy r : (n,C) (n, D) (n, F) y yy r : (n, F) (n, X ) (n, F) (n, F) y yy y r : (n, F) (n, A) (n, F) (n, F) y yy y r : (n, F) (n, B) (n, F) (n, F) y yy y r : (n, F) (n, F) yy - - - - - - - - In our tests, the initial global graph G was build to obey some properties. First of all, a random graph G ′  was generated and this graph contains 500 nodes labeled only with A and B (for each test case, the apparition of these On Parallel Graph Rewriting Systems 289 Figure 4: The results of 100 simulations of different GPR systems but having the same properties. The minimal and maximal obtained values are explicitly marked. labels are equally probable) and 2000 edges. A second graph G′′ was generated and this graph contains 10 nodes labeled only with C and D (also in this case, the apparition of these labels are equally probable) and 30 edges. Finally, G′ and G ′′  were merged together in order to form G by connecting 10 randomly chosen nodes from G ′  with 10 randomly chosen nodes from G′′ . We ran the simulator for 100 times, considering for each run a new initial global graph generated as above. In Figure 4 there are represented the minimal and the maximal values at each step of the simulation for the objects A, B, C, and D. Any particular simulation graphic from our test case lay between the boundaries established. 5 Conclusions Simulations performed using PGR systems in some cases give more accurate answers than ARMS simula- tions because they explicitly use the spatial distribution of individuals (hence the neighborhood relations can be extensively expressed). However the price to pay while using PGR systems regards the computational effort which in their case is exponential as time complexity. Nevertheless, for some cases when the number of interacting in- dividuals in the environment is small and they are not dense, the PGR systems might be useful for performing simulations. In order to handle these issues, a hybrid system combining features from the ARM and PGR systems might be proposed. Two directions could be taking into account: • one can use alternatively an ARMS-type simulation whenever the number of individuals from all the species is large and a PGRS-type simulation whenever the number of individuals from certain species goes below some threshold; in this case the newly obtained system uses in a more careful manner the probabilities for the rules executions. • one can use in parallel an ARMS-type simulation over a multiset of many individuals and a PGRS-type simulation on relatively small instances of graphs. Then one can consider a time sequence and at each moment in the sequence one can merge the ARMS configuration with the multiset of labels of the nodes from the graph (or one can exchange some data between these simulations). In this way, the newly obtained hybrid systems become more robust against some unexpected changes in the behavior (which might be triggered by some minor changes). Acknowledgments The author gratefully acknowledges the support by CNCSIS-IDIEI grant, Romanian Ministry of Education, Research and Innovation, No. 804/2008. Bibliography [1] D. Besozzi, G. Mauri, D. Pescini, C. Zandron, Dynamical Probabilistic P Systems. Int. J. Found. Comput. Sci., 17, 1 (2006), pp. 183–204. [2] D. Besozzi, P. Cazzaniga, G. Mauri, D. Pescini, Modelling Metapopulations with Stochastic Membrane Systems, BioSystems, 91 (2008), pp. 499–514. 290 Dragoş Sburlan [3] B. Bollobas, Modern Graph Theory, Springer, 1991. [4] M. Cavaliere, I.I. Ardelean, Modeling Biological Processes by Using a Probabilistic P Sys- tem Software, Natural Computing, 2, 2 (2003), pp. 173–197. [5] H. Ehrig, Introduction to the Algebraic Theory of Graph Grammars, Lecture Notes in Com- puter Science 73 (1979), pp. 1–69. [6] P. Frisco, The Conformon-P System: A Molecular and Cell Biology-Inspired Computability Model, Theoretical Computer Science, 312, 2-3 (2004), pp. 295–319. [7] P. Frisco, R.T. Gibson, A Simulator and an Evolution Program for Conformon-P Systems, Proc. of the th Int. Symp. on Symbolic and Numeric Algorithms for Scientific Computing, 2005, pp. 427–430. [8] D.T. Gillespie, Exact Simulation of Coupled Chemical Reactions, J. Physical Chemistry, 81 (1977), pp. 2340–2361. [9] V. Manca, L. Bianco, Biological Networks in Metabolic P Systems. Biosystems, 91, 3 (2008), pp. 489–498. [10] G. Păun, Membrane Computing. An Introduction, Springer, Berlin, 2002. [11] D. Sburlan, Parallel Graph Rewriting Systems, Proc. of the th BWMC, Seville, Spain, 2009, in print. [12] Y. Suzuki, H. Tanaka, S. Tsumoto, Analysis of Cycles in Symbolic Chemical System Based on Abstract Rewriting System on Multisets, Proceedings of International Conference on Artificial Life 5 (Alife 5), 1996, pp. 482–489. [13] Y. Suzuki, J. Takabayashi, H. Tanaka, Investigation of Tritrophic System in Ecological Systems by Using an Artificial Chemistry, J. Artif. Life Robot., 6 (2002), pp. 129–132. [14] Y. Suzuki, H. Tanaka, Modelling p53 Signaling Pathways by Using Multiset Processing, Applications of Membrane Computing (G. Ciobanu, G. Păun, M. Pérez-Jiménez, Eds.), Springer, Berlin, 2006, pp. 203–214. Dragoş Sburlan graduated the Faculty of Mathematics and Informatics, the Master Pro- gram Computational Mathematics and Modern Computer Science Technologies at the Ovidius University of Constanta, Romania. He defended his European PhD in computer science at the University of Seville, Spain. He followed several research stages (Leiden Institute of Advanced Computer Science, Holland and STAKI, Budapest, Hungry) and he is involved in several na- tional and international research projects. Currently, he is Senior Lecturer at the Faculty of Mathematics and Informatics, Ovidius University of Constantza. His research interests include formal languages, theory of computing, natural computing, and software engineering.