Int. J. of Computers, Communications & Control, ISSN 1841-9836, E-ISSN 1841-9844 Vol. IV (2009), No. 4, pp. 363-373 Fuzzy Logic in Genetic Regulatory Network Models C. Muñoz, F. Vargas, J. Bustos, M. Curilem, S. Salvo, H. Miranda Carlos Muñoz Poblete, Francisco Vargas Parra, Jaime Bustos Gomez Millaray Curilem Saldias, Sonia Salvo Garrido, Horacio Miranda Vargas University of La Frontera Avenue Francisco Salazar 01145, Temuco, Chile E-mail: comunoz@ufro.cl Abstract: Interactions between genes and the proteins they synthesize shape genetic regulatory networks (GRN). Several models have been proposed to describe these interactions, been the most commonly used those based on ordinary differential equations (ODEs). Some approximations using piecewise linear differential equa- tions (PLDEs), have been proposed to simplify the model non linearities. However they not allways give good results. In this context, it has been developed a model capable of representing small GRN, combining characteristics from the ODE’s models and fuzzy inference systems (FIS). The FIS is trained through an artificial neural network, which forms an Adaptive Nertwork-based Fuzzy Inference System (ANFIS). This network allows to adapt the membership and output functions from the FIS according to the training data, thus, reducing the previous knowledge needed to model the specific phenomenon. In addition, Fuzzy Logic allows to express their rules through linguistic labels, which also allows to incorporate expert knowledge in a friendly way. The proposed model has been used to describe the Lac Operon in E. Coli and it has been compared with the models already mentioned. The outcome errors due to the training process of the ANFIS network are comparable with those of the models based on ODEs. Additionally, the fuzzy logic approach provides modeling flexibility and knowledge acquisition advantages. Keywords: Genetic Regulatory Network, Fuzzy Logic, ANFIS, Differential Equa- tions, Lac Operon 1 Introduction Factors in charge of regulating the expression of a gene can be both environmental (such as produced by other factors genes) or those produced by other genes or even the same gene under regulation. The latter is the basis for understanding the so-called Genetic Regulatory Networks (GRN), since they are real networks of interaction between genes in a cell. Being able to accurately predict these interactions using mathematical and / or computational models would benefit a wide variety of applications such as medicine or agriculture. There have been many approaches to model these networks, such as Bayesian Networks, Boolean Networks, models based on Ordinary Differential Equations, Piecewise Linear Models, Stochastic Mod- els and others [13], [14], [13], [17]. From them, we can to highlight the models based on Ordinary Differential Equations (ODEs) since they describe biological phenomena in large detail, being used pri- marily for modeling small regulatory networks. A disadvantage of these models is the large number of parameters (to be known a priori) acceptable for a biological description, requiring an exhaustive study Copyright c© 2006-2009 by CCC Publications 364 C. Muñoz, F. Vargas, J. Bustos, M. Curilem, S. Salvo, H. Miranda of the available literature to specify the parameters and/or designing experiments to estimate each of the parameters as required. It is now possible, using the necessary experimental data, to use optimization tools or artificial intelligence to solve this problem [2], [25]. An approach to the ODE-based models is based on a Piecewise Linear Differential Equations (PLDEs) [3], [5], [7]. On the other hand, artificial intelligence techniques, including fuzzy logic, have been incorporated primarily to the classification and analysis of data obtained through Microarrays [8], [11], [21], [15], [23]. Moreover, the techniques of fuzzy logic have also been considered for describing biological systems of which some a priori knowl- edge exists [12], [18], [19]. In this context, it is necessary to propose models that attempt to get good predictions, reducing the need for prior knowledge. We must also consider that these models should have the ability to easily incorporate the knowledge of experts in the field of genomics, as well as experimental information thus complementing previous work with the new developments. Similarly, the steady states of the model should be analyzed in order to ensure a biologically acceptable description. In this work we propose the development of a model that integrates the Fuzzy Inference with Differential Equations. We have chosen the differential equations because they constitute a model that describes with sufficient fidelity the regulatory processes; we also incorporate fuzzy logic because they have the ability to work with non- linear systems. The proposed model can reduce the need for prior knowledge of the phenomenon due to the training of the network ANFIS, transforming it into a grey box model [1], combining differential equations, and network training. In addition, the proposed model makes it possible to express their Fuzzy Rules across linguistic labels, which gives the ability to incorporate expert knowledge in a friendly language. 1.1 Biological Background A gene is active when it is able to synthesize one or more (depending on the body) types of proteins, which can play a regulatory role in the expression of the same or other genes, and also can catalyze chemical reactions within a cell. Therefore the function of a cell in an organism depends on the genes that are active, or in other words, it depends on the expression of its genes. The process of synthesis of a protein consists of 2 phases, the first of them is known as transcription. At this stage the segment of DNA that contains the information of the gene is transcribed into a string of messenger RNA (mRNA) through the enzyme called RNA polymerase. The action of this enzyme is regulated by a series of molecules called transcription factors (TF), which use certain areas of DNA, called zones cis-regulatory that are specific to this end. Then, when we have the chain of mRNA with the information on the protein synthesis it comes a second stage called Translation. Here an internal organelle called a ribosome reads the information chain mRNA and, together with the transfer RNA (tRNAs), it links the amino acids needed to form the protein that indicates the information of the gene. This protein may regulate the expression of the same or other genes, and can also participate in metabolic processes of the cell. 2 Materials and Methods This section defines the model based on a fuzzy inference system (FIS), presenting also the charac- teristics of the first models based on ODEs and its piecewise linear approximations in order to compare all these models in a real system. 2.1 ODEs based Models These models are based on a series of ordinary differential equations that relate mRNA molecules with proteins that they synthesize, the action of other molecules present in the regulation can also be Fuzzy Logic in Genetic Regulatory Network Models 365 incorporated. Usually this kind of differential equations presents the form: dxi dt = ∑ j αi j fi j(x j) − γixi (1) Where xi represents a molecule produced in the process, ai j is the production rate of the molecule i due to the molecule j, γi is the degradation rate of the molecule i, and fi j is a function that determines the interaction of the molecule x j with the molecule xi, which is called regulation function. This is a non- linear function, which provides for realism from a biological point of view. It is generally defined as a function of sigmoidal type, commonly the Hill function [3], [5]: fi j = h + i j (x j, θi j, mi j) = x mi j j θ mi ji j + x mi j j (2) This equation shows that for values x j well over the threshold, θi j, the function tends to a value of 1, whereas when x j tends to values below the threshold, θi j, the function is close to the value 0, as seen in Figure 1.a. The speed with which the function passes from the value 0 to 1 (while x j varies) depends on the slope at the point threshold. This slope changes depending on the value of mi j. 2.2 PLDE-Step Models Due to the nonlinear nature of the ODEs-based models, piecewise linear approximations have emerged that attempt to simplify the ODEs-based model to a set of linear models. Such models are based on a Piecewise Linear Differential Equations (PLDEs). The number of potential resulting linear models de- pends on the amount of regulation functions to approximate and the amount of linear segments on each approximation. A widely used approach approximates the regulation function, fi j, to only 2 cases: fi j = si j(x j, θi j) = { , if x j > θi j , if x j ≤ θi j (3) The regulation function is then approximated to a step function [1], so this model has been named Piece- wise Linear Differential Equations-Step (PLDE-Step) [20]. In this case, the value of the threshold, θi j, is the only parameter to estimate for each regulation function. The curve is shown in Figure 1.b. 2.3 PLDE-Logoid Models In addition to the step function, there are other features to approximate a nonlinear model to a piece- wise linear. Thus, we find in the literature approaches that use the ramp function as part of the linear segments of the model [5]. This approximation of the regulation function is also known as a logoid func- tion [3] calling such models as based on Piecewise Linear Differential Equations-Logoid(PLDE-Logoid) [20]. In this case, the curve takes the form shown in Figure 1.c, and the regulation function is defined as: fi j = li j(x j, θi j, δi j) =    , if x j > θi j + δi j   δi j (x j − θi j) +  , if θi j − δi j  < x j ≤ θi j + δi j , if x j ≤ θi j − δi j (4) Where the new parameter, δi j, corresponds to the piece at which the function moves from 0 to the value set to 1, corresponding to the inverse of the ramp function in that segment. As shown in (4), there are 3 possible cases for every regulation function, which increase the number of potential linear differential equations to solve compared to the model PLDE-Step. However this also increases the accuracy of the approximation. 366 C. Muñoz, F. Vargas, J. Bustos, M. Curilem, S. Salvo, H. Miranda Figure 1: Different regulation function: a) the Hill function, b) Step function and c) Logoid function. 2.4 Proposed Model In this case each regulation function is approximated by a Fuzzy Inference System (FIS), of the Takagi-Sugeno type. This FIS is capable of representing the nonlinear behavior of the regulation function making it possible to define linguistic labels to determine the concentrations of molecules. Moreover, one can assume the production rates as unknown and include their action within the FIS, which diminishes the prior knowledge required for modeling. Thus, the differential equations take the form: dxi dt = ∑ j fisi j(x j) − γixi (5) By comparing (5) to (1) we observe that it has been replaced αi j fi j by the fuzzy inference system f isi j, i.e. not only approaching the regulation function but also including the production rates. For the design of the FIS it should mainly be considered the characteristics of the membership func- tions, the number of them, the fuzzy rules, and the output functions. The latter implies that when design- ing the fuzzy inference system a good knowledge of the phenomenon is required, including aspects such as the ranges of concentrations, the fuzzy rules, and so on. That is why we use a training network called ANFIS [16] to get a fuzzy system that approximates the ODEs-based model. This network is trained with experimental data and is capable of adapting, using a hybrid learning algorithm, the characteristics of membership functions and output functions so to reduce the error between the experimental data and data generated by the FIS. Figure 2, shows a fuzzy inference system with 2 inputs x and y, 2 membership functions for each input (A, A, B and B), 2 fuzzy rules which consequences are f  and f , and 1 output f . Among the most important factors to be considered for the training of the network are the type and number of membership functions, the maximum amount of training epochs and the error goal. We must also consider that the data for training must provide sufficient information to model the dynamics of the system. The training network allows complementing the phenomenological model with the information obtained from the experimental data. Consequently, the model is considered a hybrid model or grey box model. In this work the proposed model is called ODE-FIS model due to its characteristics. 2.5 Implementation in a real system All models are compared representing the Lac Operon in E. Coli. The Lac operon is a very well stud- ied process in the bacterium Escherichia coli [10], [22], [24], and although broadly it appears simple, in reality it can be modeled so detailed that you can include more than 100 biochemical reactions. In this paper we use the model of the lac operon shown in [24]. Fuzzy Logic in Genetic Regulatory Network Models 367 Figure 2: Structure of a ANFIS network: a) Takagi-Sugeno type Fuzzy Inference System, b) ANFIS representation of the system. The description of the process is as follows. The main source of carbon for the bacteria E. Coli is glucose. When glucose is not present in the environment cell, the bacterium is able to form glucose through lactose. For this, there is a regulatory mechanism that allows to synthesize the enzymes nec- essary to obtain glucose. This mechanism is called lac operon. This model describes the synthesis of glucose from lactose by the bacterium E Coli. This is due to the fact that in the absence of glucose, but in the presence of lactose, the bacteria activates the synthesis of β -galactosidase and Permeases. The β -galactosidase breaks down the lactose into alolactosa, glucose and galactose, being the Alolactosa the inducer in the operon regulation. Moreover permeases allow the passage of external Lactose towards the cell. An outline of the system as used in [7], is shown in Figure 3. The model presented in [24]is based on 5 nonlinear differential equations, where we can find 6 Hill functions. These equations describe the production of mRNA, β -galactosidase, alolactosa, lactose and permeases, also allowing to manipulate the external lactose and feed phosphate. Figure 3: Simplified outline of the lac operon in E. Coli As the model used is based on Hill functions, for comparison purposes it should be approximated by the models PLDE and ODE-FIS. In the case of PLDE-step model we use the same threshold values pre- sented by the ODEs-based model, and therefore it does not require a new design parameter. In the case of PLDE-Logoid model the thresholds do not vary, however we must design the value δi j, shown in (4). For doing this, the coefficient k must be calculated, such that δi j j = ki j, which in turn delivers the lowest value of steady state error with respect to the ODE models. In the case of the ODE-FIS model we train 6 regulation functions with data obtained from the model shown in [24], and which entries correspond to 368 C. Muñoz, F. Vargas, J. Bustos, M. Curilem, S. Salvo, H. Miranda states of the system. To cover a wide range of training conditions we reduce the external lactose concen- tration of 0.08 mM to 0 mM, through 4 negative steps, which in turn allows obtaining more information on the dynamics of the system. The structure of the model corresponds to 3 membership functions per entry, each one of the Gauss type, labeled Low, Medium and High referring to the concentration level of each entry. The training epochs are 80 and the error goal is 0. For comparison purposes we reproduce the 2 experiments shown in [24] using the 4 models described previously. In each case, both the expression of β -galactosidase and permeases are plotted. The first of these experiments consists on monitoring the changing states of the system in time for a given set of initial conditions, and where the external lactose level and the feeding phosphate rate are kept constant. In the second experiment we maintain a constant level of external lactose changing periodically the phos- phate feeding. For the comparison of β -galactosidase we also count with the experimental data [9] and [14] for the experiment 1 and data from [6]to experiment 2, which were provided by the authors of [24]. To compare, on a quantitative basis the approximations of models PLDEs and ODE-FIS, we have replicated experiment 1 and formulated a table with the steady state error (SSE) and the integrated square error (ISE) with respect to the ODEs-based model, considering a simulation time of 500. It was also implemented an experiment to assess whether the training of the network ANFIS was able to capture the main equilibrium points of the system. Thus, we performed a sweep sampling of the values of the external lactose and the initial conditions to see if the model ODE-FIS has the same equilibrium points as the ODEs model. All the experimental work is developed in the Matlab software, using primarily the Simulink, Fuzzy Logic and Optimization Toolboxes. 3 Results In the work of Yildirim and Mackey [24] the expression of β -galactosidase for two types of experi- ments is presented, being reproduced for all models. The standard profile of β -galactosidase, is shown in Figure 4 for the ODEs-based model, the PLDE-Step model, the PLDE-Logoid model, the ODE-FIS model, and the experimental data of the work of [9] and [14]. It is noted that all models evolve to the same steady state except PLDE-Step, which clearly does not represent adequately the profile of β - galactosidase. In addition to the β -galactosidase we present the dynamics of the permeases, standardized and shown in Figure 5. Once again the poor performance of the PLDE-Step model is repeated. We also stress the monitoring overshoot achieved by the ODE-FIS model. Figure 6 shows the normalized profile of β -galactosidase for all models, in addition to the experi- mental work of [6]. This figure shows the similarity between the experimental data, and the ODE and ODE-FIS models, but not PLDEs models (having a lower yield), particularly the PLDE-Step model. For the experiment 2, the permeases level is followed properly for the ODE-FIS model, as shown in Figure 7. In addition, the models PLDEs again show a poor performance. In order to determine which of the approaches better represents the model based on ODEs we obtain error rates by simulating the experiment 1. Table 1 shows the steady state error (SSE) and the integrated square error (ISE) for a time simulation of 500. Fuzzy Logic in Genetic Regulatory Network Models 369 Figure 4: Comparison of the dynamics of β -galactosidase for different models depending on the condi- tions of the experiment 1. Figure 5: Comparison of the dynamics of permeases for different models depending on the conditions of the experiment 1. 370 C. Muñoz, F. Vargas, J. Bustos, M. Curilem, S. Salvo, H. Miranda Figure 6: Comparison of the dynamics of β -galactosidase for different models depending on the condi- tions of the experiment 2. Figure 7: Comparison of the dynamics of permeases for different models depending on the conditions of the experiment 2. SSE ISE β -gal Permease β -gal Permease PLDE - Step 7.300 e-4 1.503 e-2 2.272 e-4 1.100 e-1 PLDE - Logoid 2.739 e-8 3.349 e-7 1.030 e-6 9.080 e-4 ODE - FIS 5.257 e-7 1.080 e-5 3.314 e-9 6.677 e-6 Table 1: Steady state error with respect to ODEs model Fuzzy Logic in Genetic Regulatory Network Models 371 To demonstrate the use of rules and linguistic labels we mention the case of a term describing the behavior of the alolactosa, which depends on 2 inputs; internal lactose (L) and β -galactosidase (B). Given that we define 3 fuzzy sets ( Low, Medium, High), there are 8 possible rules () that connect the 2 inputs. As an example we mention 3 rules that were obtained from the training process: R1-If L is High and B is Low then the influence of the term is null R2-If L is Medium and B is Low then the influence of the term is null R3-If L is Low and then B is Low then the influence of the term is null The label null is associated to the output function of the Takagi-Sugeno system, and corresponds to a value of 0. When analyzing these rules we observe that if the concentration of β -galactosidase is low, the associated term does not influence the production of alolactosa, which is consistent with reality. Due to the fact that these 3 rules depend mainly on the value of B, they can be edited and replaced by a single one having the form: R- then the influence of the term is null In addition to editing rules, you can edit the features of the membership functions. The toolbox of Matlab allows you to graphically edit the membership functions, without the need of a detailed mathe- matical knowledge of these functions. In addition, using Simulink you can see the level of activation of each rule and the degree of membership of the states in the fuzzy sets in the inference systems, all this while the simulation takes place. Figure 8: Equilibrium points of ODEs model and its comparison with the equilibrium points found for ODE-FIS model. With regards to the stability of the ODE-FIS model, Figure 8 shows the equilibrium points for Alo- lactosa as external lactose level, as shown in [24]. Additionally the figure also shows the steady states of the ODE-FIS model. Most of the equilibrium points of ODEs model are also equilibrium points of the ODE-FIS model. It must be noticed that for each value of the external lactose in the ODE-FIS model we do not see more than 1 equilibrium point. On the contrary, in the ODEs model we find up to 3 points. 372 C. Muñoz, F. Vargas, J. Bustos, M. Curilem, S. Salvo, H. Miranda 4 Summary and Conclusions The flexibility that delivers the fuzzy logic and the capacity of training provided by ANFIS allowed to represent the non-linear system behaviour with enough similarity to the obtained with ODE, showing a better performance than the PLDE models, even without describing the dynamic transient problems of interacting molecules. It should be mentioned that the PLDE models can be enhanced with optimization techniques and/or artificial intelligence tool to better design parameters θi j and δi j as needed. However this requires additional algorithms, not necessarily trivial. The ANFIS network allowed to obtain (from the training data) all the information needed to describe not only the transient state of the performed experiments, but also was able to achieve a large quantity of stable points of ODEs. This shows the great training capacity of the network, and the flexibility of the fuzzy inference system. The fuzzy logic has proved to be an important tool due to its ability to represent non-linear systems, its friendly language to express knowledge and the ability to incorporate and edit fuzzy rules. In addi- tion, complementing the fuzzy logic with an artificial neural network for training (ANFIS) turns to be a powerful tool for obtaining knowledge from experimental data, suggesting the development of new tech- niques based both on fuzzy logic as in the networks of training and differential equations. The natural step now is to work with larger regulatory networks, adressing the criteria for modeling them in such a way to obtain an acceptable representation of biological phenomena without compromising the viability of its computational implementation, while striving to maintain a simple and understandable language that allows systems analysis from a qualitative perspective. Bibliography [1] G. Acuña, E. Cubillos, Development of a Matlab Toolbox for the Design of Grey-Box Neural Mod- els, International Journal of Computers, Communications and Control, Vol. 1, pp. 7-14, 2006. [2] D. Akçay, Inference Of Switching Networks By Using A Piecewise Linear Formulation, Institute of Applied Mathematics, METU, MSc thesis, 2005 [3] F. A. Cubillos, G. Acuña and E. L. Lima, Real-Time Process Optimization Based on Grey-Box Neural Models, Brazilian Journal of Chemical Engineering, Vol. 24, pp. 433-443, 2007. [4] H. De Jong, Modeling and Simulation of Genetic Regulatory Systems: A Literature Review, Journal of Computacional Biology, Vol. 9, pp. 67-1003, 2002. [5] N. Friedman, M. Linial, I. Nachman, and D. Pe’er, Using Bayesian Networks to Analyze Expression Data, Journal of Computational Biology, Vol. 7, pp. 601-620, 2000. [6] J. Gebert, N. Radde, and G. Weber, Modeling gene regulatory networks with piecewise linear differ- ential equations, European Journal of Operational Research, in press, 2006. [7] B. C. Goodwin, Oscillatory behaviour in enzymatic control process, Adv. Enz. Regul. Vol. 3, pp. 425-438, 1969. [8] A. Halász, V. Kumar, M. Imielinski, C. Belta, O. Sokolsky, S. Pathak, and H. Rubin, Analysis of lactose metabolism in E.coli using reachability analysis of hybrid systems, IET Systems Biology, Vol. 1, pp. 130-148, 2007. [9] S. Kim, J. Kim, and C. Kwang-Hyun, Inferring gene regulatory networks from temporal expression profiles under time-delay and noise, Computational Biology and Chemistry, Vol. 31, pp. 239-245, 2007. Fuzzy Logic in Genetic Regulatory Network Models 373 [10] W. A. Knorre, Oscillation of the rate of synthesis of β -galactosidase in Escherichia Coli ML 30 and ML 308, Biochem. Biophys. Res. Commun, Vol. 31, pp. 812-817, 1968. [11] A. Kremling, K. Bettenbrock, B. Laube, K. Jahreis, W. Lengeler, and E. Gilles, The Organiza- tion of Metabolic Reaction Networks III. Application for Diauxic Growth on Glucose and Lactose, Metabolic Engineering, Vol 3, pp. 362-379, 2001. [12] R. Linden, and A. Bhaya, Evolving fuzzy rules to model gene expression, BioSystems, Vol. 88, pp. 76-91, 2007. [13] B. Lee, J. Yen, L. Yang, and J. Liao, Incorporating Qualitative Knowledge in enzyme kinetic Models Using Fuzzy Logic, Biotechnology and Bioengineering, Vol. 62, pp. 722-729, 1999. [14] H. McAdams, and L. Shapiro, Circuit Simulation of Genetic Networks, Science, Vol. 269, pp. 650-656, 1995. [15] S. Pestka, B. L. Daugherty, V. Jung, K. Hotta, and R. K. Pestka, Anti-mRNA: specific inihbition of translation of single mRNA molecules, Proc. Natl. Acad. Sci. USA, Vol. 81, pp. 7525-7528, 1984. [16] H. Ressom, P. Natarajan, R. S. Varghese, and M. T. Musavi, Applications of fuzzy logic in ge- nomics, Fuzzy Sets and Systems, Vol. 152, pp. 125-138, 2005. [17] J. Shing, and R. Jang, ANFIS: Adaptive-Network-Based Fuzzy Inference System, Trans. on Sys- tems, Man and Cybernetics, Vol. 23, pp. 665-685, 1993. [18] P. Smolen, D. Baxter, and J. Byrne, Modeling Transcriptional Control in Gene Networks - Methods, Recents Results, and Future Directions, Bulletin of Mathematical Biology, Vol. 62, pp. 247-292, 2000. [19] B. A. Sokhansanj, and J. P. Fitch, URC Fuzzy Modeling and Simulation of Gene Regulation, 23rd Annual Internacional Conference of the IEEE Engineering in Medicine and Biology, Instanbul, Turkey, 2001. [20] A. Tepeli, and A. Hortaçsu, A fuzzy logic approach for regulation in flux balance analysis, Bio- chemical Engineering Journal, in press, 2007. [21] F. Vargas, C. Muñoz, and J. Bustos, Fuzzy Logic in Gene Regulatory Networks Models, 1st In- ternacional CGNA Workshop, Utilizacion Of Novel And Sustainable Plant Products In Aqua-Feeds, Temuco, Chile, 2008. [22] S. Vinterbo, E. Y. Kim, and L. Ohno-Machado, Small, fuzzy and interpretable gene expression based classifiers, Bioinformatics, Vol. 21, pp. 1964-1970, 2005. [23] P. Wong, S. Gladney, and J. Keasling, Mathematical Model of the lac Operon: Inducer Exclusion, Catabolite Repression, and Diauxic Growth on Glucose and Lactose, Biotechnol. Prog, Vol. 13, pp. 132-143, 1997. [24] P. Woolf, and Y. Wang, A fuzzy logic approach to analyzing gene expression data, Physiol Ge- nomics, Vol. 3, pp. 9-15, 2000. [25] N. Yildirim, and M. Mackey, Feedback Regulation in the Lactose Operon: A Matematical Modeling Study and Comparison with Experimental Data, Biophysical Journal, Vol. 84, pp. 2841-2851, 2003. [26] X. Zhu, and J. Xu, Estimation of time varying parameters in nonlinear systems by using dynamic optimization, Industrial Electronics Society. IECON 2005. 31st Annual Conference of IEEE, 2005.