www.biomathforum.org/biomath/index.php/biomath ORIGINAL ARTICLE Non-monotonicity of Fano factor in a stochastic model for protein expression with sequesterisation at decoy binding sites Michal Hojcka, Pavol Bokes Department of Applied Mathematics and Statistics Comenius University Bratislava, Slovakia michal.hojcka@fmph.uniba.sk, pavol.bokes@fmph.uniba.sk Received: 11 July 2017, accepted: 21 October 2017, published: 2 November 2017 Abstract—We present a stochastic model moti- vated by gene expression which incorporates unspe- cific interactions between proteins and binding sites. We focus on characterizing the distribution of free (i.e. unbound) protein molecules in a cell. Although it cannot be expressed in a closed form, we present three different approaches to obtain it: stochastic simulation algorithms, system of ODEs and quasi- steady-state solution. Additionally we use a large- system-size scaling to derive statistical measures of approximate distribution of the amount of free protein, such as the Fano factor. Intriguingly, we report that while in the absence of or in the excess of decoy binding sites the Fano factor is equal to one (suggestive of Poissonian fluctuations), in the intermediate regimes of moderate levels of binding sites the Fano factor is greater than one (suggestive of super-Poissonian fluctuations). We support and illustrate all of our results with numerical simula- tions. Keywords-Gene Expression; Master Equation; Small Noise Approximation; Stochastic Simulation I. INTRODUCTION The number of proteins and other species present in the biological processes inside the cells such as gene expression is usually small [6], [28]. Therefore, deterministic modeling of such reactions can be quite inaccurate and we often turn to stochastic methods [18]. They account for discrete number of molecules and can easily be simulated through stochastic simulation algo- rithms, in particular the Gillespie algorithm [10], [11]. Being a very timely topic, gene expression spurred a revival of interest in Markovian models of chemical kinetics, e.g. [25]. We assume that the protein is produced with a constant rate and that the rate of its decay is proportional to the number of proteins. We study the protein dynamics in pres- ence of so-called decoy binding sites [17] on the DNA. Our model takes into account protein bind- ing/unbinding reactions with these binding sites. Similar models have already been studied previ- Copyright: c© 2017 Hojcka et al. This article is distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Citation: Michal Hojcka, Pavol Bokes, Non-monotonicity of Fano factor in a stochastic model for protein expression with sequesterisation at decoy binding sites, Biomath 6 (2017), 1710217, http://dx.doi.org/10.11145/j.biomath.2017.10.217 Page 1 of 17 http://www.biomathforum.org/biomath/index.php/biomath https://creativecommons.org/licenses/by/4.0/ https://creativecommons.org/licenses/by/4.0/ http://dx.doi.org/10.11145/j.biomath.2017.10.217 Michal Hojcka, Pavol Bokes, Non-monotonicity of Fano factor in a stochastic model for protein ... ously; in particular [9] investigated the model with protected complexes, i.e. the case when bound proteins were immune to degradation, showing that the steady-state distribution is Poissonian. Our model allows bounded proteins to degrade, which introduces additional noise into the model [2], [3]. For simplicity, we ignore effects of burst-like protein synthesis or transcriptional auto-regulation [2], [4], [26]. In section II we formulate the Master equation for the probability distribution of free protein; unfortunately, we are unable to find its solution in a closed form. However, biochemical reactions often operate on different timescales as was already thoroughly investigated in works such as [14], [12], [13]. Specifically, in the current context the interactions between the protein and its binding sites occur on a substantially faster timescale than the turnover of protein (by tran- scription and decay) does [1]. This allows us to successfully use singular perturbation methods [5], [22], [23] to obtain the quasi-steady-state solution to our problem. Obtaining the quasi-steady-state approximations in our model involves finding an equilibrium of binding/unbinding reaction, which is a specific case of a reversible bimolecular reac- tion studied by Laurenzi in [16]. In section III we expand the Master equation using the linear noise expansion as proposed in [29]. II. STOCHASTIC APPROACH The number of proteins expressed from a single gene can often be quite small [31]; it would therefore be inaccurate to use deterministic ap- proach, which treats reactants as continuous vari- ables. Instead we take a stochastic approach, in which we model each species as a discrete random variable and each reaction as a random event with a given probability to occur. Some protein species are present at even less than 10 copies per E. coli cell [28], therefore we also work with mean number of proteins below 100 in this paper. As we use discrete variables it is quite clear that we need to simulate these reactions through a simulation algorithm. Such algorithms gained a wider recognition after the work of Gillespie [10]. A. Model Description In this section we present a minimalistic model for gene expression. We neglect the effect of mRNA translation and production of proteins in bursts; instead we focus on the interaction between the proteins and decoy binding sites. Unlike in [26], we assume that bounded proteins are subject to degradation processes. Let us use the following notation for our vari- ables: X - protein (free or bound), Xf - free protein, Y - binding site, Yf - free binding site, C - complex (protein bound to the binding site). For the sake of simplicity we omit ’decoy’ from the binding site notation as we do not take into account any other binding sites. We assume that three reversible reactions can take place: 1) Protein production/decay. ∅ k−⇀↽− γ Xf 2) Protein binding/unbinding reaction. Xf + Yf k+−⇀↽− k− C 3) Decay of the complex (whereby a binding site is vacated). C γ −→ Yf We use upper-case letters in italics to represent a number of corresponding species throughout this paper. We reserve the corresponding lower-case letter as a notation for a concentration of a given species. In order to avoid confusion with X , we use N instead of Xf as the number of free protein. Although we mentioned five different variables, the problem is just two-dimensional, with the following straightforward conservation laws held between the variables: Biomath 6 (2017), 1710217, http://dx.doi.org/10.11145/j.biomath.2017.10.217 Page 2 of 17 http://dx.doi.org/10.11145/j.biomath.2017.10.217 Michal Hojcka, Pavol Bokes, Non-monotonicity of Fano factor in a stochastic model for protein ... Fig. 1: Simulation using the Gillespie algorithm. - Y is a known constant (the number of binding sites) - C = X −N (the number of complexes is the same as the number of bound protein) - Yf = Y −C = Y −X + N (the number of free binding sites is the same as the number of all binding sites without the complexes) Therefore we can express the Master Equation of the system in terms of X and N (with a constant total number of binding sites Y ): ṖX ,N = kPX−1,N−1 −kPX,N +γ(N +1)PX+1,N+1−γNPX,N +k+(N +1)(Y −X +N +1)PX,N+1 −k+N (Y −X +N )PX,N +k−(X−N +1)PX,N−1−k−(X−N )PX,N +γ(X−N +1)PX+1,N −γ(X−N )PX,N , (1) where PX ,N is the abbreviation of P(X(t) = X , Xf (t) = N ). We can use the Gillespie al- gorithm to simulate this process (parameters of the reactions are set to Y = 10, k = 3, γ = 0.1, k+ = 1, k− = 10 with no proteins at the Biomath 6 (2017), 1710217, http://dx.doi.org/10.11145/j.biomath.2017.10.217 Page 3 of 17 http://dx.doi.org/10.11145/j.biomath.2017.10.217 Michal Hojcka, Pavol Bokes, Non-monotonicity of Fano factor in a stochastic model for protein ... beginning: PX ,N (0) = δX ,0δN ,0. We illustrate the individual components of the process in Figure 1. B. Total Protein distribution Our first goal is to obtain the marginal distribu- tion of the total protein number. It can be obtained as the sum of PX ,N through all possible N ’s: PX = ∞∑ N =−∞ PX ,N . Let us use an abbreviation ∑ N for the sum through all integers. In order to derive an equation PX from (1) let us sum both sides of the equation through all N ’s. As the sum goes through all integers (if N < 0 is the probability is naturally equal to zero), we can use the fact that∑ N f(N ) = ∑ N f(N ± 1), obtaining ṖX =k ∑ N (PX−1,N −PX,N ) +γ ∑ N N (PX+1,N −PX,N) +γ ∑ N ((X−N +1)PX+1,N −(X−N )PX,N) , which simplifies to ṖX =k (PX−1−PX )+γ ((X +1)PX+1−XPX ) . (2) A system of differential equations in this form can be solved using the method of generating functions [15]. First we multiply both sides by sX and sum them through all integer X ’s. ∂ ∂t ∑ X sX P(X, t) =k (∑ X sX PX−1− ∑ X sX PX ) +γ ( (X +1) ∑ X sX PX +1−X ∑ X sX PX ) . (3) Now we can use the definition of the generating function G(s,t) = ∑ X sX PX and its derivative ∂G ∂s = ∑ X XsX−1PX in order to transform (3) into a partial differential equation: ∂G ∂t = k(s− 1)G + γ(1 −s) ∂G ∂s = (s− 1) ( kG−γ ∂G ∂s ) . (4) In the steady state we can omit the left-hand side of the equation (as ∂G ∂t = 0) and easily solve it using the separation of variables. We come to the solution G(s,∞) = e k γ (s−1), which can be recognized as the probability generating function of the Poisson distribution parametrized by λ = k γ . Using the notation 〈X〉 = λ for the distribution’s mean we can write PX (∞) = 〈X〉X e−〈X〉 X ! . Away from the steady state, (4) is an example of a non- homogeneous first-order linear partial differential equation, which can be solved for suitable initial conditions. Using a common extension of the method of characteristics for (quasi-)linear PDE’s (see e.g. [8]) we introduce an auxiliary function u = u(s,t,G), which satisfies ∂u ∂t + γ(s− 1) ∂u ∂s + k(s− 1)G ∂u ∂G = 0. The characteristic system of this equation has the form: ṫ = 1, ṡ = γ(s− 1), Ġ = k(s− 1)G, t(τ) = τ + C1 s(τ) = C2e γτ + 1 dG G = k(C2e γτ )dτ (5) By finding the functions which are constant on the characteristics, we obtain solutions in the form G(s,t) = ψ ( t− ln(s−1) γ ) ·e ks γ . We can specify the results for any particular initial conditions. Let us investigate the case when there is no protein at the beginning, i.e. P(x, 0) = δx,0, which gives us an initial condition G(s, 0) = 1 also for the generat- ing function. Using the initial condition we come to the solution G(s,t) = e− k γ (e−γt(s−1)+1)+ ks γ = e −k γ (s−1)(e−γt−1) , which is again the Poisson dis- tribution, just with different, time-dependent, pa- rameter meaning that 〈X (t)〉 = k γ · ( 1 −e−γt ) . Taking t → ∞ we can easily see that the time- dependent solution converges to the steady-state one. Biomath 6 (2017), 1710217, http://dx.doi.org/10.11145/j.biomath.2017.10.217 Page 4 of 17 http://dx.doi.org/10.11145/j.biomath.2017.10.217 Michal Hojcka, Pavol Bokes, Non-monotonicity of Fano factor in a stochastic model for protein ... C. Free Protein distribution Searching for an exact formula for the free protein distribution yields no apparent results: the master equation (1) does not have solution in the closed form (unless Y = 0). Therefore we have to look for alternative ways to obtain it. 1) Stochastic Simulation Algorithms: The first approach consists of using the techniques of stochastic simulation algorithms, in particular the Gillespie algorithms (see [7] for a practical guide). With their help we can simulate the time evolution of all species involved in a system of reactions. The simulation technique is based upon the follow- ing scheme: we generate two uniformly distributed random numbers, the first of which determines the time of the next reaction and the second one determines which reaction will occur. When we use a sufficient number of sample trajectories, we obtain a robust estimate for the distribution of any variable. The main problem of this approach is its limited computational capacity. In order to obtain a robust distribution of variables we have to run a substantial number of simulations; this can be very time-consuming and even unrealistic when calculating results for many different parameter sets. 2) System of ODEs: The second option is to transform the Master equation, which is an infinite system of equations, into a finite system of ordi- nary differential equations. The most straightfor- ward way to obtain this is to set threshold values of the discrete variables, replacing the unknown probability by zero whenever these thresholds are exceeded (see [20], in our case we set all PX ,N to zero, whenever X > 100). In this way, the overall sum of probability distribution is no longer conserved at one, but in return we get a finite system of differential equations. We use the MAT- LAB ode15s solver for stiff problems to calculate the solution. But this system is also very time- consuming, as the number of equations and the computational difficulty grows rapidly with raising the thresholds for the non-zero probabilities of PX ,N . 3) Quasi-steady-state solution: In order to ob- tain the solution in a closed form, which is as close to exact solution as possible, we proceed to obtain the so-called quasi-steady-state solution [24], [27]. We utilize the fact from biological background that binding/unbinding reactions are fast compared to protein production/degradation (k− � γ). In order to use this fact we perform the singular perturbation reduction [9] using the ratio γ/k− as a small parameter. Let us introduce new dimensionless time and the parameters: ε = γ k− , kb = k− k+ , t = τ γ . The master equation then reads ε d dτ PX ,N = ε k γ (PX−1,N−1 −PX ,N ) + ε ((N + 1)PX +1,N +1 −γNPX ,N ) + 1 kb ( (N + 1)(Y −X + N + 1)PX ,N +1 −N (Y −X + N )PX ,N ) + (X −N + 1)PX ,N−1 − (X −N )PX ,N + ε ((X −N + 1)PX +1,N − (X −N )PX ,N ) . (6) As ε tends to zero, we obtain an equation for the leading-order approximation of PX ,N . The number of total protein X is constant after this approximation, so that we can abbreviate PX ,N by PN , writing (N + 1)(Y −X + N + 1)PN +1 = (N (Y −X + N ) + kb(X −N )) PN− −kb(X −N + 1)PN−1. (7) Solving (7) subject to boundary conditions (see e.g. [16]) PN (N <0) = 0 and PN (N >X ∨N