CET Volume 86 DOI: 10.3303/CET2186151 Paper Received: 17 August 2020; Revised: 21 January 2021; Accepted: 9 May 2021 Please cite this article as: Lisci S., Gitani E., Mulas M., Tronci S., 2021, Modeling a Biological Reactor Using Sparse Identification Method, Chemical Engineering Transactions, 86, 901-906 DOI:10.3303/CET2186151 CHEMICAL ENGINEERING TRANSACTIONS VOL. 86, 2021 A publication of The Italian Association of Chemical Engineering Online at www.cetjournal.it Guest Editors: Sauro Pierucci, Jiří Jaromír Klemeš Copyright © 2021, AIDIC Servizi S.r.l. ISBN 978-88-95608-84-6; ISSN 2283-9216 Modeling a Biological Reactor using Sparse Identification Method Silvia Liscia, Elisa Gitania, Michela Mulasb, Stefania Tronci a,* aDipartimento di Ingegneria Meccanica, Chimica e dei Materiali, Università degli Studi di Cagliari, Cagliari, Italy b Department of Teleinformatics Engineering, Federal University of Ceará, Campus of Pici, Fortaleza (Ceará), Brazil stefania.tronci@dimcm.unica.it.com In this work a model-based controller for a fermentation bioreactor has been developed. By simulating the model of the process that acts as a virtual plant, input-output data have been generated and used to identify the system using sparse identification of nonlinear dynamics methodology. The obtained model is then used in a model-based algorithm to control the bioreactor temperature, where the manipulated action is obtained as a result of a constrained nonlinear optimization problem which minimizes the mismatch between the predicted trajectory and the desired one. Good performances have been obtained by applying the proposed control strategy for set-point changes and disturbance rejection. 1. Introduction Industrial biological processes are considered an important technological asset for the production of biochemicals and biofuels. Enormous effort has been made to develop mathematical models for different type of biotechnological processes, to be used for design, operation, optimization, scale-up and model-based control. However, even if they are widely used, they have not reached the same development as traditional chemical processes, particularly when considering automatic control solutions. Although bioreactors are relatively simple to operate, the complex network of reactions involved in microorganism growth makes their control very challenging (Wang et al., 2018). Even slight changes in the raw materials characteristics or in the process operating conditions can act as source that affects the growth of organisms and has an impact on the product quality. This issue becomes more demanding when considering the production of bio-chemicals or biofuels derived from waste, because the type of feedstock and pre-treatment used to obtain fermentable sugars have a strong effect on the fermentation that, then, affects the purification step (Robak and Balcerek, 2020). The design of a proper control system can improve the efficiency of the biological system and reduce the effect of incoming disturbances. Unfortunately, this is not an easy task due to model uncertainties, nonlinear nature of the system and slow response of the process. The complexity of biological processes is mostly due to the presence of living organism and their metabolism is sensitive to process conditions, such as temperature, pH, substrate concentrations (Spigno and Tronci, 2015; Pachauri et al., 2017). A good description of the input-output relationships for the bioreactor is surely one of the main ingredients for designing a proper controller and guaranteeing the respect of the desired conditions. The model can be used to derive the control action required to minimize (or maximize) an objective function, with the possibility to include some known operating constraints in the optimization objective. Caution must be taken when developing the bioreactor model, because using an inaccurate model could lead to either bad performance or an unstable closed-loop system (Cogoni et al., 2014). On the other hand, first-principles model can be difficult to obtain, particularly for bioreactors, where it is very difficult to understand and describe all the phenomena which occur. Data-driven identification can be a possible solution (Armenise et al., 2018; Taris et al., 2017), but algorithm such as neural networks may require a large amount of data and, for complex systems, they may have a large number of parameters (weights). Brunton et al. (2016) proposed a solution for system identification based on sparse identification of nonlinear dynamics which looks for the main function describing 901 the dynamics of the observed states. The procedure was successfully applied to different dynamical systems, considering measurement noise and partial observation of the states. The main objective of the present work is to design a model-based controller for a fermentation bioreactor, used as case study, exploiting the sparse identification approach. The bioreactor model was proposed by Nagy (2007) and involves detailed kinetic model and equations, which express the heat transfer, the dependence of kinetic parameters on temperature, the mass transfer of oxygen, as well as the influence of temperature and ionic strength on the mass transfer coefficient. By simulating the model of the process (virtual plant), input-output data have been generated and used to identify the system using sparse identification of nonlinear dynamics methodology (Brunton et al., 2016; Kaiser et al., 2018). A subset of states is considered measured, as it usually occurs in real plant. The study is aimed at obtaining a simple and parsimonious model that can be successfully applied to a nonlinear optimal control algorithm. 2. Fermentation reactor model A virtual plant (Nagy, 2007) is here used to address the problem of developing a nonlinear model predictive control for a bioreactor. Figure 1. The continuous fermentation bioreactor The system consists of six states, which are biomass concentration ( ), ethanol concentration ( ), substrate concentration ( ), dissolved oxygen concentration ( ), reactor temperature ( ), and jacket temperature ( ), as reported in Eqs. (1-6). The reactant volume is constant. = − (1) = − (2) = − − + , − (3) = ∗ − − (4) = ( + 273) − ( + 273) − ∆ , − , (5) = , − + , (6) = ( /( ( )) − ( ) , ∗ = 14.6 − 0.3943 + 0.007714 − 0.0000646 (7) where , is glucose concentration in the feed flow; is the flow entering the bioreactor; is the outlet flow; is the flow of the coolant agent; is the constant: for oxygen consumption ( = ), of growth inhibition by ethanol ( = ), of fermentation inhibition by ethanol ( = 1), in the substrate term for growth ( = ), and in the substrate term for ethanol production ( = 1); is the maximum specific rate for: oxygen 902 consumption ( = ), fermentation ( = ), growth rate ( = ); ( ) is the ratio of ethanol produced per glucose consumed by fermentation (ratio of cell produced per glucose consumed for growth); is the yield factor for biomass on oxygen; is the heat transfer coefficient; is the heat transfer area; is the product between mass-transfer coefficient and the specific area; ∗ is the equilibrium concentration of oxygen in the liquid phase; is the volume of the mass of reaction; is the volume of the jacket; , ( , ) is the heat capacity of the mass of the reaction (heat capacity of the cooling agent); ( ) is the reaction mass density (cooling agent density). Other details on the model and parameters value can be found in Nagy (2007). Table 1 reports nominal conditions for the fermentation system. Table 1: Input values at the nominal conditions Inputs [L/h] [L/h] , [oC] , [g/L] 51 18 15 60 3. Model identification A data-driven approach has been used to identify the model from the available outputs and inputs. The algorithm used in this work is that proposed by Brunton et al. (2016), which determines the governing equations of the bioreactor by sparse identification of nonlinear dynamical systems. To obtain the model, the available n states and l manipulated inputs have been collected, obtaining a time series of length for each measure. The data sampled can be rearranged in matrixes, as reported in Eq. (8). = ( )⋮( ) = ( ) ⋯ ( )⋮ ⋱ ⋮( ) ⋯ ( ) , = ( )⋮( ) = ( ) ⋯ ( )⋮ ⋱ ⋮( ) ⋯ ( ) (8) Defining the augmented vector = [ , ], the matrix of the collected data is reported in Eq. (9) = ( )⋮( ) = ( ) ⋯ ( )⋮ ⋱ ⋮( ) ⋯ ( ) ( ) ⋯ ( )⋮ ⋱ ⋮( ) ⋯ ( ) (9) A set of candidate functions needs to be selected for describing the dynamics of the observed states (10). In this work, only quadratic functions (9) have been considered, aiming at a simple model. Θ(Z) = |1| |Z| |Z | (10) The term Z represents the quadratic terms of the model, as represented in Eq. (11). = ( ) ( ) ( ) ⋯( ) ( ) ( ) ⋯⋮ ⋮ ⋱ ( ) … ( )( ) … ( )⋮ ⋱ ⋮( ) ( ) ( ) … ( ) … ( ) (11) The reconstructed dynamics has the following form (12) = Θ( , )Ξ (12) Each column of the matrix Ξ is a sparse vector of coefficients determining which terms are active in the right- hand side of Eq. (12). The sparse solution has been calculated by means of the sequential threshold algorithm (Brunton et al., 2016), that starts with the least-square solution for Ξ and threshold all coefficients that are smaller than the given cutoff value (λ). Then, another least-square solution for Ξ is obtained using the functions that have not been eliminated in the previous step. The new coefficients are again threshold until convergence. To obtain data set for identification, the model (1-6) has been excited by appropriate input signals. The quality of identification depends on the quality of the data used to obtain the model, that means that data should contain enough information of the state dynamics. 903 4. Control algorithm The identified model was introduced in a model predictive control scheme as the internal model used for prediction during the control movement calculation. In each sampling period, the current temperature measurement is obtained by the virtual plant, and the control action is calculated by solving the optimization problem (13) (Ogunnaike and Ray, 1994): min ( ) ∑ ( + ) − ( + ) − ( ) − ( ) + ( − 1) − ( ) (13) where ( + ) is calculated by the identified model, ( ) is the measured temperature, is the set-point trajectory, ( ) is the coolant flow rate (manipulated input) and is a positive parameter, used to penalize the deviation of the manipulated inputs in order to avoid aggressive control action. 5. Results 5.1 Model identification Only four states have been considered observable ( , , , ) and only one manipulated variable ( ), therefore = 4 and = 1 in Eq. (8). The derivatives of the states in (12) have been numerically obtained using fourth-order approximation. Different input sequences have been used to excite the system, and the best results in terms of reconstruction capability and robustness have been obtained with the pseudo-random binary sequence (PRBS) shown in Figure 2. Data have been sampled every 0.025 h and the following cutoff values have been used to individuate the best prediction capabilities = ,, , , = [0.1,0.2,0.3,0.5] . (14) Figure 2. PRBS signal used to excite the bioreactor. The comparison between the states calculated by integrating the model (1-6) and the ones predicted with the identification procedure described in Section (3) are reported in Figure 3, for the validation set. The identified model shows good prediction capability, as shown by the fact that prediction and virtual plant curves are nearly superimposed. 5.2 Control The nonlinear predictive control has been implemented using the bioreactor model in (1-6) as virtual plant. The constrained optimization problem was solved using a gradient-based method, where the minimum flow rate value was set equal 0, and the maximum one was set equal to 200, according to Nagy (2007). The sample time of the controller is 3 minutes. It is important to note that the reactor is controllable, therefore it is in principle possible to obtain the required product composition with the proper control action. Temperature set-point can be therefore selected such that the required ethanol production is obtained. 904 Figure 3. Comparison between prediction (dashed red line) and actual values (black continuous line) of substrate concentration (upper panel, left), oxygen concentration (upper panel, right), reactor temperature (bottom panel, left), cooling agent temperature (bottom panel, right). Control performance has been evaluated in terms of set-point tracking using the same temperature set-point variations proposed in Nagy (2007), who developed a neural network nonlinear temperature controller for the bioreactor. The comparison shows that the simple identification algorithm and the parsimonious model used in the present work leads to satisfactory results (Figure 4), being the controller able to efficiently track the system to the set-point. Figure 4. Proposed controller performance for set-point changes. Controlled reactor temperature (left panel, continuous line), temperature set-point (left, dashed line) and manipulated variable (right panel). The controller has been also evaluated in presence of disturbance variations, obtained by varying the input temperature of the cooling agent (Figure 5, right panel, right y-axis). Observing the left panel of Figure 5, it is possible to notice that the deviation of reactor temperature from the set-point at 30oC is very small, thanks to the optimal trajectory obtained for the manipulated variable as reported in the right panel of Figure 5 (left y- axis). 0 50 150 250 350 Time [h] 27 28 29 30 31 32 33 34 T r [ o C ] 50 150 250 350 Time [h] 0 40 80 120 160 200 F ag [l /h ] 905 Figure 5. Proposed controller performance for disturbance rejection. Controlled reactor temperature (left panel, continuous line), temperature set-point (left panel, dashed line). Manipulated variable (right panel, continuous line, left y-axes) and inlet coolant agent temperature (right panel, dashed line, right y-axes). 6. Conclusions One of the most demanding issue when considering the development of advanced control systems in bioreactors is the obtainment of an accurate input-output model. This is because of the complex phenomena occurring in the biosystem and the lack of adequate monitoring tools. This problem has been here addressed by exploiting the sparse identification of nonlinear dynamics algorithm, that has been used to identify the nonlinear model of a bioreactor for ethanol production by fermentation of glucose. The aim was to obtain a parsimonious model with a limited number of measured states, that could be used to develop a model-based control, where the manipulated variable action was calculated by solving a nonlinear optimization problem. The identified model was able to give a good reconstruction of the observed states that led to the obtainment of good performance of the optimal model-based controller. Some issues need to be addressed in the future, as the introduction of measurement noise and development of a MIMO optimal controller to guarantee the product quality even in presence of disturbances that can affect the conversion (e.g., variation of pH, presence of contaminants in the reactor feed, variation of inlet substrate concentration). References Armenise, G., Vaccari, M., Di Capaci, R.B., Pannocchia, G., 2018, An Open-Source System Identification Package for Multivariable Processes, In 2018 UKACC 12th International Conference on Control (CONTROL) (pp. 152-157). IEEE. Cogoni, G., Tronci, S., Baratti, R., Romagnoli, J.A., 2014, Controllability of semibatch nonisothermal antisolvent crystallization processes, Industrial and Engineering Chemistry Research, 53(17), 7056-7065. Brunton, S. L., Proctor, J. L., Kutz, J. N., 2016, Discovering governing equations from data by sparse identification of nonlinear dynamical systems, Proceedings of the national academy of sciences, 113(15), 3932-3937. Kaiser, E., Kutz, J. N., Brunton, S. L., 2018, Sparse identification of nonlinear dynamics for model predictive control in the low-data limit, Proceedings of the Royal Society A, 474(2219), 20180335. Lisci, S., Grosso, M., Tronci, S., 2020, A Geometric Observer-Assisted Approach to Tailor State Estimation in a Bioreactor for Ethanol Production, Processes, 8(4), 480. Nagy, Z. K.,2007, Model based control of a yeast fermentation bioreactor using optimally designed artificial neural networks, Chemical Engineering Journal, 127(1-3), 95-109. Ogunnaike, B.A. O., Ray, W. H., 1994, Process dynamics, modeling and control. Oxford University Press. Pachauri, N., Singh, V., Rani, A., 2017, Two degree of freedom PID based inferential control of continuous bioreactor for ethanol production, ISA transactions, 68, 235-250. Robak, K., Balcerek, M., 2020, Current state-of-the-art in ethanol production from lignocellulosic feedstocks, Microbiological Research, 126534. Spigno, G., Tronci, S., 2015, Development of hybrid models for a vapor-phase fungi bioreactor, Mathematical Problems in Engineering, 2015. Taris, A., Grosso, M., Brundu, M., Guida, V., Viani, A., 2017, Application of combined multivariate techniques for the description of time-resolved powder X-ray diffraction data, J. of Applied Crystallography, 50, 451- 461. Wang, P., Tong, H., Shao, H., 2018, Application of Computer Monitoring Technology in Industrial Ethanol Production and Fermentation. Chemical Engineering Transactions, 71, 409-414. 0 50 100 150 200 250 300 350 Time [h] 29.5 30 30.5 T r [ o C ] 0 100 200 300 Time [h] 0 50 100 150 F ag [l /h ] 10 15 20 25 T in ,a g [o C ] 906