Microsoft Word - 476hernandez.docx CHEMICAL ENGINEERING TRANSACTIONS VOL. 43, 2015 A publication of The Italian Association of Chemical Engineering Online at www.aidic.it/cet Chief Editors: Sauro Pierucci, Jiří J. Klemeš Copyright © 2015, AIDIC Servizi S.r.l., ISBN 978-88-95608-34-1; ISSN 2283-9216 Use of Artificial Neural Networks to predict Aqueous Two- Phases System Optimal Conditions on Bromelain’s Purification Diego de Freitas Coêlhoa, Camila Alves Silvaa, Camila Sacconi Machadoa, Edgar Silveirab, Elias Basile Tambourgia aDepartment of Chemical Engineering Systems, School of Chemical Engineering, State University of Campinas - UNICAMP - Av. Albert Einstein, 500. P.O. 6066, Zip Code: 13083-970. Campinas-SP, Brazil. bInstitute of Genetics and Biochemistry, Federal University of Uberlandia – UFU - Campus Umuarama - Bloco 2E - Sala 246 - 2º Piso, Av. Pará, 1720. Zip Code: 38400-902, Uberlândia – MG, Brazil. dfcoelho@feq.unicamp.br Bromelain is the denomination chosen to the group of endoproteases obtained from pineapple and from most of plants belonging to Bromeliaceae family. These enzymes have being widely studied in researches across the world due its physiological activity and biotechnological potential. While Brazil still cultivating over 60,000 hectares of pineapple, there is a optimistic trend that aim bromelain's recovery from agriculture residues (stalk and leaves) and fruit processing residues (stem and bark) leading to a fully integrated process which aggregate value to vegetal residues. Our previous studies applied Aqueous Two-Phases Systems and Fractional Precipitation to purify bromelain and achieve purification factor and yield of 11.80 and 87.36, respectively. However, such studies were designed and analysed using Design Of Experiments (DOEs), which lead to an optimal condition but cannot predict with accuracy the complex phenomena of partitioning using ATPS. This work is part of an initiative that aims establish a protocol to calculate more accurate partitioning data through the use of Artificial Neural Networks (ANNs) over a dataset that has being improved continuously. The ANN will determine the relationship between five input parameters (temperature, PEG's molar mass, concentration of PEG, concentration of ammonium sulphate and dilution factor of sample) with three output parameters (protein partition coefficient, Activity partition coefficient and purification factor). The method applied a feed-forward neural network trained with Levenberg-Marquardt algorithm and the Bayesian regularization over the normalized experimental data. The network generated proved the reliability of the method which combined datasets from different DOEs and obtained regression coefficient (~.99) and error (MSE ~0.02) satisfactory for such amount of data used so far. 1. Introduction The group of thiol-endopeptidases known as Bromelain can be extracted from any plant belonging to the Bromeliaceae family (Heinicke and Gortner, 1957) and originally was used as a folk medicine by the aboriginal inhabitants of Central and South America to treats several sicknesses (Taussig and Batkin, 1988). These enzymes had proven therapeutic applications as an anti-inflammatory drug (Salas et al., 2008), in the treatment of allergic disease (Secor Jr et al., 2005), carcinopreventive agent (Harrach et al., 1994), antithrombotic and fibrinolytic activities (Maurer, 2001). Unlike most enzymes, bromelain is stable and highly active in both acid and alkaline solutions (which expand its range of possible applications) and holds its proteolytic activity even at 60°C, when most enzymes denatures (Bhattacharya and Bhattacharyya, 2009). Their purification has being studied for extensively in the last decade: While Harrach et al. (1995) applied Fast Protein Liquid Chromatography as a way to Isolate and characterise the enzymes, Rabelo et al. (2004) DOI: 10.3303/CET1543237 Please cite this article as: Coelho D.F., Silva C.A., Machado C.S., Silveira E.C., Tambourgi E.B., 2015, Use of artificial neural networks to predict aqueous two-phases system optimal conditions on bromelain’s purification, Chemical Engineering Transactions, 43, 1417-1422 DOI: 10.3303/CET1543237 1417 decided to study the use of Aqueous Two-Phase Systems, which has a higher throughput capacity than chromatography techniques, to purify the same enzymes. Over the years, researchers have investigated the use of alternatives bulk recovery techniques, such as expanded bed absorption (Silveira et al., 2009) and Fractional Precipitation (Martins et al., 2014) but the use of ATPS if by far the most employed: Reverse micelles (Umesh Hebbar et al., 2008) , High-speed counter-current chromatography (Yin et al., 2011), combining it with fractional precipitation (Coelho et al., 2012) and endless variety of salts and polymers, such as PEG/potassium phosphate (Ferreira et al., 2014) and PEO–PPO–PEO block polymers (Rabelo et al., 2004). While the researchers seem to have tried to exhaust the possible combinations of components and modes when using ATPS, they all lack in use a fast and reliable method to determine the best operational conditions. At this moment, there is no method to determine such characteristics with no use of a time-consuming and laborious experimental method. ATPS’s characterisations rely on empirical determination of purification parameters for every single modification in the systems under study. One might use statistic methods (such as Design of Experiments) to reduce experimental work but it still lacks in handle trade-off problems as a purification process. What if we could use a cluster of randomly distributed data obtained to optimize specific parameters in a much broader purpose? That is exactly the purpose of this initiative: to combine all data generated through decades of research in a database that can be constantly improved. 2. Materials and Methods 2.1 ATPS Data acquisition All experimental data was acquired in projects realized previously, in which we used Design of Experiments and Response Surface Methodology to optimize the parameters or determine a specific operational condition. At this study we restricted the data to those a limited number of variables and in a specific range. The chosen input variables and the correspondents ranges are presented in the Table 1. Those variables were selected from studies in which their effects in the purification were evaluated. The ones presented here showed higher impact during experiments. Table 1: Input variables used in the neural model and their range Input variables Description Range Temperature (°C) Operational Temperature 5 - 25 MMPEG PEG Molecular Mass 2,000 – 4,000 – 6,000 (m/m,%) (NH4)2SO4 Concentration of Ammonium Sulphate 7 a 20 (m/m,%) PEG PEG Concentration 9 a 30 Dilution (%) Dilution Factor 25 - 50 - 75 2.2 Mathematical definition of Output Variables As output variables, we chose the protein partition coefficient ( ) and the enzymatic partition coefficient ( ), as described in the table 2. Coelho et al. (2013) describes in details the equationing for the chosen output variables. Table 2: Output variables used in the neural model and their range. Output variables Description Minimum Maximum KP Protein Partition Coefficient 0 100 KA Enzymatic partition Coefficient 0 100 PF Purification Factor 0 98.75 1418 2.3 Results and Discussion As no mathematical model can predict the complex nature of aqueous two-phase systems in enough accuracy, we decided to evaluate the application of Artificial Neural Networks (ANN’s) in the modelling and prediction of partitioning parameters. Basically, an artificial neural network is a system composed of hundreds of units; artificial neurons (AN) or processing elements (PE), which are connected with coefficients (weights) constituting the neural structure and is arranged in layers as can be seen in the figure 1 (Chrislb, 2005). Figure 1: Diagram of an artificial neuron When a set of input and output data is used to stimulated a “learning” network, such data is used to adjust each neuron’s “weights” through successive changes in its values so that the network implement and execute the desired functions (Brumatti, 2005) and apply the “knowledge” gained from past experiences to new problems or conditions. This study used the Levenberg-Marquardt optimization as the training algorithm but it was used Bayesian regularization in order to improve generalization and avoid overfit. This gain is a consequence of the smaller weights calculated by the algorithm, which make the network respond smoother (Foresee and Hagan, 1997). As mentioned, the neural model used either a backpropagation network or a feedforward network coupled with Levenberg-Marquardt and Bayesian regularization optimization algorithms, all available in the Neural Network toolbox from MATLAB ® Software (The MathWorks Inc., 2013). All variables were normalized between 0 and 0.9. As activation functions, were tested hyperbolic tangent function, sigmoid function and a linear function. The neural networks were set and trained combining the different neural models and activation functions (besides the number of neurons) in order to determine which topology converged faster. To estimate the deviation between the ANN’s results and the experimental data, we used the Mean Squared Error (MSE) and the regression coefficient (R), which are the most common parameters used on its analysis (Beale et al.). The neural network would be considered fit to the experimental data when MSE tend to zero and R to 1. Among the results (Table 3) obtained from the topology optimization for the neural network, the best configuration is the T3, which used 30 neurons and no intermediary layers. We compared values of R, MSE and also the convergence time, being the last one the main factor that made T3 better than T7. These results were obtained during the initial step of this research project and hence used a dataset with only 120 experiments and such variance obtained is expected. Although Aqueous Two-Phases Systems has been used for decades, there is no such thing as a model that can precisely predict any property from those systems with no experimental data. This creates an even harder task to find an appropriated approach to study it. 1419 Table 3: Neural Networks training results using KP, KA and PF as output variables T (Nrn,Lyr) R MSE KP KA PF Inter. Function Inter. Function 2 Output Function T1 (10,1) 0.8862 0.14369 0.88394 0.91154 0.67951 Tansig Purelin Purelin T2 (20,1) 0.9514 0.06371 0.95463 0.95836 0.87268 Tansig Purelin Purelin T3 (30,1) 0.9846 0.02045 0.99418 0.98574 0.92548 Tansig Purelin Purelin T4 (30,2) 0.6465 0.40178 0.63385 0.66119 0.49902 Tansig Purelin Purelin T5 (30,2) 0.9807 0.02573 0.99149 0.98097 0.91795 Logsig Purelin Purelin T6 (40,2) 0.9858 0.01901 0.99351 0.98713 0.93540 Tansig Purelin Purelin T7 (50,2) 0.9854 0.01953 0.99307 0.98720 0.93223 Tansig Purelin Purelin T8 (5,2) 0.7577 0.28486 0.75959 0.76642 0.54001 Tansig Purelin Purelin T9 (10,2) 0.8814 0.14927 0.88072 0.90113 0.70156 Tansig Purelin Purelin T10 (20,2) 0.9688 0.04160 0.98592 0.97862 0.80179 Tansig Purelin Purelin T11 (30,3) 0.9840 0.02118 0.99243 0.98553 0.92806 Tansig Purelin Purelin T12 (30,4) No Convergence Tansig: Hyperbolic Tangent, Purelin: Linear, Logsig: Sigmoidal, Nrn: Neurons, Lyr: Layers In this set of simulations we tried to test an even bigger number of combinations between the number of neurons, number of intermediary layers and even the activation functions but most of them couldn’t even converge. Thus, the topology with 30 neurons and using a hyperbolic function as activation function provided the best results. Figure 2: Convergence (A) and Regression (B) for the best topology obtained (T3, Table 3) Figure 2 presents the fitting parameters results obtained for T3 topology, which was the one that returned the best results. Figure 3 presents the regression data using the topology T3 for the output variables (K , K and P ). It is noticeable that data represents well the experimental data but we still expect to be able to improve the model in at least 5 %. Positive results are mainly due Bayesian regularization, which improve the generalization capability of the model even in a reasonable high operational range (Fileti et al., 2010). 1420 Figure 3: Output Test with T3 Topology output Neural Network for KA, KP and PF respectively However, the study still lacks in explain why we couldn’t decrease MSE and the variation observed in figure 2 and at this point the network just proved that was able to correlate data from several experiments and show we can improve the network and use it to obtain a better understanding of the process. 3. Conclusions The network generated proved the reliability of the method by modelling combined data from different experimental designs and obtaining reasonable regression coefficient (~.99) and error (MSE ~0.02). At this stage the neural network was able to model and predict with certain precision the data handled. However, it is necessary to improve the robustness by increasing the number of input in the database used to train the network. When complete, the neural model will be able to predict operational points, analyze influence of different factors and select conditions in which a trade-off is present. Acknowledgements The authors would like to acknowledge the financial support of FAPESP (São Paulo Research Foundation), PROPP-UFU (Dean of Research and Graduate Studies at the Federal University of Uberlândia) and CNPq (National Council for Scientific and Technological Development). References Beale, M. H., Hagan, M. T. & Demuth, H. B., Neural network toolbox 7. Bhattacharya, R. & Bhattacharyya, D., 2009, Resistance of bromelain to sds binding, Biochimica et Biophysica Acta (BBA) - Proteins and Proteomics, 1794, 698-708. Brumatti, M., 2005, Redes neurais artificiais, Vitória, Espírito Santo. Chrislb,2005, Diagram of an artificial neuron.,In: Artificialneuronmodel_English.Png (ed.) (created by Chrislb) [GFDL (http://www.gnu.org/copyleft/fdl.html) or CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-sa/3.0/)%5D, via Wikimedia Commons, http://commons.wikimedia.org/wiki/File:ArtificialNeuronModel_english.png: Wikimedia Commons. Coelho, D., Silveira, E., Pessoa Junior, A. & Tambourgi, E., 2012, Bromelain purification through unconventional aqueous two-phase system (peg/ammonium sulphate), Bioprocess and Biosystems Engineering, 35, 1-8. Coelho, D. F., Silveira, E., Pessoa Junior, A. & Tambourgi, E. B., 2013, Bromelain purification through unconventional aqueous two-phase system (peg/ammonium sulphate), Bioprocess and Biosystems Engineering, 36, 185-192. Ferreira, J. F., Sbruzzi, D., Barros, K. V. G., Ehrhardt, D. D. & Basile, E., 2014, Purification of bromelain enzyme from curauá (ananaserectifolius lb smith) white variety, by aqueous two-phase system peg 4000/potassium phosphate, J. Chem, 8, 395-399. Fileti, A. M. F., Fischer, G. A. & Tambourgi, E. B., 2010, Neural modeling of bromelain extraction by reversed micelles, Brazilian Archives of Biology and Technology, 53, 455-463. Foresee, F. D. & Hagan, M. T. Gauss-newton approximation to bayesian learning. Proceedings of the 1997 international joint conference on neural networks, 1997. Piscataway: IEEE, 1930-1935. Harrach, T., Eckert, K., Schulze-Forster, K., Nuck, R., Grunow, D. & Maurer, H. R., 1995, Isolation and partial characterization of basic proteinases from stem bromelain, Journal of Protein Chemistry, 14, 41-52. Harrach, T., Garbin, F., Munzig, E., Eckert, K. & Maurer, H. R., 1994, Bromelain: An immunomodulator with anticancer activity, European Journal of Pharmaceutical Sciences, 2, 164. 1421 Heinicke, R. M. & Gortner, W. A., 1957, Stem bromelain—a new protease preparation from pineapple plants, Economic Botany, 11, 225-234. Martins, B. C., Rescolino, R., Coelho, D. F., Zanchetta, B., Tambourgi, E. B. & Silveira, E., 2014, Characterization of bromelain from ananas comosus agroindustrial residues purified by ethanol factional precipitation, Chemical Engineering Transactions, 37, 781-786. Maurer, H. R., 2001, Bromelain: Biochemistry, pharmacology and medical use, Cellular and Molecular Life Sciences, 58, 1234-1245. Rabelo, A. P. B., Tambourgi, E. B. & Pessoa, A., 2004, Bromelain partitioning in two-phase aqueous systems containing peo-ppo-peo block copolymers, Journal of Chromatography B, 807, 61-68. Salas, C. E., Gomes, M. T. R., Hernandez, M. & Lopes, M. T. P., 2008, Plant cysteine proteinases: Evaluation of the pharmacological activity, Phytochemistry, 69, 2263-2269. Secor Jr, E. R., Carson Iv, W. F., Cloutier, M. M., Guernsey, L. A., Schramm, C. M., Wu, C. A. & Thrall, R. S., 2005, Bromelain exerts anti-inflammatory effects in an ovalbumin-induced murine model of allergic airway disease, Cellular Immunology, 237, 68-75. Silveira, E., Souza-Jr, M. E., Santana, J. C. C., Chaves, A. C., Porto, A. L. F. & Tambourgi, E. B., 2009, Expanded bed adsorption of bromelain (e.C. 3.4.22.33) from ananas comosus crude extract, Brazilian Journal of Chemical Engineering, 26, 149-157. Taussig, S. J. & Batkin, S., 1988, Bromelain, the enzyme complex of pineapple (ananas comosus) and its clinical application. An update, Journal of Ethnopharmacology, 22, 191-203. The Mathworks Inc.,2013, Matlab ® software.In: 8.1.0.604 ed, Natick, Massachusetts: The MathWorks Inc.,. Umesh Hebbar, H., Sumana, B. & Raghavarao, K. S. M. S., 2008, Use of reverse micellar systems for the extraction and purification of bromelain from pineapple wastes, Bioresource Technology, 99, 4896- 4902. Yin, L., Sun, C. K., Han, X., Xu, L., Xu, Y., Qi, Y. & Peng, J., 2011, Preparative purification of bromelain (ec 3.4.22.33) from pineapple fruit by high-speed counter-current chromatography using a reverse- micelle solvent system, Food Chemistry, 129, 925-932. 1422