CET 96


                                                                                                                                                                 DOI: 10.3303/CET2296047 
 
 
Paper Received: 9 December 2021; Revised: 21 June 2022; Accepted: 4 July 2022 
Please cite this article as: Del Duca V., Chirone R., Coppola A., Scala F., Salatino P., 2022, Application of Multivariate Statistical Analysis for 
Pyrolysis Process Optimization, Chemical Engineering Transactions, 96, 277-282  DOI:10.3303/CET2296047 
  

 CHEMICAL ENGINEERING TRANSACTIONS  
 

VOL. 96, 2022 

A publication of 

 
The Italian Association 

of Chemical Engineering 
Online at www.cetjournal.it 

Guest Editors: David Bogle, Flavio Manenti, Piero Salatino 
Copyright © 2022, AIDIC Servizi S.r.l. 
ISBN 978-88-95608-95-2; ISSN 2283-9216 

Application Of Multivariate Statistical Analysis For Pyrolysis 
Process Optimization 

Vincenzo Del Ducaa,*, Roberto Chironeb, Antonio Coppolaa, Fabrizio Scalaa,b, Piero 
Salatinoa,b 
aSTEMS, Consiglio Nazionale delle Ricerche, Piazzale Vincenzo Tecchio 80, 80125 Napoli, Italy 
bDICMaPI, Università degli Studi di Napoli Federico II, Piazzale Vincenzo Tecchio 80, 80125 Napoli, Italy 
vincenzodelduca@stems.cnr.it  

The identification of the most efficient biomass valorization paths is vital for reaching the target of Renewable 
Energy Sources consumption by 2030. In this context, within a National project named ‘Biofeedstock’, the 
applicability of multivariate statistical analysis, i.e. Canonical Correlation Analysis (CCA), is implemented for the 
definition of specific correlations describing quantitatively and qualitatively the fast pyrolysis process outputs. 
The database used for the CCA contains 59 observations and it has been built up using literature data 
specifically on fluidized bed fast pyrolysis without any catalyst, in the temperature range of 450-550°C. The 
results show that the CCA correctly describes the process analysed with a discrete degree of confidence. 
However, it shows two main drawbacks, firstly the dataset constitution, and secondly possibility to individuate 
only linear correlations between inputs and outputs. 

1. Introduction 

Concerns related to environmental burdens and security of energy supply are stimulating the exploitation of 
residual biomass for production of biochemical platform and biofuel platform (IEA (2019), World Energy Outlook 
2019, IEA, Paris https://www.iea.org/reports/world-energy-outlook-201>, 2019). In December 2018, the 
European Commission published the new renewable energy directive 2018/2001/EU, known as RED II (Council, 
2018; European Union, 2009). The overall target of Renewable Energy Sources consumption by 2030 was 
increased to 32% and a minimum share of 14% of renewable energy consumed in road and rail transportation 
is targeted by 2030. Furthermore, the lower bounds to contribution of advanced biofuels as a share of final 
energy consumption in the transport sector are set at 0.2 % in 2022, 1 % in 2025 and 3.5 % in 2030 (Littlejohns 
et al., 2018). Substitution of fossil fuels with biofuels aims at minimizing environmental burdens related to both 
production and consumption (Littlejohns et al., 2018) as well as at decreasing the net CO2 emissions (Colling 
Klein et al., 2018). Biofuels are classified as first-, second- and third-generation biofuels based on the carbon 
source of biomass feedstock (Ziolkowska, 2020). First-generation biofuels, which are produced directly from 
food crops, are attractive from a techno-economic prospective, as they show higher conversion efficiency and 
lower costs compared to the others. However, societal constraints as food vs fuel competition are driving 
regulations and markets toward exploitation of residual biomass and non-food crops, namely second- and third-
generation biofuels i.e. cellulosic/waste biomass and algae (Hirani et al., 2018).  
Moreover, engineering supply systems that deliver affordable, high-quality biomass or “biofeedstock” are a 
challenge for the emerging bioenergy industry (Lamers et al., 2015). Biomass feedstocks are distributed on 
broad spatial and temporal scales and have widely different physical and chemical properties (Razik et al., 
2019). “Biofeedstock”, is a National project funded by the Italian Ministry of University and Research, comprising 
12 Italian industrial and academic partners, aimed at the development of smart technology platforms for the 
residual biomass valorization paths. The basic idea behind the project is the development of extended supply 
chains based on decentralized biomass harvesting and preprocessing stages for the production of 
“biofeedstocks”, namely biogenic energy carriers.  

277

mailto:vincenzodelduca@stems.cnr.it


“Biofeedstocks” may conform to specification standards to represent tradable commodities. They can be 
eventually upgraded at centralized processing sites or biorefineries for the generation of end products (biofuels 
and biochemicals) of commercial interest. In the frame of the Biofeedstock project, viable conversion routes 
belong to either thermochemical or biochemical pathways. The thermochemical pathways include slow, fast and 
catalysed pyrolysis, gasification, torrefaction and hydrothermal liquefaction, while the biochemical pathways 
include anaerobic and aerobic fermentation of organic substrates. The comparative assessment of alternative 
valorization strategies may be accomplished by assuming relevant objective functions expressing the yield and 
quality of the biofeedstock as well as the fate of pollutant precursors.  
Obviously, the comparison among the different processes is possible if models describing the performance of 
single process, in terms of quantity and quality of the products, are available. Generally, models adopted for 
process simulations are based on equations, which describe the chemical-physical phenomena. This approach 
offers high reliability, however the development of models based on physicochemical relations are expensive in 
terms of time and money. The alternative could be represented by the application of statistical analysis exploiting 
dataset edited by scientific literature. The latter approach provides less insights and quality of the process 
description compared to the other approach, however it could be useful for the constitution of ‘light’ tools for 
decision-support systems related to biomass valorisation. This work focused the attention on the individuation 
of specific correlations related to the fast pyrolysis process representing the first attempt for the building up of 
the decision-support system. The tool adopted for the implementation of the fast pyrolysis model was the 
application of multivariate statistical analysis, in particular Canonical Correlation Analysis (CCA). The database 
used for the statistical analysis has been edited by the data presented in the scientific literature.  

2. Methodology 

Canonical Correlation Analysis (CCA), described for the first time by Harold Hotelling in 1936 (Hotelling, 1992)1, 
is widely used to extract the correlated patterns between two sets of variables, x and y. x generally represents 
the input matrix with size (n X I), where n is the number of observations and I the number of specific input 
variables; conversely, y represents the output matrix with size (n X J), where J the number of specific output 
variables. 
CCA looks at two sets of variables for modes of maximum correlation between the two sets. Thus, CCA sits at 
the top of a hierarchy of regression models which are able to manage multiple predictors (inputs) and multiple 
predictands (outputs). If x is the set of predictors and y the predictands, then CCA can be used to predict y 
when new observations of x become available. The method finds linear combinations of the original variables, 

𝑣𝑚 = 𝑎𝑚
𝑇 𝑥′ = ∑ 𝑎𝑚,𝑖𝑥𝑖

′

𝐼

𝑖=1

,     𝑚 = 1, … , min (𝐼, 𝐽) (1a) 

and 

𝑤𝑚 = 𝑏𝑚
𝑇 𝑦′ = ∑ 𝑏𝑚,𝑗𝑦𝑗

′

𝐽

𝑗=1

,     𝑚 = 1, … , min (𝐼, 𝐽) (1b) 

by projecting them onto coefficient vectors am and bm, which are chosen such that each pair of the new variables 
vm and wm, called canonical variates, exhibit maximum correlation, while being uncorrelated with the projections 
of the data onto any of the other identified patterns. In other words, CCA identifies new variables that maximize 
the interrelationships between two data sets in this sense. The vectors of linear combination weights, am and 
bm, are called the canonical vectors. The number of pairs, M, of canonical variates that can be extracted from 
the two data sets is equal to the smaller of the dimensions of x and y. The canonical vectors am and bm are the 
choices that result in the canonical variates having the following properties: 

𝐶𝑜𝑟𝑟(𝑣1, 𝑤1) ≥ 𝐶𝑜𝑟𝑟(𝑣2, 𝑤2) ≥ ⋯ ≥ 𝐶𝑜𝑟𝑟(𝑣𝑀, 𝑤𝑀) ≥ 0 (2a) 

𝐶𝑜𝑟𝑟(𝑣𝑘, 𝑤𝑚) = {
𝑟𝐶𝑚,   𝑘 = 𝑚

0,   𝑘 ≠ 𝑚
 (2b) 

𝐶𝑜𝑟𝑟(𝑣𝑘, 𝑣𝑚) = 𝐶𝑜𝑟𝑟(𝑤𝑘, 𝑤𝑚) = 0,   𝑘 ≠ 𝑚 (2c) 

and 

𝑉𝑎𝑟(𝑣𝑚) = 𝑉𝑎𝑟(𝑤𝑚) = 1,   𝑚 = 1, … , 𝑀 (2d) 

 
1 The reference refers to the reprinting of the original paper in the book ‘Breakthroughs in Statistics. Springer Series in 
Statistics’ of 1992. 

278


Equation 2a states that each of the M successive pairs of canonical variates exhibits no greater correlation than 
the previous pair and these correlations between the pairs of canonical variates are called the canonical 
correlations, rC, where [RC] is the diagonal matrix: 

[𝑅𝐶] =  

[
 
 
𝑟𝐶1 0 0 ⋯ 0

0 𝑟𝐶2 0 ⋯ 0

0 0 𝑟𝐶3 ⋯ 0

⋮ ⋮ ⋮ ⋱ ⋮
0 0 0 ⋯ 𝑟𝐶𝑀]

 
  (3) 

Equations 2b and 2c state that each canonical variate is uncorrelated with all the other canonical variates except 
its specific counterpart in the mth pair; finally equation 2d states that each of the canonical variates has variance 
equal to 1.  
The basic idea behind forecasting with CCA is straightforward: simple linear regressions are constructed that 
relate the predict and canonical variates wm to the predictor canonical variates vm: 

�̂�𝑚 = �̂�0,𝑚 + �̂�1,𝑚𝑣𝑚,   𝑚 = 1, … , 𝑀 (4) 

where �̂�0,𝑚=0, because the CCA is calculated from the centered data x’ and y’, and �̂�1,𝑚=𝑟𝐶𝑚, because the 
canonical variates are scaled to have unit variance, so the regression slopes are simply equal to the 
corresponding canonical correlations. 
The database used for the statistical analysis has been built up by the data presented in the scientific literature, 
specifically only papers regarding pyrolysis tests in fluidized beds without any catalyst have been considered 
(Chai et al., 2020; Christoforou et al., 2018; Garcia-Perez et al., 2010; Greenhalf et al., 2013; Iisa et al., 2016; 
Jung et al., 2008; Ly et al., 2019, 2020; Mullen et al., 2018; Paasikallio et al., 2014; Williams et al., 2000; Zhang 
et al., 2009). Furthermore, only tests carried out in the temperature range of 450-550°C have been collected 
because representative of the optimal range in terms of bio-liquid2 yield, so obtaining a dataset with 59 
observations. The input data x have been constituted by combination of ultimate and proximate analysis of the 
raw biomass, while the output data y report bio liquid and bio char yields, and the weight H/C and O/C ratios in 
the bio liquid. Table 1 summarizes the input and output variables used for CCA. 

 
Table 1: Variables of x and y used for CCA statistical analysis. 

x data Definition y data Definition 

M/CM Moisture to Combustible matter weight ratio of raw biomass on dry basis 
w% Bio 
Liquid 

Weight yield of bio liquid 

A/CM Ash to Combustible matter weight ratio of raw biomass on dry basis 
w% Bio 

Char 
Weight yield of bio char 

H/C Hydrogen to Carbon weight ratio of raw biomass on dry basis (O/C)BL 
Oxygen to Carbon weight ratio in the bio 

liquid 

N/C Nitrogen to Carbon weight ratio of raw biomass on dry basis (H/C)BL 
Hydrogen to Carbon weight ratio in the bio 

liquid 

O/C 
Oxygen to Carbon weight ratio of raw 

biomass on dry basis 
  

The CCA analysis has been implemented in MatLab™ environment by the utilization of the command 
canoncorr(x,y) which computes the canonical vectors am and bm, and the canonical variates vm and wm. The 
performance indicators MAE, MSE, SAE, MAPE and R2 3 have been calculated in the evaluation of the 
correlations between x and y but for simplification, only MAPE has been reported here. 

3. Results 

Table 2 reports the linear correlations and relative MAPE obtained by CCA from x and y data. In general, good 
prediction performance have been obtained for all outputs, witnessed by the MAPE values which are lower than 
10%, except for (O/C)BL which is around 13%. Figure 1 shows a comparison between experimental and 
predicted values for all outputs considered. Interesting the results show the Bio-liquid production yield is highly 
dependent on the presence of ash (A/CM). In particular, the larger is the ash content, the lower amount bio-
liquid and higher bio-char are produced.  

 
2 The bio liquid corresponds to the liquid organic phase + the pyrolytic water. 
3 MAE=mean absolute error; MSE=mean square error; SAE= sum absolute error; MAPE=mean absolute percentage error; 
R2=coefficient of determination. 

279


Table 2: x-y correlations from CCA Analysis 

Correlations MAPE  

𝑤% 𝐵𝑖𝑜 𝐿𝑖𝑞𝑢𝑖𝑑 = 1.36 ×
𝑀

𝐶𝑀
− 20.92 ×

𝐴

𝐶𝑀
− 262.55 ×

𝐻

𝐶
− 46.74 ×

𝑁

𝐶
− 26.09 ×

𝑂

𝐶
+ 118.63 7.66 (5) 

𝑤% 𝐵𝑖𝑜 𝐶ℎ𝑎𝑟 = 81.30 ×
𝑀

𝐶𝑀
+ 21.35 ×

𝐴

𝐶𝑀
+ 199.60 ×

𝐻

𝐶
− 38.64 ×

𝑁

𝐶
+ 12.25 ×

𝑂

𝐶
− 23.05 9.11 (6) 

(
𝑂

𝐶
)
𝐵𝐿

= −2.02 ×
𝑀

𝐶𝑀
+ 0.60 ×

𝐴

𝐶𝑀
− 4.77 ×

𝐻

𝐶
− 4.22 ×

𝑁

𝐶
− 0.08 ×

𝑂

𝐶
+ 1.53 13.27 (7) 

(
𝐻

𝐶
)
𝐵𝐿

= −0.27 ×
𝑀

𝐶𝑀
+ 0.07 ×

𝐴

𝐶𝑀
− 0.42 ×

𝐻

𝐶
− 0.58 ×

𝑁

𝐶
+ 0.04 ×

𝑂

𝐶
+ 0.16 7.27 (8) 

 
This result can be explained considering: i) the ash, as a non-pyrolyzable fraction, remains in the solid state, 
consequently its higher content (typical of agricultural residues) increases the production of the solid phase to 
the detriment of the others; ii) the ash has catalytic properties which favour the cracking phenomena of the 
pyrolysis vapours, consequently producing a smaller amount of bio-liquid. Furthermore, even a higher carbon 
content in the raw biomass has a positive effect for the production of the bio-liquid, indeed lower ratios of H/C, 
N/C and O/C tend to increase the yield of bio-liquid, while the situation is almost the reverse for the solid phase. 

 
Figure 1: Comparison of the predicted value of the CCA model with experimental values. 

Regarding the O/C ratio in the bio-liquid, a significant effect of the M/CM, H/C and N/C ratios can be noted, but 
the O/C of the raw biomass seems not to be very relevant. This result is perhaps imputable to the nature of the 
macro-components making up the biomass and their relative decomposition pathways and how humidity can 
play a role in these pathways. Conversely, H/C in bio-liquid appears to have a roughly equal dependence on all 
inputs. 

280


4. Conclusions 

The present work investigated the applicability of statistical analysis for describing thermochemical processes 
with simple linear correlations. The CCA method represents the first attempt to create simple tools for evaluating 
the performance of different processes for the residual biomass valorisation. In particular, CCA was used to 
quantitatively and qualitatively describe the process outputs. The results show that the predicted values of bio-
liquid and bio-char yield, and O/C and H/C ratios are within the 20% of error with respect to their experimental 
values. The main advantage of using the CCA is its simple implementation. However, the main limitation is that 
it only applies linear correlations between inputs and outputs; despite this, results can be considered acceptable 
considering the trade-off between prediction performance and computational time requirement.  
Future perspectives are to find useful prediction models looking for the optimal compromise between prediction 
efficiency and computation speed, in order to develop an efficient decision-support system and a platform for 
comparative assessment of alternative pathways for the production of bio-based fuels and chemicals from raw 
biomass residues and their blends. Other techniques, such as machine learning, can be used for more complex 
prediction based on non-linear correlations. Furthermore, for all techniques based on datamining, their 
applicability strictly depends on the quality of dataset used. In this sense, techniques based on natural language 
process for text generation can help the compiling of large datasets.  

Acknowledgments 

This study has been carried out in the frame of the project PON ARS01_00985: Biofeedstock: Development of 
Integrated Technological Platforms for Residual Biomass Exploitation, funded by the Italian Ministry for 
University and Research. 

References 

Chai, M., He, Y., Nishu, Sun, C., & Liu, R. (2020). Effect of fractional condensers on characteristics, compounds 
distribution and phenols selection of bio-oil from pine sawdust fast pyrolysis. Journal of the Energy Institute, 
93(2), 811–821. doi: 10.1016/j.joei.2019.05.001 

Christoforou, E. A., Fokaides, P. A., Banks, S. W., Nowakowski, D., Bridgwater, A. V., Stefanidis, S., 
Kalogiannis, K. G., Iliopoulou, E. F., & Lappas, A. A. (2018). Comparative Study on Catalytic and Non-
Catalytic Pyrolysis of Olive Mill Solid Wastes. Waste and Biomass Valorization, 9(2), 301–313. doi: 
10.1007/s12649-016-9809-5 

Colling Klein, B., Bonomi, A., & Maciel Filho, R. (2018). Integration of microalgae production with industrial 
biofuel facilities: A critical review. Renewable and Sustainable Energy Reviews, 82, 1376–1392. doi: 
10.1016/j.rser.2017.04.063 

Council, O. F. T. H. E. (2018). DIRECTIVE (EU) 2018/2001 OF THE EUROPEAN PARLIAMENT AND OF THE 
COUNCIL of 11 December 2018 on the promotion of the use of energy from renewable sources (recast). 
2018(April 2009). 

European Union. (2009). DIRECTIVE 2009/28/EC of the European Parliament and of the Council. Official 
Journal of the European Union, 5(1), L 140/16-L 140/62. 

Garcia-Perez, M., Shen, J., Wang, X. S., & Li, C. Z. (2010). Production and fuel properties of fast pyrolysis 
oil/bio-diesel blends. Fuel Processing Technology, 91(3), 296–305. doi: 10.1016/j.fuproc.2009.10.012 

Greenhalf, C. E., Nowakowski, D. J., Harms, A. B., Titiloye, J. O., & Bridgwater, A. V. (2013). A comparative 
study of straw, perennial grasses and hardwoods in terms of fast pyrolysis products. Fuel, 108, 216–230. 
doi: 10.1016/j.fuel.2013.01.075 

Hirani, A. H., Javed, N., Asif, M., Basu, S. K., & Kumar, A. (2018). A review on first-and second-generation 
biofuel productions. In Biofuels: Greenhouse Gas Mitigation and Global Warming (pp. 141–154). Springer. 

Hotelling, H. (1992). Relations Between Two Sets of Variates BT  - Breakthroughs in Statistics: Methodology 
and Distribution (S. Kotz & N. L. Johnson (eds.); pp. 162–190). New York, NY: Springer New York. doi: 
10.1007/978-1-4612-4380-9_14 

IEA (2019), World Energy Outlook 2019, IEA, Paris https://www.iea.org/reports/world-energy-outlook-2019. 
(2019). 

Iisa, K., French, R. J., Orton, K. A., Yung, M. M., Johnson, D. K., Ten Dam, J., Watson, M. J., & Nimlos, M. R. 
(2016). In Situ and ex Situ Catalytic Pyrolysis of Pine in a Bench-Scale Fluidized Bed Reactor System. 
Energy and Fuels, 30(3), 2144–2157. doi: 10.1021/acs.energyfuels.5b02165 

Jung, S., Kang, B., & Kim, J. (2008). Production of bio-oil from rice straw and bamboo sawdust under various 
reaction conditions in a fast pyrolysis plant equipped with a fluidized bed and a char separation system. 82, 
240–247. doi: 10.1016/j.jaap.2008.04.001 

281


Lamers, P., Roni, M. S., Tumuluru, J. S., Jacobson, J. J., Cafferty, K. G., Hansen, J. K., Kenney, K., Teymouri, 
F., & Bals, B. (2015). Techno-economic analysis of decentralized biomass processing depots. Bioresource 
Technology, 194, 205–213. doi: 10.1016/j.biortech.2015.07.009 

Littlejohns, J., Rehmann, L., Murdy, R., Oo, A., & Neill, S. (2018). Current state and future prospects for liquid 
biofuels in Canada. Biofuel Research Journal, 5(1), 759–779. 

Ly, H. V., Choi, J. H., Woo, H. C., Kim, S. S., & Kim, J. (2019). Upgrading bio-oil by catalytic fast pyrolysis of 
acid-washed Saccharina japonica alga in a fluidized-bed reactor. Renewable Energy, 133, 11–22. doi: 
10.1016/j.renene.2018.09.103 

Ly, H. V., Park, J. W., Kim, S. S., Hwang, H. T., Kim, J., & Woo, H. C. (2020). Catalytic pyrolysis of bamboo in 
a bubbling fluidized-bed reactor with two different catalysts: HZSM-5 and red mud for upgrading bio-oil. 
Renewable Energy, 149, 1434–1445. doi: 10.1016/j.renene.2019.10.141 

Mullen, C. A., Tarves, P. C., Raymundo, L. M., Schultz, E. L., Boateng, A. A., & Trierweiler, J. O. (2018). 
Fluidized Bed Catalytic Pyrolysis of Eucalyptus over HZSM-5: Effect of Acid Density and Gallium 
Modification on Catalyst Deactivation. Energy and Fuels, 32(2), 1771–1778. doi: 
10.1021/acs.energyfuels.7b02786 

Paasikallio, V., Lindfors, C., Kuoppala, E., Solantausta, Y., Oasmaa, A., Lehto, J., & Lehtonen, J. (2014). 
Product quality and catalyst deactivation in a four day catalytic fast pyrolysis production run. Green 
Chemistry, 16(7), 3549–3559. doi: 10.1039/c4gc00571f 

Razik, A. H. A., Khor, C. S., & Elkamel, A. (2019). A model-based approach for biomass-to-bioproducts supply 
Chain network planning optimization. Food and Bioproducts Processing, 118, 293–305. 

Williams, P. T., & Nugranad, N. (2000). Comparison of products from the pyrolysis and catalytic pyrolysis of rice 
husks. Energy, 25(6), 493–513. doi: 10.1016/S0360-5442(00)00009-8 

Zhang, H., Xiao, R., Huang, H., & Xiao, G. (2009). Comparison of non-catalytic and catalytic fast pyrolysis of 
corncob in a fluidized bed reactor. Bioresource Technology, 100(3), 1428–1434. doi: 
10.1016/j.biortech.2008.08.031 

Ziolkowska, J. R. (2020). Biofuels technologies: An overview of feedstocks, processes, and technologies. In 
Biofuels for a More Sustainable Future (pp. 1–19). Elsevier. 

 
282


	118delduca.pdf
	Application Of Multivariate Statistical Analysis For Pyrolysis Process Optimization