CHEMICAL ENGINEERING TRANSACTIONS  
 

VOL. 76, 2019 

A publication of 

 
The Italian Association 
of Chemical Engineering 
Online at www.aidic.it/cet 

Guest Editors: Petar S. Varbanov, Timothy G. Walmsley, Jiří J. Klemeš, Panos Seferlis 
Copyright © 2019, AIDIC Servizi S.r.l. 

ISBN 978-88-95608-73-0; ISSN 2283-9216 

Novel Parameter Estimation Method for Molecular 

Reconstruction of Naphtha by Gamma Distribution 

Yu Rena, Zuwei Liaob,*, Jingyuan Suna, Binbo Jianga, Jingdai Wangb,  

Yongrong Yangb, Qing Wuc 

aZhejiang Provincial Key Laboratory of Advanced Chemical Engineering Manufacture Technology, College of Chemical and 

 Biological Engineering, Zhejiang University, Hangzhou 310027, P. R. China  
bState Key Laboratory of Chemical Engineering, College of Chemical and Biological Engineering, Zhejiang University, 

 Hangzhou 310027, P. R. China  
cChina National Offshore Oil Corporation, Beijing 100010, P. R. China 

 liaozw@zju.edu.cn 

Molecular reconstruction is an effective tool to predict the detailed composition of complex mixtures under 

obtainable bulk properties and chemical details. With the introduction of statistical distribution, although the 

number of variables in the model is reduced, the ranges of statistical distribution parameters have a great 

influence on the prediction accuracy. A novel parameter estimation method using the analysis of the probability 

density plot of gamma distribution based on the basic available information is proposed to obtain parameter 

ranges with wide universality and non-specificity. The obtained ranges are used in a molecular reconstruction 

model to predict the bulk properties and molecular composition of naphtha. The model assumes molecular 

composition within each homologous series follows a gamma distribution against carbon number. The results 

of a naphtha sample show that the calculated bulk properties and molecular composition are in good agreement 

with the experimental values. 

1. Introduction 

Facing the increasingly serious energy crisis and environmental pressure, energy conservation and 

consumption reduction are important issues for the chemical industry. Other than process intensification tools 

(Yang et al., 2018), energy integration (Hong et al., 2018) and recovery (Li et al., 2019), system optimization 

(Hong et al., 2019) and integration (Lou et al., 2019), molecular management is an effective alternative tool to 

improve refinery efficiency. The basic idea of molecular reconstruction is to determine the detailed molecular 

composition of petroleum fractions through obtainable bulk properties and chemical details (Ren et al., 2019). 

Due to the complexity of petroleum composition, there are numerous variables in the model while available 

constraints are insufficient. To reduce the degree of freedom, the introduction of statistical distribution, which is 

an important feature of petroleum, has been widely adopted. Based on the Molecular type-homologous series 

(MTHS) matrix model (Peng, 1999), as shown in Figure 1, Wu and Zhang (2010) assumed that the compositions 

of the entries within each homologous series follow a gamma distribution with boiling point. The numbers of 

variables to represent molecular composition are reduced from the unknown fractions of each matrix entry to 

three parameters for the gamma distribution and one fraction of each homologous series. Therefore, the 

degrees of freedom are reduced dramatically. Liu (2015) followed the above method of using gamma distribution 

but improved the matrix form of the MTHS model by combining carbon numbers and pseudo-components in the 

matrix framework. Wang and Li (2017) used normal distribution instead of the gamma distribution to describe 

the statistical distribution within each homologous series in the proposed novel MTHS matrix. Recently, Cui et 

al. (2018) also used gamma probability density function to describe the fractions of species in each chemical 

family with a library containing 89 hydrocarbons and 81 heteroatom-containing species in their molecular 

representation of the petroleum gasoline fraction. 

 
                                                                                                                                                                 DOI: 10.3303/CET1976133 
 
 
Paper Received: 16/03/2019; Revised: 01/07/2019; Accepted: 02/07/2019 
Please cite this article as: Ren Y., Liao Z., Sun J., Jiang B., Wang J., Yang Y., Wu Q., 2019, Novel Parameter Estimation Method for Molecular 
Reconstruction of Naphtha by Gamma Distribution, Chemical Engineering Transactions, 76, 793-798  DOI:10.3303/CET1976133 
  

793


The introduction of statistical distribution results in reducing model variables. However, it brings a problem that 

the boundary of variables is difficult to determine during the optimization of the model. Noted that the tuning 

parameters for the optimization engine at this time are no longer the fractions of each molecule whose boundary 

fall within [0, 1], but the parameters of the statistical distribution whose boundary are wide enough to reach 

infinity (Cui et al., 2018). Therefore, reasonable ranges of parameters are critical to the accuracy of composition 

prediction. Wu and Zhang (2010) stated that to avoid the distribution between molecules within each 

homologous series being too narrow or too wide, the scale parameter α and the shape parameter β of gamma 

distribution are limited in the range of 1~20 and 19~100, respectively. However, the authors did not indicate how 

these ranges were determined. Wang and Li (2017) summarized the ranges of normal distribution parameters 

by analyzing the parameters of 16 naphtha and gasoline samples. These ranges were used to generate the 

initial value of distribution parameters, and the prediction calculation was optimized by the multi-population 

genetic optimization algorithm in this range too. By this way, the authors guaranteed the initial value and optimal 

value of distribution parameters obtained in the correct range, which could decrease the predicting error and 

calculating time largely. 

 
Figure 1: Molecular type-homologous series (MTHS) matrix (adapted from Wu and Zhang (2010)). 

In summary, the ranges of statistical distribution parameters and the estimation method are important in 

molecular reconstruction. If the initial ranges are obtained by parameter regression of existing molecular 

composition data of sample oils, it will result that a set of parameters can only be applied to specific oils. In order 

to solve the parameter estimation problem and ensure that the parameters ranges obtained have wide 

universality and are not specific to any oil characteristics, a novel complete parameter estimation procedure is 

proposed in this article. The pre-set ranges of parameters are determined from the analysis of the probability 

density plot of gamma distribution based on the basic available information. Using the range of parameters 

obtained above, the molecular reconstruction of a naphtha sample shows that the accuracy of the calculated 

bulk properties and molecular composition is within an acceptable range. 

2. Parameter estimation 

The gamma distribution function gampdf(x; α, β)is a two-parameter family of continuous probability distributions. 

The parameter α is called shape parameter, and the parameter β is called scale parameter:  

gampdf(x; α, β) =
𝑥𝛼−1𝑒 −𝑥 𝛽⁄

𝛽𝛼Γ(α)
     𝑓𝑜𝑟 𝑥 > 0 𝑎𝑛𝑑 𝛼, 𝛽 > 0  (1) 

Where 𝑥 is a random variable; Γ(α) is the gamma function. 

 
Figure 2: Probability density plots of gamma distributions. 

C0
C1
C2
.

.

.

.

C11
C12

.

.

.

.

C23
C24

.

.

.

nP iP nO iO N5 N6 2N 3N+ 1A 1A1N 1A2N 2A 2A1N 2A2N 3A 3A1N 4A 5A+ SI SII SIII SIV SV NI NII

Columns represent homologous series

Rows represent carbon numbers

Mass or molar fraction of molecule Sulphur and nitrogen

compounds

nP: normal Paraffins

iP: iso-Paraffins

O: Olefins

N: Naphthenes

A: Aromatics

S: mercaptans

SII: thiophenes & benzothiophenes

SIII: dibenzothiophenes not substituted

 at the 4 or 6 position

SIV: dibenzothiophenes substituted

 at the 4 or 6 position

SV: dibenzothiophenes substituted

 at both the 4 or 6 position

NI: basic nitrogen compounds

NII: non-basic nitrogen compounds

794


Figure 2 shows the probability density plots of gamma distributions. As can be seen from Figure 2, the 

parameters α and β simultaneously control the shape of the gamma distribution. According to the practical 

issues of molecular reconstruction, the shape of probability density plots of gamma distributions can be restricted 

according to the characteristics of the model. In this paper, it is assumed that the molecular composition within 

each homologous series follows a gamma distribution against carbon number, e.g., the random variable 𝑥 in 

Eq(1) stands for the carbon number of each matrix entry. Therefore, according to the common knowledge of 

naphtha composition distribution, the following constraints can be obtained. 

α𝑗 > 1  (2) 

β𝑗>0 (3) 

gampdf(CN𝑗
𝑚𝑖𝑛; α𝑗 , β𝑗 ) > 0  (4) 

gampdf(CN𝑗
𝑚𝑎𝑥; α𝑗 , β𝑗 ) > 0  (5) 

CN𝑗
𝑚𝑖𝑛 <(α𝑗 − 1)β𝑗<CN𝑗

𝑚𝑎𝑥   (6) 

gampdf((α𝑗 − 1)β𝑗 , α𝑗 , β𝑗 ) < 1  (7) 

Eq(2) and Eq(3) are the lower limits of parameters α and β of the homologous series j, respectively. Only when 

α>0, the gamma distribution is a unimodal function, which satisfies the requirements of the model. Eq(4) and 

Eq(5) control that the function values of the starting point (CN𝑗
𝑚𝑖𝑛 ) and ending point (CN𝑗

𝑚𝑎𝑥 ) of each plot must 

be larger than 0. These constraints are to ensure that gamma distribution is effective in fitting molecular 

composition without too many zeros between the range of minimum carbon number (CN𝑗
𝑚𝑖𝑛 ) and maximum 

carbon number (CN𝑗
𝑚𝑎𝑥 ). (α𝑗 − 1)β𝑗 is the extreme point obtained by deriving the gamma distribution function. 

Eq(6) defines that the peak position of gamma distribution must fall within the range of maximum and minimum 

carbon number in order to avoid monotonous increase or decrease in the studied range. Eq(7) indicates that 

the maximum value of gamma distribution does not exceed 1, to match the fact that the molecular fractions in a 

homologous series are between 0 and 1. The carbon number range for the five homologous series divided in 

this paper is as follows: C4~C12 for n-Paraffins and iso-Paraffins, C4~C7 for Olefins, C5~C11 for Naphthenes, 

C6~C11 for Aromatics. 

 
Figure 3: Flowsheet of parameter estimation. 

Initial range of parameters α  and β 

α:  [1, 10
5
], β: [0, 10

5
]

Minimize α  by genetic algorithm

Constraints

Equation(4)~(7)

Determine the minimum  of α  

and update the initial range

Maximize α  by genetic algorithm

Determine the maximum of α  

and update the initial range

Minimize β by genetic algorithm

Determine the minimum of β   

and update the initial range

Maximize β by genetic algorithm

Get  the  final range of 

a set of  α and β 

795


The process of parameter estimation is shown in Figure 3. The genetic algorithm is used to solve this nonlinear 

programming problem. In the above constraints, there is only a lower limit of the parameters α and β. To 

reasonably control the solution space of the model, the upper limit is set to 105. Based on such initial range, the 

objective function is to minimize the parameter α firstly under the above constraints, e.g., Eq(4) ~ (7). The 

optimization process is repeated 20 times and the minimum value of α is taken as the lower limit of parameter 

α. Then, the obtained minimum of α is used to replace the lower limit of α in the initial range, thereby realizing 

the update of the initial range. Subsequently, the optimization processes of the maximum of α, the minimum of 

β, and the maximum of β are sequentially performed. The operation of each optimization process is the same. 

Finally, the pre-set ranges of gamma distribution parameters α and β of each homologous series as shown in 

Table 1 are obtained. 

Table 1: The pre-set range of parameters α and β of the gamma distribution. 

Parameters Lower limit Upper limit 

𝛼𝑛𝑃   1.04 87.00 

𝛽𝑛𝑃   0.05 118.61 

𝛼𝑖𝑃  1.04 87.01 

𝛽𝑖𝑃   0.05 118.61 

𝛼𝑂  1.05 96.02 

𝛽𝑂   0.05 99.74 

𝛼𝑁  1.05 106.39 

𝛽𝑁   0.05 100.79 

𝛼𝐴  1.06 115.63 

𝛽𝐴  0.05 99.33 

3. Mathematical model 

The formulas to describe the assumption gamma distribution are as follows.  

𝑦𝑖,𝑗 =
𝐶𝑁

𝑖,𝑗

𝛼𝑗−1
𝑒

−𝐶𝑁𝑖,𝑗 𝛽𝑗⁄

𝛽𝑗
𝛼𝑗 Γ(𝛼𝑗)

  (8) 

𝑥𝑖,𝑗 = 𝑋𝑗
𝑦𝑖,𝑗

∑ 𝑦𝑖𝑖,𝑗𝑖,,𝑖
  (9) 

where  𝑦𝑖,𝑗  in Eq(8) stands for the fraction of the entry i, j within the j homologous series; 𝐶𝑁𝑖,𝑗 is the carbon 

number of the entry i, j; 𝑥𝑖,𝑗  in Eq(9) stands for the mass fraction of the entry i, j in the whole matrix; 𝑋𝑗  is the 

total mass fraction of the homologous series j. 

obj = ∑ (𝑤2,𝑇
𝑉𝑇

𝑒𝑥𝑝
−𝑉𝑇

𝑐𝑎𝑙

𝑉
𝑇
𝑒𝑥𝑝 )

2 + ∑ (𝑤3,𝑗
𝐹

𝑗
𝑒𝑥𝑝

−𝐹𝑗
𝑐𝑎𝑙

𝐹
𝑗
𝑒𝑥𝑝 )

2 + ∑ (𝑤1,𝑃
𝑃𝑒𝑥𝑝−𝑃𝑐𝑎𝑙

𝑃𝑒𝑥𝑝
)2𝑃𝑗∈𝑃𝐼𝑂𝑁𝐴𝑇   (10) 

where obj is the objective function, T is the distillation curve temperature, V is the volume fraction at the 

corresponding distillation curve temperature T, F stands for n-Paraffins, iso-Paraffins, Olefins, Naphthenes, and 

Aromatics fractions, P stands for other available bulk properties, and w is the weight factor.  

𝐶𝑁𝐶8,𝐴 ≤ (𝛼𝐴 − 1)𝛽𝐴 ≤ 𝐶𝑁𝐶9,𝐴  (11) 

Eq(11) is a constraint according to refinery reality to constrain the fraction of C8 or C9 for naphtha fraction is the 

maximum within the aromatic homologous series (Wu and Zhang, 2010). 

Particle swarm optimization (PSO) is used to solve the above mathematical model. The whole calculation 

processes including parameter estimation are executed by MATLAB R2016b on a PC constituted by Intel Xeon 

E31230 processor 3.20 GHz. 

4. Result and discussion 

A naphtha sample is used to verify the accuracy of the above mathematical model with the obtained parameters 

ranges in Table 1. The available bulk properties including molecular weight, special-density, PIONA weight 

fractions and ASTM D 86 distillation data are shown in Table 2. 

Table 3 shows the comparison of experimental and calculated values of molecular weight and special-density. 

The relative errors of these two properties are both less than 1.2 %, indicating that the calculated values agree 

well with the experimental values. Figure 4 shows the comparison between experimental and calculated data of 

796


simulated distillation curve. It can be seen that after 20 % distillation volume, the simulated distillation curve 

agrees well with the true boiling point distillation curve with the relative error of less than 5 %. 

Table 2: Bulk properties data of the naphtha sample. 

Bulk properties 

Molecular weight 101.32 Distillation (ASTM D 86)℃ 

IBP 33.04 

Special-density [kg / m3] 715.85 5 % 51.15 

10 % 71.28 

n-Paraffins (wt %) 28.89 20 % 80.01 

30 % 88.75 

iso-Paraffins (wt %) 39.59 40 % 94.75 

50 % 100.75 

Olefins (wt %) 0.26 60 % 110.98 

70 % 121.21 

Naphthenes (wt %) 23.54 80 % 138.5 

90 % 155.78 

Aromatics (wt %) 7.72 95 % 174.75 

FBP 189.41 

Table 3: Comparison of experimental and calculated properties. 

Properties Experiment Calculation Relative error 

Molecular weight 101.32 102.53 -1.19 % 

Special-density kg / m3 715.85 717.02 -0.16 % 

Figure 4: Comparison between experimental and calculated data of simulated distillation curve. 

-20

0

20

40

60

80

100

120

140

160

180

200

220

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Te
m

p
e

ra
tu

re
 ℃

Cumulative volume fraction

experimental

calculted

n-Paraffins iso-Paraffins

Naphthenes Aromatics

0

2

4

6

8

10

12

14

C4 C5 C6 C7 C8 C9 C10 C11 C12

W
e

ig
h

t 
fr

a
ct

io
n

%

Carbon number

experimental

calculated

0

2

4

6

8

10

12

C4 C5 C6 C7 C8 C9 C10 C11 C12

W
e

ig
h

t 
fr

a
ct

io
n

%

Carbon number

experimental

calculated

0

1

2

3

4

5

6

7

8

C5 C6 C7 C8 C9 C10 C11

W
e

ig
h

t 
fr

a
ct

io
n

%

Carbon number

experimental

calculated

0

0.5

1

1.5

2

2.5

3

C6 C7 C8 C9 C10 C11

W
e

ig
h

t 
fr

a
ct

io
n

%

Carbon number

experimental

calculated

797

Figure 5: Experimental values and calculated values of the molecular composition of the naphtha sample. 


The predicted results of molecular composition are shown in Figure 5. The figure shows the comparison of the 

experimental and calculated molecular composition of n-Paraffins, iso-Paraffins, Naphthenes and Aromatics. 

Olefins are neglected for their low content. The mean absolute differences (abbreviated as MAD,  MAD =
1

𝑛
∑ |𝑥𝑖

𝑐𝑎𝑙 − 𝑥𝑖
𝑒𝑥𝑝

|𝑛𝑖=1 ) of the calculated mass fractions of all components is 1.1, which is within an acceptable range. 

The prediction results for aromatics are more accurate due to the additional constraint on the peak position. 

Although there are errors in the specific content, an approximate molecular composition simulation result is 

obtained. To some extent, the prediction accuracy of sample oils with compositional distributions more in line 

with the statistical law of the gamma distribution will be higher. 

5. Conclusions

A novel parameter estimation method is proposed for the molecular reconstruction model with the statistical 

distribution introduced. The model is based on the Molecular type-homologous series (MTHS) matrix method 

where the molecular composition within each homologous series is assumed to follow a gamma distribution 

against carbon number. The pre-set ranges of gamma distribution parameters are calculated from the analysis 

of the probability density plot of gamma distribution based on the basic available information, so that the obtained 

parameter ranges have wide universality and are not specific to any oil characteristics. A naphtha sample is 

used to verify the model with the obtained parameters range calculated by the proposed parameter estimation 

method. The relative errors between experimental and calculated values of molecular weight and special-density 

are less than 1.2 %, and that of the simulated distillation curve is less than 5 % after 20 % distillation volume. 

The accuracy of the calculated mass fractions of molecular composition is also within an acceptable range with 

the MAD of 1.1. This work proves that the approximate bulk properties and molecular composition prediction 

could also be obtained without parameter regression but only through analyzing the features of the statistical 

distribution plot. In order to obtain better prediction accuracy, more implicit graphical features need to be 

excavated. 

Acknowledgments 

The financial support provided by the Project of National Natural Science Foundation of China (21822809 & 

61590925), the National Science Fund for Distinguished Young (21525627) are gratefully acknowledged. 

References 

Cui, C., Billa, T., Zhang, L., Shi, Q., Zhao, S., Klein, M.T., Xu, C., 2018, Molecular Representation of the 

Petroleum Gasoline Fraction, Energy and Fuels, 32, 1525-1533. 

Hong, X., Liao, Z., Lou, Y., Sun, J., Jiang, B., Wang, J., Yang, Y., 2018, Transshipment type model for total site 

heat integration using non-isothermal utility loop, Chemical Engineering Transactions, 70, 1135-1140. 

Hong, X., Liao, Z., Sun, J., Jiang, B., Wang, J., Yang, Y., 2019, Transshipment type heat exchanger network 

model for intra- and inter-plant heat integration using process streams, Energy, 178, 853-866. 

Li, H., Liao, Z., Sun, J., Jiang, B., Wang, J., Yang, Y., 2019, Modelling and simulation of two-bed PSA process 

for separating H2 from methane steam reforming, Chinese Journal of Chemical Engineering, Accepted. 

Liu, L., 2015, Molecular characterisation and modelling for refining processes, PhD Thesis, University of 

Manchester Institute of Science and Technology, Manchester, UK. 

Lou, Y., Liao, Z., Sun, J., Jiang, B., Wang, J., Yang, Y., 2019, A novel two-step method to design inter-plant 

hydrogen network, International Journal of Hydrogen Energy 44, 5686-5695. 

Ren, Y., Liao, Z., Sun, J., Jiang, B., Wang, J., Yang, Y., Wu, Q., 2019, Molecular reconstruction: Recent progress 

toward composition modeling of petroleum fractions, Chemical Engineering Journal, 357, 761-775. 

Van de Vijver, R., Devocht, B.R., Van Geem, K.M., Thybaut, J.W., Marin, G.B., 2016, Challenges and 

opportunities for molecule-based management of chemical processes, Current Opinion in Chemical 

Engineering, 13, 142-149. 

Wang, K., Li, S., 2017, Modified molecular matrix model for predicting molecular composition of naphtha, 

Chinese Journal of Chemical Engineering, 25, 1856-1862. 

Wu, Y., Zhang, N., 2010, Molecular Characterization of Gasoline and Diesel Streams, Industrial & Engineering 

Chemistry Research, 49, 12773-12782. 

Yang, Y., Sun, J., Huang, Z., Hu, D., Liao, Z., Yang, Y., Wang, J., 2018, Dynamic and Steady-State 

Characterization of the Liquid Spray Zone in an Externally Heated Gas-Solid Fluidized Bed, Industrial and 

Engineering Chemistry Research, 57, 2988-3001. 

798