CHEMICAL ENGINEERING TRANSACTIONS  
 

VOL. 61, 2017 

A publication of 

 
The Italian Association 

of Chemical Engineering 
Online at www.aidic.it/cet 

Guest Editors: Petar S Varbanov, Rongxin Su, Hon Loong Lam, Xia Liu, Jiří J Klemeš
Copyright © 2017, AIDIC Servizi S.r.l. 
ISBN 978-88-95608-51-8; ISSN 2283-9216 

Just-in-time Modeling with a Combination of Input and Output 
Similarity Criterions for Soft Sensor Modeling in Fermentation 

Processes 
Congli Mei*, Yao Chen, Hui Jiang, Yuhan Ding, Xu Chen, Guohai Liu 
Jiangsu University, Zhenjiang, 212013, Jiangsu, China 
clmei@ujs.edu.cn. 

Just-in-time learning (JITL) has been used to construct soft sensor models online for its ability of handling 
strong nonlinearity and changes in processes. The most key procedure in JITL modelling is selecting relevant 
samples similar to a query sample. However, the common similarity criterions used to select relevant samples 
do not always function well for only considering the similarity of input data. Large noise or outliers in output 
data may result in inappropriate predictions of JITL based soft sensors. In this work, a combination of similarity 
measures, the conventional similarity of input and a novel similarity of output, is proposed for comprehensively 
understanding and selecting relevant samples. The effectiveness of the proposed soft sensor is demonstrated 
through an industrial fed-batch Erythromycin fermentation process. 

1. Introduction 

In process plants, soft sensors become popular to estimate those variables difficult to measure online. 
Compared to mechanism models, data-driven soft sensors are more popular in recent years (Kadlec et al., 
2009), e.g. artificial neural network (ANN) (Pani et al., 2013), principle component regression(PCR) (Ge et al., 
2014), partial least-squares(PLS) (Wang et al., 2015), support vector machine(SVM) (Jin et al., 2015b), and 
Gaussian process regression(GPR) (Jin et al., 2015a). Those soft sensors relied on offline modelling using the 
recorded historical data. However, in order to guarantee the success of the offline soft sensors, there are 
several conditions should be fulfilled. Most critically, the historical data should contain all possible future states 
and conditions of the process. Even if the collected data contains all the required states, another difficulty is 
the model type, and parameters, in such a way that the model can comprehend all the different conditions. 
This results in high model complexity, which in turn demand large number of data for the model development, 
and most processes are existing some kind of time-varying behaviour and that requires a strategy for online 
adaptation. 
Just-in-time learning (JITL) is useful to cope with such kind of situation and have attracted extensive attention 
in process modelling and soft sensor development (Fan et al., 2014). However, there are still some practical 
challenges. The first key issue is to establish an appropriate similarity criterion for selecting relevant historical 
samples. Generally, distance between query sample and historical sample is usually used to design similarity 
criterions, such as Mahalanobis distance (Nakabayashi et al., 2010) and Euclidian distance (Ito et al., 2004). 
According to Liu et al. (2012), only utilization of the distance for description of the similarity is not 
comprehensive, then a new similarity criterion was proposed to select samples adopting the distance and 
angle between two samples and it showed better performance than only based on distance. However, 
correlations among variable are neglected in the above mentioned two similarity criterions. Consequently, 
some good data may not be selected. To this end, correlation based similarity criterion was proposed by 
Fujiwara et al. (2012). But it was pointed out that it is difficult to obtain optimum parameters of the correlation 
based criterion (Saptoro, 2014). Besides, it should be noticed that large noise or outliers always exist in 
process data. Empirically, those in historical input data predefined in the process can be identified easily. 
However, those in historical output data are difficult to be detected because of complex process nonlinearity. 

                               
DOI: 10.3303/CET1761172

 
Please cite this article as: Mei C., Chen Y., Jiang H., Ding Y., Chen X., Liu G., 2017, Just-in-time modeling with a combination of input and 
output similarity criterions for soft sensor modeling in fermentation processes, Chemical Engineering Transactions, 61, 1045-1050  
DOI:10.3303/CET1761172  

1045


In a word, the existing similarity criterions do not consider the quality of output data in samples. It means that 
large noise or outliers in output data would result in inappropriate estimates (Saptoro, 2014). 
In this study, a novel combined similarity criterion is proposed for JITL based soft sensor modelling. In the 
proposed similarity combination of input and output (SCIO), the typical distance and angle based similarity 
criterion is still used and a new similarity criterion of output is designed based on the idea of membership 
functions in clustering algorithms (Yong feng et al., 2008). The effectiveness of the proposed SCIO for JITL 
modelling method through an industrial fed-batch Erythromycin fermentation process. 

2. Similarity combination for relevant sample selection and soft sensor modelling 

In the JITL modelling, it is crucial to construct similarity criterion for selecting suitable training samples 
matching the query sample. It is nature to use distance to describe similarity between two data points. The 
study of Saptoro (2014) gives few variants of distance based similarity criterions: Eucledian distance based, 
weighted Eucledian distance based and Mahalanobis distance based. The needed JITL based soft sensor is 
constructed with the relevant data selected by using similarity criterion. However, it was pointed out that the 
distance-based similarity criterion, commonly utilized in JITL is not comprehensive because of only 
considering the distance of input. The distance and angle based similarity criterion was proposed to describe 
similarity of input comprehensively. Now, it is popular in the field of JITL modelling.  
The distance and angle based similarity criterion between the query data point and historical data point  is 
defined as follows(Liu et al. 2012) 

( ) ( ) ( )= − + −qi qi qiS ω d ω θexp 1 cos  (1) 

( ) 1,≥ = For  cos 0,     qiθ i k   

= −qi i qd x x 2  (2) 

( ) ( )= 2 2cos ,qi i q i qθ x x x x   (3) 
Where qid  and ( )cos qiθ  are the distance similarity and the angle similarity between qx  and ix  respectively. 

( )≤ ≤ω ω0 1 is a weight parameter, and only distance similarity (or angle similarity) is adopted when = 1ω  (or
= 0ω ). The value of qiS  is bounded between 0 and 1, and when qiS  approaches 1, qx closely resembles ix . 

It should be noticed that Eq. (1) cannot be used to compute the similarity qiS  between qx  and ix  if ( )cos qiθ  
is negative. 
It can be seen that only similarity of input is considered in the distance and angle based similarity criterion. It 
was pointed out that those existing similarity criterions only considering similarity of input can be badly 
influenced by large noise or outliers (Saptoro, 2014). In fact, input is always predefined in a process, in which 
large noise and outliers are easy to be identified. However, the quality of output cannot be judged easily for 
lacking predefined references. With large noise or outliers in output, selected relevant samples used for JITL 
modelling with conventional similarity criterions may result in inappropriate predictions.  
For selected relevant samples by using the distance and angle based similarity criterion, it can be assumed 
that most of them are appropriate and close to each other. It means few samples with large noise or outliers 
are far to most of the selected samples. A distance based index was designed to evaluate the quality of 
output, which is described as follows 

= −yi yiS' d'exp( )  (4) 

( )
≠

= −
L'

yi i k
k k i

d' y y
2

=1,  
(5) 

Where iy  and ky are outputs of the ith and the kth selected relevant samples with a conventional similarity 

criterion respectively. It is obvious that the values of yiS'  corresponding to the samples with large noise or 

outliers must be much smaller than those normal selected relevant samples.  

1046


To combine similarity of input and output and avoid inappropriate selected samples, a similarity combination 
can be defined as 

( ) ( ) ( )= qi yih i S i S' i  (6) 
The Eq.6) can be interpreted that only those samples with similar inputs to the query sample and without 
larger noise or outliers in outputs can achieve high values of ( )h i   The detailed steps of JITL soft sensor 
modeling based on the proposed combination Eq(6) are list as follows: 
(1) Set the value of ω   and relevant sample size ′L . 
(2) For a new query sample, use Eq(1) -( 3) to calculate the similarity qiS . 

(3) Sort qiS  in descending order and choose the first ′L   relevant samples.       

(4) For the ′L  relevant samples, use Eq(4) - (6) to calculate ( )h i . 
(5) Sort ( )h i  in descending order, then select first ′<( )L L relevant samples for JITL modelling. 
(6) After outputting predictions related to the query samples, discard the JITL model. 

3. Case study 

In this study, GPR is used to construct JITL based soft sensors for its advantages of less parameters, easily 
optimizing and modelling uncertainty (Rasmussen and Williams, 2006). For comparisons, two above-
mentioned conventional similarity criterions, i.e. the Mahalanobis distance (MD) based similarity criterion and 
the distance and angle (DA) based similarity criterion, are also studied. To evaluate different methods, the 
estimation error between predictions and real values (Error) and the root-square error (RMSE) are used and 
defined as follows 

ˆ= −i iError y y  (7) 

( )ˆ= −=  i iy yRMSE
n

2

i 1

n

 
(8) 

Where ˆiy and iy  are the predicted and observed values respectively, n denotes the number of query samples, 
and i = 1,2,...,n .The RMSE values are commonly used to indicate prediction accuracy of soft sensors. 
For an Erythromycin fermentation process, biomass concentration plays a decisive role in the final product 
(Erythromycin) concentration. So, the primary way of ensuring product quality of Erythromycin is to control 
biomass concentration which can be affected by many process factors. In this example, 182 samples are 
selected as the query data, and 1274 samples are used as the database. The relevant samples to a query 
sample are selected for JITL modelling from the database. Every sample contains fifteen input variables and 
one output variable.  
After variable selection through the principal component analysis based method described by Shakil et al. 
(2009), five input variables, i.e. DO saturation, pH, Temperature, Agitator power, Aeration rate, are selected as 
secondary variables, and the output variable is biomass concentration. The data characteristics of the 
secondary variables and the primary variable are shown in Figure 1. From the figure, it can be easily seen that 
the process has strong nonlinear characteristics. The predicted outputs with different relevant sample 
selection methods are shown in Figure 2. Also, prediction errors of three methods for this case are shown in 
Figure 3. In here a query sample comes, L (=15) training samples are selected from the database for JITL 
based soft sensor modelling. From the figure, it can be easily seen that SCIO performs the best.  
The three criterions were studied for soft sensor modelling with varying L . In this case, ′ = + 5L L . The RMSEs 
for query samples with different similarity criterions and varying L  are shown in Figure 4. These results 
suggest that MD and DA based similarity criterions do not function very well, and the proposed SCIO can track 
the nonstationary behaviour of the process data more closely. In fact, inputs of a plant are always known in 
advance. However, the outputs of that depend on the performance of signal transfer. Therefore, it is easy to 
find out those incorrect inputs. But, great errors in outputs are difficult to be identified. It is necessary to 
consider the quality of output data for improving prediction performance of JITL based soft sensors. In our 
method, the similarity index is used to overcome the effects of those outputs with great errors or noises. 

L

1047


0 50 100 150
300

400

500

600

700

800

900

Samples

A
er

at
io

n 
ra

te
(L

/h
)

0 50 100 150
0

10

20

30

40

Samples

B
io

m
as

s 
co

nc
(g

/L
)

0 50 100 150
22

24

26

28

30

32

34

Samples

A
gi

ta
to

r P
ow

er
(W

)

0 50 100 150
30

40

50

60

70

80

Samples

D
O

 c
on

c(
%

 s
at

ua
tio

n)

0 50 100 150
6.8

7

7.2

7.4

7.6

7.8

Samples
P

H
0 50 100 150

31

32

33

34

35

Samples

Te
m

pe
ra

tu
re

(K
)

 
Figure 1: Data characteristics of the input and output variable in the fed-batch Erythromycin fermentation 
process.   

 
0 20 40 60 80 100 120 140 160 180

0

20

40
DA

B
io

m
as

s 
co

nc
(g

/L
)

 
Real
Predicted

0 20 40 60 80 100 120 140 160 180

0

20

40

Samples 

B
io

m
as

s 
co

nc
(g

/L
)

SCIO

 
Real
Predicted

0 20 40 60 80 100 120 140 160 180

0

20

40

B
io

m
as

s 
co

nc
(g

/L
)

MD

 
Real
Predicted

 
Figure 2: Prediction results with different similarity criterions in the fed-batch Erythromycin fermentation 
process. 

 
1048


0 20 40 60 80 100 120 140 160 180

-20

0

20
MD

E
rr

or

0 20 40 60 80 100 120 140 160 180

-20

0

20 DA

E
rr

or

0 20 40 60 80 100 120 140 160 180

-20

0

20 SCIO

E
rr

or

Samples
 

Figure 3: Prediction errors with different similarity criterions in the fed-batch Erythromycin fermentation 
process.  

10 15 20 25 30 35
2

3

4

5

6

7

8

L

R
M

S
E

 
MD
DA
SCIO

 
Figure 4: RMSEs with varying L in fed-batch Erythromycin fermentation process. 

4. Conclusions 

Conventional similarity criterions can only be used to describe the similarity between inputs of a query sample 
and a historical sample. However, large noise or outliers are often in outputs of historical samples. It may 
result in selecting inappropriate relevant samples by using conventional similarity criterions for JITL based soft 
sensor modelling. In this paper, a novel similarity combination is proposed for JITL based soft sensor 
modelling. In the criterion, conventional distance and angle based similarity criterion was combined with a 
novel quality criterion of output. After using the presented method, inappropriate samples with large noise or 
outliers can be avoided to be selected for JITL modelling. An industrial case is used to verify the proposed 
method. Results show that the proposed similarity combination performs better than the MD based similarity 
criterion and the distance and angle based similarity criterion.   

1049


Acknowledgments  

The authors gratefully acknowledge the financial support provided by Natural Science Foundation of Jiangsu 
Province of China (Grant No. BK20130531, BK20140538), the Priority Academic Program Development of 
Jiangsu Higher Education Institutions (Grant No. PAPD 6), the Graduate practical innovation Foundation of 
Jiangsu province (Grant No. SJLX16_0441). 

References 

Fan M., Ge Z., Song Z., 2014, Adaptive Gaussian Mixture Model-Based Relevant Sample Selection for JITL 
Soft Sensor Development, Industrial & Engineering Chemistry Research, 53, 19979-19986. 

Fujiwara K., Kano M., Hasebe S., 2012, Development of correlation-based pattern recognition algorithm and 
adaptive soft-sensor design, Control Engineering Practice, 20, 371-378. 

Ge Z., Huang B., Song Z., 2014, Nonlinear semisupervised principal component regression for soft sensor 
modeling and its mixture form, Journal of Chemometrics, 28, 793-804 

Ito M., Matsuzaki S., Odate N., Uchida K., Ogai H., Akizuki K., 2004. Large scale database online modeling for 
blast furnace, Control Applications, 2004, Proceedings of the 2004 IEEE International Conference on, 
IEEE, pp. 906-911. 

Jin H., Chen X., Wang L., Yang K., Wu L., 2015a, Adaptive soft sensor development based on online 
ensemble Gaussian process regression for nonlinear time-varying batch processes, Industrial & 
Engineering Chemistry Research, 54. 

Jin H., Chen X., Yang J., Zhang H., Wang L., Wu L., 2015b, Multi-model adaptive soft sensor modeling 
method using local learning and online support vector regression for nonlinear time-variant batch 
processes, Chemical Engineering Science, 131, 282-303. 

Kadlec P., Gabrys B., Strandt S., 2009, Data-driven soft sensors in the process industry, Computers & 
Chemical Engineering, 33, 795-814. 

Liu Y., Gao Z., Li P., Wang H., 2012, Just-in-Time Kernel Learning with Adaptive Parameter Selection for Soft 
Sensor Modeling of Batch Processes, Ind.eng.chem.res , 51(11), 4313–4327. 

Nakabayashi A., Nakaya M., Ohtani T., Chen D., Wang D., Li X., 2010, A process simulator based on hybrid 
model of physical model and just-in-time model, Proceedings of SICE Annual Conference, pp. 97-100. 

Pani A.K., Vadlamudi V.K., Mohanta H.K., 2013, Development and comparison of neural network based soft 
sensors for online estimation of cement clinker quality, ISA transactions, 52, 19-29. 

Rasmussen C.E., Williams C.K.I., 2006, Gaussian processes for machine learning, MIT Press, 14(481), 69-
106 

Saptoro A., 2014, State of the art in the development of adaptive soft sensors based on just-in-time models. 
Procedia Chemistry, 9, 226–234. 

Shakil M., Elshafei M., Habib M.A., Maleki F.A., 2009, Soft sensor for NOx and O2 using dynamic neural 
networks. Computers & Electrical Engineering, 35, 578–586. 

Wang Z.X., He Q.P., Wang J., 2015, Comparison of variable selection methods for PLS-based soft sensor 
modeling, Journal of Process Control, 26, 56-72. 

Yongfeng F.U., Hongye S.U., Zhang Y., Chu J., 2008, Adaptive soft-sensor modeling algorithm based on 
FCMISVM and its application in PX adsorption separation process, Chinese Journal of Chemical 
Engineering, 16(5), 746-751.   

1050