Microsoft Word - 02Revised.doc


 CHEMICAL ENGINEERING TRANSACTIONS  
 

VOL. 55, 2016 

A publication of 

 
The Italian Association 

of Chemical Engineering 
Online at www.aidic.it/cet 

Guest Editors: Tichun Wang, Hongyang Zhang, Lei Tian
Copyright © 2016, AIDIC Servizi S.r.l., 
ISBN 978-88-95608-46-4; ISSN 2283-9216 

Classification and Detection of Adulteration in Olive Oil Using 
Improved Gaussian Mixture Model and Regression by 

Artificial Bee Colony Algorithm 
Xin Xiea, Ying Gaob, Weimin Shic, Qi Shenc* 
aHenan Polytechnic College, Zhengzhou, Henan, China 
bHenan Vocational College of Applied Technology, Kaifeng, Henan, China 
cThe College of Chemistry and Molecular Engineering, Zhengzhou University, Zhengzhou, Henan, China 
656271158@qq.com 

Gaussian mixture model (GMM) and Gaussian mixture regression (GMR) can be used to detect adulteration in 
extra virgin olive oil. The estimate of the GMM parameters is commonly obtained from the expectation-
maximization (EM) algorithm. EM algorithm has some limitations such as local optimum problems and 
sensitivity to the initial values. In this paper, artificial bee colony (ABC) algorithm is used to determine the 
optimal parameters in GMM and GMR. To improve the optimized performance and reduce computational 
effort of ABC algorithm, the information sharing mechanism among the global best food sources is introduced 
in ABC. The improved GMM and GMR by artificial bee colony algorithm (GMMRABC) were used to 
discriminate and quantify the adulteration of extra virgin olive oil with rapeseed oil using FT-IR spectroscopy. It 
has been demonstrated that the proposed method is an accurate, rapid, stable strategy for identifying and 
quantifying the extra virgin olive oil. 

1. Introduction 
Extra virgin olive oil is far more valuable and expensive than most other vegetable oils, so the adulterated 
olive oil has become the biggest source of agricultural fraud problems in the European Union (Rohman et al., 
2010). Edible oil adulteration may cause threat to food security. To ensure food safety for consumers, a set of 
effective identification and prosecution methods are needed.  
The classical methods for detection of extra virgin olive oil adulteration are often complicated with no single 
test that can accomplish the task. A battery of tests are employed to identify of the adulterant, such as the 
determination of free acidity, peroxide value, UV extinction, fatty acid composition, sterol composition, 
triglyceride composition, wax content and steroidal hydrocarbons. Commonly used methods for adulterated oil 
examination also include different chromatographic techniques (Miloudi et al., 2007), nuclear magnetic 
resonance method (Alonso-Salces et al., 2010), mid-and near infrared spectroscopic techniques (Wang et al., 
2006) and differential scanning calorimetry (Chiavaro et al., 2008). However, these methods are often time-
consuming and need high cost and cumbersome operations. Fourier transform infrared spectroscopy (FT-IR) 
(Rohman and Man, 2011) analysis has been widely used in oil adulteration detecting for its simple sample 
processing, fast analysis speed and allows for direct and fast determination of several compounds without 
sample pretreatment. As the spectral of different chemical components are overlaps severely, the FT-IR 
technique is often coupled with chemometrics methods such as principal component analysis (PCA) 
(Kamruzzaman et al., 2013), partial least squares (PLS) (Oussama et al., 2012), back-propagation artificial 
neural network (BP-ANN) (Ni et al., 2012), linear discriminant analysis (LDA) (Sinelli et al., 2007), and support 
vector machine (SVM) (Caetano et al., 2007). 
Gaussian mixture model (GMM) (Jacques et al., 2010) and Gaussian mixture regression (GMR) (Yuan et al., 
2014) can be used to model any data distribution using an adequate number of Gaussian distributions. GMM 
and GMR are widely used statistical tools in pattern classification and nonlinear regression. Determination of 
the optimal Gaussian distributions parameters is the most important consideration when training GMM and 

                               
DOI: 10.3303/CET1655059

 
Please cite this article as: Xie X., Gao Y., Shi W.M., Shen Q., 2016, Classification and detection of adulteration in olive oil using improved 
gaussian mixture model and regression by artificial bee colony algorithm, Chemical Engineering Transactions, 55, 349-354  
DOI:10.3303/CET1655059   

349


GMR. The estimate of these GMM parameters is commonly obtained from the expectation-maximization (EM) 
algorithm. As EM algorithm is essentially similar to hill-climbing in multiple parameter space, it has some 
limitations such as local optimum problem and sensitivity to the initial values. One method of determining the 
optimal parameters is to run the EM algorithm many times with different initial values and to select the best 
result which maximizes the expected log-likelihood of the data. Even with many times operations, EM 
algorithm may also lead to overfitting for high-dimensional data. In order to overcome the above defects, some 
intelligent optimization algorithms (Lu et al., 2014) are introduced and used for the parameters estimation in 
GMM.  
Artificial bee colony (ABC) algorithm (Li et al., 2015) is one of the relatively new optimization algorithms which 
motivated by the intelligent behavior of honey bees. ABC can also be applied in parameters estimation in 
GMM. With few parameters, ABC is simple to implement and robust compared to other intelligent algorithms 
(Devos and Duponchel, 2013). It does not need to know about special information for solution of the problem. 
ABC highlights the global optimum in the whole group via each bee’s behavior of local optimization, so it has 
faster convergence speed. In this paper, ABC was used to determine the optimal parameters in GMM and 
GMR. To improve the optimized performance and reduce computational effort of ABC algorithm, the 
information sharing mechanism among the global best food sources was introduced in ABC. The improved 
GMM and GMR by artificial bee colony algorithm (GMMRABC) were used to discriminate the adulteration of 
extra virgin olive oil with rapeseed oil using FT-IR spectroscopy. The GMMRABC was also applied to quantify 
the level of rapeseed oil adulterant present. As comparison, the classic GMM and GMR with EM for parameter 
estimation, LDA, back-propagation artificial neural network (BP-ANN) and PLS analysis were also used to 
classify and quantify these oil samples. It has been demonstrated that the proposed method is an accurate, 
rapid, stable strategy for identifying and quantifying the extra virgin olive oil. 

2. Methods 
2.1 Gaussian mixture model and Gaussian mixture regression 
For the adulterated oil examination problems, GMM assumes that pure oil and adulterated oil can be 
represented by different Gaussian density and the dataset consists of two underlying Gaussian probability 
distributions. GMM expresses probability density function of the data (FT-IR spectra) by a weighted sum of K 
unimodal Gaussian component densities (K=2 in this paper). Each unimodal Gaussian component density is 
parameterized by a mean vector and covariance matrix. The parameters of GMM are usually obtained from 
training data set using the EM algorithm. In the prediction stage, the likelihood function of the predicted 
samples belonging to each model is calculated. The predicted samples are assigned to the cluster with the 
largest likelihood function. 
To calculate the expectation of the level of adulterant, GMR builds several GMM models to obtain the joint 
probability density of X and Y. In this paper, X is the FT-IR data and Y is the level of adulterant. Each GMM is 
similarly determined by the mean and covariance matrix. The conditional expectation of the response output 
can be calculated by the mean and covariance matrix. The contribution of each GMM is represented using a 
mixing weight. The predicted level of adulterant in extra virgin olive oil is the summation of the weighted 
conditional expectations of all GMMs. 

2.2 Artificial bee colony algorithm 
ABC algorithm is a novel optimization algorithm inspired by the foraging behavior of honeybees. In ABC 
algorithm, the position of a food source represents a solution to the optimization problem and the nectar 
amount of a food source represents the quality (fitness) of the solution represented by that food source. There 
are three kinds of bees: employed bees, onlooker bees and scout bees. The number of employed bees or the 
onlooker bees is equal to the number of food sources.  
At first, initial solutions as food source positions are generated randomly. After initialization, the three 
categories of bees repeat the seeking good food sources process. The process of bees seeking good food 
sources is that which is used to find the optimal solution. Each employed bee generates a new food source vi 
from the old solution xi and the neighborhood of its previously position xk as follows:  

)( ,,,,, jkjijijiji xxxv −+= φ                                                                                                                           (1) 
Where k and j are randomly chosen indexes; k has to be different from i; φi,j is a random number in the range 
[-1,1]. Once the new solution vi is obtained, a greedy selection operation is applied to compare vi and the old 
solution xi. If the fitness of vi is equal to or better than that of xi, the new solution vi will replace the old one. 
Otherwise, the old solution xi is retained. When all the employed bees have finished their exploitation process, 
they share their obtained food source information with onlooker bees. 

350


Each onlooker bee chooses a food source according to the roulette wheel selection method. Then in a similar 
way to the employed bees, the onlooker bee generates a new candidate food source. The new candidate 
solution is evaluated and selected by the greedy selection mechanism. 
If a food source is not improved by a predetermined number of iterations, it is abandoned and the employed 
bee associated with the food source becomes a scout. The scout randomly generates a new food source, and 
then becomes an employed bee again. The process of bees seeking good solutions will be repeated until a 
stopping criterion is satisfied. 
In standard ABC algorithm, employed bees and onlooker bees update the food source using neighborhood 
operator. This operator may lead to the weak ability of local exploitation and the poor speed of converging. To 
improve the optimized performance of ABC algorithm, the information sharing mechanism among the global 
best food sources is introduced. 
The employed bee generates a new food source vi as:  

2)5.0()( ,,, ∗−∗−+= ijjjijiji rGBestxxv                                                                                              (2) 
Where GBest is the global optimal solution found so far. rij is a random number in the range [0,1]. 

2.3 Improved Gaussian mixture model and regression by artificial bee colony 
The mean vectors and covariance matrixes are the most important considerations when training GMM or 
GMR. The ABC optimization algorithm is an efficient scheme to obtain the parameters of GMM. To simplify the 
calculation processes, the covariance matrix was constrained to be diagonal matrix. In ABC, real number 
strings were adopted to code all the particles. Each real number coded string stands for the parameters of 
GMM, a set of mean and covariance. The improved Gaussian mixture model and regression by artificial bee 
colony (GMMRABC) is described as follows.  
Step 1. Randomly initialize the initial population of solution vectors with an appropriate size.  
Step 2. Calculate the fitness function of each individual of the population. If the best object function of the 
generation fulfills the end condition, the training is stopped with the results output, otherwise, go to the next 
step. 
Step 3. Update the population using employed bees, onlooker bees and scout bees by applying the ABC. 
Step 4. Go back to the second step to calculate the fitness of the renewed population.  
In the GMMRABC algorithm, the misclassification rate and root-mean-square error were used as the fitness 
for classification and quantitation respectively. 

2.4 Samples 
Different brands of extra virgin olive oil samples and rapeseed oil samples used in the experiment were 
bought from the local market. The rapeseed oil was used as adulterants in this study. The volume fractions of 
the rapeseed oils in the extra virgin olive oils were at fifteen levels respectively, 1%, 2%, 3%, 4%, 5%, 10%, 
15%, 20%, 35%, 45%, 50%, 55%, 70%, 80%, 90%. FT-IR spectra of extra virgin olive oil samples and 
adulterated samples were obtained without any chemical pretreatment. A total of 28 extra virgin olive oil 
samples and 60 adulteration samples were randomly split into a training set consisting of 64 samples, and a 
test set of 24 samples. 

2.5 Instrumentation and Software 
All FT-IR spectra were collected on NICOLET 6700 FT-IR spectrometer (Thermo Electron Corporation), 
equipped with DTGS detector and KBr beam splitter. The spectra were recorded in the range of 4000 to 650 
cm-1 with resolution of 4 cm-1 using ZnSe single-bounce attenuated total reflectance (ATR) accessory. All 
spectra were recorded at 25℃ using an average of 32 scans. The spectra data was scaled into (0, 1) for 
chemometrics analysis. 
The FT-IR spectra data was then processed using different algorithms written in Matlab 7.8 and run on a 
personal computer. The algorithms used in this paper include the proposed GMMRABC and GMM with EM for 
parameters estimation, BP-ANN, LDA and PLS. 

3. Results and discussion 
3.1 Classification 
FT-IR spectra of the extra virgin olive oil samples and adulterated oil samples were shown in Figure 1. The 
infrared absorption spectra of these oil samples are similar and appear virtually indistinguishable. It is difficult 
to classify and quantify these olive oil samples, so the proposed GMMRABC was introduced into olive oil 
sample analysis. 

351


For the 28 extra virgin olive oil samples and 60 adulterated oil samples, two-thirds samples were randomly 
selected as training set and the remaining samples as the prediction set. The training set consists of 19 extra 
virgin olive oil samples and 40 adulterated oil samples, the remaining 29 samples were the predicted samples. 

                    
Figure 1: FT-IR spectra (4000-650 cm-1) of pure      Figure 2: Convergence curves for GMMRABC 
olive oil samples and adulterated oil samples 

When deal with IR spectral classification, the number of samples is small compared with the number of 
variables (experimental points cm-1 per spectrum). This may deteriorate the generalization ability of the 
learned model and lead to possible overfitting in GMM and GMR analysis. As principal component analysis 
(PCA) is the most popular linear feature extraction method, we firstly conducted variable selection by PCA. In 
order to find the optimal number of principal components (PCs), we increased the selected PC one by one, 
and then the performance of the selected PCs was measured according to an averaged classification error 
rate over 5-fold cross-validation by the proposed GMMRABC. The misclassification rate was minimum when 
the number of PCs was two. The first two PCs explained for 90% of the variance. So the first two PCs with the 
largest eigenvalues were selected for GMMRABC to classify and quantitate the olive oil samples. 
In the proposed GMMRABC, ABC was used to determine the mean vectors and covariance matrixes. The 
population size of ABC was selected as 50 and the ABC was stopped after 100 iterations. The means and 
covariance matrixes were initialized randomly. The misclassification rates for training set and test set were 
0%. That’s to say, the GMMRABC model can correctly classify all the training and prediction samples by 
optimizing mean vectors and covariance matrixes. The convergence curve was showed in Figure 2. One can 
see that during the updated process, the fitness value decreased until about 20 generations and fitness values 
dropped quickly in the GMMRABC algorithm. 
To evaluate accurately the performance of GMMRABC, the total samples were randomly partitioned into 
training and test sets 50 times and then the misclassification rates were averaged. The average 
misclassification rates for training set and test set were 0% and 1.79% respectively. Among the 50 times 
operations, the misclassification rate for prediction set was 0% for 30 times, the largest misclassification rate 
was 6.90%. It means that the number of misclassified sample was two at most. The result shows that the 
GMMRABC model is stable and reliable, even for the adulteration content of 1%. 
To compare with the proposed GMMRABC, the classic GMM with EM for parameters estimation, BP-ANN and 
LDA were also performed on the extra virgin olive oil spectral data. The first two PCs with the largest 
eigenvalues were selected for these methods. In order to keep consistent with the proposed algorithm, the 
total samples were also randomly partitioned into training and test sets 50 times and then the misclassification 
rates were averaged. Using GMM-EM algorithm, the averaged misclassification rates for training samples and 
prediction samples were 6.85% and 7.38% respectively. Compared with the GMM-EM, it proves that our 
proposed GMMRABC algorithm has a better performance and less depends on the initial value. It can be seen 
that the parameters of GMM optimized by the ABC algorithm is an efficient scheme and it can improve the 
performance of GMM. In the present work, three layer ANN was used with 10 hidden nodes for 2 variables. 
BP-ANN was stopped after 500 iterations. Results of BP-ANN and LDA were listed in Table 1. The 
performance of the GMMRABC was better than that of the other classification methods. 

Table 1: Results of misclassification rates of different methods. 

Method Misclassification rate 
Training set Test set 

GMMRABC 0% 1.79% 
GMM-EM 6.85% 7.38% 
BP-ANN 4.81% 5.79% 
LDA 6.71% 7.03% 

352


3.2 Quantification 
The GMMRABC model was used to predict the concentration of adulteration of extra virgin olive oil with 
rapeseed oil. The variable selection and optimized process in the quantification were the same as in the 
classification. The first two PCs with the largest eigenvalues were chosen for quantification. The correlation 
coefficient (R) and the root mean square error (RMSE) for training set were 0.9938 and 0.0341, while the R 
and RMSE for testing set were 0.9936 and 0.0357 respectively. The correlation between the calculated and 
experimental values of the rapeseed oil contents of the adulterated oil samples was shown in Figure 3. 

 
Figure 3: Calculated versus actual rapeseed oil content using GMMRABC 

In comparison, GMR with EM for parameters estimation, BP-ANN and PLS were also performed on these oil 
samples. The optimal number of latent variables for PLS was determined by the predicting residual sum of 
squares and was set as 7. The first two PCs with the largest eigenvalues were selected for GMR-EM and BP-
ANN. Table 2 summarized the results of these regression methods. Using EM for parameters estimation, the 
GMR resulted model with R=0.9912 for training set. The correlation coefficient and the root mean square error 
for testing set were 0.9892 and 0.0437 respectively. The BP-ANN model gave the correlation coefficients 
0.9900 and 0.9853 for training and test sets respectively. The RMSE for training set was 0.0411 and for 
testing set was 0.0795. There is a slight symptom of overfitting. We can see that the GMMRABC perform 
better than the common GMR, indicating that ABC algorithm is better than EM for parameters optimization. 
The results obtained above suggest that the proposed GMMRABC method is a promising alternative to detect 
adulteration in adulteration oil samples. 

Table 2: Results of quantification of different regression methods 

Model 
Training sets Prediction sets 

correlation 
coefficient 

root mean 
square error 

correlation 
coefficient 

root mean 
square error 

GMMRABC 0.9938 0.0341 0.9936 0.0357 
GMR-EM 0.9912 0.0379 0.9892 0.0437 
BP-ANN 0.9900 0.0411 0.9853 0.0795 

PLS 0.9989 0.0139 0.9941 0.0322 

4. Conclusions 
The parameter estimation of GMM and GMR by ABC algorithm was proposed in this paper. 88 extra virgin 
olive oil and adulteration oil samples were classified and predicted by the proposed method. Moreover, 
several other classification and regression methods were applied for comparison. It proved that the proposed 
method had a better performance and it was an accurate and stable method for quality and quantification 
analysis of edible oil.  

Acknowledgments  

The work was financially supported by the National Natural Science Foundation of China (Grant No. 
21575131). 

Reference  

Ahmed B., Fouad B., Djalil B.A., Mohamed B.B., Abdelouahed T., Bedia E.A., 2016, The Thermal Study of 
Wave Propagation in Functionally Graded Material Plates (FGM) Based on Neutral Surface Position, 
Mathematical Modelling of Engineering Problems, 3(4), 202-205, Doi: 10.18280/mmep.030410. 

353


Alam M.S., 2016, Mathematical Modelling for the Effects of Thermophoresis and Heat Generation/Absorption 
on MHD Convective Flow along an Inclined Stretching Sheet in the Presence of Dufour-Soret Effects, 
Mathematical Modelling of Engineering Problems, 3(3), 119-128, Doi: 10.18280/mmep.030302. 

Alonso-Salces R.M., Heberger K., Holland M.V., Moreno-Roias J.M., Mariani C., Bellan G., Reniero F., Guillou 
C., 2010, Multivariate analysis of NMR fingerprint of the unsaponifiable fraction of virgin olive oils for 
authentication purposes, Food Chemistry, 118(4), 956-965, Doi: 10.1016/j.foodchem.2008.09.061. 

Almeida M., Vargas-Zerwes F., Ferreira-Bastos L., Costa A., Souza-Schneider R., Machado E., Kohle A., 
2015, Cation and anion monitoring in a wastewater treatment pilot project, Revista de la Facultad de 
Ingeniería, 30(3), 82-89, Doi: 10.17533/udea.redin.n76a10. 

Caetano S., Ustun B., Hennessy S., Smeyers-Verbeke J., Meissen W., Downey G., Buydens L., Heyden Y.V., 
2007, Geographical classification of olive oils by the application of CART and SVM to their FT-IR, Journal 
of Chemometrics, 21(7-9), 324-334, Doi: 10.1002/cem.1077. 

Chiavaro E., Vittadini E., Rodrigaze-Estrada M.T., Cerretani L., Bendini A., 2008, Differential scanning 
calorimeter application to the detection of refined hazelnut oil in extra virgin olive oil, Food Chemistry, 
110(1), 248-256, Doi: 10.1016/j.foodchem.2008.01.044. 

Devos O., Duponchel L., 2011, Parallel genetic algorithm co-optimization of spectral pre-processing and 
wavelength selection for PLS regression, Chemometrics & Intelligent Laboratory Systems, 107(1), 50-58, 
Doi: 10.1016/j.chemolab.2011.01.008. 

Han Z.H., Zhu P.X., Guo Y., Zhou S.G., Fan N.J., 2015, Synthesis and property study of layered ti/tib2 
composite electrode materials for wet electrolytic, Mathematical Modelling of Engineering Problems, 2(2), 
11-14, Doi: 10.18280/mmep.020203. 

Jacques J., Bouveyron C., Girard S., Devos O., Duponchel L., Ruckebusch C., 2010, Gaussian mixture 
models for the classification of high-dimensional vibrational spectroscopy data, Journal of Chemometrics, 
24(11-12), 719-727, Doi: 10.1002/cem.1355. 

Kamruzzaman M., Sun D.W., ElMasry G., Allen P., 2013, Fast detection and visualization of minced lamb 
meat adulteration using NIR hyperspectral imaging and multivariate image analysis, Talanta, 103(2), 130-
136, Doi: 10.1016/j.talanta.2012.10.020. 

Li B., Chiong R., Lin M., 2015, A balance-evolution artificial bee colony algorithm for protein structure 
optimization based on a three-dimensional AB off-lattice model, Computational Biology & Chemistry, 54, 1-
12, Doi: 10.1016/j.compbiolchem.2014.11.004. 

Lu S.J., Salleh A.H.M, Mohamad M.S., Deris S., Omatu S., Yoshioka M., 2014, Identification of gene knockout 
strategies using a hybrid of an ant colony optimization algorithm and flux balance analysis to optimize 
microbial strains, Computational Biology and Chemistry, 53, 175-183, Doi: 
10.1016/j.compbiolchem.2014.09.008. 

Miloudi H., Zoubida C., Abdelaziz S., Larbi H., Dominique G., 2007, Detection of argan oil adulteration using 
quantitative campesterol GC-Analysis, Journal of the American Oil Chemists' Society, 84(8), 761-764, Doi: 
10.1007/s11746-007-1084-y. 

Ni Y.N., Li B.H., Kokot S., 2012, Discrimination of Radix Paeoniae varieties on the basis of their geographical 
origin by a novel method combining high-performance liquid chromatography and Fourier transform 
infrared spectroscopy measurements, Analytical Methods, 4(12), 4326-4333, Doi: 10.1039/C2AY25950H. 

Oussama A., Elabadi F., Platikanov S., Kzaiber F., Tauler R., 2012, Detection of Olive Oil Adulteration Using 
FT-IR Spectroscopy and PLS with Variable Importance of Projection (VIP) Scores, Journal of the American 
Oil Chemists' Society, 89(10), 1807-1812, Doi: 10.1007/s11746-012-2091-1. 

Rohman A., Che M.Y., Ismail A., Hashim P., 2010, Application of FTIR Spectroscopy for the Determination of 
Virgin Coconut Oil in Binary Mixtures with Olive Oil and Palm Oil, Journal of the American Oil Chemists' 
Society, 87, 601-606, Doi: 10.1007/s11746-009-1536-7. 

Rohman A., Man Y.B.C., 2011, The use of Fourier transform mid infrared (FT-MIR) spectroscopy for detection 
and quantification of adulteration in virgin coconut oil, Food Chemistry, 129(2), 583-588, Doi: 
10.1016/j.foodchem.2011.04.070. 

Sinelli N., Cosio M.S., Gigliotti C., Casiraghi E., 2007, Preliminary study on application of mid infrared 
spectroscopy for the evaluation of the virgin olive oil "freshness", Analytica Chimica Acta, 2007, 598(1), 
128-134, Doi: 10.1016/j.aca.2007.07.024. 

Wang L., FSC L., Wang X., He Y., 2006, Feasibility study of quantifying and discriminating soybean oil 
adulteration in camellia oils by attenuated total reflectance MIR and fiber optic diffuse reflectance NIR, 
Food Chemistry, 95(3), 529-536, Doi: 10.1016/j.foodchem.2005.04.015. 

Yuan X.F., Ge Z.Q., Song Z.H., 2014, Soft sensor model development in multiphase/multimode processes 
based on Gaussian mixture regression, Chemometrics & Intelligent Laboratory Systems, 138, 97-109, Doi: 
10.1016/j.chemolab.2014.07.013.  

354