10575


FACTA UNIVERSITATIS  
Series: Mechanical Engineering Vol. 20, No 3, 2022, pp. 479 - 501 

https://doi.org/10.22190/FUME220307022M 

© 2022 by University of Niš, Serbia | Creative Commons License: CC BY-NC-ND 

Original scientific paper 

ASSESSMENT AND PERFORMANCE ANALYSIS OF MACHINE 

LEARNING TECHNIQUES FOR GAS SENSING  

E-NOSE SYSTEMS 

Lubna Mahmood1, Zied Bahroun2, Mehdi Ghommem3,  

Hussam Alshraideh2 

1Engineering Systems Management Graduate Program, American University of Sharjah, 

Sharjah, United Arab Emirates 
2Department of Industrial Engineering, American University of Sharjah, Sharjah,    

United Arab Emirates 

 3Department of Mechanical Engineering, American University of Sharjah, Sharjah, 

United Arab Emirates  

Abstract. E-noses that combine machine learning and gas sensor arrays (GSAs) are 

widely used for the detection and identification of various gases. GSAs produce signals 

that provide vital information about the exposed gases for the machine learning 

algorithms, rendering them indispensable within the smart-gas sensing arena. In this 

work, we present a detailed assessment of several machine learning techniques employed 

for the detection of gases and estimation of their concentrations. The modeling and 

predictive analysis conducted in this paper are based on kNN, ANN, Decision Trees, 

Random Forests, SVM and other ensembling-based techniques. Predictive models are 

implemented and tested on three different MoX gas sensor-based experimental datasets 

as reported in the literature. The assessment includes a delineated analysis of the 

different models’ performance followed by a detailed comparison against results found 

in the literature. It highlights factors that play a pivotal role in machine learning for gas 

sensing and sheds light on the predictive capability of different machine learning 

approaches applied on experimental GSA datasets. 

Key words: Gas sensor arrays, E-nose, Machine learning, Feature extraction, Feature 

selection, Classification, Regression 

 
Received: March 07, 2022 / Accepted May 10, 2022 

Corresponding author: Mehdi Ghommem  
Department of Mechanical Engineering, American University of Sharjah, P.O. Box 26666, Sharjah, United Arab Emirates. 

E-mail: mghommem@aus.edu 


480 L. MAHMOOD, Z. BAHROUN, M. GHOMMEM, H. ALSHRAIDEH 

 
 1. INTRODUCTION  

Gas sensing technology which now encompasses the E-nose systems has witnessed 

tremendous growth in the past few decades. Since the advent of E-noses in the late 20th 

century, gas sensing technology has experienced an accelerating trend in the development 

of different types of GSAs that can be employed in E-noses [1-3]. These developments 

come in light of different technologies that have been deployed for manufacturing gas 

sensors. Sensors based on metal-oxide (MoX) semiconductors [4,5], carbon-nano materials 

[6,7] and polymer composites [8,9] operate on electrochemical principles, whereas, 

acoustic [10] and infrared [11] sensors focus on non-electrochemical operating principles. 

As a consequence, E-noses are currently used in a plethora of applications that require 

detection of the gas type and in many cases, the estimation of the gas concentrations. Given 

the correlation of radon with the occurrence of lung cancer, Blanco-Novoa et al. proposed 

an IoT system for estimating radon concentrations that could further activate mitigation 

devices to reduce its concentration in indoor environments [12]. Besides air quality control, 

E-noses have been employed for detecting the freshness levels of different kinds of meat 

based on their odor level, thereby, showing promising results for food quality control [13]. 

Wireless sensor networks are also used to detect and keep ammonia gas levels in check to 

minimize the environmental hazard they might pose within the livestock environment [14]. 

Furthermore, the ability of the E-noses to offer a relatively inexpensive and fast mechanism 

for the detection of various chemical compounds has made them increasingly viable for 

medical diagnosis, for example, monitoring diabetes based on exhaled acetone levels [15]. 

Machine learning within the E-nose forms a vital component of the smart gas-detection 

platform. The sensing materials of the GSAs develop a sensory response, based on the type 

of sensor, to the exposed chemical compounds which is then exploited by machine learning 

algorithms for their identification [16]. Despite their supreme functionality, GSAs are often 

riddled with challenges that impact their selectivity, sensitivity, stability and their ability 

to render a viable sensor response [17]. While many of these challenges are influenced by 

the sensors’ external environment and have been tackled by implementing different sensor 

fabrication technologies, some challenges such as sensor drift [18,19] remain widely 

adverse. The two primary types of drift include the ‘first-order’ drift which occurs due to 

the chemical interaction between the sensor surface and the exposed gas. This could result 

in aging and an irreversible poisoning of the sensor surface due to prolonged gas exposure. 

‘Second-order’ drift, on the other hand, is attributed to external factors such as fluctuations 

in the environment or aberrant changes in the experimental conditions. To overcome the 

challenges of a single gas sensor, E-noses combine GSAs with machine learning to mimic 

the human olfactory system in detecting/identifying various chemical compounds.  

Machine learning for E-nose systems generally addresses two primary tasks, classification 

in which the type of chemical compound is identified and regression in which chemical 

compound concentration is estimated [20]. Based on the sensory response to a particular 

chemical compound, which represents the sensor signal, machine learning techniques attempt 

to extract relevant characteristics from the sensor signals using feature extraction and feature 

selection [21]. This approach of feature extraction was implemented by Vergara et al. to 

classify six different gases, i.e., ethylene, ethanol, ammonia, acetaldehyde, acetone and 

toluene, keeping in mind the challenge of sensor drift [19]. An extensive dataset that rendered 

the gas type and concentration obtained from 16 MoX sensors, was used to implement 

Support Vector Machine (SVM) ensemble classifiers to improve gas detection and counteract 


 Assessment and Performance Analysis of Machine Learning Techniques for Gas Sensing E-Nose... 481 

 
sensor drift simultaneously. Akin to the work presented in [19], Adhikari and Saha proposed 

the use of Artificial Neural Network (ANN) as a deep learning network [22,23] and k-Nearest 

Neighbors (kNN) ensembles to correctly identify the six gases in the face of drift [24]. The 

dataset consisting of six gas classes, posed a multi-class classification task that was tackled 

using ensembles combining several machine learning algorithms to render the final 

classification result. In addition to being used for classification, the dataset was employed to 

perform regression to predict the gas concentrations. Rehman and Bermak proposed a machine 

learning algorithm that incorporated feature selection within the regression model [25]. Through 

feature selection, the predictive model was able to identify relevant characteristics from the 

sensor responses to help estimate the concentrations for each of the six gases. Using predictive 

models, feature extraction, and feature selection for classification and regression using 

different GSA datasets, previous studies have worked towards improving the performance of 

the E-nose system. Consequently, significant advancements in different machine learning 

techniques continue to help tackle the challenges faced by GSAs while ameliorating their 

performance for different applications.  

This paper first presents the general computational framework used for smart gas 

sensing and discusses several predictive techniques used in an E-nose system. The framework 

is applied to assess the performance of different machine learning techniques using three 

different experimental datasets that were previously obtained from MoX gas sensors. The 

assessment entails a detailed analysis of their implementation on the selected datasets and a 

comparison of their results against those obtained from previously-published studies. Finally, 

we summarize the main findings of the present work and conclude with some remarks. 

2. MACHINE LEARNING-BASED APPROACH FOR GAS SENSING 

Smart gas sensing using E-noses for classification and regression involves three vital 

steps as shown in Fig. 1. Multiple sensors in the GSA generate a unique sensory response 

in the form of a time-series signal for each gas they are exposed to, depending on the type 

of sensor being used and its operating principle. In several cases, the sensors undergo a 

physical or chemical change that results in a measurable sensor output. For instance, the 

sensor output could be measured as the change in electrical resistance or conductance of a 

sensitive material when exposed to a particular gas. As a result, the sensor response is 

typically referred to as a unique fingerprint that is specific to a certain gas and a particular 

concentration. Once the signal is acquired, the next pivotal step of signal pre-processing 

helps in the useful and meaningful transformation of the original raw data into representative 

features through feature extraction. This results in the conversion of the sensor response 

into a set of n features that capture vital information from the sensor response. In many 

cases, the transformation can render a high-dimensional dataset due to the large number of 

features that can be extracted. This might cause complexities while implementing machine 

learning models, thereby, resulting in the poor performance of the overall E-nose system 

[26,27]. This can be encountered using feature selection techniques that help retain 

important features subset from the original feature set to improve model performance 

[20,21,28,29]. Consequently, the generated dataset can be divided into training and testing 

datasets. The training dataset is used for model building, whereas, the testing dataset is 

used to test for the model performance. Classification-based models attempt to identify and 

segregate the different gases using class labels, for example, segregating the quality of wine 


482 L. MAHMOOD, Z. BAHROUN, M. GHOMMEM, H. ALSHRAIDEH 

 
using wine odors as conducted in [30], whereas, regression-based models aim to establish 

a relationship between the features and the gas concentration as a numeric variable [31]. 

 
Fig. 1 Block diagram representation of the steps involved in the E-nose operation 

Given the primary focus of this work on sensor data obtained from MoX sensors, Fig. 

2 shows a typical MoX sensor response on exposure to a gas. The response depicts the 

change in conductance over time during the gas exposure with three different phases; Phase 

I (initial phase), Phase II (exposure phase), and Phase III (regeneration phase) [32]. It is 

quite evident that the raw time-series data from the sensor contains numerous data points. 

As a consequence, feature extraction and feature selection that form a subset of dimensionality 

reduction techniques, focus on extracting/selecting representative features from the sensor data 

[33]. The extracted features could represent information such as the slope of the signal in each 

phase, the rise time of the signal, the peak value of the signal, the trough of the signal and the 

overall average value of the signal. As a result, dimensionality reduction techniques are 

utilized to provide characteristic features that summarize a major portion of the information 

contained in the original raw data. 

 
Fig. 2 Typical response of MoX gas sensor on exposure to ethylene. Adapted from [32] 


 Assessment and Performance Analysis of Machine Learning Techniques for Gas Sensing E-Nose... 483 

 
3. ASSESSMENT OF MACHINE LEARNING TECHNIQUES  

The adopted E-nose machine learning-based framework is applied to three MoX gas 

sensor datasets to assess various machine learning models’ performance. Whenever 

possible, a comparison of the adopted framework results with previously-published work 

is presented. Each dataset was obtained through different experimental protocols for 

detecting different gases and their concentrations [18,19,25,34,37]. The experimental 

protocol, assessment and the results for each dataset are discussed in the following sections. 

3.1. Dataset 1- Experimental dataset of six gases with varying concentrations 

This dataset contains measurements collected over three years for the detection of six 

analytes, namely, ammonia, acetaldehyde, acetone, toluene, ethanol and ethylene using 16 

commercial MoX sensors in a GSA [19,25,34]. The complete experimental setup, fully 

controlled to provide high reliability and measurement reproducibility, consisted of four 

different types of MoX sensors (four of each type), thereby, resulting in 16 MoX sensors. 

Each gas and its respective concentration was exposed to the GSAs in no particular order 

to yield a complete dataset of 13,910 sensor signals, split into ten different batches. Further 

details regarding the experimental procedure, various gas concentrations tested and 

different batches can be found in the work presented by Vergara et al. [19]. 

3.1.1. Methodology 

Following the signal acquisition, Fonollosa et al. resorted to extracting two types of 

features from the sensor response [19,34]. For r[k], that represents the sensor resistance at 

time step k, the steady-state feature ∆R represents the difference of the maximum resistance 

change and the baseline resistance value, and its normalized value ‖∆R‖ is depicted as the 

ratio of this difference with respect to the baseline resistance value. The transient features 

represent the sensors’ dynamic response by accounting for the rising/decaying potion of 

the signal using the exponential moving average (ema) for different smoothing parameter 

values. The complete feature dataset consists of 128 features (8 main features x 16 sensors) 

and can be found in [25]. Prior to the implementation of machine learning models, the 

features for each sensor were labeled and numbered accordingly.   

3.1.2. Part 1: Machine learning – Regression  

The main outline of the regression performed on this dataset followed the approach 

undertaken by Rehman and Bermak [25] for a fair comparison of the implemented models 

with the models presented in [25] and is shown in Fig. 3. For this analysis, we used data 

from batch 1 to train the machine learning algorithm and develop the model, whereas, 

batches 2-5 were used to test the model for the final prediction results. The final results 

were quantified by the mean absolute percentage error (MAPE) metric for numerical 

predictions. This process was repeated for each of the six gases to estimate their 

concentrations separately. During the cross-validation phase, the model hyperparameters 

were tuned through parameter optimization to avoid the occurrence of any model overfitting or 

underfitting. 


484 L. MAHMOOD, Z. BAHROUN, M. GHOMMEM, H. ALSHRAIDEH 

 
Fig. 3 Methodology outline for regression - part 1 

The complete list of algorithms applied in the previous study [25] and those applied in this 

work are shown in Table 1. The kNN algorithm was initially applied to the training dataset 

using k as 5 and the distance metric as Euclidean distance. To optimize the model performance, 

these hyperparameters were tuned during the training phase. Prior to training the kNN 

algorithm, the training dataset was normalized to minimize the impact of the scale ranges for 

different features. For both the Decision Trees (DT) and the Random Forests (RF) models, 

regression was performed using the least squares regression splitting criterion. Regression 

results of the techniques used in this work for all the six analytes is shown in Table 2. 

Table 1 List of machine learning models used in previous work and present work 

Machine learning techniques 

Previous work [25] Present work 

Levenberg Marquardt (LMNN) Bagging using Random Forests: Bagging (RF) 

Scaled Conjugate Gradient (SCGNN) kNN 

Bayesian regularization (BRNN) Vote using Gradient Boosted Trees and RF: Vote (GBT,RF) 

Average Ensemble (AvgEnsemble) RF(r)* 

Support Vector Machines (SVM) GBT 

Gaussian Processes (GP) Decision Trees (DT) 

Random Forests (RF) Vote using kNN, RF and DT: Vote (kNN,RF,DT) 

Heuristic Random Forests (HRF)  

*Since RF is implemented in the previous and present work, the RF of this research is referred to as RF(r). 

As shown in Table 2, in the case of ethanol, when testing batch 2, kNN (MAPE of 

35.35%) achieved the best performance in comparison to all the models reported in the 

previous study [25]. Despite the lack of high concentrations present in the training dataset, 

the present implementation of the models enabled to obtain relatively lower errors. This 

can be supported by the performance of DT (MAPE of 45.42%), Vote using kNN,RF,DT 

(MAPE of 51.05%) and RF(r) (MAPE of 55.02%), that had MAPEs lower than the HRF 

model [25]. For testing batches 2-5, kNN provided the least MAPE of 39.23%, that was 

comparable with the MAPE of HRF. Other models that delivered a decent performance 

include the Vote using kNN,RF,DT (MAPE of 52.26%) and RF(r) (MAPE of 59.18%). For 

batch 2 of ethylene, kNN (MAPE of 15.66%) demonstrated the best performance amongst 


 Assessment and Performance Analysis of Machine Learning Techniques for Gas Sensing E-Nose... 485 

 
the implemented models and those presented in the previous study [25]. In addition to kNN, 

Bagging using RF (MAPE of 21.82%) and RF(r) (MAPE of 23.47%) also delivered good 

performances. In case of batches 2-5, the lowest achievable MAPE was 58.22% from 

Bagging using RF, wherein, the high MAPEs can be attributed to the sensor drift 

experienced over the course of the experiment. Nevertheless, Bagging using RF (MAPE of 

58.22%), kNN (MAPE of 64.97%), DT (MAPE of 71.44%) and RF(r) (MAPE of 63.03%) 

performed better than SCGNN and SVM [25].  

Table 2 MAPEs for six gases using different machine learning models - present work 

Gas 
Test 

batch 

Machine learning model/MAPE(%) 

Bagging 

(RF) 
kNN 

Vote 

(GBT,RF) 
RF(r) GBT DT 

Vote 

(kNN,RF,DT) 

1 
Batch 2 61.94 35.35 87.39 55.02 124.39 45.42 51.05 

Batches 2-5 60.35 39.23 72.08 59.18 89.74 72.35 52.26 

2 
Batch 2 21.82 15.66 61.29 23.47 102.55 33.16 32.98 

Batches 2-5 58.22 64.97 73.83 63.03 88.34 71.44 89.55 

3 
Batch 2 35.06 11.46 59.44 37.93 81.47 18.79 19.93 

Batches 2-5 54.23 38.22 81.15 56.15 106.31 39.79 41.69 

4 
Batch 2 22.71 55.05 25.99 20.47 39.74 38.73 37.65 

Batches 2-5 22.14 59.77 26.55 19.8 41.49 49.07 41.81 

5 
Batch 2 30.74 25.39 40.26 26.12 57.46 85.76 59.64 

Batches 2-5 33.17 32.54 53.75 32.09 77.3 86.25 58.54 

6 
Batch 2 38.2 28.28 41.74 40.02 43.46 52.22 59.00 

Batches 2-6 38.6 64.02 50.66 41.63 59.68 84.83 65.98 

Gas 1 – ethanol, Gas 2 – ethylene, Gas 3 – ammonia, Gas 4 – acetone, Gas 5 – acetaldehyde, Gas 6 – toluene 

  
When it comes to ammonia, the lowest MAPE of 11.46% was achieved by kNN for 

batch 2 as seen in Table 2, which is lower than the error achieved by all the models 

presented in the previous study [25]. This was closely followed by DT and Vote using 

kNN,RF,DT with MAPEs of 18.79% and 19.93% respectively. As a consequence, kNN, 

DT and Vote using kNN,RF,DT collectively outperformed LMNN, SCGNN, SVM, GP 

and RF [25]. On the other hand, for testing batches 2-5, the lowest MAPEs were achieved 

by kNN (MAPE of 38.22%) and DT (MAPE of 39.79%). Hyperparameter tuning of kNN 

(setting k=7), helped kNN achieve the lowest MAPE in comparison to other models tested 

in this work, and those in the previous study [25]. For acetone, RF(r) provided the lowest 

MAPE of 20.47% amongst all the implemented models. Although the best performance 

was attributed to SVM (MAPE of 10.16%) from the previous study [25], the performance 

of RF(r) was comparable to that of HRF (MAPE of 20.44%) that had the second-best 

performance. The training dataset had low frequencies of high concentrations, whereas, the 

testing dataset had much higher frequencies of high concentrations. Despite this discrepancy, 

RF(r) rendered the lowest MAPE of 19.8% for testing batches 2-5 in this work. Besides 

RF(r), Bagging using RF (MAPE of 22.14%) and Vote using GBT,RF (MAPE of 26.55%) 

also delivered a good performance.  

For testing batch 2 of acetaldehyde, kNN demonstrated the best performance with a 

staggering low MAPE of 25.39%, followed by RF(r) (MAPE of 26.12%) and Bagging 

using RF (MAPE of 30.74%) (see Table 2). On the other hand, for testing batches 2-5, the 

best performance was observed for RF(r) with the lowest MAPE of 32.09%, closely 


486 L. MAHMOOD, Z. BAHROUN, M. GHOMMEM, H. ALSHRAIDEH 

 
followed by kNN (MAPE of 32.54%) and Bagging using RF (MAPE of 33.17%). For 

toluene, batches 2-6 were tested as opposed to batches 2-5 since no measurements were 

recorded for toluene in batches 3, 4 and 5. Based on the regression results, the best performing 

model for testing batch 2 was kNN (MAPE of 28.28%) with a performance comparable to HRF 

(MAPE of 21.48%) and GP (MAPE of 26.58%) from the previous study [25]. Other models 

that performed equivalently well included Bagging using RF (MAPE of 38.2%) and RF(r) 

(MAPE of 40.02%) which had MAPEs lower than the models used in the previous study [25]. 

For batches 2-6, the best results were attained using Bagging using RF (MAPE of 38.6%), Vote 

using GBT,RF (MAPE of 50.66%) and RF(r) (MAPE of 41.63%), which collectively had errors 

lower than all the models except HRF from the previous study [25].  

Going a step further, feature selection techniques of Sequential Forward Selection (SFS) 

and Sequential Backward Elimination (SBE), and the dimensionality reduction technique of 

Principle Component Analysis (PCA) were applied to investigate any potential change in the 

regression performance. PCA for the given dataset resulted in 128 PCs as the dataset contained 

128 features. To minimize dimensionality while retaining maximum data variance, the limit for 

data variance was set to 95%, where the number of independent PCs that yielded 94.3% 

variance in the data was found to be 4. A summary of the best performing models for each gas 

and the associated MAPEs is shown in Table 3. Feature selection or PCA helped to minimize 

the redundancy in the feature datasets and thereby, improve the performance of the implemented 

models. Moreover, models that did not perform well earlier were able to gain an advantage to 

their benefit. 

In case of SFS, 82.61% of the features selected represented the transient response of 

the sensors, whereas, quite a few pertained to the steady-state response of the sensors. 

Similarly, in case of SBE, the transient response of the sensors formed 74.76% of the 

remaining features. Though more importance is attributed to the transient features, our 

work implies the importance of both steady-state and transient features for estimating the 

gas concentrations. 

Table 3 Summary of best performing models and associated MAPEs - present work 

Gas 1 – ethanol, Gas 2 – ethylene, Gas 3 – ammonia, Gas 4 – acetone, Gas 5 – acetaldehyde, Gas 6 – toluene 

 
Gas Test batch Machine learning model Technique MAPE (%) 

1 
Batch 2 RF(r) SFS 29.75 

Batches 2-5 DT PCA 36.59 

2 
Batch 2 Vote (kNN,RF,DT) SBE 15.31 

Batches 2-5 RF(r) SFS 47.69 

3 
Batch 2 kNN SBE 8.19 

Batches 2-5 Bagging (RF) SFS 11.38 

4 
Batch 2 Vote (GBT,RF) PCA 13.45 

Batches 2-5 Bagging (RF) SBE 29.59 

5 
Batch 2 RF(r) SBE 12.94 

Batches 2-5 kNN PCA 10.39 

6 
Batch 2 Vote (kNN,RF,DT) PCA 39.9 

Batches 2-6 Bagging (RF) SFS 38.48 


 Assessment and Performance Analysis of Machine Learning Techniques for Gas Sensing E-Nose... 487 

 
3.1.3. Part 2: Machine learning – Regression 

So far, we focused on estimating concentrations of the six gases for batch 2, batches 2-
5 (ethanol, ethylene, ammonia, acetone, acetaldehyde) and batches 2-6 (toluene). For the 
other batches of data, a different approach was followed, where, the machine learning 
algorithms were trained on batches 1 to (m-1) and tested on the mth batch. To clarify further, 

for ethanol, ethylene, ammonia, acetone and acetaldehyde, m ∈ {6,7,8,9,10}; whereas, for 
toluene m ∈ {7,8,9,10}. Since batch 6 was already tested for toluene in the previous 
analysis, batch 6 for toluene was not included in this section.  

One of the main reasons for following this approach for batches 6-10 was sensor drift. 
The sensor data was collected over the course of 36 months, with batch 6 measurements 
beginning in month 17. Training on batch 1 where measurements were taken in months 1 
and 2, and testing on batches with measurements taken much later in time could result in 
inaccurate prediction results due to changes in the sensor response over time. As a 
consequence, all the previous batches prior to the testing batch were included in the model 
training to estimate the gas concentrations. Results of the best performing regression 
models in the present work are shown in Table 4. Akin to the previous analysis, the kNN 

Table 4 MAPEs for gases on testing batches 6-10 - present work 

Gas Training batches Test batch Machine learning model MAPE (%) 

1 

Batches 1-5 Batch 6 kNN 22.17 

Batches 1-6 Batch 7 RF 22.22 

Batches 1-7 Batch 8 RF 23.73 

Batches 1-8 Batch 9 DT 38.14 

Batches 1-9 Batch 10 DT 54.02 

2 

Batches 1-5 Batch 6 DT 20.1 

Batches 1-6 Batch 7 RF 54.01 

Batches 1-7 Batch 8 RF 29.57 

Batches 1-8 Batch 9 RF 29.46 

Batches 1-9 Batch 10 RF 85.45 

3 

Batches 1-5 Batch 6 DT 14.58 

Batches 1-6 Batch 7 kNN 5.98 

Batches 1-7 Batch 8 kNN 58.9 

Batches 1-8 Batch 9 Bagging (RF) 8.89 

Batches 1-9 Batch 10 Bagging (RF) 52.21 

4 

Batches 1-5 Batch 6 RF 4.01 

Batches 1-6 Batch 7 RF 37.26 

Batches 1-7 Batch 8 RF 7.59 

Batches 1-8 Batch 9 Bagging (RF) 17.96 

Batches 1-9 Batch 10 DT 44.87 

5 

Batches 1-5 Batch 6 DT 19.87 

Batches 1-6 Batch 7 RF 18.55 

Batches 1-7 Batch 8 DT 30.82 

Batches 1-8 Batch 9 RF 6.35 

Batches 1-9 Batch 10 DT 59.61 

6 

Batches 1-6 Batch 7 Bagging (RF) 46.72 

Batches 1-7 Batch 8 Bagging (RF) 45.62 

Batches 1-8 Batch 9 DT 40.9 

Batches 1-9 Batch 10 Bagging (RF) 24.6 

Gas 1 – ethanol, Gas 2 – ethylene, Gas 3 – ammonia, Gas 4 – acetone, Gas 5 – acetaldehyde, Gas 6 – toluene 


488 L. MAHMOOD, Z. BAHROUN, M. GHOMMEM, H. ALSHRAIDEH 

 
model was initially used with k as 5 using the Euclidean distance metric. The MAPEs are 
a result of this selection whereas, in some cases, the hyperparameters were tuned 
accordingly. As seen in Table 4, some models rendered remarkably low MAPEs such as 
kNN (MAPE of 22.17%) for testing batch 6 of ethanol, DT (MAPE of 20.1%) for testing 
batch 6 of ethylene, kNN (MAPE of 5.98%) for testing batch 7 of ammonia, Bagging using 
RF (MAPE of 8.89%) for testing batch 9 of ammonia, RF (MAPE of 4.01%) for testing 
batch 6 of acetaldehyde, RF (MAPE of 6.35%) for testing batch 9 of acetone and Bagging 
using RF (MAPE of 24.6%) for testing batch 10 of toluene. The low MAPEs in majority 
of the cases except for when testing batch 10, indicated that using several batches of data 
for model training helped in addressing sensor drift. Using more recent data samples 
resulted in good predictions, with RF repeatedly delivering the best MAPEs, thereby, 
fortifying the ability of ensemble methods to deliver good prediction results. 

Furthermore, SFS, SBE and PCA using 4 PCs were applied along with the machine 

learning models to improve their performance. For some models, the MAPEs remained the 

same despite the use of feature selection or dimensionality reduction. This was seen from 

testing batch 10 of ethanol using DT and SFS (MAPE of 54.02%), batch 7 of ammonia 

using kNN and SBE (MAPE of 5.98%), batch 10 of acetone using DT and SBE (MAPE of 

59.61%). Alternatively, in cases such as testing batch 6 of ethanol using kNN, SFS reduced 

the MAPE drastically from 22.17% to 8.15% and for testing batch 10 of ethylene using RF, 

SBE reduced the MAPE from 85.45% to 80.03%. Testing batch 8 of acetaldehyde with RF 

and PCA reduced the MAPE from 7.59% to 6.24%. Consequently, the use of either 

wrapper-based feature selection techniques or PCA helped to ameliorate the performance 

of the implemented models.    

3.1.4. Machine learning – Classification 

In addition to regression, classification was performed to predict the gas type using two 

different approaches:  

▪ Approach 1: Training the algorithm on batch m and testing the model on the 

remaining consecutive batches, where m  {1,2,3,4,5,6,7,8,9}. 
▪ Approach 2: Training the algorithm on batch (m-1) and testing the model on the mth 

consecutive batch, where m  {2,3,4,5,6,7,8,9,10}. 
The two approaches implemented are similar to those implemented in their respective 

previously published counterparts in [19,24]. This was done to enable a fair comparison of 

the results obtained. We implemented machine learning models such as RF, ANN, kNN, 

SVM, DT, Bagging using RF and Vote using GBT,RF for both aforementioned approaches. 

For ANN, a back propagation multi-layer perceptron (BP-MLP) model with 2 hidden 

layers and 15 nodes was used. Moreover, to get appropriate results, the training datasets 

were normalized to lie between -1 and 1 for SVM, ANN and kNN. 

Using 10-fold, leave-one-out cross-validation, the classification accuracies for the 

machine learning models using Approach 1 are shown in Table 5. From Table 5, it is 

observed that training on batch 1 and testing on all consecutive batches results in the 

highest average classification accuracy of 60.75% using SVM. A similar observation can 

be made for batch 2 (accuracy of 60.31%) and batch 3 (accuracy of 63.59%), which 

rendered accuracies in the same range but using kNN. For kNN, the value of k was initially 

set to 5, however, on tuning the model hyperparameters, setting the value of k equal to 7 

yielded the best results for classification. Irrespective of the technique that rendered the 


 Assessment and Performance Analysis of Machine Learning Techniques for Gas Sensing E-Nose... 489 

 
highest accuracy, it was quite evident that the classification accuracy decreased on going 

from training batch 1 to batch 9 due to sensor drift. Despite the occurrence of drift, the 

classification accuracy of 60.75% was higher than the accuracies rendered by ANN 

(accuracy of 42.43%) and kNN (accuracy of 45.77%) as reported in [24]. 

Our results were also compared to those presented by Vergara et al. [19], who used an 

SVM-based machine learning model for classification. On comparing accuracies, it was 

observed that the implemented SVM classifier (accuracy of 60.75%) performed better than 

the SVM classifier in [19]. Besides SVM, kNN also provided a higher accuracy 60.31% 

when trained on batch 2 and tested on batches 3-10 in comparison to SVM (accuracy of 

58.35%) in [19]. A similar claim can be made for training batch 4 (accuracy of 44.46% 

with Bagging using RF), batch 5 (47.23% with SVM) and batch 8 (56.44% with RF). As a 

result, the classifiers in our research were successful in outperforming the SVM classifier 

in [19] in 50% of the cases.   

Using the second approach, i.e., training classifiers using only the previous batch and 

testing on the next immediate consecutive batch, we set the classifiers to their initial 

parameters to perform classification. Results of Approach 2 for different models are shown 

in Table 6. It is quite clear that the accuracy for each test batch reaches its highest value 

when the model is trained with a previous batch. Moreover, classifiers such as Vote 

(GBT,RF), RF, SVM, Bagging (RF) and RF rendered quite high accuracies for 6 out of the 

Table 5 Average classification accuracies for training with different batches – approach 1 

Training 

batch 

Test 

batches 

Average classification accuracy (%) 

ANN kNN SVM RF DT 
Bagging 

(RF) 

Vote 

(GBT,RF) 

Batch 1 Batches 2-10 53.19 58.72 60.75 49.25 50.8 44.11 45.32 

Batch 2 Batches 3-10 56.47 60.31 50.97 54.07 49.99 56.36 49.46 

Batch 3 Batches 4-10 54.78 63.59 60.74 58.98 45.5 57.87 51.6 

Batch 4 Batches 5-10 35.03 36.98 40.77 43.88 37.25 44.46 43.65 

Batch 5 Batches 6-10 32.19 32.07 47.23 27.06 19.28 25.1 25.76 

Batch 6 Batches 7-10 48.82 57.53 54.13 54.13 48.7 53.49 53.61 

Batch 7 Batches 8-10 47.82 52.66 64.54 58.77 56.48 57.93 62.4 

Batch 8 Batches 9-10 39.06 36.42 47.76 56.44 47.84 52.44 50.79 

Batch 9 Batch 10 17.22 15.36 15.25 13.83 16.83 14.19 11.83 

Table 6 Classification accuracies when trained on a previous batch and tested only on the 

subsequent batch – approach 2 

Training batch Test batch 

Classification accuracy (%) 

ANN kNN SVM RF 
Bagging 

 (RF) 

Vote  

(GBT,RF) 
DT 

Batch 1 Batch 2 45.42 70.58 53.62 77.25 74.36 77.33 76.29 

Batch 2 Batch 3 85.75 91.55 79.07 91.42 94.96 88.91 77.8 

Batch 3 Batch 4 61.49 70.19 67.7 83.85 81.99 78.88 80.12 

Batch 4 Batch 5 58.88 50.25 54.82 89.34 89.85 93.4 88.32 

Batch 5 Batch 6 57.3 37.48 67.96 39.74 38.57 37.96 37 

Batch 6 Batch 7 69.28 74.4 78.19 74.31 74.62 71.52 65.15 

Batch 7 Batch 8 89.12 70.07 92.86 86.73 83.33 80.27 66.67 

Batch 8 Batch 9 54.04 48.51 72.98 92.98 91.91 86.38 73.62 

Batch 9 Batch 10 17.22 15.36 15.25 13.83 14.19 11.83 16.83 


490 L. MAHMOOD, Z. BAHROUN, M. GHOMMEM, H. ALSHRAIDEH 

 
9 cases, which reflects their superior performance when compared to the SVM classifier, 

SVM-based ensembles and the PCA models implemented in [19]. Thus, Approach 2 

indicates that the most recent data samples provide a better prediction for the subsequent 

data samples, thereby, helping in reducing the impact of drift. As the test batch number 

increases, impact of drift also increases which further reduces the classification accuracy. 

Using feature selection or PCA using 4 PCs, the average classification accuracy 

obtained for the machine learning models, either improved slightly or remained the same. 

Many classification accuracies especially when testing batches 9-10 and batch 10 remained 

the same, indicating that their optimal performance had already been achieved prior to 

feature selection or PCA. Just like for regression, more transient features were selected in 

SFS (70% transient features), and a higher proportion of transient features remained after 

implementing SBE (76% transient features). A similar observation was made by using 

filter-based feature selection to determine feature importance using weights. Using gini 

index, the weights of the top 30 features are shown in Fig. 4. The weights represented in 

Fig. 4 depict the decreasing order of importance for the features, where 27% of the features 

represent steady-state features and 73% represent transient features. With more importance 

given to the transient features, these features were capable of capturing more information 

from the sensor signals to yield good prediction results. 

 
Fig. 4 Feature weights for top 30 features in the dataset 1 

The impact of different number of features on the classification accuracy is shown in 

Fig. 5, which shows the average classification accuracies for the best performing models 

for training batch 1 and testing batches 2-10. RF and Bagging using RF reach their peak 

classification accuracies at the top 40 features, whereas, SVM uses the top 35 features to 

attain the highest classification accuracy. On the other hand. kNN, uses the top 50 features 

to reach its highest accuracy. Despite the different number of features used, it was evident 

that once the peak classification accuracy is attained by each model, the accuracy begins 

to stabilize and maintain a nearly steady value, with a minimum threshold of 30 features. 

With at least top 30 highest weighted features, the classification accuracy increased, 

following which it slowly waned off to provide a steadier value. Although, this filter-based 

feature selection technique used a different approach to identify the most relevant features, 

the results remained consistent with those obtained earlier using SFS and SBE. 


 Assessment and Performance Analysis of Machine Learning Techniques for Gas Sensing E-Nose... 491 

 
Fig. 5 Average classification accuracy using different number of features – dataset 1 

3.2. Dataset 2 – Experimental dataset of four gases with varying concentrations 

The following dataset was used for the detection of four gases: ethylene (Ey), ethanol 

(Ea), carbon monoxide (CO) and methane (Me). The experiment involved testing for 10 

different concentrations of each analyte over a course of 22 days, resulting in a total of 40 

different concentrations and 640 measurement samples. The experiment used eight MoX 

sensors in the GSAs referred to as chemical detection platforms (boards). Varying responses 

for each gas and its concentration were obtained using five identical boards, with any 

variability eliminated by maintaining the same configuration in all five boards. Further details 

pertaining to the experimental procedure can be found in the work of Fonollosa et al. [18].  

The signals acquired from the eight sensors in a given board across different days, as 

reported in [18], are displayed in Fig. 6. All the sensors follow a similar trend during the 

initial phase (I) for 50s, exposure phase (E) for 100s and the recovery phase (R) for 450s. 

The exposure phase can be easily identified by the dip in the sensor response when being 

exposed to the target gas and it is quite evident that the sensors render slightly varied 

responses on different days. This can be seen in Fig. 6, wherein, the stable response of each 

 
Fig. 6  Sensor signals acquired for detecting 62.5 ppm of ethylene on board 1, on day 4 and 

day 21  


492 L. MAHMOOD, Z. BAHROUN, M. GHOMMEM, H. ALSHRAIDEH 

 
sensor on Board 1 varies from Day 4 to Day 21 for ethylene. Despite testing the same board 

over different days, the variability in the sensor responses can be attributed to drift, which 

could be a consequence of external factors.  

3.2.1. Methodology 

As an initial step, signal pre-processing using weighted moving average was performed 

to eliminate noise levels from the signals. Weighted moving average is the most common 

filter used for signal noise processing [35,36], which despite its simplicity, renders a high-

attenuation ability, by reducing noise while retaining the shape of the signal. Given the 

sensor responses from the five boards, two main window lengths of 5 and 10 were used for 

noise removal, where Fig. 7 shows the impact of using weighted moving average with a window 

length of 5 on sensors 1 and 3 from board 1, for the detection of 12.5ppm of ethanol. 

 
Fig. 7 Implementation of weighted moving average on sensor signals from sensor 1 and 3 

on board 2 and day 1 for 12.5 ppm of ethanol 

Following the pre-processing, relevant signal characteristics were extracted from 

𝑟[𝑘 + ∆𝑥] that represents the sensor resistance, with 𝑘 as the discrete time-step, where 𝑘 ∈
[0,600] and ∆𝑥 is used to indicate any incremental change in the time-step. The signal 
characteristics extracted from 𝑟[𝑘 + ∆𝑥] in this work are discussed below: 

Drop value as the signal transitions from the initial to the exposure phase is given as, 

     𝒅𝒓𝒐𝒑 = 𝑎𝑣𝑔
𝑘=0,1,…50

𝑟[𝑘 + ∆𝑥]    − 𝑎𝑣𝑔
𝑘=51,52,…150

𝑟[𝑘 + ∆𝑥]     (1) 

whereas, the transition rate is given by the slope,  

 𝒔𝒍𝒐𝒑𝒆 =
𝑟[𝑘1+∆𝑥1]−𝑟[𝑘2+∆𝑥2]

∆𝑥2−∆𝑥1
,         𝑘1 = 50, 𝑘2 = 51   (2)   

Average exposure value attained when the target gas is exposed to the sensor is 

expressed as,  

 𝒂𝒗𝒆𝒓𝒂𝒈𝒆 𝒆𝒙𝒑𝒐𝒔𝒖𝒓𝒆 = 𝑎𝑣𝑔
𝑘=51,52,…150

𝑟[𝑘 + ∆𝑥]  (3) 


 Assessment and Performance Analysis of Machine Learning Techniques for Gas Sensing E-Nose... 493 

 
Fall time (denoted by ft) from 10% to 90% of the signal while transitioning from the 

initial to exposure phase is given as,  

𝑟[𝑘 + ∆𝑥]10% = 𝑎𝑣𝑔
𝑘=0,1,…50

𝑟[𝑘 + ∆𝑥] − 

 − [0.1 × ( 𝑎𝑣𝑔
𝑘=0,1,…50

𝑟[𝑘 + ∆𝑥] − 𝑎𝑣𝑔
𝑘=51,52,…150

𝑟[𝑘 + ∆𝑥])]  
(4)

 
 𝑟[𝑘 + ∆𝑥]90% = 𝑎𝑣𝑔
𝑘=0,1,…50

𝑟[𝑘 + ∆𝑥] − 

 − [0.9 × ( 𝑎𝑣𝑔
𝑘=0,1,…50

𝑟[𝑘 + ∆𝑥] − 𝑎𝑣𝑔
𝑘=51,52,…150

𝑟[𝑘 + ∆𝑥])]      
(5)

 
𝒇𝒕 = 𝑡𝑖𝑚𝑒|(𝑟[𝑘 + ∆𝑥] = 𝑟[𝑘 + ∆𝑥]90%) − 𝑡𝑖𝑚𝑒|(𝑟[𝑘 + ∆𝑥] = 𝑟[𝑘 + ∆𝑥]10%)  (6) 

Rise time (denoted by rt) from 10% to 90% of the signal while transitioning from the 

exposure to recovery phase is expressed as, 

𝑟[𝑘 + ∆𝑥]10% = 𝑎𝑣𝑔
𝑘=151,152,…600

𝑟[𝑘 + ∆𝑥] − 

 [0.1 × ( 𝑎𝑣𝑔
𝑘=151,152,…600

𝑟[𝑘 + ∆𝑥] − 𝑎𝑣𝑔
𝑘=51,52,…150

𝑟[𝑘 + ∆𝑥])]         
(7)

 
𝑟[𝑘 + ∆𝑥]90% = 𝑎𝑣𝑔
𝑘=151,152,…600

𝑟[𝑘 + ∆𝑥] − 

 [0.9 × ( 𝑎𝑣𝑔
𝑘=151,152,…600

𝑟[𝑘 + ∆𝑥] − 𝑎𝑣𝑔
𝑘=51,52,…150

𝑟[𝑘 + ∆𝑥])]  
(8)

 
 𝒓𝒕 = 𝑡𝑖𝑚𝑒|(𝑟[𝑘 + ∆𝑥] = 𝑟[𝑘 + ∆𝑥]10%) − 𝑡𝑖𝑚𝑒|(𝑟[𝑘 + ∆𝑥] = 𝑟[𝑘 + ∆𝑥]90%)            (9) 

Fig. 8 illustrates the five different features in Eqs. (1)-(3), (6) and (9) extracted from 

sensor 4 of Board 1 for the detection of 12.5ppm of ethanol. The five features ensure that 

 
Fig. 8 Visual depiction of the five features extracted from sensor signal of sensor 4 on 

board 1 and day 1 for 12.5 ppm of ethanol 


494 L. MAHMOOD, Z. BAHROUN, M. GHOMMEM, H. ALSHRAIDEH 

 
they cover all the three phases of the signal and are able to extract relevant information 

from the raw signal. The pre-processing and feature extraction was achieved for eight 

sensors on all five boards, for all the four gases, thereby, resulting in a total of 40 features 

(5 main features x 8 sensors) to be used for classification and regression. The summary of 

the complete feature dataset is shown in Table 7. 

Table 7 Summary of the complete feature set - dataset 2 

Sensor 

no. 

Feature name/Feature number 

Drop 

value 

Average exposure 

value 

Slope value 

in the IE 

Fall time from 10% 

to 90% in the IE 

Rise time from 10% 

to 90% in the ER 

1 f1 f2 f3 f4 f5 

2 f6 f7 f8 f9 f10 

3 f11 f12 f13 f14 f15 

4 f16 f17 f18 f19 f20 

5 f21 f22 f23 f24 f25 

6 f26 f27 f28 f29 f30 

7 f31 f32 f33 f34 f35 

8 f36 f37 f38 f39 f40 

3.2.2. Machine Learning - Classification 

The complete feature dataset of 40 features was subjected to machine learning for 

classification. 480 measurements (75% of the dataset) from boards 1, 2 and 3 were used to 

train the machine learning models, whereas, 160 measurements (25% of the dataset) from 

boards 4 and 5 were used for model testing. The training data was subjected to 10-fold, 

leave-one-out cross-validation to minimize the possibility of any model underfitting or 

overfitting. Classification models such as the kNN, SVM, DT and RF were implemented 

and the corresponding results are shown in Fig. 9. Using k as 5 with the Manhattan distance 

metric, the classification accuracy was found to be 96.25%. It was observed that changing 

the value of k beyond 5 and lower than 3 reduced the classification accuracy, as a result, 

the tested k values were limited to 5,4, and 3. For both kNN and SVM, the features were 

normalized to attain the same scale range following which SVM rendered an accuracy of 

99.38%. Besides SVM, the RF model using 100 decision trees, a tree depth of 10 and the 

information gain splitting criterion rendered a good performance with 99.38% accuracy. 

 
Fig. 9 Classification accuracies of machine learning models – dataset 2 


 Assessment and Performance Analysis of Machine Learning Techniques for Gas Sensing E-Nose... 495 

 
For kNN, SBE improved the performance of the model from 96.25% to 98.75% using 38 

features. On the other hand, using PCA with 15 PCs along with DT enhanced the performance 

of the model from 77.5% to 90%. Amongst the top 20 features shown in Fig. 10, drop value 

features accounted for 35% and slope value features accounted for 30%. This clearly shows 

that although drop value features accounted for a higher percentage, ample importance was 

given to the features from the other phases of the signal response as well. 

 
Fig. 10 Feature weights for top 20 features in the dataset - dataset 2 

On implementing the machine learning models with different number of features based 

on their gini index weights, for all the four models, the accuracy steadily increased until 

the top 20 highest weighted features were used, following which it maintained a steady 

value (see Fig. 11). This shows that the top 20 features are capable of providing a steadily 

increasing accuracy and attaining a more stable value requires the use of at least 20 features.  

 
Fig. 11 Classification accuracy using different number of features – dataset 2 


496 L. MAHMOOD, Z. BAHROUN, M. GHOMMEM, H. ALSHRAIDEH 

 
3.3. Dataset 3 – Experimental dataset of CO with varying concentrations 

This dataset was collected through the detection of different carbon monoxide (CO) 

levels using fourteen MoX sensors under fluctuating humidity conditions [37]. The experiment 

employed seven different MoX sensors (2 sensors of each type), resulting in a total of 14 

MoX sensors for detecting 10 different concentrations. Each concentration was tested 10 

times, yielding a complete dataset of 1,300 measurement samples. More details regarding 

the experimental procedure can be found in the work presented by Burgués et al. [37]. Each 

sample of CO was measured for 900s, during which the sensor heater voltage levels were 

fluctuated to generate high and low sensor temperatures in cycles of 25s. As a result, the 

duration of each measurement being 900s, with one cycle of 25s, resulted in approximately 36 

complete cycles during each measurement taken by Burgués et al. [37], as shown in Fig. 12.  

 
Fig. 12 Sensor response from sensor 1 and 2 to 20ppm of CO  

3.3.1. Methodology  

Based on the sensor responses, the three main features were extracted from 𝑟[𝑘 + ∆𝑥] 
that represents the sensor resistance, with 𝑘 as the discrete time-step, where 𝑘 ∈ [0,900] 

and ∆𝑥 is used to indicate any incremental change in the time-step as shown below: 
The maximum value of the signal amongst all the 36 cycles for a given gas is given as,  

 maximum peak =  abs. max
k=0,1,…900

 (r[k + ∆x]) (10) 

and the time-step at which the maximum peak occurs is defined as,  

 timemax.  peak = k + ∆x (11) 

Drop value as the signal transitions from the peak value to the lowest value in that cycle 

is given as, 

  drop = maximum peak-  local min
k=0,1,…900

 (r[k + ∆x])   (12) 

here, 𝑘 < 𝑡𝑖𝑚𝑒𝑚𝑎𝑥.  𝑝𝑒𝑎𝑘 . On the other hand, the transition rate is determined through the 

slope as follows: 


 Assessment and Performance Analysis of Machine Learning Techniques for Gas Sensing E-Nose... 497 

 
  minimum value =  local min
k=0,1,…900

 (r[k + ∆x])
 
  (13) 

here, 𝑘 > 𝑡𝑖𝑚𝑒𝑚𝑎𝑥.  𝑝𝑒𝑎𝑘  and the time-step at which the minimum value occurs is defined 

as, 

 timemin.  value = k + ∆x   (14) 

Consequently,  

 𝒔𝒍𝒐𝒑𝒆 =
𝑚𝑎𝑥𝑖𝑚𝑢𝑚 𝑝𝑒𝑎𝑘 – 𝑚𝑖𝑛𝑖𝑚𝑢𝑚 𝑣𝑎𝑙𝑢𝑒

𝑡𝑖𝑚𝑒𝑚𝑎𝑥.𝑝𝑒𝑎𝑘 – 𝑡𝑖𝑚𝑒𝑚𝑖𝑛.𝑣𝑎𝑙𝑢𝑒 
 (15) 

Fig. 13 illustrates the three different features extracted as shown in Eqs. (10), (12) and 

(15) from sensor 1 for the detection of 20ppm of CO. This was done for the responses from 

all 14 sensors for each of the CO concentrations, thereby, resulting in 42 features (3 main 

features x 14 sensors) to be used for regression. The feature and the feature number 
associated with each sensor is shown in Table 8. 

 
Fig. 13 Visual depiction of the three features extracted from sensor signal of sensor 1 for 

20ppm of CO 

Table 8 Summary of the complete feature set - dataset 3 

Sensor 

no. 

Feature name/Feature number 
Sensor 

no. 

Feature name/Feature number 

Maximum 

peak value 

Drop 

value 
Slope value 

Maximum 

peak value 
Drop value 

Slope 

value 

1 f1 f2 f3   8 f22 f23 f24 

2 f4 f5 f6   9 f25 f26 f27 

3 f7 f8 f9 10 f28 f29 f30 

4 f10 f11 f12 11 f31 f32 f33 

5 f13 f14 f15 12 f34 f35 f36 

6 f16 f17 f18 13 f37 f38 f39 

7 f19 f20 f21 14 f40 f41 f42 


498 L. MAHMOOD, Z. BAHROUN, M. GHOMMEM, H. ALSHRAIDEH 

 
3.3.2. Machine learning – Regression 

The complete dataset of 42 features with 1,300 measurement samples was used for 

regression. The dataset was split using the 70:30 ratio, where 70% of the original dataset 

was reserved for training and the remaining 30% was used for testing the machine learning 

models. Both, classical models such as DT, kNN, SVM, ANN and ensemble machine learning 

models such as RF, Voting and Bagging were used to perform the regression, wherein, the 

performance of the models was compared using the symmetric mean absolute percentage 

error (sMAPE) to account for 0ppm concentrations present in the dataset. Results of 

regression are shown in Fig. 14, where the sMAPEs for all the models hovered around 60%. 

However, the lowest sMAPE was attained by RF (sMAPE of 58.82%), followed by Bagging 

using RF (sMAPE of 60.28%). Hyperparameter tuning for RF resulted in an optimal value of 

48 trees, whereas, the hyperparameters for ANN were tuned to 2 hidden layers with 2 neurons 

each. For kNN, the optimal value of k was found to be 7 with Manhattan distance. The dataset 

contains 10 repetitions for each concentration of CO with varying levels of humidity for each 

repetition. As a result, the high sMAPEs can be attributed to the lack of repeatability in the 

sensor responses due to fluctuating humidity levels. 

When determining the weights of the features using correlation, each type of feature, 

i.e., the drop value, the slope value and the peak value accounted for an equal portion of 

the top 21 features (33.3% each), which highlights their equal importance for regression. 

Furthermore, to alleviate the impact of high sMAPEs, SFS and SBE were implemented to 

reflect on any performance improvement that could be achieved. The results of using SFS 

and SBE indicated that feature selection proved to be successful in reducing the sMAPE 

for all the models. For some models such as Bagging using RF, the sMAPE reduced 

significantly from 60.28% to 55.73% and RF, where the sMAPE reduced to 52.71% from 

58.82%. However, for some models such as SVM, the sMAPE remained almost the same 

irrespective of feature selection. In any case, the best performing models were found to be 

the RF, Bagging using RF, kNN and ANN models. Using SFS, the three types of features 

in the dataset were equally present in most of the cases, whereas, for SBE, an equal number 

of all the three features were dropped while training the models. 

 
Fig. 14 Regression performance of machine learning models – dataset 3 


 Assessment and Performance Analysis of Machine Learning Techniques for Gas Sensing E-Nose... 499 

 
Using PCA, 19 PCs were used along with the models to achieve a 95% variance, where 

models of RF (sMAPE of 51.82%), kNN (sMAPE of 56.81%) and Bagging using RF 

(sMAPE of 54.12%) provided the lowest sMAPEs. In other cases, the error remained the 

same or slightly decreased. This shows that for RF, kNN and Bagging using RF, a linear 

combination of the features through PCA could be used to obtain a better performance, 

whereas, the remaining models relied on the original feature dataset to estimate the 

concentrations of CO. On further observation, our results indicate the ability of the implemented 

models to be applied to regression tasks. They also highlight the importance of feature selection 

and PCA in reducing computational time to improve the model performance. Consequently, 

other factors that might have played a crucial role could be the sensor data itself. The 

inability of sensors to reproduce the same sensor response owing to external factors such 

as humidity can easily compromise the performance of the machine learning model for 

either classification, regression or both. 

4. CONCLUSION  

E-noses are used profusely for smart gas sensing applications such as medical diagnosis, 

environmental monitoring and food quality control. Their ability to identify different gases 

and estimate their concentrations makes them extremely viable for gas sensing. In this work, 

we provided a detailed assessment of different machine learning techniques used in E-noses 

for the purpose of classification and regression using three different experimental GSA 

datasets. The present analysis discusses concepts of signal pre-processing, feature extraction, 

feature selection followed by classification and regression using machine-learning techniques. 

Feature selection using wrapper-based techniques of SFS and SBS, and filter-based techniques 

accounting for gini index and correlation to determine feature importance is studied in detail 

to investigate the predictive capability of the developed models. In addition, the machine 

learning techniques employed were not limited to classical models such as DT, kNN and 

SVM but also incorporated ensemble techniques of Voting, Bagging and RF to offer a more 

comprehensive discussion.  
The performance of the models was assessed using classification accuracies and 

regression errors, followed by a comparison of the results obtained from models presented 
in previously published studies. In most cases, classical models such as kNN, DT and SVM 
showed great capability for classification problems as demonstrated by the obtained high 
level of accuracies. Ensemble techniques such as Bagging (RF) and RF outperformed their 
classical counterparts presented in this work and published in previous works. The pivotal 
role of feature extraction was demonstrated by the good performance of models. Moreover, 
the models implemented in this work were able to offer more accurate prediction than most 
of the reported results obtained from the same datasets.  

In terms of dimensionality reduction, PCA demonstrated improvement in the performance 
of the machine learning algorithms. However, Linear Discriminant Analysis, a technique 
widely used for feature extraction and classification, also helps minimize data dimensionality 
and can be used with GSA datasets. In addition, regression results revealed the impact of several 
factors that come into play when applying machine-learning techniques. These factors include 
the size of the training datasets, where it is crucial to pay heed to the size of the dataset, so as to 
avoid model underfitting or overfitting to reduce their impact on the models’ performance. 
Moreover, a large number of features, non-representative features and poor quality of data for 
training the models can be a massive challenge for both classification and regression.  


500 L. MAHMOOD, Z. BAHROUN, M. GHOMMEM, H. ALSHRAIDEH 

 
Despite the lack of publicly available GSA datasets, this work was able to identify three 

MoX sensor-based GSA datasets to provide a delineated assessment of different machine 

learning models for classification and regression. Diverse machine learning algorithms and 

techniques with different working principles were selected to perform classification and 

regression to provide a holistic discussion of machine learning in gas sensing. In all the 

three case studies, feature selection and dimensionality reduction were able to improve the 

predictive capability of machine learning models. In many cases, the approach undertaken 

was also capable of producing good results in the face of sensor drift. As a result, models 

in this work provided promising results with MoX sensor datasets and showed great 

predictive capability to be applied to datasets from other gas sensors such as carbon 

nanotube-based, polymer-based or acoustic sensors.  

REFERENCES 

1. Miller, D.R., Akbar, S.A., Morris, P.A., 2014, Nanoscale metal oxide-based heterojunctions for gas 
sensing: A review, Sensors and Actuators B: Chemical, 204, pp. 250-272. 

2. Chen, X., Wong, C.K.Y., Yuan, C.A., Zhang, G., 2013, Nanowire-based gas sensors, Sensors and 
Actuators B: Chemical, 177, pp. 178–195. 

3. Park, H.J., Kim, W., Lee, H., Lee, D., Shin, J., Jun, Y., Yun, Y., 2018, Highly flexible, mechanically stable, 
and sensitive NO2 gas sensors based on reduced graphene oxide nanofibrous mesh fabric for flexible electronics, 

Sensors and Actuators B: Chemical, 257, pp. 846-852. 
4. Fois, M., Cox, T., Ratcliffe, N., Costello, B., 2021, Rare earth doped metal oxide sensor for the multimodal 

detection of volatile organic compounds (VOCs), Sensors and Actuators B: Chemical, 330, 129264. 

5. Kazemi, E., Zadeh, D.S., Moshiri, B., 2021, Metal-oxide-semiconductor Sensors Modeling Using Ordered 
Weighted Averaging (OWA) Operators in Electronic Nose, Measurement, 184, 109932. 

6. El-Shamy, A.G., 2021, New nano-composite based on carbon dots (CDots) decorated magnesium oxide 
(MgO) nano-particles (CDots@MgO) sensor for high H2S gas sensitivity performance, Sensors and 
Actuators B: Chemical, 329, 129154. 

7. Yoo, R., Kim, J., Song, M.J., Lee, W., Noh, J.S., 2015, Nano-composite sensors composed of single- walled 
carbon nanotubes and polyaniline for the detection of a nerve agent simulant gas, Sensors and Actuators 
B: Chemical, 209, pp. 444-448. 

8. Matindoust, S., Farzi, G., Nejad, M.B., Shahrokhabadi, M.H., 2021, Polymer-based gas sensors to detect 
meat spoilage: A review, Reactive and Functional Polymers, 165, 104962. 

9. Zhou, Z., Xu, Y., Qiao, C., Liu, L., Jia, Y., 2021, A novel low-cost gas sensor for CO2 detection using 
polymer-coated fiber Bragg grating, Sensors and Actuators B: Chemical, 332, 129482. 

10. Jakubik, W.P., 2011, Surface acoustic wave-based gas sensors, Thin Solid Films, 520, pp. 986-993. 
11. Nitzsche, L., Goldschmidt, J., Lambrecht, A., Wöllenstein, J., 2021, Two-component gas sensing with MIR 

dual comb spectroscopy, tm - Technisches Messen, 89, pp. 50-59. 
12. Blanco-Novoa, O., Fernández-Caramés, T.M., Fraga-Lamas, P., Castedo, L., 2018, A cost-effective IoT 

system for monitoring indoor radon gas concentration, Sensors (Switzerland), 18, 2198.  

13. Chen, J., Gu, J., Zhang, R., Mao, Y., Tian, S., 2019, Freshness evaluation of three kinds of meats based on 
the electronic nose, Sensors (Switzerland), 19, 605. 

14. Ashari, I.A., Widodo, A.P., Suryono, S., 2019, The Monitoring System for Ammonia Gas (NH3) Hazard 
Detection in the Livestock Environment uses Inverse Distance Weight Method, 2019 Fourth International 
Conference on Informatics and Computing (ICIC), pp. 1-6.  

15. Kao, K.A., Cheng, C., Gwo, S., Yeh, J.A., 2015, A Semiconductor Gas System of Healthcare for Liver 
Disease Detection Using Ultrathin InN-Based Sensor, ECS Transactions, 66, pp. 151-157.  

16. Chen, Z., Chen, Z., Song, Z., Ye, W., Fan, Z., 2019, Smart gas sensor arrays powered by artificial 
intelligence, Journal of Semiconductors, 40, 111601. 

17. Hunter, G.W., Akbar, S., Bhansali, S., Daniele, M., Erb, P.D., Johnson, K., Liu, C., Miller, D., Oralkan, 
O., Hesketh, P.J., 2020, Editors’ Choice—Critical Review—A Critical Review of Solid State Gas Sensors, Journal 

of The Electrochemical Society, 167, 037570. 


 Assessment and Performance Analysis of Machine Learning Techniques for Gas Sensing E-Nose... 501 

 
18. Fonollosa, J., Fernández, L., Gutiérrez-Gálvez, A., Huerta, R., Marco, S., 2016, Calibration transfer and 
drift counteraction in chemical sensor arrays using Direct Standardization, Sensors and Actuators B: Chemical, 

236, pp. 1044-1053. 

19. Vergara, A., Vembu, S., Ayhan, T., Ryan, M.A., Homer, M.L., Huerta, R., 2012, Chemical gas sensor drift 
compensation using classifier ensembles, Sensors and Actuators B: Chemical, 166-167, pp. 320-329. 

20. Deng, C., Lv, K., Shi, D., Yang, B., Yu, S., He, Z., Yan, J., 2018, Enhancing the discrimination ability of 
a gas sensor array based on a novel feature selection and fusion framework, Sensors (Switzerland), 18, 1909. 

21. Hira, Z.M., Gillies, D.F., 2015, A review of feature selection and feature extraction methods applied on 
microarray data, Advances in Bioinformatics, 2015, 198363.  

22. Geng, A., Moghiseh, A., Redenbach, C., Schladitz, K., 2021, Comparing optimization methods for deep 
learning in image processing applications, tm - Technisches Messen, 88, pp. 443-453. 

23. Hoffmann, L., Fortmeier, I., Elster, C., 2021, Deep learning for tilted-wave interferometry, tm - 
Technisches Messen, 89, pp. 33-42. 

24. Adhikari, S., Saha, S., 2014, Multiple classifier combination technique for sensor drift compensation using 
ANN & KNN, 2014 IEEE International Advance Computing Conference (IACC), pp. 1184-1189. 

25. Rehman, A.U., Bermak, A., 2019, Heuristic random forests (HRF) for drift compensation in electronic 
nose applications, IEEE Sensors Journal, 19, pp. 1443–1453. 

26. Ma, D., Gao, J., Zhang, Z., Zhao, H., 2021, Gas recognition method based on the deep learning model of 
sensor array response map, Sensors and Actuators B: Chemical, 330, 129349. 

27. Fu, X., Wang, L., 2003, Data dimensionality reduction with application to simplifying RBF network 
structure and improving classification performance, IEEE Transactions on Systems, Man, and Cybernetics 

Part B: Cybernetics, 33, pp. 399–409. 
28. Xue, X., Zhang, M., Browne, W.N., Yao, X., 2016, A Survey on Evolutionary Computation Approaches to 

Feature Selection, IEEE Transactions on Evolutionary Computation, 20, pp. 606–626. 

29. Sánchez-Maroño, N., Alonso-Betanzos, A., Tombilla-Sanrománm M., 2007, Filter methods for feature 
selection - A comparative study, Lecture Notes in Computer Science (including subseries Lecture Notes in 

Artificial Intelligence and Lecture Notes in Bioinformatics), 4881, pp. 178–187. 

30. Borowik, P., Adamowicz, L., Tarakowski, R., Siwek, K., Grzywacz, T., 2020, Odor detection using an e-
nose with a reduced sensor array, Sensors (Switzerland), 20, 3542. 

31. Vito, S.D., Massera, E., Piga, M., Martinotto, L., Francia, G.D., 2008, On field calibration of an electronic 
nose for benzene estimation in an urban pollution monitoring scenario, Sensors and Actuators B: 
Chemical, 129, pp. 750-757. 

32. Pashami, S., Lilienthal, A.J., Schaffernicht, E., Trincavelli, M., 2013, TREFEX: Trend estimation and 
change detection in the response of MOX gas sensors, Sensors (Switzerland), 13, pp. 7323-7344.  

33. Zhang, S., Xie, C., Hu, M., Li, H., Bai, Z., Zeng, D., 2008, An entire feature extraction method of metal 
oxide gas sensors, Sensors and Actuators B: Chemical, 132, pp. 81–89. 

34. Fonollosa, J., Rodríguez-Luján, I., Huerta, R., 2015, Chemical gas sensor array dataset, Data in Brief, 3, 
pp. 85–89. 

35. Destro, R., Matakas, L., Komatsu, W., Ama, N.R.N., 2013, Implementation aspects of adaptive window 
moving average filter applied to PLLs - Comparative study, 2013 Brazilian Power Electronics Conference, 

COBEP 2013 – Proceedings, pp. 730–736. 

36. Zhao, Y., He, X., Pecht, M.G., Zhang, J., Zhou, D., 2020, Detection and detectability of intermittent faults 
based on moving average T2 control charts with multiple window lengths, Journal of Process Control, 92, 

pp. 296–309. 

37. Burgués, J., Jiménez-Soto, J.M., Marco, S., 2018, Estimation of the limit of detection in semiconductor gas 
sensors through linearized calibration models, Analytica Chimica Acta, 1013, pp. 13–25.