IIUM Engineering Journal, Vol. 24, No. 2, 2023 Mohd Tombel et al. https://doi.org/10.31436/iiumej.v24i2.2832 FEATURE EXTRACTION AND SUPERVISED LEARNING FOR VOLATILE ORGANIC COMPOUNDS GAS RECOGNITION NOR SYAHIRA MOHD TOMBEL1, HASAN FIRDAUS MOHD ZAKI2,3* AND HANNA FARIHIN BINTI MOHD FADGLULLAH3 1 Dept. of Science (Computational and Theoretical), Kulliyyah of Science, International Islamic University of Malaysia, Kuantan, Pahang 2 Centre for Unmanned Technologies (CUTe), Kulliyyah of Engineering, International Islamic University of Malaysia, Gombak, Kuala Lumpur 3 Dept. of Mechatronic, Kulliyyah of Engineering, International Islamic University of Malaysia, Gombak, Kuala Lumpur *Corresponding author: hasanzaki@iium.edu.my (Received: 11 April 2023; Accepted: 13 June 2023; Published on-line: 4 July 2023) ABSTRACT: The emergence of advanced technologies, particularly in the field of artificial intelligence (AI), has sparked significant interest in exploring their potential benefits for various industries, including healthcare. In the medical sector, the utilization of sensing systems has proven valuable for diagnosing pulmonary diseases by detecting volatile organic compounds (VOCs) in exhaled breath. However, the identification of the most informative and discriminating features from VOC sensor arrays remains an unresolved challenge, essential for achieving robust VOC class recognition. This research project aims to investigate effective feature extraction techniques that can be employed as discriminative features for machine learning algorithms. A preliminary dataset was used to predict VOC classification through the application of five supervised machine learning algorithms: k-Nearest Neighbors (kNN), Random Forest (RF), Support Vector Machines (SVM), Logistic Regression (LR), and Artificial Neural Networks (ANN). Ten feature extraction methods were proposed based on changes in sensor response as inputs to classify three types of gases in the dataset. The performance of each model was evaluated and compared using k-Fold cross-validation (k=10) and metrics derived from the confusion matrix. The results demonstrate that the RF model achieved the highest mean accuracy and standard deviation, with values of 0.813 ± 0.035, followed closely by kNN with 0.803 ± 0.033. Conversely, LR, SVM (kernel=Polynomial), and ANN exhibited poor performances when applied to the VOC dataset, with accuracies of 0.447 ± 0.035, 0.403 ± 0.041, and 0.419 ± 0.035, respectively. Therefore, this paper provides evidence that classifying VOC gases based on sensor responses is feasible and emphasizes the need for further research to explore sensor array analysis to enhance feature extraction techniques. ABSTRAK: Perkembangan teknologi canggih, khususnya dalam bidang kecerdasan buatan (AI), telah mencetuskan minat yang ketara dalam menerokai manfaatnya untuk pelbagai industri, termasuk bidang kesihatan. Dalam sektor perubatan, penggunaan sistem penderiaan telah terbukti bernilai untuk mendiagnosis penyakit paru-paru dengan mengesan sebatian organik meruap (VOC) dalam nafas yang dihembus manusia. Walau bagaimanapun, pengenalpastian ciri yang paling bermaklumat dan mendiskriminasi daripada penderia VOC kekal sebagai cabaran yang tidak dapat diselesaikan, penting untuk mencapai pengiktirafan kelas VOC yang kukuh. Projek penyelidikan ini bertujuan untuk menyiasat teknik pengekstrakan ciri yang berkesan yang boleh digunakan sebagai ciri diskriminatif untuk 407 IIUM Engineering Journal, Vol. 24, No. 2, 2023 Mohd Tombel et al. https://doi.org/10.31436/iiumej.v24i2.2832 algoritma pembelajaran mesin. Set data awal digunakan untuk meramalkan klasifikasi VOC melalui aplikasi lima algoritma pembelajaran mesin yang diselia: k-Nearest Neighbors (kNN), Random Forest (RF), Support Vector Machines (SVM), Logistic Regression (LR), dan Artificial Neural Networks (ANN). Sepuluh kaedah pengekstrakan ciri telah dicadangkan berdasarkan perubahan dalam tindak balas penderia sebagai input untuk mengklasifikasikan tiga jenis gas dalam set data. Prestasi setiap model telah dinilai dan dibandingkan menggunakan pengesahan silang k-Fold (k=10) dan metrik yang diperoleh daripada confusion matriks . Keputusan menunjukkan bahawa model RF mencapai ketepatan minima tertinggi dan sisihan piawai, dengan nilai 0.813 ± 0.035, diikuti oleh kNN dengan 0.803 ± 0.033. Sebaliknya, LR, SVM (kernel=Polinomial), dan ANN mempamerkan prestasi yang lemah apabila digunakan pada dataset VOC, dengan ketepatan masing-masing 0.447 ± 0.035, 0.403 ± 0.041 dan 0.419 ± 0.035. Oleh itu, kertas kerja ini memberikan bukti bahawa mengklasifikasikan gas VOC berdasarkan tindak balas penderia adalah boleh dilaksanakan dan menekankan keperluan untuk penyelidikan lanjut untuk meneroka analisis tatasusunan penderia untuk meningkatkan teknik pengekstrakan ciri. KEYWORDS: Supervised machine learning; Volatile Organic Compound; VOC Sensor; Gas classification; feature extraction 1. INTRODUCTION Volatile organic compounds (VOC) have been used as preclinical biomarkers in breath analysis to monitor health and diagnose various pulmonary diseases such as asthma and lung cancer [1] [2] [3][4]. An array of sensors, or electronic nose (e-nose) is known as the alternative for a non-invasive method of detecting volatile organic compounds (VOC). E-nose is a device inspired by the olfactory system of humans or mammals (sense of smell), composed of a collection of an array of gas sensors with a pattern recognition system designed to detect and differentiate a wide variety of gas compounds [5]. The advancement of nanosensor arrays with pattern recognition involving pre-processing, feature extraction and machine learning algorithms makes it a powerful tool for the detection and recognition of gas samples with concentration estimation. Feature extraction is an essential technique used to extract significant information from the sensor response signal [6] [7] to optimize the performance of pattern recognition algorithms for gas classification [6] [8]. However, the detection of VOC using nanosensor technologies still has some constraints in its detection system. The VOC sensor as a sensing unit faced a few limitations such as lack of sensitivity and selectivity [9] [10]. Besides, it is still not clear which type of features from VOC sensor arrays are the most descriptive and discriminative leading to a robust recognition of the VOC classes. Data collection from a gas sensor array can also be cumbersome and time- consuming which poses a nuisance in employing data-hungry machine learning algorithms. Therefore, this paper proposes employing supervised machine learning algorithms to classify the preliminary data of the individual sensors for VOC recognition. The VOC detection was performed on a chemiresistive sensor from various functionalised reduced Graphene Oxide (rGO) as a sensing layer. The targeted VOC gases used are acetone, toluene and isoprene which have been suggested as pulmonary disease-related biomarkers [11] with concentration levels ranging from 1 to 6 ppm. Concretely, we explore 10 feature sets that were extracted from the sensor’s original response curve. Then, we analyse the effect of these features towards VOC classification with five benchmark machine learning algorithms including K-Nearest Neighbours (kNN), Random 408 IIUM Engineering Journal, Vol. 24, No. 2, 2023 Mohd Tombel et al. https://doi.org/10.31436/iiumej.v24i2.2832 Forest (RF), Logistic Regression (LR), Support Vector Machine (SVM) and Artificial Neural Network (ANN). The recognition models were then put into comparison to determine the one which provides the best evaluation and high accuracy in performing the classification of the targeted VOC gases using k-Fold Cross Validation (k=10) and Confusion Matrix. 2. GAS SENSING MECHANISM Sensing mechanism of the sensor is first studied, to understand the VOC detection on the sensor. The thin film is comprised of rGO which is one of the most promising materials for detecting low VOC concentrations at room temperature [12]. Graphene is a two-dimensional building block made up of a one-atom-thick sheet of a carbon atom. Graphene can work well at room temperature because it has enormously high mobility [13]. Researchers are interested in modifying graphene into reduced Graphene Oxide as a sensing element because of its excellent electrical, high thermal conductivity, and mechanical properties [14] [15]. The functionalisation of rGO with nanoparticles and plasma treatment can improve sensor functionality and selectivity in distinguishing different vapours [11]. Different functionalisation of sensing elements is a good technique to improve the gas sensor's sensitivity and characteristics. In this research, the sensing layer was deposited on Ti/Pt Interdigitated Electrode (IDE). The electrode was used to supply current flow from the power source to the device, which improved the sensing material's catalytic properties towards a specific gas [16]. Furthermore, the VOC sensor employed is a resistive type, which produces a signal based on a change in resistance in response to gas exposure. In general, VOC gas detection on a sensor is caused by the adsorption and desorption processes that occur between analytes and the sensor surface [17]. Oxygen ion species were absorbed on the sensor surface in the presence of air (humidity) and lowered the electron from the conduction band [18]. The electron density is falling off and forming an electron depletion layer and barrier potential on the surface. Electron removal causes an increase in the depletion layer. The related equation for chemisorbed oxygen at temperatures less than 100°C [19] is as follows: O2 (gas)→O2 (adsorption) (1) O2 (gas)+ e - (surface)↔ O2 - (adsorption) (<100℃) (2) When VOC gas was introduced into the chamber, the gas molecules started to react with the absorbed oxygen ions and released electrons back into the conduction band. The predominant carrier in the sensors was modified by the reaction of the VOC gas (oxidizing or reducing agent) with the molecules in the sensing layer, resulting in an increasing or decreasing in the resistance measurement as the output [19]. Reduced Graphene Oxide has been reported to exhibit p-type behaviour [20]. However, the functionalised sensor was shown to be an n-type semiconductor in this VOC test, and the VOC analytes acted as reducing gases [18]. The sensor experienced an electron carrier majority, causing a decrease in depletion width and potential barrier. As a result, the sensor resistance decreased in the presence of VOC gas [21]. 409 IIUM Engineering Journal, Vol. 24, No. 2, 2023 Mohd Tombel et al. https://doi.org/10.31436/iiumej.v24i2.2832 3. RELATED WORKS 3.1. Feature Extraction Feature extraction is a technique that is used to extract significant information from the sensor response graph [6] to ensure better performance of machine learning algorithms in pattern recognition [8]. The information is deemed relevant when the derived value extracted from the measured data is non-redundant, not correlated with other features and projects the decisive features [22]. Other than that, feature selection is also related to the dimensionality reduction process of transforming high dimensional data into a low dimensional feature [8]. Detection of VOCs using gas sensors commonly used real-time analysis and discrimination of “breath prints” to perform the gas classification process [2]. In 2012, Vergara and his team applied 8 feature extraction from the time-series sensor, which are the change of the maximal resistance change (ΔR), the normalized resistance change (||ΔR||), minimum and maximum exponential moving average(ema) with a value of 𝛼 = 0.001,0.01,0.1 each [10]. On the other hand, many features can be extracted from raw signals and applied in electronic nose applications. Commonly extracted features from gas original response curves such as maximum response, the response of special time, time of special response, area, integral, derivative, difference and second derivative [6]. Table 1 shows a few lists of feature extraction from electronic nose sensor data for wound detection [23]. Table 1: List of Feature Extraction for Wound Detection based Electronic Nose [23] Feature Description Normalization Preprocessing the sensor data for features from the steady-state response, eliminate the effect of a concentration difference on recognition. Integral and derivative methods Integrals may represent the accumulative total of the reaction degree change and derivatives may represent the rate at which the sensor reacts to the odour. The Root Mean Square Error (RMSE) of curve fitting Depends on the type of model and the number of parameters in the model. Fourier transform and wavelet transform Fourier transform decomposes the original response curve into a superposition of the DC component and different harmonic components. 3.2. Supervised Machine Learning There were few studies which implemented the detection of different gases by Supervised learning models such as k-Nearest Neighbour (kNN), Support Vector Machine (SVM), Artificial Neural Network (ANN), Random Forest (RF) and Logistic Regression. The findings were summarised in Table 2. There are two gaseous flows in the system: for carrier gas and VOC gas. Clean Dry Air (CDA) was used as carrier gas, while isoprene, toluene, and acetone as the targeted VOC gas. The temperature of the gas and temperature chuck in the sensor chamber were controlled using a Cellkraft Humidifier P-10 and a Nextron Temperature controller module (Nextron Microprobe Station, with platier heater and 4 probe needles). Agilent SMU 34410A was used to drive the voltage and input current. A data acquisition (DAQ) system that is used to convert the output/measured signal from the sensor system into the computer is via a user interface software that is programmed using the LabVIEW program, provided by MIMOS. 410 IIUM Engineering Journal, Vol. 24, No. 2, 2023 Mohd Tombel et al. https://doi.org/10.31436/iiumej.v24i2.2832 Table 2: Summary of Implementation of Supervised Learning Algorithms on Different Types of Gas Detection. Model Description References k-Nearest Neighbour (Knn) kNN is widely used in the classification of mixed gas and for gas discrimination systems. The kNN model is advantageous because it is comprehensible, insensitive to noise, low cost for retaining and good combination with other algorithms. However, this model is sensitive to sample distribution, it has a slow speed for recognition, high spatial complexity, heavy calculation burden and poor interpretability. [24] Support Vector Machines (SVM) SVM of classifiers can cope well with gas sensor drift and perform better than the baseline competing methods on the extensive dataset. However, the SVM model requires a long learning time and poor application for larger data. Choice of kernel function is important as it is the key for feature space in SVM. [10],[24] Artificial Neural Network (ANN) ANN is the frequently used method in predicting and analysing complex gas (Hashoul & Haick, 2019). It has good learning ability, good parallel processing capability and detecting compatibility error. However, this model has poor interpretability for output, long time learning and is easy to overfit. Therefore, weight, activation function and the number of hidden layers are important to develop an ANN algorithm in performing the classification of targeted output. [24],[25] Random Forest (RF) RF model is used in a lot of feature datasets as it can prevent overfitting from a decision tree algorithm. In the Random Forest algorithm, the number of trees affected the accuracy of the model, as each tree has a classification result and the final result is based on the majority decision trees vote [26], [36] Logistic Regression LR is a classification algorithm that calculates linear output and statistical function through the regression output. Logistic regression can perform multiclass classification problems by using one-vs-rest or one-vs-one wrapper models. The algorithm can be applied to a non-linear classification problem with a proper feature selection. LR model can produce high accuracy as it is a good signal to noise ratio. [27] 4. EXPERIMENTAL SETUP As illustrated in Fig 1, the gas sensing system for this study comprises a gas supply system, a sensor chamber, a temperature and humidity controller module, and data collection system [28]. Fig 2 showed the DAQ board of the LabVIEW that contain all the controller variables for the test measurement such as flow of the CDA, flow of the VOC gas, input current, input voltage, temperature inside the chamber, temperature of the sensor's heater, relative humidity, and system ramp rate. Then, the sensor was tested individually with the targeted VOC gas and the sensor’s responses were recorded to study performance of the individual sensor. 411 IIUM Engineering Journal, Vol. 24, No. 2, 2023 Mohd Tombel et al. https://doi.org/10.31436/iiumej.v24i2.2832 Fig. 1. Schematic Diagram of the Experimental Setup for Gas Detection System. Fig. 2. Manual control of LabVIEW Gas System. 4.1. VOC Sensor The gas sensor used in this study is called a VOC sensor, which is prepared, fabricated and functionalised by the engineering team at MIMOS Bhd. Reduced Graphene Oxide (rGO) as a sensing membrane was deposited on the Platinum-titanium interdigitated electrode (Pt/Ti IDE) on a silicon and silicon dioxide (Si/SiO2) substrate. The rGO was functionalised with nanoparticles such as; gold (Au), silver (Ag) and platinum (Pt) and plasma treatment such as; hydrogen (H2) and Octafluorocyclobutane (C4F8). The sensor was fabricated using a standard semiconductor process using Chemical Vapor Deposition (CVD) by a standard lithography process for the functionalisation with different recipes. The rGO was functionalised with nanoparticles at a different duration of sputtering and Relative Frequency (RF) power, while functionalisation with plasma treatment at a variety of plasma power and temperature. Therefore, there are 21 individual VOC gas sensors used in this study and the details are according to Table 3. Next, the pre-processed signal proceeded with a feature extraction method to extract pertinent information to be input for supervised machine learning at classifying the gas components into targeted gas output. The features were 412 IIUM Engineering Journal, Vol. 24, No. 2, 2023 Mohd Tombel et al. https://doi.org/10.31436/iiumej.v24i2.2832 decided to extract from the original gas response involving measured resistance in the absence and presence of the VOC gas. Table 3: List of Functionalisation and Recipe of VOC Sensors Sample no. Nanoparticles Power [WRF] Time [sec] Remarks 1 Reference rGO film Bare Rgo 2 Au 30 15 rGO/Au (30W 15s) 3 Au 30 75 rGO/Au (30W 75s) 4 Au 70 15 rGO/Au (70W 15s) 5 Au 70 75 rGO/Au (70W 75s) 6 Pt 30 15 rGO/Pt (30W 15s) 7 Pt 30 75 rGO/Pt (30W 75s) 8 Pt 70 15 rGO/Pt (70W 15s) 9 Pt 70 75 rGO/Pt (70W 75s) 10 Ag 30 15 rGO/Ag (30W 15s) 11 Ag 30 75 rGO/Ag (30W 75s) 12 Ag 70 15 rGO/Ag (70W 15s) 13 Ag 70 75 rGO/Ag (70W 75s) Sample no. Plasma Treatment Power (WRF) Temperature (℃) Remarks 14 H2 - RT rGO/H2 RT ℃ 15 H2 - 200 rGO/H2 200 ℃ 16 H2 - 400 rGO/H2 400 ℃ 17 H2 - 700 rGO/H2 700 ℃ 18 C4F8 150 - rGO/C4F8 150 ℃ 19 C4F8 200 - rGO/C4F8 200 ℃ 20 C4F8 250 - rGO/C4F8 250 ℃ 21 C4F8 300 - rGO/C4F8 300 ℃ 4.2. Data Collection The sensors were tested individually with each of the selected VOC gas. The sensor was placed in a chamber with 30℃ of temperature and presence of 40% relative humidity (RH). The voltage and current input were set at 1V and 1.2A respectively. CDA was maintained at 1 L/min for 5 minutes to stabilize the baseline reading. Then, the VOC gas was purged into the chamber with a gradual increase of concentrations, from 1 to 6 ppm in 12 minutes (2 minutes for each concentration). The sensor responses were analysed from the resistance changes of individual sensors that undergo the VOC test. 5. DATA PRE-PROCESSING AND FEATURE EXTRACTION In this phase, the analytes of the VOC gas were reacting with the sensing element, thus leading to a change in resistance. The sensor response was determined by analysing the measured resistance as a signal output from each sensor. However, the parameter setup was not in optimal condition and the output signal contained unexpected noise from the SMU system. A typical sensor response could not be seen clearly from the graph of resistance versus time. As a result, the signal was pre-processed by applying filter and smoothing methods to denoising the signal and reduce the influence of random variation caused by instrumental conditions and atmospheric effects [29]. The data was filtered using a moving average (MA 413 IIUM Engineering Journal, Vol. 24, No. 2, 2023 Mohd Tombel et al. https://doi.org/10.31436/iiumej.v24i2.2832 length = 3) and smoothed with Minitab software using a single exponential method with a constant = 0.02 value. The sensor response was determined by using the formula [30][31]: ∆𝑅 = 𝑅𝑔 − 𝑅𝑎 (3) 𝑆 = ∆𝑅 𝑅𝑎 = 𝑅𝑔− 𝑅𝑎 𝑅𝑎 × 100(%) (4) Where, 𝑅𝑎 = resistance in clean dry air, without VOC gas 𝑅𝑔 = resistance with the exposure of VOC gas Next, the pre-processed signal proceeded with a feature extraction method to extract pertinent information as input for supervised machine learning at classifying the gas components into targeted gas output. The features were decided to extract from the original gas response involving measured resistance in the absence and presence of the VOC gas. The 10 selected features as listed in Table 4. Table 4: List of Features and Formula for the 10 Selected Features No. Feature Formula 1 Rgas Resistance at steady-state phase 2 R0 Baseline resistance 3 Sensor Response, S ∆𝑅 = 𝑅𝑔𝑎𝑠 − 𝑅𝑎𝑖𝑟 𝑅𝑎𝑖𝑟 4 Difference, ∆𝑅 ∆𝑅 = 𝑅𝑔𝑎𝑠 − 𝑅𝑎𝑖𝑟 5 Relative difference 𝑅𝑔𝑎𝑠 𝑅𝑎𝑖𝑟 6 Log relative resistance value 𝑙𝑜𝑔( 𝑅𝑔𝑎𝑠 𝑅𝑎𝑖𝑟 ) 7 Normalisation | 𝑅𝑔𝑎𝑠 − 𝑅𝑎𝑖𝑟 𝑅𝑎𝑖𝑟 | 8 G Rgas 1 𝑅𝑔𝑎𝑠 9 G R0 1 𝑅𝑎𝑖𝑟 10 Conductance difference 𝐺𝑔𝑎𝑠 - 𝐺𝑎𝑖𝑟 The VOC dataset comprises ten feature extraction values and is organised into three categories. In summary, there are 918 total samples, with 252, 324, and 342 each for acetone, toluene, and isoprene gas, respectively. 6. SUPERVISED LEARNING FOR VOC GAS CLASSIFICATION Five supervised learning models including k-Nearest Neighbour (kNN), Random Forest (RF), Logistic Regression (LR), Support Vector Machines (SVM) and Artificial Neural Networks (ANN) were benchmarked for the VOC gas classification. The model was implemented using the Python and Scikit-learn library. Each model's parameter settings are described in Table 5. To avoid bias in the analysis, the dataset was first standardised in the range 0 to 1 to uniform the values with different scales by using the min-max normalisation technique [32]. 414 IIUM Engineering Journal, Vol. 24, No. 2, 2023 Mohd Tombel et al. https://doi.org/10.31436/iiumej.v24i2.2832 Following that, the dataset was divided into 70% for the training set and 30% for the testing set. The performances of each model are then evaluated from confusion matrix-based measures in terms of accuracy, precision and using k-fold cross-validation technique, where k =10. K- fold Cross Validation is a cross validation technique used to evaluate the performance of a machine learning model by the resampling procedure. The training of the models proceeds using the k-1 parts and validation or testing errors from the remaining part [33]. Table 5: Parameter Setting of the Approached Supervised Machine Learning Model Parameter K-nearest Neighbour The K-value is decided as one, (k=1) and distance between two points is calculated by applying the distance metric formula (2), (mentioned in chapter 2), with p = 2, to manipulate the generalised distance to Euclidean Distance. Random Forest Grid search for the setting parameter, with n-estimator:100. Artificial Neural Network A shallow Neural Network was implemented, with a standard three-layer feed-forward network. For the hidden layer, the size was set up to 50 and used the ReLU activation function. While Softmax activation function for output layer with learning rate 0.001 and 50 epochs. Support Vector Machine (RBF kernel) Radial basis function (RBF) was selected as a kernel function for this SVM model, as defined in equation (6) (in Chapter 2) The kernel function, σ and regularisation, C used GridSearchCV from sci- kit- learn library to perform grid search for parameter setting. Logistic Regression The model used ‘l2’ for regularisation (penalty) and solver ‘lbfgs’. 7. RESULT Figures 3 a) to e) showed the results of the confusion matrices for each proposed supervised learning method. Meanwhile, Table 6 below showed the accuracies from the 10- fold cross validation from each model. The diagonal values in the confusion matrix denoted the accuracy values of the gas classification to the targeted output [34]. Figure 3 shows that the kNN and RF models performed well in classifying each of the targeted VOCs. The kNN model correctly predicted all three gases with greater than 80% accuracy, while the Random Forest model predicted them with greater than 70% accuracy. On the other hand, Logistic Regression, Support Vector Machine (kernel = RBF) and Artificial Neural Network performed poor classification on the 3 VOCs gases. The SVM and ANN models misclassified the gas more to isoprene gas while LR model misclassified the toluene and isoprene gas. It is noticeable in Table 6, the RF model showed the highest mean accuracy with 0.813 ±0.035, followed by the kNN model with 0.803 ±0.033. RF model are known to have advantages in the process of random sampling which can ensure randomness and avoid overfitting. Besides, this model is also robust to noise [5] and it is good at handling missing data and imbalance classes [4]. Whereas, the kNN model has limitations in understanding the relationship between the features and the class (output) thus easily producing the wrong classification for a multiclass problem [4]. Therefore, the highest accuracy achieved by kNN in this study showed that the features were related well to the output class. 415 IIUM Engineering Journal, Vol. 24, No. 2, 2023 Mohd Tombel et al. https://doi.org/10.31436/iiumej.v24i2.2832 a) b) c) d) e) Fig. 3. Comparison of Normalised Confusion Matrix of a) kNN, b) RF, c) LR, d) SVM (kernel= RBF) and e) ANN. However, the ANN, LR and SVM (Polynomial kernel) models had poor performance compared with the RF and kNN model with accuracies of 0.447 ± 0.035, 0.403 ± 0.041 and 0.419 ± 0.035 respectively. Meanwhile, the SVM model with Grid search parameters showed 416 IIUM Engineering Journal, Vol. 24, No. 2, 2023 Mohd Tombel et al. https://doi.org/10.31436/iiumej.v24i2.2832 high accuracy by using a Polynomial kernel at degree = 3, compared with other kernels such as Linear and RBF. Table 6: Model Performance based on 10-Fold Cross-Validation Technique. Model 10-Fold Cross-Validation (Mean Accuracy ± Standard Deviation) Random Forest 0.813 ± 0.035 K-Nearest Neighbors 0.803 ± 0.033 Logistic Regression 0.403 ± 0.041 Support Vector Machine (Kernel = Polynomial) 0.419 ± 0.035 Support Vector Machine (Kernel = Linear) 0.401 ± 0.039 Support Vector Machine (Kernel = RBF) 0. 408 ± 0.055 Artificial Neural Network 0.447 ± 0.035 The poor performance from the ANN, LR and SVM (Polynomial kernel) models might be due to their weakness, in which they are very prone to overfitting training data [33] and they required testing with various kernels and model parameters [4]. The ANN model also is a learning-based algorithm and is more complex in architecture. Thus, it has more hyperparameters required to be tuned [33] and it needs enough samples for training. Other than that, the ROC curve and AUC value are another way of visualising the output performances from the computed confusion matrix. Evaluation of the Receiver Operating Characteristics (ROC) and Area Under Curve (AUC) was done to analyse the performance of the classifiers. The highest value of the AUC showed good value prediction of the model to assign a larger probability to a random positive example than a random negative example [35]. The AUC value should be between 0.5 and 1.0. The ROC curves for each classifier are illustrated in Figure 4. As the minimum is 0.541, it can be said that the SVM classifier does not predict our dataset very well and could not differentiate the classes, while the highest value of AUC goes to the KNN classifier which is equal to 0.886 and 0.885 for the RF model. As a result, in this research, the kNN and RF are two models that can deal with the selected features in the VOC dataset as they obtained the highest accuracy for the gas classification. Fig. 4. ROC Curve of the 5 Supervised Learning Model 417 IIUM Engineering Journal, Vol. 24, No. 2, 2023 Mohd Tombel et al. https://doi.org/10.31436/iiumej.v24i2.2832 8. CONCLUSION The gas sensor data was collected at a preliminary stage and has been used for the machine learning part, which involved pre-processing, feature extraction and classification algorithm. Each sensor was performing well at low operating temperatures and in the presence of humidity. The sensors response on different targeted VOC gas from 1 to 6 ppm were collected. Then, feature extracted were performed on the resistance-based data. Then, feature extracted were performed on the resistance-based data. Ten featured were proposed as inputs to five supervised learning algorithms to accurately recognise and classify the selected VOC gas based on the labelled output. The confusion matrix and 10-Fold Cross Validation were used to evaluate each model's performance. As a result, the RF and kNN models have higher accuracy with 0.813 ± 0.035and 0.803 ± 0.033, compared with LR, SVM and ANN with the accuracy of 0.447 ± 0.035, 0.403 ± 0.041 and 0.419 ± 0.035 respectively. The two highest accuracies achieved by RF and kNN models demonstrated that they distinguished the gas well from the VOC dataset. Despite the gas sensor's shortcomings, such as low sensitivity, selectivity, and noise in the sensor signal output, the findings of this study can be utilised as a guide for selecting the optimum algorithm for dealing with a gas sensor array. The performance of the kNN and RF models is the Proof of Concept that the algorithm can perform gas classification tasks from the simplest feature selected from the steady-state phase. The feature extraction approach, on the other hand, can be discovered more from the raw signal to build a dataset with more significant features and relevant information to improve the algorithm's performance. ACKNOWLEDGEMENT This project is a collaboration between the International Islamic University Malaysia (IIUM) with the Centre of Unmanned Technologies, Kulliyyah of Engineering and Department of R&D, MIMOS Bhd. This research was partially sponsored by the Fundamental Research Grant Scheme (FRGS19159-0768) and MIMOS Berhad (SPG21-015-0015). Great thanks to research team at Department of Research & Development, MIMOS Berhad; Dr. Ismahadi Syono, Firzalaila Syarina Md Yakin and Siti Aishah Mohamad Badaruddin for their guidance and supervision during data collection and for device preparation. REFERENCES [1] Krisher S, Riley A, Mehta K. (2014) Designing breathalyser technology for the developing world: How a single breath can fight the double disease burden. Journal of Medical Engineering and Technology, 38(3), 156–163. doi:10.3109/03091902.2014.890678. [2] Dragonieri S, Pennazza G, Carratu P, and Resta O. (2017) Electronic Nose Technology in Respiratory Diseases. Lung, 195 (2):157–165. doi:10.1007/s00408-017-9987-3. [3] Thriumani R, Zakaria A, Hashim YZH, Jeffree AI, Helmy KM, Kamarudin LM, Omar MI, Shakaff AYM, Adom AH, Persaud KC. (2018) A study on volatile organic compounds emitted by in-vitro lung cancer cultured cells using gas sensor array and SPME-GCMS. BMC Cancer, 18:362 doi:10.1186/s12885-018-4235-7. [4] Chen C, Lin W, Yang H. (2020) Diagnosis of ventilator-associated pneumonia using electronic nose sensor array signals: solutions to improve the application of machine learning in respiratory research. Respiratory Research, 21:45. doi:10.1186/s12931-020-1285-6. [5] Hu W, Wan L, Jian Y, Ren C, Jin K, Su X, Bai X, Haick H, Yao M, Wu W. (2018) Electronic Noses: From Advanced Materials to Sensors Aided with Data Processing. Advanced Materials Technologies, 4(2),1-38. doi: 10.1002/admt.201800488. 418 IIUM Engineering Journal, Vol. 24, No. 2, 2023 Mohd Tombel et al. https://doi.org/10.31436/iiumej.v24i2.2832 [6] Yan J, Guo X, Duan S, Jia P, Wang L, Peng C, Zhang S. (2015) Electronic Nose Feature Extraction Methods: A Review. Sensors, 15, 27804-27831. doi:10.3390/s151127804 27804–27831. [7] Zulkhairi MA, Mustafah YM, Abidin ZZ, Zaki HFM, Rahman HA. (2019) Car Detection Using Cascade Classifier on Embedded Platform. 7th International Conference on Mechatronics Engineering (ICOM), Putrajaya, Malaysia, pp. 1-3, doi: 10.1109/ICOM47790.2019.8952064. [8] Ansari AQ, Khusro A, Ansari MR. (2016) Performance evaluation of classifier techniques to discriminate odors with an E-Nose. 12th IEEE International Conference Electronics, Energy, Environment, Communication, Computer, Control: (E3-C3), INDICON, 2325-9418. doi:10.1109/INDICON.2015.7443838. [9] Yi Z, Li C. (2019) Anti-Drift in Electronic Nose via Dimensionality Reduction: A Discriminative Subspace Projection Projection Approach. IEEE Access, 7, 170087–170095. doi:10.1109/ACCESS.2019.2955712. [10] Vergara A, Vembu S, Ayhan T, Ryan MA, Homer ML, Huerta R. (2012) Chemical gas sensor drift compensation using classifier ensembles. Sensors and Actuators, B: Chemical, 166–167, 320–329. doi: 10.1016/j.snb.2012.01.074. [11] Liu B, Huang Y, Kam KW, Cheung WF, Zhao N, Zheng B. (2019) Functionalized graphene-based chemiresistive electronic nose for discrimination of disease-related volatile organic compounds. Biosensors and Bioelectronics: X, 1, 100016. doi: 10.1016/j.biosx.2019.100016. [12] Gargiulo V, Alfano B, Capua RD, Alfè M, Vorokhta M, Polichetti T, Massera E, Miglietta ML, Schiattarella C, Francia GD. (2018) Graphene-like layers as promising chemiresistive sensing material for detection of alcohols at low concentration. Journal of Applied Physics, 123, 024503. doi:10.1063/1.5000914. [13] Lu G, Ocola LE, Chen J. (2009) Reduced graphene oxide for room-temperature gas sensors. Nanotechnology, 20, 445502, 1-9. doi:10.1088/0957-4484/20/44/445502. [14] Lee K, Yoo YK, Chae MS, Hwang KS, Lee J, Kim H, Hur D, Jeong HL. (2019) Highly selective reduced graphene oxide (rGO) sensor based on a peptide aptamer receptor for detecting explosives. Sci Rep, 9, 10297. Scientific Reports, 9. doi:10.1038/s41598-019-45936-z. [15] Tian W, Liu X, Yu W. (2018) Research progress of gas sensor based on graphene and its derivatives: A review. Applied Sciences (Switzerland), 8(7). doi:10.3390/app8071118. [16] Lee SP. (2017) Electrodes for Semiconductor Gas Sensors. Sensors, 17, 683; doi:10.3390/s17040683. [17] Wang C, Yin L, Zhang L, Xiang D, Gao R. (2010) Metal Oxide Gas Sensors: Sensitivity and Influencing Factors. Sensors, 10, 2088-2106. doi:10.3390/s100302088. [18] Baharuddin AA, Ang BC, Hoong Y, Haseeb ASMA, Wong YC. (2019) Materials Science in Semiconductor Processing Advances in chemiresistive sensors for acetone gas detection. Materials Science in Semiconductor Processing, 103, 104616. doi: 10.1016/j.mssp.2019.104616. [19] Amiri V, Roshan H, Mirzaei A, Neri G, Ayesh AI. (2020) Nanostructured Metal Oxide-Based Acetone Gas Sensors: A Review. Sensors, 20, 3096. doi:10.3390/s20113096. [20] Norizan MN, Zulaikha S, Demon N, Halim NA. (2021) The frontiers of functionalized graphene- based nanocomposites as chemical sensors. Nanotechnology Reviews, 10: 330-369. doi:10.1515/ntrev-2021-0030. [21] James F, Fiorido T, Bendahan M, Aguir K. (2017) Comparison between MOX sensors for low VOCs concentrations with interfering gases ALLSENSORS, pp.39-40. [22] Phillips CO, Syed Y, Parthaláin NM, Zwiggelaar R, Claypole TC, Lewis KE. (2012) Machine learning methods on exhaled volatile organic compounds for distinguishing COPD patients from healthy controls. Journal of Breath Research, 6(3). doi: 10.1088/1752-7155/6/3/036003. [23] Yan J, Tian F, He Q, Shen, Y. (2012) Feature Extraction from Sensor Data for Detection of found Pathogen Based on Electronic Nose. Sensors and Materials, 24(2), 57–73. [24] Feng S, Farha F, Li Q, Wan Y, Xu Y, Zhang T, Ning H. (2019) Review on smart gas sensing technology. Sensors (Switzerland), 19(17), 1–22. doi:10.3390/s19173760. [25] Hashoul D, Haick H. (2019) Sensors for detecting pulmonary diseases from exhaled breath. European Respiratory Review, 28(152). doi:10.1183/16000617.0011-2019. 419 IIUM Engineering Journal, Vol. 24, No. 2, 2023 Mohd Tombel et al. https://doi.org/10.31436/iiumej.v24i2.2832 [26] Xu Y, Zhao X, Chen Y, Yang Z. (2019) Research on a mixed gas classification algorithm based on extreme random trees. Applied Sciences (Switzerland), 9(9). doi:10.3390/app9091728. [27] Thorson J, Collier-Oxandale A, Hannigan M. (2019) Using A Low-Cost Sensor Array and Machine Mixtures and Identify Likely Sources. Sensors, 19, 3723. doi:10.3390/s19173723. [28] Tombel NSM, Badaruddin SAM, Yakin FSM, Zaki HFM, Syono MI (2021) Detection of low PPM of volatile organic compounds using nanomaterial functionalized reduced graphene oxide sensor. AIP Conference Proceedings, 2368 vol. 020004. doi.org/10.1063/5.0057775. [29] Smolinska A. (2014) Current breathomics - A review on data pre-processing techniques and machine learning in metabolomics breath analysis. Journal of Breath Research, 8, 027105: 22. doi:10.1088/1752-7155/8/2/027105. [30] Das S, Jayaraman V. (2014) Progress in Materials Science SnO2: A comprehensive review in structures and gas sensors. Progress in Materials Science, 66, pp. 112-255. doi: 10.1016/j.pmatsci.2014.06.003. [31] Maity A, Raychaudhuri AK, Ghosh B. (2019) High sensitivity NH3 gas sensor with electrical readout made on paper with prevoskite halide as sensor material. Scientific Report, 9, 7777. doi: 10.1038/s41598-019-43961-6. [32] Webb AR (2002) Statistical Pattern Recognition Statistical Pattern Recognition Second Edition. John Wiley & Sons,Ltd, vol. 9. [33] Bastuck M. (2019) Improving the Performance of Gas Sensor Systems with Advanced Data Evaluation, Operation and Calibration Methods. Linköping University Electronic Press, 298. [34] Kong C, Zhao S, Weng X, Liu C, Guan R, Chang Z. (2019) Weighted Summation: Feature Extraction of Farm Pigsty Data for Electronic Nose. IEEE Access, vol. 7, pp. 96732-96742. doi:10.1109/access.2019.2929526. [35] Allwright S. (2022) How to interpret AUC score. Retrieved from https://stephenallwright.com/interpret-auc-score/. [36] Nyssa SSC, XueVZ, Justin TN, V. R. Saran KC, Raia CF, Michael JL, Thomas LW, Mike F, Gregory JS, Philippe B, Christopher JH, Steven JK. (2022) Machine Learning-Based Rapid Detection of Volatile Organic Compounds in a Graphene Electronic Nose. ACS Nano 2022 16 (11), 19567-19583. doi: 10.1021/acsnano.2c10240. 420 https://doi.org/10.1016/j.pmatsci.2014.06.003