CET-vol95 DOI: 10.3303/CET2295013 Paper Received: 15 April 2022; Revised: 15 June 2022; Accepted: 27 May 2022 Please cite this article as: Cruz C., Aleixandre M., Matatagui D., Horrillo M.C., 2022, An Artificial Olfactory System for Toxic Compounds Classification Using Machine Learning Techniques, Chemical Engineering Transactions, 95, 73-78 DOI:10.3303/CET2295013 CHEMICAL ENGINEERING TRANSACTIONS VOL. 95, 2022 A publication of The Italian Association of Chemical Engineering Online at www.cetjournal.it Guest Editors: Selena Sironi, Laura Capelli Copyright © 2022, AIDIC Servizi S.r.l. ISBN 978-88-95608-94-5; ISSN 2283-9216 An Artificial Olfactory System for Toxic Compounds Classification using Machine Learning Techniques Carlos Cruza, Manuel Aleixandreb, Daniel Matataguia, Mari Carmen Horrilloa* a SENSAVAN, Instituto de Tecnologías Físicas y de la Información (ITEFI), CSIC, 28006 Madrid, Spain b Institute of Innovative Research, Tokyo Institute of Technology, Yokohama 226-8503, Japan carmen.horrillo.guemes@csic.es The long-term exposure to nitrogen dioxide produces harmful effects for humans and any living being. Thus, in security applications, sensor arrays are required for detecting nitrogen dioxide by interfering gas classification. In this work, a compact and intelligent electronic nose (e-nose) based on a Shear-Horizontal Surface Acoustic Wave (SH-SAW) sensor array is proposed for sensing, classifying, and calibrating toxic chemicals. Different carbon-based nanostructured materials are deposited as sensitive layers providing excellent outcomes by mass and elastic changes in this type of sensors. The HS-SAW sensors achieve a high sensitivity, fast response, and reproducibility to different toxic gases such as nitrogen dioxide, carbon monoxide, ammonia, benzene and acetone. The gas flows were controlled by an automated system that consists of four mass flow controllers to obtain the desired concentrations. The e-nose provides an efficient performance with supervised machine learning techniques. Outcomes indicate that Linear Discrimination Analysis (LDA) performs a 90% precise discrimination on test dataset and provides a clear discrimination of NO2 with interfering toxic compounds. On the other hand, K-Nearest Neighbors (KNN) and Logistic Regression (LR) also achieve excellent classification scores (95% and 79% respectively). Decision surface for toxic compounds of different classification algorithms were also performed achieving good classification. An evaluation and comparison of the prediction methods: Partial Least Square (PLS), Artificial Neural Networks (ANNs) and cascade of ANNs are accomplished. The ANN cascade results show that this technique is an excellent candidate for an accurate prediction and classification of NO2. Therefore, the designed and validated e-nose is a promising on-line tool of analysis for environmental applications. 1. Introduction The performance of a gas sensor depends mainly on the proper use of sensing materials, the low noise and high accuracy of the signal acquisition system (Matatagui et al.,2019; Santos et al., 2012). In this work, the Shear Horizontal Surface Acoustic Wave (HS-SAW) sensors based on carbon sensing materials exhibit excellent sensitivity, response/recovery time, reproducibility, and long-term stability (Jha et al., 2009; de la O- Cuevas et al., 2021). In addition, data processing is an important factor, as the success of the Machine Learning (ML) process relies on it. It aims at extracting robust feature information from the dynamic response of the sensors, which can represent the unique "fingerprint" patterns for a particular gas. To ensure the effectiveness of the subsequent pattern recognition algorithm, ML techniques such as K-nearest neighbors (KNN), Partial Least Squares (PLS) or Artificial Neural Network (ANN) (Aleixandre et al. 2014; Yaqoob et al., 2021; Gutierrez- Osuna et al., 2002, Covington et al., 2021) have been widely used in the achievement of highly selective gas sensors. It is therefore of great importance that sensor signal processing (e.g., algorithms) is integrated into implementing electronic nose for realistic applications. In this study, we develop a complete system of carbon-based HS-SAW sensor array together with signal processing units and ML algorithms in a smart and compact gas sensor system. The modular architecture provides a very self-contained and versatile platform, incorporating besides the ML capabilities for gas sensing. 73 mailto:carmen.horrillo.guemes@csic.es 2. Experimental setup Figure 1 shows the experimental design and the processing abilities we have employed for this study. The experimental setup consisted of the e-nose (Figure 2a), the automatic gas line (Figure 2b) and the signal processing system to evaluate the discrimination and the prediction of the gas tested. Figure 1: Experimental setup 2.1 Electronic nose The system was designed using 1) a carbon-based sensor array, 2) a data acquisition stage, 3) a signal conditioning stage and 4) data transmission and software application stages (Figure 2a). The sensor array was developed with four carbon-based nanostructured materials, such as, mesoporous carbon (MC), reduced graphene oxide (rGO), graphene oxide (GO) and polydopamine/reduced graphene oxide (PDA/rGO). The sensors and the signal conditioning modules were mechanically adapted allowing an easy access to carry out changes and manipulation. The signal conditioning module feeds each sensor into a feedback loop that consists of two amplifier steps and a directional coupler. The output of the coupler was used to sample the oscillator frequency. A multiplexer selected one of the sensor-oscillator signals as a single output, which is mixed with the reference oscillator signal. This signal allows the operating frequency to be reduced and compensates for temperature and noise disturbances. In this way, a difference signal is obtained. The output signals are processed (filtered and amplified ones by the analogue-to-digital converter port of the teensy microcontroller), and the teensy module was used as a frequency counter (Matatagui et al., 2019). 2.2 Automatic gas line An automated flow system controls the flowmeters and allows us to select the gases that enter into the gas cell and the different tested concentrations (Figure 2b). We have used synthetic air as carrier gas. More specifically, the gas control was performed by three Bronkhorst flowmeters. Their control and reading have been performed by two acquisition cards ADAM-4017 and ADAM-4024. The configuration and the reading of the flowmeters have been developed in a LabVIEW acquisition system that performs the pre-treatment and extraction of features of the gas measurements. (a) (b) Figure 2: SH-SAW e-nose (a) and implementation of the automatic gas line (b). 74 2.3 Processing system The e-nose communication was performed by cable through UART/FIFO controllers and by wireless communication with XBee protocol. The latter method was also employed for the control of the gas line. The outcomes obtained have been processed using ML techniques with a PC using the LabVIEW and the Matlab software. 3. Measurements Different concentrations of toxic gases were the core of the experiments to obtain the sensor responses: Ammonia (NH3), benzene (C6H6) and acetone (C3H6O) from 10 to 40 ppm; nitrogen dioxide (NO2) from 0 to 1 ppm; and carbon monoxide (CO) from 1 to 6 ppm. The exposure time to the gases was 2 minutes. The recovery time in air, among exposures, was 20 minutes. Figure 3 shows the responses obtained by the different sensors for concentrations of C3H6O, C6H6, and NH3 20 ppm, NO2 0.2 ppm and (CO) 2 ppm. Mixtures of gases of only two components were measured. NO2 with each interfering gas, and with variate humidity (20 % and 40 % of relative humidity) for over 150 measurements. Figure 3: Response and recovery times of a SH-SAW sensor array: GO, rGO, MC and PDA/rGO for specific concentrations of benzene, acetone, ammonia, carbon monoxide and nitrogen dioxide. 4. Data Analysis Six supervised ML techniques were implemented. One to discriminate the clustering analysis and five ones to validate for classification and prediction purposes, and in this way to determine the most efficient method. 4.1 Linear Discrimination Analysis (LDA) LDA is used to maximise separation among gases and minimise variance. Its use is focused on the type of point and/or feature and/or subspace that offers the most discrimination to separate the data. Thus, LDA reduces the degree of over-fitting due to dimensionality in non-regularised models. Figure 4 shows the results obtained for the NO2 and interferings. The discrimination accuracy was 90%. 75 Figure.4: LDA reached an accuracy of around 90% for NO2 and interferings on the given test data and classes. 4.2 K-nearest neighbor (KNN) KNN is a subset within ML techniques based on the biological neural networks of the human brain. It starts with an untrained network and establishes a training pattern in the input layer. Signals are then fed through the network and the output is determined at the last layer. This technique takes many labelled points and uses them to learn how to label other ones. Figure 5 shows a grid of points spanning the entire space within some bounds of the sensor responses. KNN correctly classified gases with an accuracy of 95%. However, there is not a clear classification for acetone and benzene over a 3-nearest neighbour classifier training. Figure 5 shows how to plot the decision surface for aive Bayes and Classification tree, which are compared with LDA and KNN classification algorithms. Figure 5: Classification algorithms generate a decision-making rule visualized in the form of a decision surface. 3 repeated measurements are performed from each of six classes. 4.3 Logistic regression (LR) LR achieves very good performances with linearly two separable classes. A 79% accuracy in the gas classification has been achieved to determine if a new sample fits into the interfering category. 4.4 Partial least squares (PLS) PLS regression is a quick, efficient, and optimal method based on the standard mathematical approach for fitting a linear regression. The algorithm returns the relative mean absolute error (RMAE) metric for its evaluation and response loadings of the gas responses. 76 4.5 Artificial neural network (ANN) ANN is a highly capable ML technique to perform nonlinear and complex tasks with a high accuracy degree. The ANNs were structured into ensembles consisting of 10 feed-forward networks evaluated with the same trained dataset of the normalized sensor responses. The output of the ensembles was the robust mean (removing the lowest and highest outputs of the prediction). The ANN ensembles make the possibility of deviations is less likely fora robust prediction. 4.6 Cascade of ANN Cascade of ANN is a useful tool to improve the learning since new information is added to an already-trained network. We built and trained three ANN ensembles (NO2, interfering gases and humidity). Initially, we trained two ensembles for the NO2 and the interfering gases. The process was repeated with a third ensemble for only NO2. Two additional entries were introduced into this step, which corresponded to the results of the two gases previously obtained. Next, we trained a new ensemble with the humidity responses. The lowest error to predict concentrations was the obtained one from this ensemble. RMAE is a valuable precision metric if large errors with high consequences are obtained. This score describes the average of the differences among predicted and observed values. Table 1 summarize the results of this type of Error for the newest ML techniques used. The ANN technique showed a clear improvement over PLS. The ANN cascade showed many better accuracy levels than PLS and ANNs. Figure 6 shows the results of the ANN cascade. However, the training and processing times increase considerably (179 s) if compared with PLS (9 s) and ANN (16 s) as the need to train and the evaluation of several networks increase their complexity with the number of interferings. Table 1: Metric of the error prediction obtained for the different gases measured through PLS, ANNs, and ANN Cascade. RMAE for NO2 (%) RMAE for Interferings (%) RMAE for Humidity (%) Time (s) PLS 13.12 17.90 2.01 9 ANN 9.92 15.15 0.30 16 Cascade of ANN 7.62 11.16 0.04 188 Figure 6: Prediction of the interferings and NO2 using the cascade of ANN ensemble. We have presented an innovative neural network structure that uses the parts that are easiest to regress and to assist the prediction of the other parts. Since the measurement space has no priory information from the combination of gases, the improvement must be due to the added information from the prediction at each step. An increasing in the number of neurons does not optimize the result. In addition, an increase of step number in the ANN cascade would increase the complexity of the system and would make a training more complex and difficult by increasing the processing time required. By contrast, a higher statistic will help to reduce the prediction error. 77 5. Conclusions In this work, an electronic nose has been presented to classify, discriminate NO2 with respect to interferings and humidity. After sensors’ responses treatment, final results showed that LDA, KNN are excellent candidates with an accuracy of more than 90%. In addition, three different methods have been used for NO2, interferings and moisture prediction: PLS, ANN and cascade of ANN. All the ML techniques provided excellent results. For instance, the error is reduced in the ANN implementation if compared to the PLS technique. The cascade of ANN ensemble can improve the prediction error by more than 50% for NO2 and interferings compared to the results obtained for a simple ANN network. This is probably since the networks optimize their results more efficiently with only one class at a time due to a smaller number of possible combinations. Once gases exhibit a well-differentiated response and their prediction is calculated, this information is better incorporated into the network as an independent input. As a result, LDA is an excellent method for discriminating NO2 and the cascade of ANN ensemble is a novel method for gas prediction through a portable electronic nose. Acknowledgments Funding: Spanish Ministry of Science and Innovation for financing the project RTI2018-095856-B-C22 (AEI/FEDER). References Aleixandre M., Matatagui D., Santos J. P., Horrillo M. Carmen., 2014, Cascade of Artificial Neural Network committees for the calibration of small gas commercial sensors for NO2, NH3 and CO, SENSORS, IEEE, pp. 1803-1806, doi: 10.1109/ICSENS.2014.6985376. Covington A., Marco S., Persaud K. C., Schiffman S. S., Nagle H T., 2021, Artificial Olfaction in the 21st Century, in IEEE Sensors Journal, vol. 21, no. 11, pp. 12969-12990, doi: 10.1109/JSEN.2021.307641. de la O-Cuevas E., Alvarez-Venicio V., Badillo-Ramírez I., Islas S. R., del Pilar Carreón-Castro M., Saniger J. M., 2021, Graphenic substrates as modifiers of the emission and vibrational responses of interacting molecules: The case of BODIPY dyes. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 246, 119020. Gutierrez-Osuna R., 2002, Pattern analysis for machine olfaction: A review. IEEE Sensors journal, 2(3), 189- 202. Matatagui D., Bahos F. A., Gràcia I., Horrillo M. Carmen., 2019. Portable low-cost electronic nose based on surface acoustic wave sensors for the detection of BTX vapors in air. Sensors, 19(24), 5406. Jha S. K., Yadava R. D. S., 2009, Preprocessing of SAW Sensor Array Data and Pattern Recognition, in IEEE Sensors Journal, vol. 9, no. 10, pp. 1202-1208. Santos, J. P., Aleixandre, M., & Cruz, C., 2012. Hand held electronic nose for VOC detection. Chemical Engineering, 30. Yaqoob U., Younis M. I., 2021, Chemical gas sensors: Recent developments, challenges, and the potential of machine learning—a review. Sensors, 21(8), 2877. 78 18cruz.pdf An Artificial Olfactory System for Toxic Compounds Classification using Machine Learning Techniques