Acta Polytechnica DOI:10.14311/AP.2019.59.0322 Acta Polytechnica 59(4):322–351, 2019 © Czech Technical University in Prague, 2019 available online at https://ojs.cvut.cz/ojs/index.php/ap A COMPARATIVE STUDY OF DATA-DRIVEN MODELING METHODS FOR SOFT-SENSING IN UNDERGROUND COAL GASIFICATION Ján Kačur∗, Milan Durdán, Marek Laciak, Patrik Flegner Technical University of Košice, Faculty BERG, Institute of Control and Informatization of Production Processes, Němcovej 3, 040 01 Košice, Slovak Republic ∗ corresponding author: jan.kacur@tuke.sk Abstract. Underground coal gasification (UCG) is a technological process, which converts solid coal into a gas in the underground, using injected gasification agents. In the UCG process, a lot of process variables can be measurable with common measuring devices, but there are variables that cannot be measured so easily, e.g., the temperature deep underground. It is also necessary to know the future impact of different control variables on the syngas calorific value in order to support a predictive control. This paper examines the possibility of utilizing Neural Networks, Multivariate Adaptive Regression Splines and Support Vector Regression in order to estimate the UCG process data, i.e., syngas calorific value and underground temperature. It was found that, during the training with the UCG data, the SVR and Gaussian kernel achieved the best results, but, during the prediction, the best result was obtained by the piecewise-cubic type of the MARS model. The analysis was performed on data obtained during an experimental UCG with an ex-situ reactor. Keywords: Underground coal gasification, syngas calorific value, underground temperature, time series prediction, machine learning, soft-sensing. 1. Introduction 1.1. Understanding UCG Technology Underground coal gasification (UCG) represents an in-situ controlled combustion of coal where valuable gases (i.e., syngas) are produced. The UCG represents an alternative to traditional coal mining methods. The UCG allows to mine coal from deep coal seams, seams affected by tectonic disturbances, seams with a low grade, or seams that have a thin stratum profile. Various coal types can be gasified, e.g., lignite or bituminous. The UCG offers a low surface damage, low solid waste discharge and lower emissions of SO2, NOx to the air than the traditional coal mining. For an industrial gasification, at least two boreholes should be drilled (i.e., inlet and outlet). Inlet borehole serves as a supply well for gasification agents (i.e., air, oxygen, and steam), and outlet borehole as the exhaust of the produced syngas. Inlet and outlet boreholes are usually linked by various methods in order to create a gasification channel [1]. The main chemical reactions that occur during the UCG are drying, pyrolysis, combustion, and gasifi- cation of solid hydrocarbons. For the improvement of the UCG, it must be ensured that combustion re- actions produce sufficient energy for the heating of reactants. It is also necessary to overcome the heat losses from the georeactor and to support the rate of endothermic gasification reactions [2]. The UCG is performed as an autothermic process where the heat in the coalbed is generated by an injection of oxygen from the injection well and by means of combustion re- actions with carbon. The UCG essentially represents the acquisition of a spatially and thermally decom- posed reaction zone in the coalbed, which overlaps regions of coal oxidation, coal reduction, and coal py- rolysis. The incoming air causes that the coal burns, the exothermic process releases heat and consumes oxygen. When coal is heated and CO is produced, the Boudoard chemical reactions (i.e., CO2+C ⇒ 2CO) is one of the most important chemical reaction. Raw, pure gas from the UCG consists predominantly of H2, CO, CO2, CH4, higher hydrocarbons, tar, impurities and small quantities of SOx, NOx, and H2S [3]. In terms of the calorific value, gases, such as CO, H2 and CH4, are valuable, but higher hydrocarbons also contribute to the calorific value. The syngas can be used for generating electricity, to produce synthetic natural gas or various chemical products. 1.2. Measurement and Monitoring in UCG The efficiency of the coal-to-gas transformation de- pends on the UCG monitoring and control and the various coal seam parameters. The main reason for the UCG monitoring is operating the technology more efficiently and increasing the quality of the produced gas, cost reduction and to meet regulatory require- ments. Monitoring also informs about the control decision effects, injection rates, syngas composition, temperatures, pressures, cavity size, fractures, and when to stop the gasification. In the UCG, various process variables can be moni- tored. These variables can be used for the data-driven 322 https://doi.org/10.14311/AP.2019.59.0322 https://ojs.cvut.cz/ojs/index.php/ap vol. 59 no. 4/2019 A comparative study of data-driven modeling methods. . . Figure 1. Scheme of measurement and control in UCG. 323 J. Kačur, M. Durdán, M. Laciak, P. Flegner Acta Polytechnica modeling of the process behaviour. In the terms of the process control, it is needed to monitor volume flows and pressures (i.e., overpressure) of injected oxidizers (i.e., air, oxygen, water vapor). On the outlet, the volume flow of produced syngas and regulated under- pressure can be monitored. Of course, it is needed to monitor concentrations of syngas components that affect the calorific value (e.g., CO, CO2, CH4 and H2). The volume flow, pressure, and composition of injected gasification agents can substantially affect the composition of the produced syngas [2]. The measurement of the temperature inside oxidiz- ing and reducing zones is most problematic. There are two methods of an indirect measurement of the underground temperature - from the current syngas composition [4]) or method based on rules of heat transfer [5, 6]. Recently, methods of monitoring the underground temperature by measuring carbon isotopes and by measuring emissions of radon to the surface ap- peared [7]. Figure 1 presents the basic scheme of the process variables measurement. 1.3. Modeling and Prediction in UCG In the last years, an increased demand for an online and accurate measurement of some process variables that cannot be measured by conventional methods has occurred. This is the measurement of process variables in an aggressive (e.g., high temperature) or physically inaccessible (e.g., underground) environment. Simi- larly, in the UCG, modelling and prediction methods need to be applied in order to determine the process parameters when going deep underground. These variables are decisive in increasing the efficiency and quality of the production. For this reason, different predictors and models, which can calculate desired process variables based on other observations, are de- veloped and applied. These models often serve as support systems for the control of the technological process. Predictive modeling usually uses statistics to pre- dict the future behaviour of the process and is often associated with machine learning. The most popular are methods of a regression analysis where the output is a regression model. Almost every regression model can serve for the prediction. Some primordial regres- sion analyses of the UCG were previously performed in [8]. The time series prediction is a challenging research area with broad application prospects. The soft- sensing methods for the data estimation and predic- tion are widely used in the industry. Various ap- proaches to modelling and data prediction have been explored in the world. Unfortunately, there is only scarce evidence of the UCG models oriented for pro- cess control and soft-sensing. Soft sensors based on data-driven predictive mod- elling are very useful in industry, especially in oper- ations where important process variables cannot be measured directly by a conventional hardware. Soft sensors use various models that enable real-time esti- mating process variables without a hardware sensor. They can provide less expensive and quicker process data than slow and costly hardware devices. How- ever, the soft sensors can be run in parallel with the hardware devices for the measurement [9]. Well-known software algorithms that can be seen as soft sensors include Kalman filters. More recent im- plementations of soft sensors in the UCG use Neural Network (NN) or Fuzzy Computing. Unfortunately, there is only scarce evidence of using the machine learning methods for a prediction of the underground temperature, syngas calorific value or syngas compo- sition in the UCG. For example, Ji and Shi [10] have used a hybrid radial basis function (RBF) NN as a learning scheme for the temperature prediction of Texaco gasifier. In order to increase the performance of the NN, the number of hidden neurons was determined by a fuzzy C-means algorithm and particle swarm optimization algorithm. Recently, Uppal et al. [11–13] have pro- posed a control oriented one dimensional packed bed model of the UCG for an estimation of the syngas composition. This model works with a connection to the sliding mode controller to maintain the desired syngas calorific value. Learning schemes for the coal gasification to support the process control can also be found in [14]. Multiple Neural Network (MNN) for the syngas composition prediction and dynamic principal component analysis was proposed in [15]. Other researchers, e.g., Guo et al. [16] have modeled coal gasification with a hybrid NN. A model of a coal gasification was developed, incorporating a first-principles model with an NN parameter estimator. The hybrid NN was trained with experimental data for the two coals and gave a good performance in the process modeling. Other effective methods have also been applied to the gasification. Liu et al. [17] have proposed a data- driven modeling for fixed-bed intermittent gasification processes inside UGI gasifiers by an enhanced lazy learning combined with a relevance vector machine. Authors have used the Bayesian learning framework for the modeling gasifier’s temperature. The effective- ness of the enhanced lazy learning approach combined with the relevance vector machine for the modelling of the UGI gasification processes has been verified by a series of experiments based on the data collected from practical fields. Similarly, for the same problem of the data-driven modeling for the UGI gasification process, a variable structure of a genetic BP NN was used in [17]. The UGI represents a gasification named by UGI Company. The UGI gasifier is an atmospheric fixed bed, solid-state slag coal gasification equipment. The prediction of the syngas composition based on the thermodynamic model can be found in [18, 19]. In the past, the application of the one-dimensional time-dependent numerical computational model of the 324 vol. 59 no. 4/2019 A comparative study of data-driven modeling methods. . . UCG in a packed bed has also been investigated with a verification on laboratory measurements [20]. The model based on nonlinear partial differential equa- tions was capable of estimating the syngas composi- tion and temperature distribution. A novel dynamic soft-sensing method based on an impulses response template for the Shell coal gasification process was developed in [21]. The proposed model can predict the syngas composition during the coal gasification. The application and comparison of the efficiency of various learning methods, i.e., NN, Adaptive Re- gression Splines (MARS) or Support Vector Machines (SVMs) in the UCG data prediction has not yet been the subject of an extensive study, but similar applica- tions in steel making processess and biomass gasifica- tion are registered (e.g., [22, 23]). The purpose of the UCG monitoring is to provide a better understanding of how the syngas is produced. For this reason, it is needed to know what temperature is reached in the underground oxidizing zone. This work examines a potential learning method that can be implemented in the proposal of the soft sensor for the data prediction in the UCG. An under- ground geo-reactor is different from another industrial plant because the coal seam was created by nature and it is not possible to see what is in the underground when the UCG is in progress. For the UCG, new research and technologies are aimed to make measure- ments of process variables faster and non-destructive, which would allow having a smart non-intrusive qual- ity sensor at hand. In this paper, in order to training data-driven modeling, a data set from an experimental UCG was used. In the data-driven (i.e., black box) modelling, input and output data are used in order to create a statis- tical model. In order to find a prediction apparatus for the UCG data prediction, the machine learning approach has been examined. One of the interesting advantages of the machine learning is that a system, randomly initialized and trained on some data sets, will eventually learn good feature representations for a given task. In the following sections, three learning methods, Back-Propagation NN (BPNN), MARS and Support Vector Regression (SVR), are examined in order to support soft-sensing in the UCG. Predictive methods are evaluated using statistical approaches and calculat- ing the performance index. The methods were applied to the experimental data obtained from the experi- mental trial of the UCG. The results from the three methods are compared to each other for determining which method is the most suitable for the UCG. 2. Analysis of Selected Modeling Methods 2.1. Multilayer Feed-Forward Neural Networks The inherent non-linear structure of the NN is well suited for solving many real-world problems. In recent years, several models of NNs have been designed and optimized to solve a specific problem. NNs models have an excellent ability to learn from experience and are also suitable non-parametric methods that do not require many limiting factors. A multilayer feed-forward neural networks are most commonly used as a universal means for classification and prediction. They consist of sensoric units, so-called input nodes, that form an input layer, one or more hidden layers with counting nodes and an output layer also with counting nodes. The signal passes through the network forwards across the individual layers. In a multi-layer feed-forward NN, all neurons of the previous layer are linked to each neuron of the following layer. However, there are no interconnections between the neurons at the level of the same layer as well as the direct interconnection of the input layer neurons with the neurons that are two layers further. In this paper, the back-propagation algorithm was used for the NN modeling of the UCG. This simple gradient algorithm was proposed by [24, 25]. There are more approaches to explaining the principle of NNs and the back-propagation method, e.g., using the projection pursuit regression (PPR) [26]. In this paper, a graph-oriented approach with an extensive description that can be found in [27] has been used. The input and output scheme considered for the NN for the UCG data prediction is shown in Section 4.1 (see Figure 7). Formally, the NN is defined as the oriented graph G = (V,E) where V = {v1,v2, ...,vN} is the set of verteices and E = {e1,e2, ...eM} the set edges. Denote a non-empty vertex or edge set of the graph G, con- taining N nodes (neurons) and M (connections). The set of V neurons is distributed to disjunctive subsets of V = VI ∪VH ∪VO where VI contains NI input neu- rons, which are adjacent to only the outgoing edges. VH contains NH hidden neurons, which are adjacent to the outgoing edges as well as to the incoming ones. Finally, VO contains NO output neurons, which are adjacent only to incoming edges. For an acyclic NN, the neurons can be arranged into layers where L1 = VI is an input layer (i.e., contains only input neurons), L2,L3, ...,Lt−1 are hidden layers and Lt is an output layer. The NN determined by the acyclic graph is usually chosen so that the neurons from the two adjacent layers are joined together by all possible connections. Neurons and connections are rated by real numbers. Each neuron vi is rated by a threshold ϑi and an activity xi. Similarly, each connection (vj,vi) is rated by a weighting coefficient (or simply, by weight) wij. The activities of hidden 325 J. Kačur, M. Durdán, M. Laciak, P. Flegner Acta Polytechnica and output neurons are determined by the following equation [27]: xi = t(ξi) (1) ξi = ∑ j∈Γ−1 i wijxj + ϑi (2) where a summation runs through neurons which are predecessors of the neuron vi and variable ξ is called the potential of the neuron vi. For defining the oriented graph G, we used the view of Γ that assigns to each vertex v ∈ V a subset Γ(v) ⊂ V that contains those neurons that are endpoints on the connections that are going out from the vertex v [28]. The neurons of the Γ(v) subset are called the de- scendents of the vertex v in the graph G. The “inverse” Γ−1 view will assigns to each vertex v ∈ V a subset Γ−1(v) ⊂ V of the gamma composed of the “descen- dents” of the vertex v in the graph G. Neuronal activities form a vector x = (x1,x2, ...xN ). This vector can be formally decomposed into three subsets containing input, hidden, and output activities x = xI ⊕ xH ⊕ xO (3) Hidden activities are not explicitly mentioned; they only play the role of intermediate results. In general, to calculate activities from the layer Li (where i > 1), it is only needed to know the activities from the lower layers L1,L2, ...,Li−1. In this recursive manner, the activities of all neurons can be gradually calculated. Activities of output neurons are calculated as the last. For this reason, for NNs represented by the acyclic graph, the name εfeed-forward NNsε is used. The adaptation of the NN is based on searching for such threshold and weighting coefficients, which, for a given pair of input vector of activities xI and desired output vector of activities x̂O i.e., xI/x̂O and the calculated output vector xO, minimize the difference between the output activities xO and x̂O. Vector x̂O represents the desired measured (i.e., experimental) data. The aim of the adaptation process is to find such thresholds and weighting coefficients that minimize the objective function E. For more pairs of input and output vectors x(1)I /x̂ (1) O , x (2) I /x̂ (2) O , ..., x (r) I /x̂ (r) O , (4) which represent the training set, the objective function has the following form: E = r∑ i=1 Ei = r∑ i=1 1 2 (x(i)O − x̂ (i) O ) 2 (5) where x(i)O is the output vector of the NN as a response to the input vector x(i)I and x̂ (i) O the desired output vector is assigned to the input x(i)I . This minimization of the non-linear objective func- tion can be performed by many optimization methods known in numerical mathematics. The most effective are so-called gradient methods, based on the use of a gradient of the objective function for the iterative construction of an optimal solution. When calculating the gradient of the objective func- tion that contains more than one pair of input-otput vectors xI/x̂O (see equation (5)) then, the overall gra- dient of the objective function is simply determined as the sum of the gradients for all pairs xI/x̂O of the training set (4). grad E = r∑ i=1 grad E(i) (6) where the objective function E(i) is defined for i-th pair xI/x̂O of the training set. Formally, the adapted NN is described by the coef- ficients determined as (w, ϑ) = argmin (w,ϑ) E(w, ϑ) (7) The settings of an applied BPNN and evaluation of its performance on the UCG data prediction is discussed in Section 4.1. 2.2. Multivariate Adaptive Regression Splines Multivariate Adaptive Regression Splines (MARS) is the method of regression that was developed by [29]. Many works have been published that discussed the MARS method [26, 30–34]. It is a non-parametric regression technique that looks like an extension of linear models. This technique automatically mod- els non-linearities and interactions between variables. This technique is also more flexible than linear models and is suitable for processing large data series. This technique can serve for a quick prediction of time se- ries. MARS is similar to recursive partitioning where input data are divided into discontinuous regions of varying size. Then, the local model is created for each region. The size of each area is set by MARS as required. In MARS, these regions are smaller when the relationship between input and output is more complex. MARS, like the recursive partitioning tech- nique, performs the automatic selection of variables, so the model includes important (useful) variables and excludes non-essential (as opposed to NN). The MARS model is adapted based on the input train- ing data, and cross-validation is used to validate the resulting model. The resulting model may not only be stored in a PC but is also portable, and there is easy to see the impact of each predictor (the model is easier to understand by humans). In order to cre- ate the MARS model, the training data vectors, i.e., the inputs (observations) and outputs (targets) are needed. Training data are split into several splines on an equivalent interval basis [29]. 326 vol. 59 no. 4/2019 A comparative study of data-driven modeling methods. . . The data are, in each spline, split into many sub- groups, and several knots are created that can be placed between different input variables or different intervals in the same input variable to separating sub- groups [31]. In MARS, the regression function called a basis function (BF) is approximated by smoothing splines for a general representation of data in each subgroup. Between any two knots, the model can characterize the data either globally or by using the linear regression. The BF is unique between any two knots and is shifted to another BF at each knot [29, 35]. The two BFs in two adjacent domains of data in- tersect at the knot to make the model outputs con- tinuous. MARS creates a curved regression line to fit the data from subgroup to subgroup and from one spline to another spline. For evading over-fitting and over-regressing, the shortest distance between two neighboring knots is predetermined to prevent too few data in a subgroup. In the MARS method, the goal is to find the de- pendency of variables yi on one or more independent variables xi. The following regression sample is con- sidered: D = {xi,yi}Ni=1 = {x1i, ...,xni,yi} N i=1 (8) where xi ∈ Rp is the i-th vector of the independent variable, yi, (i = 1, ...,N) is the dependent variable , N is the number of independent variables, n is the number of data in xi. The relationship between yi and xi (i = 1, ...,N) can be represented as: yi = f(x1i ,x 2 i , ...,x p i ) + ε = f(xi) + ε, (9) where f is an unknown function, and ε is an er- ror (ε ∼ N(0,σ2)). The single valued deterministic function f, captures the joint predictive relationship of yi on (x1i ,x2i , ...,x p i ). The additive stochastic com- ponent ε, whose expected value is determined to be zero, usually reflects the dependence of yi on values other than (x1i ,x2i , ...,x p i ), that are neither controlled nor observed. In the one-dimensional case, splines are expressed in terms of piecewise linear basis functions, (x− t)+ and (t−x)+ with the node in t. The “+” means a positive part. These functions are truncated linear functions, for x ∈ R. (x− t)+ = { x− t, If x > t, 0, otherwise and (t−x)+ = { t−x, If x < t, 0, otherwise (10) Each function (i.e., (x−t)+ and (t−x)+) is piecewise linear, with a knot at the value t. They are marked as linear splines. These two functions are named as a reflected pair. In the multidimensional case the idea is to form re- flected pairs for each input component xj of the vector x = (x1, ...,xj, ...,xp)T with knots at each observed value xji of that input (i = 1, 2, ...,N; j = 1, 2, ...,p). Thus, a set of constructed basis functions can be rep- resented in the form: C = { (xj − t)+, (t−xj)+|t ∈{xj1,x j 2, ...,x j N}, j ∈{1, 2, ...,p}} (11) If all input data are different, then, in the set of 2Np basis functions, each of them depends on only one variable xj. For example, B(x) = (xj − t)+ is regarded as a function over the entire input space Rp. The basic functions used for approximation are as follows: Bm(x) = Km∏ k=1 [ sk,m · (xv(k,m) − tkm) ] + (12) where Km is the total number of truncated linear functions in the m-th basis function (i.e., it is the number of “splits” that gave rise to Bm), xv(k,m) is the component of the vector x, related to the k-th truncated linear function in the m-th basic function, tkm is the corresponding node, and skm ∈{±1}. Km is the user defined degree order of the interaction term and sk,m represents the direction of the univariate term, which could be positive or negative. The model-building strategy is like a forward step- wise linear regression, but instead of using the original inputs, it is allowed to use functions from the set C of their products. Therefore, the MARS model can be expressed by the following equation [26]: y = f̂(x) + ε = c0 + M∑ m=1 cmBm(x) + ε (13) where y is the output variable, x is the vector of input variables, M is the number of basis functions in the model (i.e., number of spline functions, c0 is the coefficient of the constant basis function B0, and sum is over the basis functions Bm produced by al- gorithm that implements the stepwise forward part of the MARS strategy by incorporating the modifica- tion to recursive partitioning. The coefficients cm are estimated by minimizing the residual sum-of-squares (i.e., by standard linear regression). Bm(x) is the m-th function in C, or a product of two or more such functions. The most important thing in this model is the choice of basis functions. In the beginning, the model contains a single function B0(x) = 1 and all functions from the set C are possible candidates for an inclusion in the model. As in the linear regression, setting Bm, the coefficients cm can be found by the method of least squares. Another subroutine of MARS performs the back- ward deletion strategy wherein each iteration causes one unnecessary (i.e., redundant) basis function to be deleted. The inner loop of the algorithm will select 327 J. Kačur, M. Durdán, M. Laciak, P. Flegner Acta Polytechnica one function to be deleted. A function whose removal either mostly improves the fit or at least degrades it will be deleted. However, the constant basis function B1(x) = 1 is never removed. The settings of MARS, its result forms, and evalua- tion of its performance are discussed in Section 4.2. 2.3. Support Vector Regression Back-propagation NNs are capable of representing general non-linear functions, but its disadvantage is often very difficult teaching, because, practically, there is always a risk of a deadlock in the local minimum of the error function, and in addition, the learning is highly complicated by looking for a high number of weights in the multidimensional space. An alternative and relatively new approach are the so-called Support Vector Machines (SVMs). The SVMs are used for time series prediction and classification tasks. These methods represent the field of the so-called kernel ma- chines and exploit the benefits provided by effective algorithms for finding a linear boundary while being able to represent highly complex non-linear functions. Kernel functions methods try to find an optimal linear separator. The optimal linear separator in an SVM al- gorithm is searched using the quadratic programming method. In Support Vector Regression (SVR) the data x ∈X are mapped into a high-dimensional feature space F via a nonlinear mapping Φ, and to do linear regression in this space [36, 37]. Input data (i.e., observations) are represented by vector x = (x1, x2, ..., xl) where l denotes the size of the sample. Taking into account regression with one tar- get variable y, so observations on the examined process can be written as a sequence of pairs (x1,y1), ..., (xi,yi), ..., (xl,yl), xi ∈ Rn, yi ∈ R. Vec- tor xi represents one pattern of input observations xi = (xi1,xi2, ...,xin). In the case of observing process variables during the UCG, this vector may contain measured data from the database. Thus, a linear regression in a high dimensional (feature) space corresponds to a non-linear regression in the low dimensional input space Rn. The whole problem of the SVR can be rewritten in terms of dot products in the low dimensional input space [38]. f(x) = l∑ i=1 (αi −α∗i )(Φ(xi) · Φ(x)) + b = l∑ i=1 (αi −α∗i )k(xi, x) + b (14) Given two points xi, xj ∈ X , the function that returns the inner product between their images in the space F is known as the kernel function. In equation (14) a kernel function k(xi, xj ) = (Φ(xi) · Φ(xj )) is introduced. Kernel type Kernel function Gaussian (RBF) kernel k(xi, xj ) = e−γ||xi−xj|| 2 Linear kernel k(xi, xj ) = xTi xj Polynomial kernel k(xi, xj ) = (γ(xTi xj + 1)) d Sigmoid kernel k(xi, xj ) = tanh (γxTi xj + d) Table 1. Overview of common kernels used by SVR (γ is a kernel parameter controlling the sensitivity of the kernel function, and d is an integer). Prameters αi, α∗i are the solutions of the quadratic programming problem [37]. These parameters have an intuitive interpretation as forces pushing and pulling the estimate f(xi) towards the measurements yi. Pa- rameter b is a threshold. Common kernels are summarized in Table 1. In this paper, the linear epsilon-insensitive SVM (ε-SVM) regression has been used. For this special cost function, the Lagrange mul- tipliers αi, α∗i are often sparse, i.e., they result in non-zero values after the optimization only if they are on or outside the boundary, which means that they fulfill the Karush-Kuhn-Tucker (KKT) conditions. The ε-insensitive cost function is given by C(f(x) −y) ={ |f(x) −y|−ε for |f(x) −y| ≥ ε 0 otherwise (15) In ε-SVM regression, the set of training data in- cludes predictor variables and observed response val- ues. The goal is to find a function f(x) that deviates from y by a value no higher than ε for each training point x, and at the same time is as flat as possible. In SVR the kernel matrix K = (k(xi, xj ))li,j=1 (xi, xj ∈ X) is introduced. It is a symmetric pos- itive definite matrix of inner products between all pairs of points {xi}li=1. Each element represents the inner product of the predictors transformed by Φ. However, it is not needed to know Φ, because the ker- nel function can generate the kernel matrix directly. Using this approach, the non-linear SVR finds the optimal function f(x) in the transformed predictor space. The prediction of new values is based on a function that depends only on the support vectors: f(x) = l∑ i=1 (αi −α∗i ) K(xi, x) + b (16) where α and α∗ are non-negative Lagrange multi- pliers for each observation x. A threshold b can be determined from the Lagrange multipliers. Lagrange coefficients can be found by minimization of the following function [39]: L(α) = 1 2 l∑ i=1 l∑ j=1 (αi −α∗i ) ( αj −α∗j ) K(xi, xj )+ ε l∑ i=1 (αi + α∗i ) − l∑ i=1 yi(α∗i −αi) (17) 328 vol. 59 no. 4/2019 A comparative study of data-driven modeling methods. . . subject to the constraint l∑ i=1 (αi −α∗i ) = 0; ∀i : 0 ≤ αi ≤ C; ∀i : 0 ≤ α∗i ≤ C (18) The KKT complementarity conditions are ∀i : αi(ε + ξi −yi + f(xi)) = 0 ∀i : α∗i (ε + ξ ∗ i + yi −f(xi)) = 0 ∀i : ξi(C −αi) = 0 ∀i : ξ∗i (C −α ∗ i ) = 0 (19) where slack variables ξ and ξ∗ for each point en- sure that regression errors are up to the value of ξ and ξ∗ and meet the desired conditions. KKT com- plementarity conditions are optimization constraints required to obtain the optimum. These conditions in- dicate that all observations strictly inside the epsilon tube have Lagrange multipliers αi = 0 and α∗i = 0. Observations with nonzero Lagrange multipliers are called support vectors. The constant C is the box constraint, a positive numeric value that controls the penalty imposed on observations that lie outside the epsilon margin (ε) and helps to prevent overfitting, i.e., regularization [40]. The minimization problem can be solved by com- mon quadratic programming techniques, e.g., the Chunking and working set method, Sequential mini- mal optimization(SMO), or Iterative single data algo- rithm (ISDA). For the modeling of the UCG process, the epsilon- insensitive SVM (ε-SVM) regression has been used. To solve the optimization problem, the algorithm of SMO has been used. 3. Experimental UCG in Ex-Situ Reactor For the purpose of verifying the UCG, laboratory equipment has been created. The base is an experi- mental coal gasifier, i.e., an ex-situ reactor or so-called syngas generator (see Figure 2) and a set of devices for measurement and control. The ex-situ reactor was constructed so that the bedding of coal with overbur- den and under-burden layers can simulate the real coal seam. Several experiments as trials of a real UCG were performed with the ex-situ reactor. This laboratory gasification equipment was well described in [41, 42]. Similar trials of the UCG on a laboratory ex-situ reactor can be found in [43, 44]. Various gasifica- tion agents (i.e., oxidizers), ways of bedding coal and monitoring of the UCG process were tested there. The gasification in the ex-situ reactor is based on the control of the flow of the inlet gasification agents (i.e., air and oxygen) and pressure on an outlet. Lignite coal from the Slovak mine that is suitable for the UCG was gasified. The composition of the coal that was Figure 2. Experimental coal gasifier (ex-situ reactor). gasified and factors that affect the UCG can be found in [41]. The influence of various gasification agents (i.e., its flows and pressures) on the syngas quality was discussed in [2]. The bedding of coal in the ex-situ reactor was made on the basis by the rules of the similarity theory. The goal was to obtain the similarity with the real coal seam. Blocks of coal merged into one coal unit were used when preparing a physical model of the coalbed. In order to make the physical model airtight, layers of over-burden and under-burden contained sand mixed with water glass. In addition, the reactor was tilted at 10◦, in order to get as close as possible to the real coal seam. The coal used in the experiment was extracted from the underground coalbed with the same inclination. This coalbed (i.e., in mine Cigel - Slovakia, overburden bed) has the potential to be mined in the future by the UCG. Air, as the primary oxidant, was blown into the pressure vessel by two compressors. The pressure of air injected into the ex-situ reactor was adjusted by a reducing valve. The air flow was controlled by the servo valve and measured by a differential pressure sensor with a centric orifice that was installed in a pipeline. Similarly, the flow of the produced syngas was measured, but the segment orifice was used. The oxygen flow and pressure were controlled by two re- ducing valves. As a source of the technical oxygen, pressure cylinders were used. Technical oxygen was injected as an auxiliary oxidant into the mixing cham- ber where it was mixed with air, and the mixture was then injected into the ex-situ reactor. The pressures of the oxidants were measured by a set of pressure transducers. The K-type thermocouples were used to measure the temperatures in the coal model. These 329 J. Kačur, M. Durdán, M. Laciak, P. Flegner Acta Polytechnica thermocouples allow to measure the temperature up to 1300 ◦C. Thermocouples were placed in ceramic tubes (for protection in an oxidation-reducing atmo- sphere) and then inserted into holes drilled in the physical model. The temperature of the coal along the gasification channel, in the overburden and the under-burden was measured. On the outlet from the reactor, the sample of syngas was captured to be anal- ysed by two stationary analysers. The concentration of CO, CO2, O2, CH4, H2 in the syngas was constantly measured . The syngas calorific value was continually calculated from the composition of the syngas. The pressure at the outlet was controlled by a vacuum fan. The power of the fan was controlled by a frequency inverter. The outlet pressure was measured by one pressure transducer. The produced syngas was finally burned in the combustion chamber. The UCG was tried in two different reactors. Figure 3 shows a complete technological scheme of the experimental gasification with one generator. All devices for measurement and control (i.e., pressure transducers, differential pressure transducers, servo- valve, thermocouples, switching relays, frequency con- verter with fan, compressors and gas analysers) were connected to a PLC that provided the basic control loop gasification (i.e., on-off control of compressors, air flow stabilization, temperature stabilization, oxygen concentration stabilization). The PLC was connected to a PC that performed data storage, optimal con- trol [45], and process monitoring [46]. The SCADA/HMI system Promotic was used for the monitoring of the process and setting the con- trollers [46]. A detailed description of all devices that were used for the measurement and control was pre- sented in [41]. The measured process variables (i.e., flows of gasification agents, pressures, temperatures, syngas composition and calorific value) were recorded in a database which may later be processed as a set of time series. The PC with CPU Intel®CoreTM i5- 4300U (2.9 GHz) and 8 GB RAM was used for all calculations. 4. Results and Discussion The flowchart for the proposed soft-sensing in the UCG is shown in Figure 4. This paper focuses on the evaluation of the potential predictive methods that could be used in soft-sensing. Machine learning models are generalized to data similar to those on which they were trained. Although static models, which are time- independent, i.e., they work on a single data set, have been used, their application to the dynamic process should be improved by the continual updating of the training set with the online data. The development of practical on-line prediction soft sensors consists of two stages: training and on-line prediction. A data set from the experimental UCG was used in order to train a data-driven modeling algorithm. Three prediction methods analysed in the previous section have been applied in order to predict the un- derground temperature in the oxidizing zone of ex-situ reactor and syngas calorific value. The data obtained during the experimental UCG with the laboratory equipment have been used in the analyses. As an underground temperature in the oxidizing zone, the highest temperature along the gasification channel during the experiment is considered. Observations and target data measured during a one well-running experiment were used in this paper for the analysis. When the learning method is applied, it is con- venient to divide observed data into a training set Atrain and test set Atest (i.e., validation set). When choosing a training and test sets, it is recommended to ensure that the data used for the model testing covers a significant range of variations that are supposed to be encountered during the use of the soft sensor. Given that there is no exact rule in the literature how to divide the data into a training and test set for specific learning methods (there are some different rec- ommendations and instruction for experimentation), the models were tested on data from the 10 % and 20 % of the experiment. In general, a higher performance of selected methods was obtained with more data for training (i.e., the ratio between the training set and the test set was 90:10). To compare the performance of three different methods, this paper presents only results when 10 % of the experiment was used for the test. The used test set consisted of 10 % of the data from the end of the experiment. The whole experi- ment lasted for 70 hours. In simulations, there were 4201 patters in total and 3781 patterns were used for training. Overview of all regarded observations and targets shows Table 2. The pressure on the outlet is the relative pressure measured on the output pipe from the gasifier. This pressure can also be nega- tive when the power of the exhaust fan is increased. The behaviour of the measured data from the UCG experiment is shown in Figure 5 and Figure 6. Due to the fact that the highest temperature along the gasification channel is considered to be the temper- ature in the oxidizing zone weakly correlates with the operating variables (i.e., flows of gasification agents and pressure) and it has some inertia, the decision to ensure its estimation from the composition of syngas measured on the outlet was made. Since the composi- tion of the syngas depends on the temperature in the oxidizing zone, there is an inverse way to determine the temperature that corresponds to the measured concentrations. This decision was also supported by the existence of a large number of uncertainties that occur in the UCG. The propagation of the tempera- ture in the underground is not uniform, i.e., there are different temperatures in the coal, along the gasifica- tion channel, in the underburden and overburden. In addition, there is a continual shift of the combustion front. Due to the changing conditions in the under- ground gasifier (e.g., groundwater, cracks, fractures, gas leaks and shift of combustion front and surface subsidence) it is the process control in conditions of 330 vol. 59 no. 4/2019 A comparative study of data-driven modeling methods. . . � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� �� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� �� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� �� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� �� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ! � � � � � � � � Figure 3. Scheme of gasification equipment with one ex-situ reactor (modified after [41]). 331 J. Kačur, M. Durdán, M. Laciak, P. Flegner Acta Polytechnica Figure 4. The principle of soft-sensing in UCG. 332 vol. 59 no. 4/2019 A comparative study of data-driven modeling methods. . . Figure 5. Time series of measured syngas composition divided into training and measured data set. Figure 6. Time series of measured control variables divided into training and measured data set. 333 J. Kačur, M. Durdán, M. Laciak, P. Flegner Acta Polytechnica Target (output process variable y) Observation (input process variable x) Calorific value of syngas (MJ/Nm3) x1 - Air flow (Nm 3/h) x2 - Oxygen flow (Nm3/h) x3 - Pressure on outlet (i.e., underpressure/overpressure) (Pa) Underground temperature (◦C) x1 - Concentration of CO in syngas (%) x2 - Concentration of CO2 in syngas (%) x3 - Concentration of H2 in syngas (%) x4 - Concentration of CH4 in syngas (%) x5 - Concentration of O2 in syngas (%) Table 2. Observations and targets used in modeling. uncertainty. In such conditions, the process of mea- suring the process variables, identification, and finally automated control is more complicated. Partly, these uncertainties can be reduced by the more detailed geological survey. But even this does not guarantee the elimination of such uncertainties, as evidenced by long-term experience in the traditional coal mining technology. Predictive methods were evaluated using statistical approaches and calculating the performance index. The following indicators were used to compare the performance of the individual prediction methods. Variable y represents the measured target, and Y represents its prediction. y is the average of target values yi and Y is the average of predicted values Yi (i = 1, ...,N). N represents the number of patterns in the training or testing set. • Coefficient of correlation (ryY ) - The coefficient ex- presses the force of the linear relationship between two variables. Determines the degree of dependence of two variables and acquires the value from the interval (-1; 1). Its definition is based on the con- sideration of the sum of deviations of individual values of two correlated characters and their aver- ages. Several equations are used to calculate the correlation coefficient, but the following are used in this work: ryY = ∑N i=1(Yi −Y )(yi −y)√∑N i=1 (Yi −Y )2 √∑N i=1 (yi −y)2 (20) If ryY = 1, the dependence is completely direct; ryY = −1 i.e., the correlation is completely indi- rect; if ryY = 0, between the variables is indepen- dence. More precisely: ryY < 0.3 - low tightness; 0.3 ≤ ryY < 0.5 - slight tightness; 0.5 ≤ ryY < 0.7 - significant tightness; 0.7 ≤ ryY < 0.9 - high tight- ness; 0.9 ≤ ryY - very high tightness. • Coefficient of determination (r2yY ) - It expresses the degree of the causal dependence of two variables. It is a statistic that will give some information about the goodness of the fit of a model. The correla- tion coefficient is the square root of the determi- nation coefficient. Degrees of tightness depending on the coefficient of determination are as follows: r2yY < 0.1 - low tightness; 0.1 ≤ r 2 yY < 0.25 - slight tightness; 0.25 ≤ r2yY < 0.50 - significant tightness; 0.5 ≤ r2yY < 0.80 - high tightness; 0.8 ≤ r 2 yY - very high tightness. r2yY = 1 indicates that the model perfectly fits the measured target data. r2yY = 1 − 1 N ∑N i=1 (Yi −Y ) 2 1 N ∑N i=1 (yi −y)2 (21) • Relative root mean squared error (RRMSE) - This error can be calculated by dividing root mean square error RMSE by the average of actual values yi. RMSE represents the square root of mean square error (MSE) calculated as follows: RMSE = √ MSE = √√√√ 1 N n∑ i=1 (Yi −yi)2 (22) The MSE is a useful statistic measure for assess- ing the accuracy of the prediction. The RRMSE can be calculated by the following equation [34, 47]: RRMSE = RMSE 1 N ∑N i=1 yi × 100 =√ 1 N ∑N i=1 (Yi −yi)2 1 N ∑N i=1 yi × 100 (%) (23) • Mean absolute percentage error (MAPE) - This statistic indicator expresses a percentage prediction error. It can be calculated as follows: MAPE = 1 N N∑ i=1 |yi −Yi| |yi| × 100 (%) (24) This error has certain disadvantages. At zero values of yi, a division by zero can occur, and MAPE 334 vol. 59 no. 4/2019 A comparative study of data-driven modeling methods. . . produces undefined values. In very low values of yi, MAPE can exceed 100 % extremely. When actual values yi are very high (i.e., above Yi), MAPE will not exceed 100 %. • Performance index (PI) - indicates the overall per- formance of the given prediction method. The PI value ranges is from 0 to +∞. The smaller values of PI indicate a better performance. The PI is calculated as follows [47, 48]: PI = RRMSE ryY + 1 (25) The above statistical indicators were calculated in- dividually for training and testing data. The time needed to train the predictive model was also mea- sured. 4.1. Prediction by the Back-Propagation NN The back-propagation algorithm (or gradient algo- rithm) is based on an “error-correction” learning rule, i.e., learning by the network error. This algorithm performs the steepest descent procedure. The learning consists of two pass-overs across different layers of the network, i.e., the forward pass (i.e., forward activation flow of outputs) and backward pass (i.e., the backward error propagation of weight adjustments). Before the training, it is appropriate to standardize all inputs of the NN for the effective scaling of the weights. The learning takes place in cycles (i.e., epochs), always with new input patterns. It is based on the gradual minimizing weights so that the error E (5) is reduced. This error is minimized iteratively in different epochs so that the required accuracy ε was achieved. After learning, it is desirable that the output from the NN network is equal to the required output or to get it as close as possible, considering all input patterns from the set of Atrain. The ability of the NN to determine the output for inputs outside the Atrain set is called the generaliza- tion. This is also the main role of the NN in the prediction. Weights are modified using the set Atrain and, using Atest, the generalization error is detected. Five input variables i.e., concentration of O2 (x1), CO2 (x2), CO (x3), H2 (x4) and CH4 (x5) (measured in vol. %) have been regarded for the initialization of input neurons and the prediction of the underground temperature in the UCG. Two stationary analysers that measured concentrations of only these five gases have been used during the UCG experiment. These concentrations have been considered to be the most significant. In the prediction of the syngas calorific value, three input process-relevant variables were used i.e., injected air flow (x1), flow of supplementary oxygen (Nm3/h) (x2) and pressure on outlet (Pa) (x3). These variables are adjustable by the automatized control system. The general scheme of the NN considered from the UCG data prediction is shown in Figure 7. The methods of Batch Gradient Descent with Momentum and Gradient Descent with Variable Learning Rate have been applied. In this approach, weights and biases are updated according to the gradient descent momentum and an adaptive learning rate. It is the most widely used way to realize this min- imization of error (5)) within gradient optimization methods, in which weighting and threshold factors are recurrently updated according to following equa- tions [27]: w (k+1) ij = w (k) ij −λ ∂E ∂wij + µ∆w(k)ij ϑ (k+1) j = ϑ (k) j −λ ∂E ∂ϑj + µ∆ϑ(k)j (26) where parameter λ > 0 represents the learning rate and must be small enough to ensure the monotone convergence of the optimization algorithm and, at the same time, large enough to provide a sufficiently high convergence rate. Calculating the partial derivations ∂E ∂wij and ∂E ∂ϑj for the entire NN, running recurrently from the highest to the lowest layer, i.e., against the direction of dis- semination of information in the NN, runs from the lowest to the highest layer. Initial values of the threshold and weighting coeffi- cients ϑ(0)j and w (0) ij are randomly generated from a small center-to-zero interval e.g., from an open interval (-1, 1). The last member µ in (26) represents the so- called moment member that is determined by the dif- ference of the coefficients from the last two iterations, ∆w(k)ij = w (k) ij − w (k−1) ij and ∆ϑ (k) j = ϑ (k) j − ϑ (k−1) j . The momentum is important for the “skip” of the local minima in the initial optimization phase. The value of the parameter µ is usually chosen from the interval 0.5 ≤ µ ≤ 0.7. The adaptive learning rate tries to maintain a stable learning and a largest size of the learning step. The mean squared error (MSE) was used as the function for the error calculating during the training. The number of hidden layers and neurons is usually determined on the basis of experimentation, where an NN model is selected for which Etest is minimal. However, a small number of hidden layers in the NN may not well model non-linearities in the data. It is, therefore, necessary to look for an optimal number of hidden neurons. Two variants of the NN, i.e., with one and two hidden layers, were used, but the number of neurons was estimated in previous experimentation. It has also been tried to set the number of neurons in the hidden layer to 2m + 1, where m is the number of input neurons. In all variants that were tried only one neuron in the output layer was used. The momentum constant was set-up to 0.9. Within the results, the goal was also to show what impact has the number of hidden neurons on the quality of prediction. The results of training and testing are shown in Table 3. 335 J. Kačur, M. Durdán, M. Laciak, P. Flegner Acta Polytechnica P re di ct ed va ri ab le H id de n la ye rs N um be r of ne ur on s (L 1: L2 ) O bs er va ti on s (i np ut s) T ra in in g T es ti ng r y Y r 2 yY R R M SE (% ) P I M A P E (% ) M SE R M SE T im e (s ) r y Y 2 ry Y R R M SE (% ) P I M A P E (% ) M SE R M SE T em pe ra tu re 2 50 00 :1 5 C O ,C O 2 0. 27 07 0. 07 33 8. 00 62 6. 30 08 6. 41 08 64 84 .2 53 7 80 .5 24 9 44 .8 31 2 0. 19 85 0. 03 94 8. 68 18 7. 24 40 7. 27 07 69 38 .2 15 3 83 .2 96 0 2 50 00 :1 5 C O ,C O 2 ,H 2 ,C H 4 ,O 2 0. 05 67 0. 00 32 11 .7 86 9 11 .1 54 2 9. 12 23 14 05 4. 28 43 11 8. 55 08 50 .8 07 7 0. 06 11 0. 00 37 18 .2 23 7 17 .1 74 9 10 .8 64 0 30 57 0. 46 82 17 4. 84 41 2 80 0: 8 C O ,C O 2 0. 20 43 0. 04 17 8. 06 49 6. 69 67 6. 64 52 65 79 .6 87 8 81 .1 15 3 6. 85 60 0. 67 55 0. 45 63 8. 50 75 5. 07 75 7. 45 26 66 62 .3 75 9 81 .6 23 4 2 80 0: 8 C O ,C O 2 ,H 2 ,C H 4 ,O 2 0. 18 20 0. 03 31 8. 84 28 7. 48 10 6. 95 26 79 10 .2 62 9 88 .9 39 7 7. 19 21 0. 08 10 0. 00 66 9. 98 93 9. 24 07 6. 80 42 91 85 .3 46 2 95 .8 40 2 1 50 00 C O ,C O 2 0. 05 87 0. 00 34 69 .5 63 5 65 .7 07 8 53 .1 50 0 48 95 23 .6 38 2 69 9. 65 97 38 .4 06 1 0. 10 21 0. 01 04 95 .0 73 8 10 5. 88 81 73 .4 97 4 83 20 48 .0 25 2 91 2. 16 67 1 50 00 C O ,C O 2 ,H 2 ,C H 4 ,O 2 0. 18 65 0. 03 48 42 .8 46 2 36 .1 10 5 31 .0 09 9 18 57 09 .9 32 8 43 0. 94 08 43 .1 59 9 0. 16 11 0. 02 60 70 .0 98 2 83 .5 61 4 48 .0 39 3 45 23 13 .8 23 3 67 2. 54 28 1 80 0 C O ,C O 2 0. 04 28 0. 00 18 23 .6 40 0 22 .6 70 0 17 .8 90 8 56 53 3. 41 72 23 7. 76 76 1. 74 51 0. 16 23 0. 02 63 48 .7 51 4 41 .9 44 6 35 .9 24 2 21 87 76 .5 47 0 46 7. 73 56 1 80 0 C O ,C O 2 ,H 2 ,C H 4 ,O 2 0. 07 78 0. 00 61 16 .0 17 4 14 .8 60 8 11 .9 06 1 25 95 3. 34 24 16 1. 10 04 1. 84 75 0. 63 81 0. 40 72 28 .3 44 7 17 .3 03 2 22 .0 81 8 73 95 5. 58 71 27 1. 94 78 1 5 C O ,C O 2 0. 14 95 0. 02 23 8. 10 83 7. 05 38 6. 74 66 66 50 .7 12 0 81 .5 51 9 0. 67 02 0. 33 73 0. 11 38 8. 42 45 6. 29 96 7. 16 68 65 33 .0 96 2 80 .8 27 6 1 11 C O ,C O 2 ,H 2 ,C H 4 ,O 2 0. 42 10 0. 17 72 7. 45 52 5. 24 66 6. 14 09 56 22 .4 66 2 74 .9 83 1 0. 74 61 0. 67 87 0. 46 06 7. 43 01 4. 42 61 5. 81 29 50 81 .7 73 2 71 .2 86 6 C al or ifi c va lu e 2 50 00 :1 5 A ir ,O 2 0. 55 88 0. 31 22 33 .3 75 0 21 .4 10 8 37 .6 19 5 9. 64 41 3. 10 55 55 .2 75 3 0. 03 66 0. 00 13 30 .4 62 6 29 .3 86 1 30 .6 66 0 12 .4 35 3 3. 52 64 2 50 00 :1 5 A ir ,O 2 ,O ut le t pr es su re 0. 13 11 0. 01 72 11 9. 94 83 10 6. 04 31 12 6. 19 49 12 4. 56 76 11 .1 61 0 73 .1 43 2 0. 02 70 0. 00 07 86 .6 73 7 84 .3 98 8 70 .8 20 8 10 0. 66 93 10 .0 33 4 2 80 0: 8 A ir ,O 2 0. 73 83 0. 54 51 22 .6 63 8 13 .0 37 8 28 .3 85 9 4. 44 72 2. 10 88 7. 10 23 0. 09 99 0. 01 00 24 .8 01 0 22 .5 48 6 22 .6 78 8 8. 24 25 2. 87 10 2 80 0: 8 A ir ,O 2 ,O ut le t pr es su re 0. 73 92 0. 54 64 22 .6 56 9 13 .0 27 1 28 .0 10 8 4. 44 44 2. 10 82 7. 58 02 0. 69 06 0. 47 69 21 .5 02 7 12 .7 19 0 19 .5 45 6 4. 58 77 2. 14 19 1 50 00 A ir ,O 2 0. 03 98 0. 00 16 15 9. 12 70 15 3. 03 06 18 3. 30 15 21 9. 23 21 14 .8 06 5 44 .2 14 6 0. 14 74 0. 02 17 17 3. 39 74 15 1. 12 07 14 8. 02 62 40 2. 90 98 20 .0 72 6 1 50 00 A ir ,O 2 ,O ut le t pr es su re 0. 23 39 0. 05 47 13 3. 33 27 10 8. 06 00 13 5. 64 04 15 3. 91 82 12 .4 06 4 44 .6 76 0 0. 49 76 0. 24 76 13 9. 36 64 93 .0 57 1 11 3. 27 22 26 0. 27 89 16 .1 33 2 1 80 0 A ir ,O 2 0. 41 06 0. 16 86 55 .1 18 8 39 .0 76 1 61 .2 30 8 26 .3 03 6 5. 12 87 7. 73 48 0. 18 19 0. 03 31 87 .2 72 9 73 .8 42 9 75 .1 44 2 10 2. 06 61 10 .1 02 8 1 80 0 A ir ,O 2 ,O ut le t pr es su re 0. 45 90 0. 21 07 51 .0 45 8 34 .9 87 4 51 .7 01 3 22 .5 59 9 4. 74 97 7. 94 96 0. 66 93 0. 44 79 57 .1 71 8 34 .2 49 6 47 .0 39 3 43 .8 01 3 6. 61 83 1 5 A ir ,O 2 0. 67 14 0. 45 07 24 .9 07 6 14 .9 02 5 32 .9 43 5 5. 37 13 2. 31 76 0. 96 49 0. 04 01 0. 00 16 19 .8 08 2 19 .0 44 5 22 .9 76 6 5. 25 79 2. 29 30 1 7 A ir ,O 2 ,O ut le t pr es su re 0. 72 81 0. 53 01 23 .0 70 7 13 .3 50 3 28 .6 97 4 4. 60 83 2. 14 67 1. 03 06 0. 71 87 0. 51 66 15 .8 21 9 9. 20 55 16 .8 80 4 3. 35 46 1. 83 16 Table 3. Results of simulations with NNs where 10 % of the experiment was used to test. 336 vol. 59 no. 4/2019 A comparative study of data-driven modeling methods. . . Figure 7. Proposal of Neural network considered for UCG data prediction. When training and testing the prediction of under- ground temperature, it can be seen that the lowest val- ues of statistic indicators (i.e., RRMSE, MSE, MAPE, and RMSE) were obtained in the case of one hidden layer with 11 neurons (i.e., 2m+ 1, according to a gen- eral recommendation, where m is the number of input neurons). In this case, the RRMSE and performance index had the lowest values (PI = 4.42 for testing and PI = 5.24 for training). The value of the coefficient of determination was the highest in this case (r2yY = 0.46 for testing and r2yY = 0.17 for training). This result was obtained when five input observations were used for training the NN model and for the testing of prediction. This means that this variant predicts the target the best. The second interesting result was obtained in the case of using an NN with two hidden layers and the structure of 800:8 neurons. It was the case when the two observation inputs were used. The worst results in the prediction of the underground temperature were achieved in the case of one hidden layer with 5000 neurons. When training and testing the NN for the calorific value, it can be seen that the lowest values of statistic indicators were also obtained in the case of one hidden layer with 7 neurons (i.e., 2m + 1). In this case, the RRMSE and performance index was the lowest (e.g., PI = 9.20 for testing). This result was obtained when three input variables were used for training the NN model and for the testing of prediction. The value of the coefficient of determina- tion was the highest in this case (i.e., r2yY = 0.51 for testing). Similarly, as in the case of the calorific value prediction, the second interesting result was obtained in the case of using an NN with two hidden layers and the structure of 800:8 neurons. This is the case where three observation inputs were used. The worst results in the prediction of the calorific value were achieved in the case of one hidden layer with 5000 neurons (see Table 3). It can be stated that the use of the NN model for the temperature prediction has achieved better results in terms of the performance index than in the case of the calorific value. The best prediction of the calorific value and underground temperature by NN, where 10 % of the experiment was used for the test is shown in Figure 8 and Figure 9. The black vertical line in figures divides the prediction into training and testing. 4.2. Prediction by the MARS The algorithm of a regression model creation with MARS runs in two phases as it was analysed in detail in Section 2.2. In the forward phase, the algorithm begins with a model that has only an intercept term. Then, in the cycle, reflected pairs of BFs are added so that the training error is reduced as much as possible. This is done until, for example, the maximum number of BFs is not reached. In the backward phase, the model is simplified by deleting one of the least impor- tant BFs, which also reduces the training error. Then, more “best” models of different sizes are obtained. At the end of this phase, only one model with the lowest GCV is selected from these best models (excluding models larger than maximal final BFs). Several different variants of simulations with MARS have been performed. The maximum number of BFs included in the model in the forward phase has been experimentally changed. The initial number of BFs in the forward phase was determined according to the formula: min(200, max(20, 2d))+1, where d represents the num- ber of input variables. The initial number of BFs was set to 21 in all sim- ulations. In modelling, we have considered a max- imal interactivity between input variables without 337 J. Kačur, M. Durdán, M. Laciak, P. Flegner Acta Polytechnica Figure 8. Measured and predicted calorific value of syngas by NN, where three inputs were used in the test. Figure 9. Measured and predicted underground temperature by NN, where five inputs were used in the test. 338 vol. 59 no. 4/2019 A comparative study of data-driven modeling methods. . . self-interactions. Since there were not smoother data, the piecewise-cubic and piecewise-linear type of mod- eling have been analysed in order to know the predic- tion performance. In default, all MARS models are created as piecewise-linear and are transformed into piecewise-cubic after the backward phase. The best or optimal number of maximal BFs in the final MARS model was estimated by the GCV criterion and by the 10-fold Cross-Validation. In an all Cross-Validation iteration, a new MARS model is created and reduced using the GCV in the in-fold (training) data. In addition, there also is a calculated MSE criterion on out-of-fold (test) data (MSEoof) in the reducing phase for each model. Figure 10 shows a comparison of the behavior of the GCV and MSEoof criterion calculated for each new model after the 10-fold iteration. In the simulation, the model for predicting syngas calorific value was considered. The figure shows two vertical dashed lines at the minimum of the two solid lines. The figure also shows the number of optimum BFs estimated by the GCV (cyan) and Cross-Validation (magenta). Ideally, these two lines would coincide. Similarly, Figure 11 shows the determination of the optimal number of basis functions for the MARS model for the underground temperature prediction. Figures show simulations of the piecewise-linear type of MARS models with all inputs considered in a given target variable. The results from the training of the model and testing the prediction are shown in Table 4. The original MARS approximation method uses a cubic function to smooth the truncated piecewise- linear functions. The cubic function has the following general form: C(x|s = +1, t−, t, t+) =  0, x ≤ t− α+(x− t−)2 + β+(x− t−)3, t− < x < t+ x− t, x ≥ t+ (27) where α+ = 2t+ − 3t + t− (t+ − t−)2 , β+ = −t+ + 2t− t− (t+ − t−)3 , (28) and C(x|s = −1, t−, t, t+) =  t−x, x ≤ t− α−(x− t+)2 + β−(x− t+)3, t− < x < t+ 0, x ≥ t+ (29) where α− = −t+ + 3t− 2t− (t− − t+)2 , β− = t+ − 2t + t− (t− − t−)3 . (30) where t represents a univariate knot, which is se- lected for each of the factor variables x. The piecewise-linear type of MARS models better fits the training data but, in the prediction on un- trained data, better results with the piecewise-cubic type of the model (see Table 4) are obtained. Equation (31) represents a resulting piecewise-cubic type of MARS model for the prediction of the under- ground temperature. Basis functions in the equation are calculated according to Table 5. This MARS model takes into account five inputs, i.e., concentra- tion of measured gasses: x1 - CO, x2 - CO2, x3 - H2, x4 - CH4, x5 - O2. Similarly, equation (32) repre- sents the piecewise-cubic type of MARS model for the prediction of the calorific value of the syngas. Basis functions in the equation are calculated according to Table 6. The MARS model, in this case, has three inputs x1 - air flow and x2 - oxygen flow and x3 - con- trolled pressure on outlet. In these models, the best results for the prediction were obtained in terms of all statistical indicators. The lowest performance index PI = 5.66 was obtained in the testing of the prediction of the underground temperature on untrained data with the piecewise-cubic type of the MARS model. When testing the prediction of the calorific value on untrained data with piecewise-cubic type of MARS model, the performance index PI = 12.1382 was ob- tained. Temperature (◦C) = 855.36 −2108.8 × BF1 +8.7366 × BF2 + 14.266 × BF3 +20.014 × BF4 − 13.465 × BF5 −18.233 × BF6 + 1.2743 × BF7 −3.764 × BF8 − 1.3594 × BF9 −6.7334 × BF10 + 1.1376 × BF11 +0.61503 × BF12 + 0.79404 × BF13 −1.766 × BF14 + 0.20451 × BF15 −1.8976 × BF16 − 5.8421 × BF17 +0.72146 × BF18 + 0.86513 × BF19 (31) Calorificvalue (MJ/Nm3) = 13.528 −0.0044217 × BF1 − 0.13132 × BF2 −0.14469 × BF3 − 1.1482 × BF4 +0.015322 × BF5 − 0.0034767 × BF6 −0.019788 × BF7 + 0.11264 × BF8 +0.55397 × BF9 − 0.21132 × BF10 −0.84369 × BF11 + 0.032498 × BF12 −0.16806 × BF13 + 0.12228 × BF14 +0.084482 × BF15 + 0.00096036 × BF16 +0.00075854 × BF17 (32) Due to the fact that the piecewise-linear type of the model gives, during the training, better results in terms of all indicators, these variants of the model are also presented. These are models with all considered inputs for the prediction of the given target. Piecewise-linear type of MARS model usage max(0, x−t) function, where t is the knot. The max() function represents positive part of (0,x − t) which can be formally expressed as the following: max(0,x− t) = { x− t, if x ≥ t 0, otherwise (33) 339 J. Kačur, M. Durdán, M. Laciak, P. Flegner Acta Polytechnica Figure 10. Example of the estimation of the “best” number of BFs of the calorific value MARS model with three inputs by GCV and 10-fold Cross-Validation (i.e., MSEoof). Figure 11. Example of estimation of the “best” number of BFs in MARS model of underground temperature with five input by GCV and 10-fold Cross-Validation (i.e., MSEoof). 340 vol. 59 no. 4/2019 A comparative study of data-driven modeling methods. . . P re di ct ed va ri ab le T yp e of M A R S m od el N um be r of B Fs in fin al m od el in cl ud in g B F 0 O bs er va ti on s (i np ut s) T ra in in g T es ti ng r y Y r 2 yY R R M SE (% ) P I M A P E (% ) M SE R M SE T im e (s ) r y Y 2 ry Y R R M SE (% ) P I M A P E (% ) M SE R M SE T em pe ra tu re pi ec ew is e- cu bi c 16 C O ,C O 2 0. 47 07 0. 22 16 7. 23 42 4. 91 89 5. 74 32 52 94 .1 48 1 72 .7 60 9 10 .5 61 9 0. 23 12 0. 05 34 10 .5 86 5 8. 59 86 8. 87 79 10 31 6. 44 17 10 1. 56 99 20 C O ,C O 2 ,H 2 ,C H 4 ,O 2 0. 58 59 0. 34 33 6. 64 45 4. 18 97 5. 20 14 44 66 .1 54 4 66 .8 29 3 17 .8 53 8 0. 60 31 0. 36 37 6. 42 75 4. 00 94 5. 16 23 38 02 .8 74 7 61 .6 67 5 pi ec ew is e- lin ea r 16 C O ,C O 2 0. 49 85 0. 24 85 7. 10 80 4. 74 34 5. 62 62 51 10 .9 55 4 71 .4 90 9 12 .1 52 5 0. 32 82 0. 10 77 10 .6 82 4 8. 04 27 9. 16 29 10 50 4. 30 10 10 2. 49 05 20 C O ,C O 2 ,H 2 ,C H 4 ,O 2 0. 66 66 0. 44 44 6. 11 19 3. 66 73 4. 72 85 37 78 .9 20 0 61 .4 72 9 17 .8 48 2 0. 57 88 0. 33 50 8. 94 62 5. 66 65 7. 52 83 73 67 .2 26 9 85 .8 32 6 C al or ifi c va lu e pi ec ew is e- cu bi c 15 A ir ,O 2 0. 78 63 0. 61 83 20 .7 58 2 11 .6 20 8 24 .0 94 6 3. 73 07 1. 93 15 8. 05 20 0. 04 19 0. 00 18 83 .9 17 9 80 .5 43 1 32 .5 39 8 94 .3 69 5 9. 71 44 18 A ir ,O 2 ,O ut le t pr es su re 0. 86 61 0. 75 01 16 .7 96 4 9. 00 10 17 .3 05 7 2. 44 26 1. 56 29 13 .5 41 4 0. 43 86 0. 19 24 17 .4 62 0 12 .1 38 2 17 .8 49 3 4. 08 61 2. 02 14 pi ec ew is e- lin ea r 16 A ir ,O 2 0. 79 91 0. 63 86 20 .1 96 5 11 .2 25 6 23 .0 08 4 3. 53 16 1. 87 92 9. 16 35 0. 04 19 0. 00 18 80 .7 36 5 77 .4 89 7 30 .3 19 3 87 .3 50 0 9. 34 61 17 A ir ,O 2 ,O ut le t pr es su re 0. 87 30 0. 76 21 16 .3 86 6 8. 74 89 16 .5 88 3 2. 32 49 1. 52 47 13 .4 41 0 0. 42 38 0. 17 96 17 .8 94 5 12 .5 68 1 18 .3 34 1 4. 29 10 2. 07 15 Table 4. Results of simulations with MARS models where 10 % of the experiment was used to test. 341 J. Kačur, M. Durdán, M. Laciak, P. Flegner Acta Polytechnica BF Equation BF Equation BF1 C(x4 | +1, 20.571, 20.579, 20.607) BF11 C(x1 | +1, 3.6356, 6.3881, 7.6506) × C(x2 | -1, 19.073, 26.51, 28.665) BF2 C(x4 | -1, 20.571, 20.579, 20.607) BF12 BF8 × C(x1 | +1, 9.2894, 9.6658, 14.265) BF3 C(x2 | +1, 7.967, 11.636, 19.073) BF13 BF8 × C(x1 | -1, 9.2894, 9.6658, 14.265) BF4 C(x2 | -1, 7.967, 11.636, 19.073) BF14 BF9 × C(x1 | +1, 7.6506, 8.9131, 9.2894) BF5 C(x1 | -1, 3.6356, 6.3881, 7.6506) BF15 BF9 × C(x1 | -1, 7.6506, 8.9131, 9.2894) BF6 C(x3 | +1, 15.747, 26.344, 30.018) BF16 BF5 × C(x3 | +1, 2.5749, 5.1498, 15.747) BF7 C(x3 | -1, 15.747, 26.344, 30.018) BF17 BF5 × C(x3 | -1, 2.5749, 5.1498, 15.747) BF8 BF3 × C(x4 | +1, 11.785, 20.563, 20.571) BF18 BF14 × C(x5 | +1, 2.1322, 3.9609, 12.268) BF9 BF3 × C(x4 | -1, 11.785, 20.563, 20.571) BF19 BF14 × C(x5 | -1, 2.1322, 3.9609, 12.268) BF10 C(x1 | +1, 3.6356, 6.3881, 7.6506) × C(x2 | +1, 19.073, 26.51, 28.665) Table 5. Basis functions of piecewise-cubic type of MARS model of underground temperature (five inputs). BF Equation BF Equation BF1 C(x1 | +1, 5.866, 11.732, 17.503) × C(x3 | +1, 8.357, 16.442, 32.428) BF10 C(x1 | -1, 5.866, 11.732, 17.503) × C(x2 | +1, 0.239, 0.478, 2.505) BF2 C(x1 | +1, 5.866, 11.732, 17.503) × C(x3 | -1, 8.357, 16.442, 32.428) BF11 C(x1 | -1, 5.866, 11.732, 17.503) × C(x2 | -1, 0.239, 0.478, 2.505) BF3 C(x2 | +1, 7.763, 9.83, 25.869) BF12 C(x1 | +1, 17.503, 23.274, 31.139) × C(x2 | +1, 2.505, 4.532, 5.114) BF4 C(x2 | -1, 7.763, 9.83, 25.869) BF13 C(x1 | +1, 17.503, 23.274, 31.139) × C(x2 | -1, 2.505, 4.532, 5.114) BF5 BF4 × C(x3 | +1, 32.428, 48.414, 74.478) BF14 BF11 × C(x3 | +1, -0.155, 0.272, 8.357) BF6 BF4 × C(x3 | -1, 32.428, 48.414, 74.478) BF15 BF11 × C(x3 | -1, -0.155, 0.272, 8.357) BF7 C(x1 | +1, 5.866, 11.732, 17.503) × C(x2 | +1, 5.114, 5.696, 7.763) BF16 BF5 × C(x1 | +1, 31.139, 39.004, 48.437) BF8 C(x1 | +1, 5.866, 11.732, 17.503) × C(x2 | -1, 5.114, 5.696, 7.763) BF17 BF5 × C(x1 | -1, 31.139, 39.004, 48.437) BF9 C(x1 | -1, 17.503, 23.274, 31.139) Table 6. Basis functions of piecewise-cubic type of MARS model of syngas calorific value (three inputs). Temperature (◦C) = 851.31 +3036.4 × BF1 + 11.023 × BF2 +21.043 × BF3 + 7.8072 × BF4 −7.1445 × BF5 − 28.58 × BF6 +1.264 × BF7 − 483.77 × BF8 −2.2149 × BF9 − 6.5355 × BF10 +0.25511 × BF11 + 33.098 × BF12 +57.057 × BF13 − 1.0868 × BF14 +0.2452 × BF15 − 4.2728 × BF16 −8.3842 × BF17 + 0.56517 × BF18 +0.74386 × BF19 (34) Calorificvalue (MJ/Nm3) = 10.969 −0.003224 × BF1 − 0.13121 × BF2 −0.63223 × BF3 − 0.51773 × BF4 +0.008499 × BF5 − 0.0075648 × BF6 +0.04656 × BF7 + 0.050259 × BF8 +0.43584 × BF9 − 0.13449 × BF10 −1.2256 × BF11 − 0.043978 × BF12 −0.08991 × BF13 + 0.22099 × BF14 +1.4463 × BF15 + 0.00078717 × BF16 +0.0011077 × BF17 (35) The piecewise-linear type of MARS model for the underground temperature prediction represents equa- tion (34). Corresponding basis functions are shown in Table 7. The performance index obtained during the training of the MARS model was PI = 3.66. In this case, the coefficient of determination was the highest (r2yY = 0.44). The piecewise-linear type of the MARS model for the syngas calorific value prediction is rep- resented by the equation (35). Corresponding basis functions are shown in Table 8. The performance in- dex obtained during the training of the MARS model was PI = 8.74. In this case, the coefficient of determi- nation was the highest (r2yY = 0.76). Due to the comparison with other methods, this pa- per presents the behaviour of the prediction with only the piecewise-cubic type of MARS models, because the best results on untrained data have been achieved for them. The best prediction of the calorific value and un- derground temperature by the piecewise-cubic type of the MARS model where 10 % of the experiment was used for the test of the prediction is shown in Figure 12 and Figure 13. The black vertical line di- vides the prediction into training and testing part. It can be said that better predictions with the MARS model were achieved in the case of the underground temperature. The experimental results demonstrate that the piecewise-cubic type of MARS model is better than the piecewise-linear both in the temperature and the calorific value prediction. 4.3. Prediction by the Support Vector Regression The SVR model has been trained on the predictor data similarly as in the previous method. The predic- tor data were mapped using three kernel functions, and the SMO method for the objective-function mini- mization was used. The training data table was used where one row of the table represented one observa- tion and individual columns were predictors x. The table contains one additional column for the response variable y. The standardized predictor matrix has been used for the training. The standardization was performed using corresponding weighted means of pre- dictors and weighted standard deviations. Predictors 342 vol. 59 no. 4/2019 A comparative study of data-driven modeling methods. . . BF Equation BF Equation BF1 max(0, x4 -20.579) BF11 max(0, x1 -6.3881) × max(0,26.51 -x2) BF2 max(0,20.579 -x4) BF12 BF8 × max(0, x1 -9.6658) BF3 max(0, x2 -11.636) BF13 BF8 × max(0,9.6658 -x1) BF4 max(0,11.636 -x2) BF14 BF9 × max(0, x1 -8.9131) BF5 max(0,6.3881 -x1) BF15 BF9 × max(0,8.9131 -x1) BF6 max(0, x3 -26.344) BF16 BF5 × max(0, x3 -5.1498) BF7 max(0,26.344 -x3) BF17 BF5 × max(0,5.1498 -x3) BF8 BF3 × max(0, x4 -20.563) BF18 BF14 × max(0, x5 -3.9609) BF9 BF3 × max(0,20.563 -x4) BF19 BF14 × max(0,3.9609 -x5) BF10 max(0, x1 -6.3881) × max(0, x2 -26.51) Table 7. Basis functions of the piecewise-linear type of MARS model of underground temperature (five inputs). BF Equation BF Equation BF1 max(0, x1 -11.732) × max(0, x3 -16.442) BF10 max(0,11.732 -x1) × max(0, x2 -0.478) BF2 max(0, x1 -11.732) × max(0,16.442 -x3) BF11 max(0,11.732 -x1) × max(0,0.478 -x2) BF3 max(0, x2 -9.83) BF12 max(0, x1 -23.274) × max(0, x2 -4.532) BF4 max(0,9.83 -x2) BF13 max(0, x1 -23.274) × max(0,4.532 -x2) BF5 BF4 × max(0, x3 -48.414) BF14 BF11 × max(0, x3 -0.272) BF6 BF4 × max(0,48.414 -x3) BF15 BF11 × max(0,0.272 -x3) BF7 max(0, x1 -11.732) × max(0, x2 -5.696) BF16 BF5 × max(0, x1 -39.004) BF8 max(0, x1 -11.732) × max(0,5.696 -x2) BF17 BF5 × max(0,39.004 -x1) BF9 max(0, 23.274 -x1) Table 8. Basis functions of the piecewise-linear type of MARS model of syngas calorific value (three inputs). Figure 12. Measured and predicted calorific value of syngas by piecewise-cubic type of MARS model where 10 % of the experiment and three inputs were used for the test. 343 J. Kačur, M. Durdán, M. Laciak, P. Flegner Acta Polytechnica Figure 13. Measured and predicted underground temperature by piecewise-cubic type of MARS model where 10 % of the experiment and five inputs were used for the test. are less sensitive to the scale on which they are mea- sured on when standardization is used. Similarly, a table for the test of the model on untrained data was prepared. As kernel functions, the Linear, Gaussian and Polynomial kernels were used (see Table 1). The final values of α were stored in the memory of the computer. Table 9 shows the results of applying a various types of the kernel function in the SVR where 10 % of the UCG experiment was used for the test. The table shows that the best results were obtained in the utilization of Gaussian kernel function, regard- ing all input variables. This result was obtained for the temperature prediction and also for the syngas calorific value prediction. It is possible to see that the temperature model with five input variables gives the best tightness with the real measured data (see parameter r2yY = 0.82 for training and r 2 yY = 0.47 for test). Also, the performance index calculated for the training and testing has reached the lowest value in this case (see parameter PI = 1.80 for training and PI = 4.67 for test). The best performance of prediction is also indicated by other statistical parameters. Simi- larly, the best result for the SVR model of the syngas calorific value was also obtained when three inputs and Gaussian kernel were used. It is possible to see that the calorific value model with three inputs gives the best tightness with the real measured data (see parameter r2yY = 0.83 for training and r 2 yY = 0.35 for test). Also, the performance index calculated during the training and testing has reached the lowest value in this case (see parameter PI = 7.20 for training and PI = 8.21 for test). The best quality of prediction is also indicated by other statistical parameters. The worst results for prediction on untrained data were obtained using the polynomial kernel, both in the case of the temperature model and also the calorific value model. Figure 14 and Figure 15 show the best prediction of the calorific value and underground tem- perature by the SVR on untrained data where 10 % of the experiment was used for the test. The black vertical line in figures divides the prediction into the training and testing phase. Even with this method, the results were better for the temperature prediction. 4.4. Overall Results A ranking of the results from three evaluated methods is presented in Table 10 and Table 11. These tables show the comparison of the best results when two variants of observations (i.e., input variables) of each predicted target were used. In the training phase, the model was verified on training data in order to predict the target variable. The results from the training phase of each method are shown in Table 10. It is possible to see that the SVR model with Gaussian kernel better fits the measured target data in the case of modelling the temperature with five observations. The SVR model also achieved a better performance when using two input variables. The other interesting results were obtained with the piecewise-linear type of the MARS model, both in the case of two and five observations. However, MARS models have consumed more time than BPNN and SVR. When fitting the calorific value, the SVR and Gaus- sian kernel also reached the best performance. It was the case with five inputs variables. In the case of 344 vol. 59 no. 4/2019 A comparative study of data-driven modeling methods. . . P re di ct ed va ri ab le K er ne l fu nc ti on O bs er va ti on s (i np ut s) T ra in in g T es ti ng r y Y r 2 yY R R M SE (% ) P I M A P E (% ) M SE R M SE T im e (s ) r y Y 2 ry Y R R M SE (% ) P I M A P E (% ) M SE R M SE T em pe ra tu re Li ne ar C O ,C O 2 0. 27 94 0. 07 81 8. 26 94 6. 46 35 6. 83 99 69 17 .6 85 4 83 .1 72 6 0. 48 13 0. 36 50 0. 13 32 8. 45 38 6. 19 33 7. 21 03 65 78 .6 16 5 81 .1 08 7 C O ,C O 2 ,H 2 ,C H 4 ,O 2 0. 29 45 0. 08 67 7. 84 80 6. 06 25 6. 45 95 62 30 .6 60 5 78 .9 34 5 0. 48 21 0. 47 54 0. 22 60 8. 40 61 5. 69 75 6. 88 81 65 04 .5 92 5 80 .6 51 1 G au ss ia n C O ,C O 2 0. 58 88 0. 34 66 6. 63 62 4. 17 70 4. 80 36 44 55 .0 43 8 66 .7 46 1 0. 66 05 0. 20 53 0. 04 21 12 .4 96 7 10 .3 68 3 10 .4 41 1 14 37 5. 34 49 11 9. 89 72 C O ,C O 2 ,H 2 ,C H 4 ,O 2 0. 90 96 0. 82 74 3. 44 14 1. 80 22 1. 97 18 11 98 .0 79 3 34 .6 13 3 0. 66 88 0. 69 13 0. 47 79 7. 90 51 4. 67 40 6. 96 61 57 52 .3 31 7 75 .8 44 1 P ol yn om ia l C O ,C O 2 0. 25 18 0. 06 34 8. 00 87 6. 39 77 6. 25 60 64 88 .3 47 4 80 .5 50 3 2. 16 46 0. 17 04 0. 02 90 12 .7 47 4 10 .8 91 4 10 .2 19 7 14 95 7. 76 85 12 2. 30 20 C O ,C O 2 ,H 2 ,C H 4 ,O 2 0. 55 31 0. 30 59 6. 92 84 4. 46 09 4. 82 13 48 55 .9 22 9 69 .6 84 5 3. 42 30 0. 07 72 0. 00 60 16 .7 33 5 15 .5 34 2 14 .1 21 5 25 77 4. 97 45 16 0. 54 59 C al or ifi c va lu e Li ne ar A ir ,O 2 0. 69 27 0. 47 98 24 .5 83 2 14 .5 23 1 31 .2 77 7 5. 23 23 2. 28 74 1. 52 94 0. 32 22 0. 10 38 19 .2 42 1 14 .5 53 1 20 .0 50 8 4. 96 17 2. 22 75 A ir ,O 2 ,O ut le t pr es su re 0. 70 61 0. 49 86 24 .2 51 8 14 .2 14 5 31 .6 47 8 5. 09 22 2. 25 66 1. 55 39 0. 38 50 0. 14 82 16 .1 07 3 11 .6 29 7 16 .1 57 2 3. 47 67 1. 86 46 G au ss ia n A ir ,O 2 0. 78 68 0. 61 90 21 .0 84 8 11 .8 00 6 22 .3 99 0 3. 84 91 1. 96 19 0. 51 78 0. 41 71 0. 17 40 18 .9 02 3 13 .3 38 7 19 .4 81 6 4. 78 80 2. 18 81 A ir ,O 2 ,O ut le t pr es su re 0. 91 30 0. 83 35 13 .7 89 6 7. 20 85 9. 99 55 1. 64 63 1. 28 31 0. 54 46 0. 59 97 0. 35 96 13 .1 47 1 8. 21 84 11 .3 80 8 2. 31 62 1. 52 19 P ol yn om ia l A ir ,O 2 0. 70 42 0. 49 59 25 .9 05 4 15 .2 01 2 32 .7 63 3 5. 81 03 2. 41 05 10 9. 85 09 0. 37 85 0. 14 33 20 .8 25 6 15 .1 07 5 23 .5 17 0 5. 81 19 2. 41 08 A ir ,O 2 ,O ut le t pr es su re 0. 65 00 0. 42 25 35 .7 76 2 21 .6 82 6 44 .4 54 0 11 .0 81 7 3. 32 89 12 1. 12 58 0. 20 66 0. 04 27 28 .6 15 5 23 .7 15 8 32 .3 06 5 7. 98 59 3. 31 26 Table 9. Results of simulations with SVR where 10 % of the experiment was used to test. 345 J. Kačur, M. Durdán, M. Laciak, P. Flegner Acta Polytechnica Figure 14. Measured and predicted calorific value of syngas by SVR with Gaussian kernel function where 10 % of the experiment and three inputs were used for test. Figure 15. Measured and predicted underground temperature by SVR with Gaussian kernel function where 10 % of the experiment and five inputs were used for test. 346 vol. 59 no. 4/2019 A comparative study of data-driven modeling methods. . . P re di ce d va ri ab le M et ho d O bs er va ti on s T ra in in g r y Y r 2 yY R R M SE (% ) P I M A P E (% ) M SE R M SE T im e (s ) T em pe ra tu re B P N N ,L ay er s: 2, N eu ro ns : (L 1: L2 ) 80 0: 8 C O ,C O 2 0. 20 43 0. 04 17 8. 06 49 6. 69 67 6. 64 52 65 79 .6 87 8 81 .1 15 3 6. 85 60 B P N N ,L ay er : 1, N eu ro ns (L 1) 11 C O ,C O 2 ,H 2 ,C H 4 ,O 2 0. 42 10 0. 17 72 7. 45 52 5. 24 66 6. 14 09 56 22 .4 66 2 74 .9 83 1 0. 74 61 M A R S, pi ec ew is e- lin ea r, 16 B Fs C O ,C O 2 0. 49 85 0. 24 85 7. 10 80 4. 74 34 5. 62 62 51 10 .9 55 4 71 .4 90 9 12 .1 52 5 M A R S, pi ec ew is e- lin ea r, 20 B Fs C O ,C O 2 ,H 2 ,C H 4 ,O 2 0. 66 66 0. 44 44 6. 11 19 3. 66 73 4. 72 85 37 78 .9 20 0 61 .4 72 9 17 .8 48 2 SV R ,G au ss ia n ke rn el C O ,C O 2 0. 58 88 0. 34 66 6. 63 62 4. 17 70 4. 80 36 44 55 .0 43 8 66 .7 46 1 0. 66 05 SV R ,G au ss ia n ke rn el C O ,C O 2 ,H 2 ,C H 4 ,O 2 0. 90 96 0. 82 74 3. 44 14 1. 80 22 1. 97 18 11 98 .0 79 3 34 .6 13 3 0. 66 88 C al or ifi c va lu e B P N N ,L ay er s: 2, N eu ro ns : (L 1: L2 ) 80 0: 8 A ir ,O 2 0. 73 83 0. 54 51 22 .6 63 8 13 .0 37 8 28 .3 85 9 4. 44 72 2. 10 88 7. 10 23 B P N N ,L ay er s: 2, N eu ro ns : (L 1: L2 ) 80 0: 8 A ir ,O 2 ,O ut le t pr es su re 0. 73 92 0. 54 64 22 .6 56 9 13 .0 27 1 28 .0 10 8 4. 44 44 2. 10 82 7. 58 02 M A R S, pi ec ew is e- lin ea r, 15 B Fs A ir ,O 2 0. 79 91 0. 63 86 20 .1 96 5 11 .2 25 6 23 .0 08 4 3. 53 16 1. 87 92 9. 16 35 M A R S, pi ec ew is e- lin ea r, 17 B Fs A ir ,O 2 ,O ut le t pr es su re 0. 87 30 0. 76 21 16 .3 86 6 8. 74 89 16 .5 88 3 2. 32 49 1. 52 47 13 .4 41 0 SV R ,G au ss ia n ke rn el A ir ,O 2 0. 78 68 0. 61 90 21 .0 84 8 11 .8 00 6 22 .3 99 0 3. 84 91 1. 96 19 0. 51 78 SV R ,G au ss ia n ke rn el A ir ,O 2 ,O ut le t pr es su re 0. 91 30 0. 83 35 13 .7 89 6 7. 20 85 9. 99 55 1. 64 63 1. 28 31 0. 54 46 Table 10. Overall results from the training phase. 347 J. Kačur, M. Durdán, M. Laciak, P. Flegner Acta Polytechnica P re di ce d va ri ab le M et ho d O bs er va ti on s T es ti ng r y Y r 2 yY R R M SE (% ) P I M A P E (% ) M SE R M SE T em pe ra tu re B P N N ,L ay er s: 2, N eu ro ns : (L 1: L2 ) 80 0: 8 C O ,C O 2 0. 67 55 0. 45 63 8. 50 75 5. 07 75 7. 45 26 66 62 .3 75 9 81 .6 23 4 B P N N ,L ay er : 1, N eu ro ns (L 1) 11 C O ,C O 2 ,H 2 ,C H 4 ,O 2 0. 67 87 0. 46 06 7. 43 01 4. 42 61 5. 81 29 50 81 .7 73 2 71 .2 86 6 M A R S, pi ec ew is e- lin ea r, 16 B Fs C O ,C O 2 0. 32 82 0. 10 77 10 .6 82 4 8. 04 27 9. 16 29 10 50 4. 30 10 10 2. 49 05 M A R S, pi ec ew is e- cu bi c, 20 B Fs C O ,C O 2 ,H 2 ,C H 4 ,O 2 0. 60 31 0. 36 37 6. 42 75 4. 00 94 5. 16 23 38 02 .8 74 7 61 .6 67 5 SV R ,L in ea r ke rn el C O ,C O 2 0. 36 50 0. 13 32 8. 45 38 6. 19 33 7. 21 03 65 78 .6 16 5 81 .1 08 7 SV R ,G au ss ia n ke rn el C O ,C O 2 ,H 2 ,C H 4 ,O 2 0. 69 13 0. 47 79 7. 90 51 4. 67 40 6. 96 61 57 52 .3 31 7 75 .8 44 1 C al or ifi c va lu e B P N N ,L ay er : 1, N eu ro ns (L 1) 5 A ir ,O 2 0. 04 01 0. 00 16 19 .8 08 2 19 .0 44 5 22 .9 76 6 5. 25 79 2. 29 30 B P N N ,L ay er : 1, N eu ro ns (L 1) 7 A ir ,O 2 ,O ut le t pr es su re 0. 71 87 0. 51 66 15 .8 21 9 9. 20 55 16 .8 80 4 3. 35 46 1. 83 16 M A R S, pi ec ew is e- lin ea r, 15 B Fs A ir ,O 2 0. 04 19 0. 00 18 80 .7 36 5 77 .4 89 7 30 .3 19 3 87 .3 50 0 9. 34 61 M A R S, pi ec ew is e- cu bi c, 18 B Fs A ir ,O 2 ,O ut le t pr es su re 0. 43 86 0. 19 24 17 .4 62 0 12 .1 38 2 17 .8 49 3 4. 08 61 2. 02 14 SV R ,G au ss ia n ke rn el A ir ,O 2 0. 41 71 0. 17 40 18 .9 02 3 13 .3 38 7 19 .4 81 6 4. 78 80 2. 18 81 SV R ,G au ss ia n ke rn el A ir ,O 2 ,O ut le t pr es su re 0. 59 97 0. 35 96 13 .1 47 1 8. 21 84 11 .3 80 8 2. 31 62 1. 52 19 Table 11. Overall results from the testing phase. 348 vol. 59 no. 4/2019 A comparative study of data-driven modeling methods. . . model fitting with two observations, slightly better results for training were achieved when the piecewise- linear type of the MARS model was used. The MARS model also had the highest time consumption during the training in this case. Fitting the model of the calorific value has reached a worse performance on average than in the case of the temperature. The higher performance index in the training phase is due to the higher variability of inputs and low correla- tion between the inputs and target. The results of fitting with the BPNN take the third-place, because of the worst performance index (PI), both in the case of the calorific value and the underground tempera- ture. In general, to improve the performance index in the training phase, it is suggested to use more input variables. The results from the test phase are shown in Ta- ble 11. These results are not so consistent as in the training phase, especially for the temperature predic- tion where the best results are scattered over indi- vidual methods. In the testing phase, the model was verified on untrained data in order to predict the tar- get variable. The time consumption was not evaluated in the testing phase. For the temperature prediction with five input vari- ables, the best result in terms of performance index was obtained in the utilization of piecewise-cubic type of the MARS model with 20 BFs. The worst result in terms of the performance index was obtained by the SVR with the Gaussian kernel and five input vari- ables. When the SVR with five inputs was used, the predicted temperature value correlated the best with the measured one. With two input variables, the best results were obtained by the BPNN and the worst by the piecewise-linear type of the MARS model. For the calorific value prediction with three input variables, the results with the lowest performance in- dex were obtained for the utilization of the SVR and Gaussian kernel. Results obtained with the utilization of the BPNN take the second-place. When the BPNN with three inputs was used, the predicted calorific value correlated the best with the measured one. The worst results in terms of the performance index were obtained by the piecewise-cubic type of the MARS model with 18 BFs and three inputs. With two input variables, the best results were obtained for the SVR and Gaussian Kernel and the worst in the case of the piecewise-linear type of the MARS model. The pre- diction of the underground temperature has reached a higher performance index on average than in the case of temperature. 5. Summary and Conclusions In this paper, three approaches were examined in or- der to find the best prediction method for the UCG data soft-sensing. A comparison of methods suitable for predicting the UCG data has not been published yet. In the UCG process, it is a complicated to predict some process variables because it is not possible to see the state of the process that runs in an inacces- sible environment. The goal was to find the valid data-driven learning method that allows to estimate the underground temperature or the syngas calorific value from other measurable process variables. Pre- dicting these variables will make it possible to control the UCG process more efficiently. In this paper, only a small amount of measurable input variables from one UCG experiment were used to get a comparison of learning methods. All methods that have been applied considered only one output variable. The re- sulting MARS model can be stored in a PC and is even portable as an analytic equation and the impact of each predictor can clearly ne seen (i.e., the model is easier to understand by humans). In MARS, the prediction is based on a simple and quick calculation of the MARS model formula. In SVR, each variable is multiplied by the corresponding element of each support vector, which can be a slow process if there are many variables and a large number of support vectors. Individual SVR models have been obtained by using the ε-SVR. Applying the kernel trick in the SVR, it allows modelling expert knowledge of the UCG process. The SVR is defined as a convex opti- mization problem where there are no local minima, and therefore, effective optimization methods such as SMO can be used. In the case of NNs, the training is often complicated because there is always a risk of a deadlock at the local minimum of the error function. Also, the learning of NNs is highly complicated by looking for a high number of weights in a multidi- mensional space. In MARS and SVR, it is needed to package program code that provides the prediction with the optimized weights or support vectors. It can be said that all three methods have achieved satisfac- tory results in terms of the underground temperature and syngas calorific value prediction. Regarding the training, the SVR with the Gaussian kernel was the winner. This model best matched the measured data, both in the case of the temperature and the calorific value. Regarding the prediction, the best result was obtained by piecewise-cubic type of MARS model. In these cases, the better results were achieved at all considered input variables of the target variable. The results show that a higher number of input variables increases the predictive performance. Obtained re- sults can be applied to the model predictive control of the UCG process. Acknowledgements This work was supported by the EC Research Programme of the Research Fund for Coal and Steel (Grant No. RFCR- CT-2013-00002), by the Slovak Grant Agency for Science under grant VEGA 1/0273/17, and by the Slovak Research and Development Agency under the contract No. APVV- 14-0892. References [1] G. Ökten, V. Didari. Underground gasification of coal. in: Kural, o. (ed.) coal. Technical report, Istanbul 349 J. Kačur, M. Durdán, M. Laciak, P. Flegner Acta Polytechnica Technical University, University, Istanbul, Turkey, p. 371-378, 1994. [2] J. Kačur, M. Durdán, M. Laciak, P. Flegner. Impact analysis of the oxidant in the process of underground coal gasification. Measurement 51:147–155, 2014. doi:10.1016/j.measurement.2014.01.036. [3] M. Sury, M. White, J. Kirton, et al. Review of Environmental Issues of Underground Coal Gasification, Technical Report COAL R272 DTI/Pub URN 04/1880. Technical report, WS Atkins Consultants Ltd. Department of Trade and Industry, 2010. [4] J. Kačur, M. Durdán, G. Bogdanovská. Monitoring and measurement of the process variable in UCG. In SGEM 2016: 16th International Multidisciplinary Scientific GeoConference, Bulgaria - Sofia : STEF92 Technology, pp. 295–302. 2016. doi:10.5593/SGEM2016/B21/S07.038. [5] M. Durdán, K. Kostúr. Modeling of temperatures by using the algorithm of queue burning movement in the UCG process. Acta Montanistica Slovaca 20(3):181–191, 2015. [6] K. Kostúr. Mathematical modeling temperature’s fields in overburden during underground coal gasification. In ICCC 2014 : Proceedings of the 2014 15th International Carpathian Control Conference (ICCC) Velke Karlovice, May 28-30, pp. 248–253. Danvers : IEEE, 2014. doi:10.1109/CarpathianCC.2014.6843606. [7] M. Koenen, F. Bergen, P. David. Isotope measurements as a proxy for optimising future hydrogen production in underground coal gasification, news in depth, 2015. [8] M. Benková, M. Durdán. Statistical analyzes of the underground coal gasification process realized in the laboratory conditions. In SGEM 2016: 16th International Multidisciplinary Scientific GeoConference, Bulgaria - Sofia : STEF92 Technology, pp. 405–412. 2016. doi:10.5593/SGEM2016/B21/S07.052. [9] L. Fortuna, S. Graziani, A. Rizzo, G. M. Xibilia. Soft Sensors for Monitoring and Control of Industrial Processes. Springer London, 2007. doi:10.1007/978-1-84628-480-9. [10] T. Ji, H. Shi. Soft sensor modeling for temperature measurement of texaco gasifier based on an improved RBF neural network. In 2006 IEEE International Conference on Information Acquisition, pp. 1147–1151. IEEE, 2006. doi:10.1109/icia.2006.305907. [11] A. A. Uppal, A. I. Bhatti, E. Aamir, et al. Control oriented modeling and optimization of one dimensional packed bed model of underground coal gasification. Journal of Process Control 24:269–277, 2014. doi:10.1016/j.jprocont.2013.12.001. [12] A. A. Uppal, A. I. Bhatti, E. Aamir, et al. Optimization and control of one dimensional packed bed model of underground coal gasification. Journal of Process Control 35:11–20, 2015. doi:10.1016/j.jprocont.2015.08.002. [13] A. A. Uppal, Y. M. Alsmadi, V. I. Utkin, et al. Sliding mode control of underground coal gasification energy conversion process. IEEE Transactions on Control Systems Technology 26(2):587–598, 2018. doi:10.1109/tcst.2017.2692718. [14] Q. Wei, D. Liu. Adaptive dynamic programming for optimal tracking control of unknown nonlinear systems with application to coal gasification. Transactions on Automation Science and Engineering, IEEE 11(4):1020– 1036, 2014. doi:10.1109/TASE.2013.2284545. [15] R. Guo, G. X. Cheng, Y. Wang. Texaco coal gasification quality prediction by neural estimator based on dynamic PCA. In Proceedings of the 2006 IEEE International Conference on Mechatronics and Automation, pp. 1298–1302. 2006. doi:10.1109/ICMA.2006.257660. [16] B. Guo, Y. Shen, F. Zhao. Modelling coal gasification with a hybrid neural network. Fuel 76(12):1159–1164, 1997. doi:10.1016/s0016-2361(97)00122-1. [17] S. Liu, Z. Hou, C. Yin. Data-driven modeling for fixed-bed intermittent gasification processes by enhanced lazy learning incorporated with relevance vector machine. In 11th IEEE International Conference on Control & Automation (ICCA), pp. 1019–1024. IEEE, 2014. doi:10.1109/icca.2014.6871060. [18] M. Laciak, J. Kačur, K. Kostúr. The verification of thermodynamic model for UCG process. In ICCC 2016: 17th International Carpathian Control Conference, pp. 424–428. 2016. doi:10.1109/CarpathianCC.2016.7501135. [19] M. Laciak, D. Ráškayová. The using of thermodynamic model for the optimal setting of input parameters in the UCG process. In ICCC 2016: 17th International Carpathian Control Conference, pp. 418– 423. 2016. doi:10.1109/CarpathianCC.2016.7501134. [20] A. M. Winslow. Numerical model of coal gasification in a packed bed. Symposium (International) on Combustion 16(1):503–513, 1977. doi:10.1016/s0082-0784(77)80347-0. [21] P. Ji, X. Gao, D. Huang, Y. Yang. Prediction of syngas compositions in shell coal gasification process via dynamic soft-sensing method. In Proceeding of 10th IEEE International Conference on Control and Automation (ICCA), pp. 244–249. 2013. doi:10.1109/ICCA.2013.6565140. [22] I. H. AL-Qinani. Multivariate adaptive regression splines (MARS) heuristic model: Application of heavy metal prediction. International Journal of Modern Trends in Engineering & Research 3(8):223–229, 2016. doi:10.21884/ijmter.2016.3027.7nuqv. [23] A. Aryafar, R. Gholami, R. Rooki, F. D. Ardejani. Heavy metal pollution assessment using support vector machine in the shur river, sarcheshmeh copper mine, iran. Environmental Earth Sciences 67(4):1191–1199, 2012. doi:10.1007/s12665-012-1565-7. [24] D. E. Rumelhart, G. E. Hinton, R. J. Wiliams. Learning internal representation by error propagation. in: D. E. Rumelhart, J. L. McClelland, and PDP Research Group. Parallel Distributed Processing. Explorations in the Microstructure of Cognition. Vol 1: Foundation, 1987. [25] G. Sampson, D. E. Rumelhart, J. L. McClelland, T. P. R. Group. Parallel distributed processing: Explorations in the microstructures of cognition. Language 63(4):871, 1987. doi:10.2307/415721. 350 http://dx.doi.org/10.1016/j.measurement.2014.01.036 http://dx.doi.org/10.5593/SGEM2016/B21/S07.038 http://dx.doi.org/10.1109/CarpathianCC.2014.6843606 http://dx.doi.org/10.5593/SGEM2016/B21/S07.052 http://dx.doi.org/10.1007/978-1-84628-480-9 http://dx.doi.org/10.1109/icia.2006.305907 http://dx.doi.org/10.1016/j.jprocont.2013.12.001 http://dx.doi.org/10.1016/j.jprocont.2015.08.002 http://dx.doi.org/10.1109/tcst.2017.2692718 http://dx.doi.org/10.1109/TASE.2013.2284545 http://dx.doi.org/10.1109/ICMA.2006.257660 http://dx.doi.org/10.1016/s0016-2361(97)00122-1 http://dx.doi.org/10.1109/icca.2014.6871060 http://dx.doi.org/10.1109/CarpathianCC.2016.7501135 http://dx.doi.org/10.1109/CarpathianCC.2016.7501134 http://dx.doi.org/10.1016/s0082-0784(77)80347-0 http://dx.doi.org/10.1109/ICCA.2013.6565140 http://dx.doi.org/10.21884/ijmter.2016.3027.7nuqv http://dx.doi.org/10.1007/s12665-012-1565-7 http://dx.doi.org/10.2307/415721 vol. 59 no. 4/2019 A comparative study of data-driven modeling methods. . . [26] T. Hastie, R. Tibshirani, J. Friedman. The Elements of Statistical Learning - Data Mining, Inference, and Prediction, Second Edition. Springer New York, 2009. doi:10.1007/b94608. [27] V. Kvasnička, . Beňušková, I. F. J. Pospíchal, et al. Úvod do teórie neurónových sietí. IRIS, Bratislava, 1997. [28] J. Sedláček. Úvod do teorie grafů. Academia, Praha, 1981. [29] J. H. Friedman. Multivariate adaptive regression splines. The Annals of Statistics 19(1):1–67, 1991. doi:10.1214/aos/1176347963. [30] P. Sephton. Forecasting recessions: can we do better on MARS?, 2001. [31] M. Chugh, S. S. Thumsi, V. Keshri. A comparative study between least square support vector machine(lssvm) and multivariate adaptive regression spline(mars) methods for the measurement of load storing capacity of driven piles in cohesion less soil. International Journal of Structural and Civil Engineering Research 2015. doi:10.18178/ijscer.4.2.189-194. [32] V. R. Tselykh. Multivariate adaptive regression splines. Machine learning and data analysis 1(3):272–278, 2012. doi:10.21469/22233792. [33] P. Samui, D. P. Kothari. A multivariate adaptive regression spline approach for prediction of maximum shear modulus and minimum damping ratio. Engineering Journal 16(5):69–78, 2012. doi:10.4186/ej.2012.16.5.69. [34] W. Zhang, A. T. C. Goh. Multivariate adaptive regression splines and neural network models for prediction of pile drivability. Geoscience Frontiers 7(1):45–52, 2016. doi:10.1016/j.gsf.2014.10.003. [35] A. Abraham, D. Steinberg. MARS: Still an alien planet in soft computing? In International Conference on Computational Science - ICCS (Proceedings), Part II, vol. 2, pp. 235–244. Springer Berlin Heidelberg, 2001. doi:10.1007/3-540-45718-6\_27. [36] B. E. Boser, I. M. Guyon, V. N. Vapnik. A training algorithm for optimal margin classifiers. In Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory - COLT’92, Pittsburgh, PA, pp. 144–152. ACM Press, 1992. doi:10.1145/130385.130401. [37] V. N. Vapnik. Constructing learning algorithms. In The Nature of Statistical Learning Theory, pp. 119–166. Springer Verlag, New York, 1995. doi:10.1007/978-1-4757-2440-0\_6. [38] K. R. Müller, A. J. Smola, G. Rätsch, et al. Predicting time series with support vector machines. In Lecture Notes in Computer Science, pp. 999–1004. Springer Berlin Heidelberg, 1997. doi:10.1007/bfb0020283. [39] J. Kačur, M. Laciak, M. Durdán, P. Flegner. Utilization of Machine Learning Method in Prediction of UCG Data. In ICCC 2017: 18th International Carpathian Control Conference, pp. 1–6. IEEE, 2017. doi:10.1109/carpathiancc.2017.7970411. [40] MathWorks. Understanding Support Vector Machine Regression, In: Statistics and Machine Learning Toolbox User’s Guide (R2018ab). regression.html, 2018. [41] J. Kačur, K. Kostúr. Approaches to the Gas Control in UCG. Acta Polytechnica 57(3), 2017. doi:10.14311/ap.2017.57.0182. [42] M. Laciak, K. Kostúr, M. Durdán, et al. The analysis of the underground coal gasification in experimental equipment. Energy 114:332–343, 2016. doi:10.1016/j.energy.2016.08.004. [43] R. L. I. Dobbs, W. B. Krantz. Combustion front propagation in underground coal gasification, Final Report, Work Performed under Grant No. DE-FG22-86PC90512. Technical report, University of Colorado, Boulder Department of Chemical Engineering, 1990. doi:10.2172/6035494. [44] K. Stańczyk, A. Smoliński, K. Kapusta, et al. Dynamic experimental simulation of hydrogen oriented underground gasification of lignite. Fuel 89(11):3307–3314, 2010. doi:10.1016/j.fuel.2010.03.004. [45] K. Kostúr, J. Kačur. Developing of optimal control system for UCG. In Proceedings of the 13th International Carpathian Control Conference (ICCC), pp. 347–352. IEEE, 2012. doi:10.1109/carpathiancc.2012.6228666. [46] K. Kostúr, J. Kačur. Development of control and monitoring system of UCG by promotic. In 2011 12th International Carpathian Control Conference (ICCC), pp. 215–219. IEEE, 2011. doi:10.1109/carpathiancc.2011.5945850. [47] A. H. Gandomi, D. A. Roke. Intelligent formulation of structural engineering systems. In Seventh MIT Conference on Computational Fluid and Solid Mechanics-Focus: Multiphysics and Multiscale, pp. n/a–n/a. Cambridge, USA, 2013. [48] A. H. Gandomi, D. A. Roke. Assessment of artificial neural network and genetic programming as predictive tools. Advances in Engineering Software 88:63–72, 2015. doi:10.1016/j.advengsoft.2015.05.007. 351 http://dx.doi.org/10.1007/b94608 http://dx.doi.org/10.1214/aos/1176347963 http://dx.doi.org/10.18178/ijscer.4.2.189-194 http://dx.doi.org/10.21469/22233792 http://dx.doi.org/10.4186/ej.2012.16.5.69 http://dx.doi.org/10.1016/j.gsf.2014.10.003 http://dx.doi.org/10.1007/3-540-45718-6\_27 http://dx.doi.org/10.1145/130385.130401 http://dx.doi.org/10.1007/978-1-4757-2440-0\_6 http://dx.doi.org/10.1007/bfb0020283 http://dx.doi.org/10.1109/carpathiancc.2017.7970411 http://dx.doi.org/10.14311/ap.2017.57.0182 http://dx.doi.org/10.1016/j.energy.2016.08.004 http://dx.doi.org/10.2172/6035494 http://dx.doi.org/10.1016/j.fuel.2010.03.004 http://dx.doi.org/10.1109/carpathiancc.2012.6228666 http://dx.doi.org/10.1109/carpathiancc.2011.5945850 http://dx.doi.org/10.1016/j.advengsoft.2015.05.007 Acta Polytechnica 59(4):322–351, 2019 1 Introduction 1.1 Understanding UCG Technology 1.2 Measurement and Monitoring in UCG 1.3 Modeling and Prediction in UCG 2 Analysis of Selected Modeling Methods 2.1 Multilayer Feed-Forward Neural Networks 2.2 Multivariate Adaptive Regression Splines 2.3 Support Vector Regression 3 Experimental UCG in Ex-Situ Reactor 4 Results and Discussion 4.1 Prediction by the Back-Propagation NN 4.2 Prediction by the MARS 4.3 Prediction by the Support Vector Regression 4.4 Overall Results 5 Summary and Conclusions Acknowledgements References