Acta Polytechnica


DOI:10.14311/AP.2019.59.0322
Acta Polytechnica 59(4):322–351, 2019 © Czech Technical University in Prague, 2019

available online at https://ojs.cvut.cz/ojs/index.php/ap

A COMPARATIVE STUDY OF DATA-DRIVEN MODELING
METHODS FOR SOFT-SENSING IN UNDERGROUND COAL

GASIFICATION

Ján Kačur∗, Milan Durdán, Marek Laciak, Patrik Flegner

Technical University of Košice, Faculty BERG, Institute of Control and Informatization of Production
Processes, Němcovej 3, 040 01 Košice, Slovak Republic

∗ corresponding author: jan.kacur@tuke.sk

Abstract. Underground coal gasification (UCG) is a technological process, which converts solid coal
into a gas in the underground, using injected gasification agents. In the UCG process, a lot of process
variables can be measurable with common measuring devices, but there are variables that cannot be
measured so easily, e.g., the temperature deep underground. It is also necessary to know the future
impact of different control variables on the syngas calorific value in order to support a predictive control.
This paper examines the possibility of utilizing Neural Networks, Multivariate Adaptive Regression
Splines and Support Vector Regression in order to estimate the UCG process data, i.e., syngas calorific
value and underground temperature. It was found that, during the training with the UCG data, the
SVR and Gaussian kernel achieved the best results, but, during the prediction, the best result was
obtained by the piecewise-cubic type of the MARS model. The analysis was performed on data obtained
during an experimental UCG with an ex-situ reactor.

Keywords: Underground coal gasification, syngas calorific value, underground temperature, time
series prediction, machine learning, soft-sensing.

1. Introduction
1.1. Understanding UCG Technology
Underground coal gasification (UCG) represents an
in-situ controlled combustion of coal where valuable
gases (i.e., syngas) are produced. The UCG represents
an alternative to traditional coal mining methods.
The UCG allows to mine coal from deep coal seams,
seams affected by tectonic disturbances, seams with a
low grade, or seams that have a thin stratum profile.
Various coal types can be gasified, e.g., lignite or
bituminous. The UCG offers a low surface damage,
low solid waste discharge and lower emissions of SO2,
NOx to the air than the traditional coal mining.

For an industrial gasification, at least two boreholes
should be drilled (i.e., inlet and outlet). Inlet borehole
serves as a supply well for gasification agents (i.e.,
air, oxygen, and steam), and outlet borehole as the
exhaust of the produced syngas. Inlet and outlet
boreholes are usually linked by various methods in
order to create a gasification channel [1].

The main chemical reactions that occur during the
UCG are drying, pyrolysis, combustion, and gasifi-
cation of solid hydrocarbons. For the improvement
of the UCG, it must be ensured that combustion re-
actions produce sufficient energy for the heating of
reactants. It is also necessary to overcome the heat
losses from the georeactor and to support the rate of
endothermic gasification reactions [2]. The UCG is
performed as an autothermic process where the heat
in the coalbed is generated by an injection of oxygen
from the injection well and by means of combustion re-

actions with carbon. The UCG essentially represents
the acquisition of a spatially and thermally decom-
posed reaction zone in the coalbed, which overlaps
regions of coal oxidation, coal reduction, and coal py-
rolysis. The incoming air causes that the coal burns,
the exothermic process releases heat and consumes
oxygen. When coal is heated and CO is produced, the
Boudoard chemical reactions (i.e., CO2+C ⇒ 2CO)
is one of the most important chemical reaction. Raw,
pure gas from the UCG consists predominantly of H2,
CO, CO2, CH4, higher hydrocarbons, tar, impurities
and small quantities of SOx, NOx, and H2S [3]. In
terms of the calorific value, gases, such as CO, H2
and CH4, are valuable, but higher hydrocarbons also
contribute to the calorific value. The syngas can be
used for generating electricity, to produce synthetic
natural gas or various chemical products.

1.2. Measurement and Monitoring in
UCG

The efficiency of the coal-to-gas transformation de-
pends on the UCG monitoring and control and the
various coal seam parameters. The main reason for
the UCG monitoring is operating the technology more
efficiently and increasing the quality of the produced
gas, cost reduction and to meet regulatory require-
ments. Monitoring also informs about the control
decision effects, injection rates, syngas composition,
temperatures, pressures, cavity size, fractures, and
when to stop the gasification.

In the UCG, various process variables can be moni-
tored. These variables can be used for the data-driven

322

https://doi.org/10.14311/AP.2019.59.0322
https://ojs.cvut.cz/ojs/index.php/ap


vol. 59 no. 4/2019 A comparative study of data-driven modeling methods. . .

Figure 1. Scheme of measurement and control in UCG.

323


J. Kačur, M. Durdán, M. Laciak, P. Flegner Acta Polytechnica

modeling of the process behaviour. In the terms of the
process control, it is needed to monitor volume flows
and pressures (i.e., overpressure) of injected oxidizers
(i.e., air, oxygen, water vapor). On the outlet, the
volume flow of produced syngas and regulated under-
pressure can be monitored. Of course, it is needed
to monitor concentrations of syngas components that
affect the calorific value (e.g., CO, CO2, CH4 and
H2). The volume flow, pressure, and composition of
injected gasification agents can substantially affect
the composition of the produced syngas [2].

The measurement of the temperature inside oxidiz-
ing and reducing zones is most problematic. There
are two methods of an indirect measurement of the
underground temperature - from the current syngas
composition [4]) or method based on rules of heat
transfer [5, 6].
Recently, methods of monitoring the underground

temperature by measuring carbon isotopes and by
measuring emissions of radon to the surface ap-
peared [7]. Figure 1 presents the basic scheme of
the process variables measurement.

1.3. Modeling and Prediction in UCG
In the last years, an increased demand for an online
and accurate measurement of some process variables
that cannot be measured by conventional methods has
occurred. This is the measurement of process variables
in an aggressive (e.g., high temperature) or physically
inaccessible (e.g., underground) environment. Simi-
larly, in the UCG, modelling and prediction methods
need to be applied in order to determine the process
parameters when going deep underground. These
variables are decisive in increasing the efficiency and
quality of the production. For this reason, different
predictors and models, which can calculate desired
process variables based on other observations, are de-
veloped and applied. These models often serve as
support systems for the control of the technological
process.
Predictive modeling usually uses statistics to pre-

dict the future behaviour of the process and is often
associated with machine learning. The most popular
are methods of a regression analysis where the output
is a regression model. Almost every regression model
can serve for the prediction. Some primordial regres-
sion analyses of the UCG were previously performed
in [8].

The time series prediction is a challenging research
area with broad application prospects. The soft-
sensing methods for the data estimation and predic-
tion are widely used in the industry. Various ap-
proaches to modelling and data prediction have been
explored in the world. Unfortunately, there is only
scarce evidence of the UCG models oriented for pro-
cess control and soft-sensing.
Soft sensors based on data-driven predictive mod-

elling are very useful in industry, especially in oper-
ations where important process variables cannot be

measured directly by a conventional hardware. Soft
sensors use various models that enable real-time esti-
mating process variables without a hardware sensor.
They can provide less expensive and quicker process
data than slow and costly hardware devices. How-
ever, the soft sensors can be run in parallel with the
hardware devices for the measurement [9].

Well-known software algorithms that can be seen as
soft sensors include Kalman filters. More recent im-
plementations of soft sensors in the UCG use Neural
Network (NN) or Fuzzy Computing. Unfortunately,
there is only scarce evidence of using the machine
learning methods for a prediction of the underground
temperature, syngas calorific value or syngas compo-
sition in the UCG.
For example, Ji and Shi [10] have used a hybrid

radial basis function (RBF) NN as a learning scheme
for the temperature prediction of Texaco gasifier. In
order to increase the performance of the NN, the
number of hidden neurons was determined by a fuzzy
C-means algorithm and particle swarm optimization
algorithm. Recently, Uppal et al. [11–13] have pro-
posed a control oriented one dimensional packed bed
model of the UCG for an estimation of the syngas
composition. This model works with a connection to
the sliding mode controller to maintain the desired
syngas calorific value.

Learning schemes for the coal gasification to support
the process control can also be found in [14]. Multiple
Neural Network (MNN) for the syngas composition
prediction and dynamic principal component analysis
was proposed in [15]. Other researchers, e.g., Guo et
al. [16] have modeled coal gasification with a hybrid
NN. A model of a coal gasification was developed,
incorporating a first-principles model with an NN
parameter estimator. The hybrid NN was trained
with experimental data for the two coals and gave a
good performance in the process modeling.
Other effective methods have also been applied to

the gasification. Liu et al. [17] have proposed a data-
driven modeling for fixed-bed intermittent gasification
processes inside UGI gasifiers by an enhanced lazy
learning combined with a relevance vector machine.
Authors have used the Bayesian learning framework
for the modeling gasifier’s temperature. The effective-
ness of the enhanced lazy learning approach combined
with the relevance vector machine for the modelling
of the UGI gasification processes has been verified by
a series of experiments based on the data collected
from practical fields. Similarly, for the same problem
of the data-driven modeling for the UGI gasification
process, a variable structure of a genetic BP NN was
used in [17]. The UGI represents a gasification named
by UGI Company. The UGI gasifier is an atmospheric
fixed bed, solid-state slag coal gasification equipment.
The prediction of the syngas composition based on
the thermodynamic model can be found in [18, 19].
In the past, the application of the one-dimensional
time-dependent numerical computational model of the

324


vol. 59 no. 4/2019 A comparative study of data-driven modeling methods. . .

UCG in a packed bed has also been investigated with
a verification on laboratory measurements [20]. The
model based on nonlinear partial differential equa-
tions was capable of estimating the syngas composi-
tion and temperature distribution. A novel dynamic
soft-sensing method based on an impulses response
template for the Shell coal gasification process was
developed in [21]. The proposed model can predict
the syngas composition during the coal gasification.

The application and comparison of the efficiency
of various learning methods, i.e., NN, Adaptive Re-
gression Splines (MARS) or Support Vector Machines
(SVMs) in the UCG data prediction has not yet been
the subject of an extensive study, but similar applica-
tions in steel making processess and biomass gasifica-
tion are registered (e.g., [22, 23]).

The purpose of the UCG monitoring is to provide
a better understanding of how the syngas is produced.
For this reason, it is needed to know what temperature
is reached in the underground oxidizing zone.

This work examines a potential learning method
that can be implemented in the proposal of the soft
sensor for the data prediction in the UCG. An under-
ground geo-reactor is different from another industrial
plant because the coal seam was created by nature and
it is not possible to see what is in the underground
when the UCG is in progress. For the UCG, new
research and technologies are aimed to make measure-
ments of process variables faster and non-destructive,
which would allow having a smart non-intrusive qual-
ity sensor at hand. In this paper, in order to training
data-driven modeling, a data set from an experimental
UCG was used.

In the data-driven (i.e., black box) modelling, input
and output data are used in order to create a statis-
tical model. In order to find a prediction apparatus
for the UCG data prediction, the machine learning
approach has been examined. One of the interesting
advantages of the machine learning is that a system,
randomly initialized and trained on some data sets,
will eventually learn good feature representations for
a given task.

In the following sections, three learning methods,
Back-Propagation NN (BPNN), MARS and Support
Vector Regression (SVR), are examined in order to
support soft-sensing in the UCG. Predictive methods
are evaluated using statistical approaches and calculat-
ing the performance index. The methods were applied
to the experimental data obtained from the experi-
mental trial of the UCG. The results from the three
methods are compared to each other for determining
which method is the most suitable for the UCG.

2. Analysis of Selected Modeling
Methods

2.1. Multilayer Feed-Forward Neural
Networks

The inherent non-linear structure of the NN is well
suited for solving many real-world problems. In recent
years, several models of NNs have been designed and
optimized to solve a specific problem. NNs models
have an excellent ability to learn from experience
and are also suitable non-parametric methods that
do not require many limiting factors. A multilayer
feed-forward neural networks are most commonly used
as a universal means for classification and prediction.
They consist of sensoric units, so-called input nodes,
that form an input layer, one or more hidden layers
with counting nodes and an output layer also with
counting nodes. The signal passes through the network
forwards across the individual layers. In a multi-layer
feed-forward NN, all neurons of the previous layer are
linked to each neuron of the following layer. However,
there are no interconnections between the neurons
at the level of the same layer as well as the direct
interconnection of the input layer neurons with the
neurons that are two layers further.

In this paper, the back-propagation algorithm was
used for the NN modeling of the UCG. This simple
gradient algorithm was proposed by [24, 25]. There
are more approaches to explaining the principle of
NNs and the back-propagation method, e.g., using
the projection pursuit regression (PPR) [26]. In this
paper, a graph-oriented approach with an extensive
description that can be found in [27] has been used.
The input and output scheme considered for the NN
for the UCG data prediction is shown in Section 4.1
(see Figure 7).

Formally, the NN is defined as the oriented graph
G = (V,E) where V = {v1,v2, ...,vN} is the set of
verteices and E = {e1,e2, ...eM} the set edges. Denote
a non-empty vertex or edge set of the graph G, con-
taining N nodes (neurons) and M (connections). The
set of V neurons is distributed to disjunctive subsets
of V = VI ∪VH ∪VO where VI contains NI input neu-
rons, which are adjacent to only the outgoing edges.
VH contains NH hidden neurons, which are adjacent
to the outgoing edges as well as to the incoming ones.
Finally, VO contains NO output neurons, which are
adjacent only to incoming edges.

For an acyclic NN, the neurons can be arranged into
layers where L1 = VI is an input layer (i.e., contains
only input neurons), L2,L3, ...,Lt−1 are hidden layers
and Lt is an output layer. The NN determined by the
acyclic graph is usually chosen so that the neurons
from the two adjacent layers are joined together by
all possible connections. Neurons and connections
are rated by real numbers. Each neuron vi is rated
by a threshold ϑi and an activity xi. Similarly, each
connection (vj,vi) is rated by a weighting coefficient
(or simply, by weight) wij. The activities of hidden

325


J. Kačur, M. Durdán, M. Laciak, P. Flegner Acta Polytechnica

and output neurons are determined by the following
equation [27]:

xi = t(ξi) (1)

ξi =
∑
j∈Γ−1

i

wijxj + ϑi (2)

where a summation runs through neurons which are
predecessors of the neuron vi and variable ξ is called
the potential of the neuron vi.

For defining the oriented graph G, we used the view
of Γ that assigns to each vertex v ∈ V a subset Γ(v) ⊂
V that contains those neurons that are endpoints on
the connections that are going out from the vertex
v [28].
The neurons of the Γ(v) subset are called the de-

scendents of the vertex v in the graph G. The “inverse”
Γ−1 view will assigns to each vertex v ∈ V a subset
Γ−1(v) ⊂ V of the gamma composed of the “descen-
dents” of the vertex v in the graph G.

Neuronal activities form a vector x = (x1,x2, ...xN ).
This vector can be formally decomposed into three
subsets containing input, hidden, and output activities

x = xI ⊕ xH ⊕ xO (3)

Hidden activities are not explicitly mentioned; they
only play the role of intermediate results. In general,
to calculate activities from the layer Li (where i > 1),
it is only needed to know the activities from the lower
layers L1,L2, ...,Li−1. In this recursive manner, the
activities of all neurons can be gradually calculated.
Activities of output neurons are calculated as the last.
For this reason, for NNs represented by the acyclic
graph, the name εfeed-forward NNsε is used. The
adaptation of the NN is based on searching for such
threshold and weighting coefficients, which, for a given
pair of input vector of activities xI and desired output
vector of activities x̂O i.e., xI/x̂O and the calculated
output vector xO, minimize the difference between the
output activities xO and x̂O. Vector x̂O represents
the desired measured (i.e., experimental) data.
The aim of the adaptation process is to find such

thresholds and weighting coefficients that minimize
the objective function E. For more pairs of input and
output vectors

x(1)I /x̂
(1)
O , x

(2)
I /x̂

(2)
O , ..., x

(r)
I /x̂

(r)
O , (4)

which represent the training set, the objective function
has the following form:

E =
r∑
i=1

Ei =
r∑
i=1

1
2

(x(i)O − x̂
(i)
O )

2 (5)

where x(i)O is the output vector of the NN as a
response to the input vector x(i)I and x̂

(i)
O the desired

output vector is assigned to the input x(i)I .

This minimization of the non-linear objective func-
tion can be performed by many optimization methods
known in numerical mathematics.
The most effective are so-called gradient methods,

based on the use of a gradient of the objective function
for the iterative construction of an optimal solution.

When calculating the gradient of the objective func-
tion that contains more than one pair of input-otput
vectors xI/x̂O (see equation (5)) then, the overall gra-
dient of the objective function is simply determined
as the sum of the gradients for all pairs xI/x̂O of the
training set (4).

grad E =
r∑
i=1

grad E(i) (6)

where the objective function E(i) is defined for i-th
pair xI/x̂O of the training set.

Formally, the adapted NN is described by the coef-
ficients determined as

(w, ϑ) = argmin
(w,ϑ)

E(w, ϑ) (7)

The settings of an applied BPNN and evaluation
of its performance on the UCG data prediction is
discussed in Section 4.1.

2.2. Multivariate Adaptive Regression
Splines

Multivariate Adaptive Regression Splines (MARS) is
the method of regression that was developed by [29].
Many works have been published that discussed the
MARS method [26, 30–34]. It is a non-parametric
regression technique that looks like an extension of
linear models. This technique automatically mod-
els non-linearities and interactions between variables.
This technique is also more flexible than linear models
and is suitable for processing large data series. This
technique can serve for a quick prediction of time se-
ries. MARS is similar to recursive partitioning where
input data are divided into discontinuous regions of
varying size. Then, the local model is created for
each region. The size of each area is set by MARS as
required. In MARS, these regions are smaller when
the relationship between input and output is more
complex. MARS, like the recursive partitioning tech-
nique, performs the automatic selection of variables,
so the model includes important (useful) variables
and excludes non-essential (as opposed to NN). The
MARS model is adapted based on the input train-
ing data, and cross-validation is used to validate the
resulting model. The resulting model may not only
be stored in a PC but is also portable, and there is
easy to see the impact of each predictor (the model
is easier to understand by humans). In order to cre-
ate the MARS model, the training data vectors, i.e.,
the inputs (observations) and outputs (targets) are
needed. Training data are split into several splines on
an equivalent interval basis [29].

326


vol. 59 no. 4/2019 A comparative study of data-driven modeling methods. . .

The data are, in each spline, split into many sub-
groups, and several knots are created that can be
placed between different input variables or different
intervals in the same input variable to separating sub-
groups [31].
In MARS, the regression function called a basis

function (BF) is approximated by smoothing splines
for a general representation of data in each subgroup.
Between any two knots, the model can characterize the
data either globally or by using the linear regression.
The BF is unique between any two knots and is shifted
to another BF at each knot [29, 35].
The two BFs in two adjacent domains of data in-

tersect at the knot to make the model outputs con-
tinuous. MARS creates a curved regression line to
fit the data from subgroup to subgroup and from one
spline to another spline. For evading over-fitting and
over-regressing, the shortest distance between two
neighboring knots is predetermined to prevent too few
data in a subgroup.
In the MARS method, the goal is to find the de-

pendency of variables yi on one or more independent
variables xi. The following regression sample is con-
sidered:

D = {xi,yi}Ni=1 = {x1i, ...,xni,yi}
N
i=1 (8)

where xi ∈ Rp is the i-th vector of the independent
variable, yi, (i = 1, ...,N) is the dependent variable
, N is the number of independent variables, n is the
number of data in xi. The relationship between yi
and xi (i = 1, ...,N) can be represented as:

yi = f(x1i ,x
2
i , ...,x

p
i ) + ε = f(xi) + ε, (9)

where f is an unknown function, and ε is an er-
ror (ε ∼ N(0,σ2)). The single valued deterministic
function f, captures the joint predictive relationship
of yi on (x1i ,x2i , ...,x

p
i ). The additive stochastic com-

ponent ε, whose expected value is determined to be
zero, usually reflects the dependence of yi on values
other than (x1i ,x2i , ...,x

p
i ), that are neither controlled

nor observed.
In the one-dimensional case, splines are expressed

in terms of piecewise linear basis functions, (x− t)+
and (t−x)+ with the node in t. The “+” means a
positive part. These functions are truncated linear
functions, for x ∈ R.

(x− t)+ =
{
x− t, If x > t,
0, otherwise

and (t−x)+ =
{
t−x, If x < t,
0, otherwise

(10)

Each function (i.e., (x−t)+ and (t−x)+) is piecewise
linear, with a knot at the value t. They are marked
as linear splines. These two functions are named as a
reflected pair.

In the multidimensional case the idea is to form re-
flected pairs for each input component xj of the vector
x = (x1, ...,xj, ...,xp)T with knots at each observed

value xji of that input (i = 1, 2, ...,N; j = 1, 2, ...,p).
Thus, a set of constructed basis functions can be rep-
resented in the form:

C =
{

(xj − t)+, (t−xj)+|t ∈{xj1,x
j
2, ...,x

j
N},

j ∈{1, 2, ...,p}}
(11)

If all input data are different, then, in the set of
2Np basis functions, each of them depends on only
one variable xj. For example, B(x) = (xj − t)+ is
regarded as a function over the entire input space Rp.
The basic functions used for approximation are as
follows:

Bm(x) =
Km∏
k=1

[
sk,m · (xv(k,m) − tkm)

]
+

(12)

where Km is the total number of truncated linear
functions in the m-th basis function (i.e., it is the
number of “splits” that gave rise to Bm), xv(k,m) is
the component of the vector x, related to the k-th
truncated linear function in the m-th basic function,
tkm is the corresponding node, and skm ∈{±1}. Km
is the user defined degree order of the interaction term
and sk,m represents the direction of the univariate
term, which could be positive or negative.

The model-building strategy is like a forward step-
wise linear regression, but instead of using the original
inputs, it is allowed to use functions from the set C of
their products. Therefore, the MARS model can be
expressed by the following equation [26]:

y = f̂(x) + ε = c0 +
M∑
m=1

cmBm(x) + ε (13)

where y is the output variable, x is the vector of
input variables, M is the number of basis functions
in the model (i.e., number of spline functions, c0 is
the coefficient of the constant basis function B0, and
sum is over the basis functions Bm produced by al-
gorithm that implements the stepwise forward part
of the MARS strategy by incorporating the modifica-
tion to recursive partitioning. The coefficients cm are
estimated by minimizing the residual sum-of-squares
(i.e., by standard linear regression). Bm(x) is the
m-th function in C, or a product of two or more such
functions.
The most important thing in this model is the

choice of basis functions. In the beginning, the model
contains a single function B0(x) = 1 and all functions
from the set C are possible candidates for an inclusion
in the model. As in the linear regression, setting Bm,
the coefficients cm can be found by the method of
least squares.
Another subroutine of MARS performs the back-

ward deletion strategy wherein each iteration causes
one unnecessary (i.e., redundant) basis function to be
deleted. The inner loop of the algorithm will select

327


J. Kačur, M. Durdán, M. Laciak, P. Flegner Acta Polytechnica

one function to be deleted. A function whose removal
either mostly improves the fit or at least degrades it
will be deleted. However, the constant basis function
B1(x) = 1 is never removed.

The settings of MARS, its result forms, and evalua-
tion of its performance are discussed in Section 4.2.

2.3. Support Vector Regression
Back-propagation NNs are capable of representing
general non-linear functions, but its disadvantage is
often very difficult teaching, because, practically, there
is always a risk of a deadlock in the local minimum
of the error function, and in addition, the learning is
highly complicated by looking for a high number of
weights in the multidimensional space. An alternative
and relatively new approach are the so-called Support
Vector Machines (SVMs). The SVMs are used for
time series prediction and classification tasks. These
methods represent the field of the so-called kernel ma-
chines and exploit the benefits provided by effective
algorithms for finding a linear boundary while being
able to represent highly complex non-linear functions.
Kernel functions methods try to find an optimal linear
separator. The optimal linear separator in an SVM al-
gorithm is searched using the quadratic programming
method.

In Support Vector Regression (SVR) the data x ∈X
are mapped into a high-dimensional feature space F
via a nonlinear mapping Φ, and to do linear regression
in this space [36, 37].
Input data (i.e., observations) are represented by

vector x = (x1, x2, ..., xl) where l denotes the size of
the sample.
Taking into account regression with one tar-

get variable y, so observations on the examined
process can be written as a sequence of pairs
(x1,y1), ..., (xi,yi), ..., (xl,yl), xi ∈ Rn, yi ∈ R. Vec-
tor xi represents one pattern of input observations
xi = (xi1,xi2, ...,xin). In the case of observing process
variables during the UCG, this vector may contain
measured data from the database.
Thus, a linear regression in a high dimensional

(feature) space corresponds to a non-linear regression
in the low dimensional input space Rn.

The whole problem of the SVR can be rewritten in
terms of dot products in the low dimensional input
space [38].

f(x) =
l∑
i=1

(αi −α∗i )(Φ(xi) · Φ(x)) + b =

l∑
i=1

(αi −α∗i )k(xi, x) + b
(14)

Given two points xi, xj ∈ X , the function that
returns the inner product between their images in
the space F is known as the kernel function. In
equation (14) a kernel function k(xi, xj ) = (Φ(xi) ·
Φ(xj )) is introduced.

Kernel type Kernel function

Gaussian (RBF) kernel k(xi, xj ) = e−γ||xi−xj||
2

Linear kernel k(xi, xj ) = xTi xj
Polynomial kernel k(xi, xj ) = (γ(xTi xj + 1))

d

Sigmoid kernel k(xi, xj ) = tanh (γxTi xj + d)

Table 1. Overview of common kernels used by SVR
(γ is a kernel parameter controlling the sensitivity of
the kernel function, and d is an integer).

Prameters αi, α∗i are the solutions of the quadratic
programming problem [37]. These parameters have an
intuitive interpretation as forces pushing and pulling
the estimate f(xi) towards the measurements yi. Pa-
rameter b is a threshold.

Common kernels are summarized in Table 1.
In this paper, the linear epsilon-insensitive SVM

(ε-SVM) regression has been used.
For this special cost function, the Lagrange mul-

tipliers αi, α∗i are often sparse, i.e., they result in
non-zero values after the optimization only if they are
on or outside the boundary, which means that they
fulfill the Karush-Kuhn-Tucker (KKT) conditions.

The ε-insensitive cost function is given by
C(f(x) −y) ={

|f(x) −y|−ε for |f(x) −y| ≥ ε
0 otherwise

(15)

In ε-SVM regression, the set of training data in-
cludes predictor variables and observed response val-
ues. The goal is to find a function f(x) that deviates
from y by a value no higher than ε for each training
point x, and at the same time is as flat as possible.
In SVR the kernel matrix K = (k(xi, xj ))li,j=1

(xi, xj ∈ X) is introduced. It is a symmetric pos-
itive definite matrix of inner products between all
pairs of points {xi}li=1. Each element represents the
inner product of the predictors transformed by Φ.
However, it is not needed to know Φ, because the ker-
nel function can generate the kernel matrix directly.
Using this approach, the non-linear SVR finds the
optimal function f(x) in the transformed predictor
space. The prediction of new values is based on a
function that depends only on the support vectors:

f(x) =
l∑
i=1

(αi −α∗i ) K(xi, x) + b (16)

where α and α∗ are non-negative Lagrange multi-
pliers for each observation x. A threshold b can be
determined from the Lagrange multipliers.

Lagrange coefficients can be found by minimization
of the following function [39]:

L(α) =
1
2

l∑
i=1

l∑
j=1

(αi −α∗i )
(
αj −α∗j

)
K(xi, xj )+

ε

l∑
i=1

(αi + α∗i ) −
l∑
i=1

yi(α∗i −αi)

(17)

328


vol. 59 no. 4/2019 A comparative study of data-driven modeling methods. . .

subject to the constraint

l∑
i=1

(αi −α∗i ) = 0; ∀i : 0 ≤ αi ≤ C;

∀i : 0 ≤ α∗i ≤ C
(18)

The KKT complementarity conditions are

∀i : αi(ε + ξi −yi + f(xi)) = 0
∀i : α∗i (ε + ξ

∗
i + yi −f(xi)) = 0

∀i : ξi(C −αi) = 0
∀i : ξ∗i (C −α

∗
i ) = 0

(19)

where slack variables ξ and ξ∗ for each point en-
sure that regression errors are up to the value of ξ
and ξ∗ and meet the desired conditions. KKT com-
plementarity conditions are optimization constraints
required to obtain the optimum. These conditions in-
dicate that all observations strictly inside the epsilon
tube have Lagrange multipliers αi = 0 and α∗i = 0.
Observations with nonzero Lagrange multipliers are
called support vectors. The constant C is the box
constraint, a positive numeric value that controls the
penalty imposed on observations that lie outside the
epsilon margin (ε) and helps to prevent overfitting,
i.e., regularization [40].
The minimization problem can be solved by com-

mon quadratic programming techniques, e.g., the
Chunking and working set method, Sequential mini-
mal optimization(SMO), or Iterative single data algo-
rithm (ISDA).
For the modeling of the UCG process, the epsilon-

insensitive SVM (ε-SVM) regression has been used.
To solve the optimization problem, the algorithm of
SMO has been used.

3. Experimental UCG in Ex-Situ
Reactor

For the purpose of verifying the UCG, laboratory
equipment has been created. The base is an experi-
mental coal gasifier, i.e., an ex-situ reactor or so-called
syngas generator (see Figure 2) and a set of devices
for measurement and control. The ex-situ reactor was
constructed so that the bedding of coal with overbur-
den and under-burden layers can simulate the real
coal seam. Several experiments as trials of a real
UCG were performed with the ex-situ reactor. This
laboratory gasification equipment was well described
in [41, 42].
Similar trials of the UCG on a laboratory ex-situ

reactor can be found in [43, 44]. Various gasifica-
tion agents (i.e., oxidizers), ways of bedding coal and
monitoring of the UCG process were tested there.

The gasification in the ex-situ reactor is based on the
control of the flow of the inlet gasification agents (i.e.,
air and oxygen) and pressure on an outlet. Lignite
coal from the Slovak mine that is suitable for the UCG
was gasified. The composition of the coal that was

Figure 2. Experimental coal gasifier (ex-situ reactor).

gasified and factors that affect the UCG can be found
in [41]. The influence of various gasification agents
(i.e., its flows and pressures) on the syngas quality
was discussed in [2].

The bedding of coal in the ex-situ reactor was made
on the basis by the rules of the similarity theory. The
goal was to obtain the similarity with the real coal
seam. Blocks of coal merged into one coal unit were
used when preparing a physical model of the coalbed.
In order to make the physical model airtight, layers of
over-burden and under-burden contained sand mixed
with water glass. In addition, the reactor was tilted
at 10◦, in order to get as close as possible to the
real coal seam. The coal used in the experiment was
extracted from the underground coalbed with the
same inclination. This coalbed (i.e., in mine Cigel
- Slovakia, overburden bed) has the potential to be
mined in the future by the UCG.
Air, as the primary oxidant, was blown into the

pressure vessel by two compressors. The pressure of
air injected into the ex-situ reactor was adjusted by
a reducing valve. The air flow was controlled by the
servo valve and measured by a differential pressure
sensor with a centric orifice that was installed in a
pipeline. Similarly, the flow of the produced syngas
was measured, but the segment orifice was used. The
oxygen flow and pressure were controlled by two re-
ducing valves. As a source of the technical oxygen,
pressure cylinders were used. Technical oxygen was
injected as an auxiliary oxidant into the mixing cham-
ber where it was mixed with air, and the mixture was
then injected into the ex-situ reactor. The pressures
of the oxidants were measured by a set of pressure
transducers. The K-type thermocouples were used to
measure the temperatures in the coal model. These

329


J. Kačur, M. Durdán, M. Laciak, P. Flegner Acta Polytechnica

thermocouples allow to measure the temperature up
to 1300 ◦C. Thermocouples were placed in ceramic
tubes (for protection in an oxidation-reducing atmo-
sphere) and then inserted into holes drilled in the
physical model. The temperature of the coal along
the gasification channel, in the overburden and the
under-burden was measured. On the outlet from the
reactor, the sample of syngas was captured to be anal-
ysed by two stationary analysers. The concentration
of CO, CO2, O2, CH4, H2 in the syngas was constantly
measured . The syngas calorific value was continually
calculated from the composition of the syngas. The
pressure at the outlet was controlled by a vacuum fan.
The power of the fan was controlled by a frequency
inverter. The outlet pressure was measured by one
pressure transducer. The produced syngas was finally
burned in the combustion chamber. The UCG was
tried in two different reactors.

Figure 3 shows a complete technological scheme of
the experimental gasification with one generator. All
devices for measurement and control (i.e., pressure
transducers, differential pressure transducers, servo-
valve, thermocouples, switching relays, frequency con-
verter with fan, compressors and gas analysers) were
connected to a PLC that provided the basic control
loop gasification (i.e., on-off control of compressors, air
flow stabilization, temperature stabilization, oxygen
concentration stabilization). The PLC was connected
to a PC that performed data storage, optimal con-
trol [45], and process monitoring [46].
The SCADA/HMI system Promotic was used for

the monitoring of the process and setting the con-
trollers [46]. A detailed description of all devices that
were used for the measurement and control was pre-
sented in [41]. The measured process variables (i.e.,
flows of gasification agents, pressures, temperatures,
syngas composition and calorific value) were recorded
in a database which may later be processed as a set
of time series. The PC with CPU Intel®CoreTM i5-
4300U (2.9 GHz) and 8 GB RAM was used for all
calculations.

4. Results and Discussion
The flowchart for the proposed soft-sensing in the
UCG is shown in Figure 4. This paper focuses on the
evaluation of the potential predictive methods that
could be used in soft-sensing. Machine learning models
are generalized to data similar to those on which they
were trained. Although static models, which are time-
independent, i.e., they work on a single data set, have
been used, their application to the dynamic process
should be improved by the continual updating of the
training set with the online data. The development
of practical on-line prediction soft sensors consists of
two stages: training and on-line prediction. A data
set from the experimental UCG was used in order to
train a data-driven modeling algorithm.

Three prediction methods analysed in the previous
section have been applied in order to predict the un-

derground temperature in the oxidizing zone of ex-situ
reactor and syngas calorific value. The data obtained
during the experimental UCG with the laboratory
equipment have been used in the analyses. As an
underground temperature in the oxidizing zone, the
highest temperature along the gasification channel
during the experiment is considered. Observations
and target data measured during a one well-running
experiment were used in this paper for the analysis.
When the learning method is applied, it is con-

venient to divide observed data into a training set
Atrain and test set Atest (i.e., validation set). When
choosing a training and test sets, it is recommended to
ensure that the data used for the model testing covers
a significant range of variations that are supposed
to be encountered during the use of the soft sensor.
Given that there is no exact rule in the literature
how to divide the data into a training and test set for
specific learning methods (there are some different rec-
ommendations and instruction for experimentation),
the models were tested on data from the 10 % and 20
% of the experiment. In general, a higher performance
of selected methods was obtained with more data for
training (i.e., the ratio between the training set and
the test set was 90:10). To compare the performance
of three different methods, this paper presents only
results when 10 % of the experiment was used for the
test. The used test set consisted of 10 % of the data
from the end of the experiment. The whole experi-
ment lasted for 70 hours. In simulations, there were
4201 patters in total and 3781 patterns were used for
training. Overview of all regarded observations and
targets shows Table 2. The pressure on the outlet
is the relative pressure measured on the output pipe
from the gasifier. This pressure can also be nega-
tive when the power of the exhaust fan is increased.
The behaviour of the measured data from the UCG
experiment is shown in Figure 5 and Figure 6.

Due to the fact that the highest temperature along
the gasification channel is considered to be the temper-
ature in the oxidizing zone weakly correlates with the
operating variables (i.e., flows of gasification agents
and pressure) and it has some inertia, the decision to
ensure its estimation from the composition of syngas
measured on the outlet was made. Since the composi-
tion of the syngas depends on the temperature in the
oxidizing zone, there is an inverse way to determine
the temperature that corresponds to the measured
concentrations. This decision was also supported by
the existence of a large number of uncertainties that
occur in the UCG. The propagation of the tempera-
ture in the underground is not uniform, i.e., there are
different temperatures in the coal, along the gasifica-
tion channel, in the underburden and overburden. In
addition, there is a continual shift of the combustion
front. Due to the changing conditions in the under-
ground gasifier (e.g., groundwater, cracks, fractures,
gas leaks and shift of combustion front and surface
subsidence) it is the process control in conditions of

330


vol. 59 no. 4/2019 A comparative study of data-driven modeling methods. . .

�
�
�
�
�
�
�
�
�
�
�
	

�
�
�
�
�
�
�
�
�

�
�
�
�
�

�
�
�

�

�

�
�
�
�
�
�
�
�
�
�
�
�
�
�

�
�
�
�
�
�
�
�
�
�
�
	
�

�

�
�
�
�
�
�
�
�
�
�
�
	

�
�
�
�
	

�
�
�
�
�
�
�

�
�
�
�
�
�
�
�
	
�
�
�
�

�
�
�
	

�
�
�
�
�

�
�
�
�
�
�

	
�
�
�
�
�

�

�
�
�
�
�
�
�
�
	
�
�
�
�

�
�
�
	

�
�
�
�
�

�
�
�
�
�
�
�
	
�
�
�
�
�

�
�
�
�
�
�
	
�
�
�
�
�
�


�
�
�
�
�
�
�
�
�
�


�
�
�
�
�
�
�
�
�

�
�
�
�
�
�
�
�
�
�
�
�
	
�
�
�
�
�
�
�
�
	

�
�
�
�
�
�


�
�
�
	
�
�
�
�
	
�
�
�
�
�
�
�
�
�
�
�
�

�
�
�

�

�

�
�
�
�
�
�
�
�
�
	

�
�

�


�
�
�


�
�
�
�
�
�
�

� �

�
�

�

� �

�
�
�
�
�
�
�
�
	

�
�
�
�
�

�
�
�
�
�
�
�
�
	

�
�
�
�
�

�
�
�
�
	

� �

���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
��

���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
��

���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
��

���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
���
��

�
�


�
�
�
�
�
�
�
�
	

�
�
�
�
�

�
�
�
�
�
�
�
�
�
�
�
	

�
�
�
	
�
�


�
	
�
�
�

�

�
�
�


�
�
�
�
�
�
�
�
�
�
�
	

�
�
�
�
�
�
�
�
�
�
�
�
	
�
�
	

�
�
�
	

�
�
�
�
	
�
�
�
�
�
	

�
�

�

�
�
�
�
�
�
�
	

�
�
�
�
�

�

�
�

�

�
�

�

�
�
�
�
�
�
�
�
	

�
�
�
�
�

�
�
�
�

�


�
�
�
�
	
�
�

�
�
�
�
�
�
�
�
�

�

�
�

�

�

�
�
�
�

�

�
�
�

�

�

�
�
�
�
�
�
�
�
�
�
�
�
	
�
�
�
�
�
�
�
�
	

�
�
�
�
�
�


�
�
�
	
�
�
�
�
	

�
�
�
�
�
�
�
�
�
�
�
�

�
�
�
�
�

�
�
�
�
�

�
�
�
 
�
�
�
�
�
�
	

!
�
�
 
�
�


�
	
�
�
�

Figure 3. Scheme of gasification equipment with one ex-situ reactor (modified after [41]).

331


J. Kačur, M. Durdán, M. Laciak, P. Flegner Acta Polytechnica

Figure 4. The principle of soft-sensing in UCG.

332


vol. 59 no. 4/2019 A comparative study of data-driven modeling methods. . .

Figure 5. Time series of measured syngas composition divided into training and measured data set.

Figure 6. Time series of measured control variables divided into training and measured data set.

333


J. Kačur, M. Durdán, M. Laciak, P. Flegner Acta Polytechnica

Target
(output process
variable y)

Observation
(input process
variable x)

Calorific value
of syngas (MJ/Nm3) x1 - Air flow (Nm

3/h)

x2 - Oxygen flow (Nm3/h)
x3 - Pressure on outlet
(i.e., underpressure/overpressure) (Pa)

Underground
temperature (◦C) x1 - Concentration of CO in syngas (%)

x2 - Concentration of CO2 in syngas (%)
x3 - Concentration of H2 in syngas (%)
x4 - Concentration of CH4 in syngas (%)
x5 - Concentration of O2 in syngas (%)

Table 2. Observations and targets used in modeling.

uncertainty. In such conditions, the process of mea-
suring the process variables, identification, and finally
automated control is more complicated. Partly, these
uncertainties can be reduced by the more detailed
geological survey. But even this does not guarantee
the elimination of such uncertainties, as evidenced by
long-term experience in the traditional coal mining
technology.

Predictive methods were evaluated using statistical
approaches and calculating the performance index.
The following indicators were used to compare the
performance of the individual prediction methods.
Variable y represents the measured target, and Y
represents its prediction. y is the average of target
values yi and Y is the average of predicted values Yi
(i = 1, ...,N). N represents the number of patterns in
the training or testing set.
• Coefficient of correlation (ryY ) - The coefficient ex-

presses the force of the linear relationship between
two variables. Determines the degree of dependence
of two variables and acquires the value from the
interval (-1; 1). Its definition is based on the con-
sideration of the sum of deviations of individual
values of two correlated characters and their aver-
ages. Several equations are used to calculate the
correlation coefficient, but the following are used in
this work:

ryY =
∑N
i=1(Yi −Y )(yi −y)√∑N

i=1 (Yi −Y )2
√∑N

i=1 (yi −y)2
(20)

If ryY = 1, the dependence is completely direct;
ryY = −1 i.e., the correlation is completely indi-
rect; if ryY = 0, between the variables is indepen-
dence. More precisely: ryY < 0.3 - low tightness;
0.3 ≤ ryY < 0.5 - slight tightness; 0.5 ≤ ryY < 0.7 -
significant tightness; 0.7 ≤ ryY < 0.9 - high tight-
ness; 0.9 ≤ ryY - very high tightness.

• Coefficient of determination (r2yY ) - It expresses the
degree of the causal dependence of two variables. It

is a statistic that will give some information about
the goodness of the fit of a model. The correla-
tion coefficient is the square root of the determi-
nation coefficient. Degrees of tightness depending
on the coefficient of determination are as follows:
r2yY < 0.1 - low tightness; 0.1 ≤ r

2
yY < 0.25 - slight

tightness; 0.25 ≤ r2yY < 0.50 - significant tightness;
0.5 ≤ r2yY < 0.80 - high tightness; 0.8 ≤ r

2
yY - very

high tightness. r2yY = 1 indicates that the model
perfectly fits the measured target data.

r2yY = 1 −
1
N

∑N
i=1 (Yi −Y )

2

1
N

∑N
i=1 (yi −y)2

(21)

• Relative root mean squared error (RRMSE) - This
error can be calculated by dividing root mean square
error RMSE by the average of actual values yi.
RMSE represents the square root of mean square
error (MSE) calculated as follows:

RMSE =
√

MSE =

√√√√ 1
N

n∑
i=1

(Yi −yi)2 (22)

The MSE is a useful statistic measure for assess-
ing the accuracy of the prediction. The RRMSE
can be calculated by the following equation [34, 47]:

RRMSE =
RMSE

1
N

∑N
i=1 yi

× 100 =√
1
N

∑N
i=1 (Yi −yi)2

1
N

∑N
i=1 yi

× 100 (%)
(23)

• Mean absolute percentage error (MAPE) - This
statistic indicator expresses a percentage prediction
error. It can be calculated as follows:

MAPE =
1
N

N∑
i=1

|yi −Yi|
|yi|

× 100 (%) (24)

This error has certain disadvantages. At zero
values of yi, a division by zero can occur, and MAPE

334


vol. 59 no. 4/2019 A comparative study of data-driven modeling methods. . .

produces undefined values. In very low values of yi,
MAPE can exceed 100 % extremely. When actual
values yi are very high (i.e., above Yi), MAPE will
not exceed 100 %.

• Performance index (PI) - indicates the overall per-
formance of the given prediction method. The PI
value ranges is from 0 to +∞. The smaller values
of PI indicate a better performance. The PI is
calculated as follows [47, 48]:

PI =
RRMSE
ryY + 1

(25)

The above statistical indicators were calculated in-
dividually for training and testing data. The time
needed to train the predictive model was also mea-
sured.

4.1. Prediction by the
Back-Propagation NN

The back-propagation algorithm (or gradient algo-
rithm) is based on an “error-correction” learning rule,
i.e., learning by the network error. This algorithm
performs the steepest descent procedure. The learning
consists of two pass-overs across different layers of the
network, i.e., the forward pass (i.e., forward activation
flow of outputs) and backward pass (i.e., the backward
error propagation of weight adjustments).

Before the training, it is appropriate to standardize
all inputs of the NN for the effective scaling of the
weights.

The learning takes place in cycles (i.e., epochs),
always with new input patterns. It is based on the
gradual minimizing weights so that the error E (5)
is reduced. This error is minimized iteratively in
different epochs so that the required accuracy ε was
achieved. After learning, it is desirable that the output
from the NN network is equal to the required output
or to get it as close as possible, considering all input
patterns from the set of Atrain.
The ability of the NN to determine the output for

inputs outside the Atrain set is called the generaliza-
tion. This is also the main role of the NN in the
prediction. Weights are modified using the set Atrain
and, using Atest, the generalization error is detected.
Five input variables i.e., concentration of O2 (x1),

CO2 (x2), CO (x3), H2 (x4) and CH4 (x5) (measured
in vol. %) have been regarded for the initialization of
input neurons and the prediction of the underground
temperature in the UCG. Two stationary analysers
that measured concentrations of only these five gases
have been used during the UCG experiment. These
concentrations have been considered to be the most
significant.

In the prediction of the syngas calorific value, three
input process-relevant variables were used i.e., injected
air flow (x1), flow of supplementary oxygen (Nm3/h)
(x2) and pressure on outlet (Pa) (x3). These variables
are adjustable by the automatized control system.

The general scheme of the NN considered from
the UCG data prediction is shown in Figure 7. The
methods of Batch Gradient Descent with Momentum
and Gradient Descent with Variable Learning Rate
have been applied. In this approach, weights and
biases are updated according to the gradient descent
momentum and an adaptive learning rate.
It is the most widely used way to realize this min-

imization of error (5)) within gradient optimization
methods, in which weighting and threshold factors
are recurrently updated according to following equa-
tions [27]:

w
(k+1)
ij = w

(k)
ij −λ

∂E
∂wij

+ µ∆w(k)ij
ϑ

(k+1)
j = ϑ

(k)
j −λ

∂E
∂ϑj

+ µ∆ϑ(k)j
(26)

where parameter λ > 0 represents the learning rate
and must be small enough to ensure the monotone
convergence of the optimization algorithm and, at the
same time, large enough to provide a sufficiently high
convergence rate.

Calculating the partial derivations ∂E
∂wij

and ∂E
∂ϑj

for
the entire NN, running recurrently from the highest
to the lowest layer, i.e., against the direction of dis-
semination of information in the NN, runs from the
lowest to the highest layer.

Initial values of the threshold and weighting coeffi-
cients ϑ(0)j and w

(0)
ij are randomly generated from a

small center-to-zero interval e.g., from an open interval
(-1, 1). The last member µ in (26) represents the so-
called moment member that is determined by the dif-
ference of the coefficients from the last two iterations,
∆w(k)ij = w

(k)
ij − w

(k−1)
ij and ∆ϑ

(k)
j = ϑ

(k)
j − ϑ

(k−1)
j .

The momentum is important for the “skip” of the
local minima in the initial optimization phase. The
value of the parameter µ is usually chosen from the
interval 0.5 ≤ µ ≤ 0.7.

The adaptive learning rate tries to maintain a stable
learning and a largest size of the learning step. The
mean squared error (MSE) was used as the function
for the error calculating during the training.

The number of hidden layers and neurons is usually
determined on the basis of experimentation, where
an NN model is selected for which Etest is minimal.
However, a small number of hidden layers in the NN
may not well model non-linearities in the data. It is,
therefore, necessary to look for an optimal number of
hidden neurons.
Two variants of the NN, i.e., with one and two

hidden layers, were used, but the number of neurons
was estimated in previous experimentation. It has
also been tried to set the number of neurons in the
hidden layer to 2m + 1, where m is the number of
input neurons. In all variants that were tried only one
neuron in the output layer was used. The momentum
constant was set-up to 0.9. Within the results, the
goal was also to show what impact has the number
of hidden neurons on the quality of prediction. The
results of training and testing are shown in Table 3.

335


J. Kačur, M. Durdán, M. Laciak, P. Flegner Acta Polytechnica

P
re
di
ct
ed

va
ri
ab

le
H
id
de

n
la
ye
rs

N
um

be
r

of
ne

ur
on

s
(L

1:
L2

)

O
bs
er
va
ti
on

s
(i
np

ut
s)

T
ra
in
in
g

T
es
ti
ng

r
y
Y

r
2 yY

R
R
M
SE

(%
)

P
I

M
A
P
E

(%
)

M
SE

R
M
SE

T
im

e
(s
)

r
y
Y

2 ry
Y

R
R
M
SE

(%
)

P
I

M
A
P
E

(%
)

M
SE

R
M
SE

T
em

pe
ra
tu
re

2
50
00
:1
5

C
O
,C

O
2

0.
27
07

0.
07
33

8.
00
62

6.
30
08

6.
41
08

64
84
.2
53
7

80
.5
24
9

44
.8
31
2

0.
19
85

0.
03
94

8.
68
18

7.
24
40

7.
27
07

69
38
.2
15
3

83
.2
96
0

2
50
00
:1
5

C
O
,C

O
2
,H

2
,C

H
4
,O

2
0.
05
67

0.
00
32

11
.7
86
9

11
.1
54
2

9.
12
23

14
05
4.
28
43

11
8.
55
08

50
.8
07
7

0.
06
11

0.
00
37

18
.2
23
7

17
.1
74
9

10
.8
64
0

30
57
0.
46
82

17
4.
84
41

2
80
0:
8

C
O
,C

O
2

0.
20
43

0.
04
17

8.
06
49

6.
69
67

6.
64
52

65
79
.6
87
8

81
.1
15
3

6.
85
60

0.
67
55

0.
45
63

8.
50
75

5.
07
75

7.
45
26

66
62
.3
75
9

81
.6
23
4

2
80
0:
8

C
O
,C

O
2
,H

2
,C

H
4
,O

2
0.
18
20

0.
03
31

8.
84
28

7.
48
10

6.
95
26

79
10
.2
62
9

88
.9
39
7

7.
19
21

0.
08
10

0.
00
66

9.
98
93

9.
24
07

6.
80
42

91
85
.3
46
2

95
.8
40
2

1
50
00

C
O
,C

O
2

0.
05
87

0.
00
34

69
.5
63
5

65
.7
07
8

53
.1
50
0

48
95
23
.6
38
2

69
9.
65
97

38
.4
06
1

0.
10
21

0.
01
04

95
.0
73
8

10
5.
88
81

73
.4
97
4

83
20
48
.0
25
2

91
2.
16
67

1
50
00

C
O
,C

O
2
,H

2
,C

H
4
,O

2
0.
18
65

0.
03
48

42
.8
46
2

36
.1
10
5

31
.0
09
9

18
57
09
.9
32
8

43
0.
94
08

43
.1
59
9

0.
16
11

0.
02
60

70
.0
98
2

83
.5
61
4

48
.0
39
3

45
23
13
.8
23
3

67
2.
54
28

1
80
0

C
O
,C

O
2

0.
04
28

0.
00
18

23
.6
40
0

22
.6
70
0

17
.8
90
8

56
53
3.
41
72

23
7.
76
76

1.
74
51

0.
16
23

0.
02
63

48
.7
51
4

41
.9
44
6

35
.9
24
2

21
87
76
.5
47
0

46
7.
73
56

1
80
0

C
O
,C

O
2
,H

2
,C

H
4
,O

2
0.
07
78

0.
00
61

16
.0
17
4

14
.8
60
8

11
.9
06
1

25
95
3.
34
24

16
1.
10
04

1.
84
75

0.
63
81

0.
40
72

28
.3
44
7

17
.3
03
2

22
.0
81
8

73
95
5.
58
71

27
1.
94
78

1
5

C
O
,C

O
2

0.
14
95

0.
02
23

8.
10
83

7.
05
38

6.
74
66

66
50
.7
12
0

81
.5
51
9

0.
67
02

0.
33
73

0.
11
38

8.
42
45

6.
29
96

7.
16
68

65
33
.0
96
2

80
.8
27
6

1
11

C
O
,C

O
2
,H

2
,C

H
4
,O

2
0.
42
10

0.
17
72

7.
45
52

5.
24
66

6.
14
09

56
22
.4
66
2

74
.9
83
1

0.
74
61

0.
67
87

0.
46
06

7.
43
01

4.
42
61

5.
81
29

50
81
.7
73
2

71
.2
86
6

C
al
or
ifi
c
va
lu
e

2
50
00
:1
5

A
ir
,O

2
0.
55
88

0.
31
22

33
.3
75
0

21
.4
10
8

37
.6
19
5

9.
64
41

3.
10
55

55
.2
75
3

0.
03
66

0.
00
13

30
.4
62
6

29
.3
86
1

30
.6
66
0

12
.4
35
3

3.
52
64

2
50
00
:1
5

A
ir
,O

2
,O

ut
le
t
pr
es
su
re

0.
13
11

0.
01
72

11
9.
94
83

10
6.
04
31

12
6.
19
49

12
4.
56
76

11
.1
61
0

73
.1
43
2

0.
02
70

0.
00
07

86
.6
73
7

84
.3
98
8

70
.8
20
8

10
0.
66
93

10
.0
33
4

2
80
0:
8

A
ir
,O

2
0.
73
83

0.
54
51

22
.6
63
8

13
.0
37
8

28
.3
85
9

4.
44
72

2.
10
88

7.
10
23

0.
09
99

0.
01
00

24
.8
01
0

22
.5
48
6

22
.6
78
8

8.
24
25

2.
87
10

2
80
0:
8

A
ir
,O

2
,O

ut
le
t
pr
es
su
re

0.
73
92

0.
54
64

22
.6
56
9

13
.0
27
1

28
.0
10
8

4.
44
44

2.
10
82

7.
58
02

0.
69
06

0.
47
69

21
.5
02
7

12
.7
19
0

19
.5
45
6

4.
58
77

2.
14
19

1
50
00

A
ir
,O

2
0.
03
98

0.
00
16

15
9.
12
70

15
3.
03
06

18
3.
30
15

21
9.
23
21

14
.8
06
5

44
.2
14
6

0.
14
74

0.
02
17

17
3.
39
74

15
1.
12
07

14
8.
02
62

40
2.
90
98

20
.0
72
6

1
50
00

A
ir
,O

2
,O

ut
le
t
pr
es
su
re

0.
23
39

0.
05
47

13
3.
33
27

10
8.
06
00

13
5.
64
04

15
3.
91
82

12
.4
06
4

44
.6
76
0

0.
49
76

0.
24
76

13
9.
36
64

93
.0
57
1

11
3.
27
22

26
0.
27
89

16
.1
33
2

1
80
0

A
ir
,O

2
0.
41
06

0.
16
86

55
.1
18
8

39
.0
76
1

61
.2
30
8

26
.3
03
6

5.
12
87

7.
73
48

0.
18
19

0.
03
31

87
.2
72
9

73
.8
42
9

75
.1
44
2

10
2.
06
61

10
.1
02
8

1
80
0

A
ir
,O

2
,O

ut
le
t
pr
es
su
re

0.
45
90

0.
21
07

51
.0
45
8

34
.9
87
4

51
.7
01
3

22
.5
59
9

4.
74
97

7.
94
96

0.
66
93

0.
44
79

57
.1
71
8

34
.2
49
6

47
.0
39
3

43
.8
01
3

6.
61
83

1
5

A
ir
,O

2
0.
67
14

0.
45
07

24
.9
07
6

14
.9
02
5

32
.9
43
5

5.
37
13

2.
31
76

0.
96
49

0.
04
01

0.
00
16

19
.8
08
2

19
.0
44
5

22
.9
76
6

5.
25
79

2.
29
30

1
7

A
ir
,O

2
,O

ut
le
t
pr
es
su
re

0.
72
81

0.
53
01

23
.0
70
7

13
.3
50
3

28
.6
97
4

4.
60
83

2.
14
67

1.
03
06

0.
71
87

0.
51
66

15
.8
21
9

9.
20
55

16
.8
80
4

3.
35
46

1.
83
16

Table 3. Results of simulations with NNs where 10 % of the experiment was used to test.

336


vol. 59 no. 4/2019 A comparative study of data-driven modeling methods. . .

Figure 7. Proposal of Neural network considered for UCG data prediction.

When training and testing the prediction of under-
ground temperature, it can be seen that the lowest val-
ues of statistic indicators (i.e., RRMSE, MSE, MAPE,
and RMSE) were obtained in the case of one hidden
layer with 11 neurons (i.e., 2m+ 1, according to a gen-
eral recommendation, where m is the number of input
neurons). In this case, the RRMSE and performance
index had the lowest values (PI = 4.42 for testing and
PI = 5.24 for training). The value of the coefficient
of determination was the highest in this case (r2yY =
0.46 for testing and r2yY = 0.17 for training). This
result was obtained when five input observations were
used for training the NN model and for the testing
of prediction. This means that this variant predicts
the target the best. The second interesting result was
obtained in the case of using an NN with two hidden
layers and the structure of 800:8 neurons. It was the
case when the two observation inputs were used. The
worst results in the prediction of the underground
temperature were achieved in the case of one hidden
layer with 5000 neurons. When training and testing
the NN for the calorific value, it can be seen that the
lowest values of statistic indicators were also obtained
in the case of one hidden layer with 7 neurons (i.e.,
2m + 1). In this case, the RRMSE and performance
index was the lowest (e.g., PI = 9.20 for testing). This
result was obtained when three input variables were
used for training the NN model and for the testing of
prediction. The value of the coefficient of determina-
tion was the highest in this case (i.e., r2yY = 0.51 for
testing). Similarly, as in the case of the calorific value
prediction, the second interesting result was obtained
in the case of using an NN with two hidden layers and
the structure of 800:8 neurons. This is the case where
three observation inputs were used. The worst results
in the prediction of the calorific value were achieved

in the case of one hidden layer with 5000 neurons (see
Table 3).

It can be stated that the use of the NN model for
the temperature prediction has achieved better results
in terms of the performance index than in the case of
the calorific value. The best prediction of the calorific
value and underground temperature by NN, where 10
% of the experiment was used for the test is shown
in Figure 8 and Figure 9. The black vertical line in
figures divides the prediction into training and testing.

4.2. Prediction by the MARS
The algorithm of a regression model creation with
MARS runs in two phases as it was analysed in detail
in Section 2.2. In the forward phase, the algorithm
begins with a model that has only an intercept term.
Then, in the cycle, reflected pairs of BFs are added so
that the training error is reduced as much as possible.
This is done until, for example, the maximum number
of BFs is not reached. In the backward phase, the
model is simplified by deleting one of the least impor-
tant BFs, which also reduces the training error. Then,
more “best” models of different sizes are obtained. At
the end of this phase, only one model with the lowest
GCV is selected from these best models (excluding
models larger than maximal final BFs).

Several different variants of simulations with MARS
have been performed. The maximum number of BFs
included in the model in the forward phase has been
experimentally changed.
The initial number of BFs in the forward

phase was determined according to the formula:
min(200, max(20, 2d))+1, where d represents the num-
ber of input variables.
The initial number of BFs was set to 21 in all sim-

ulations. In modelling, we have considered a max-
imal interactivity between input variables without

337


J. Kačur, M. Durdán, M. Laciak, P. Flegner Acta Polytechnica

Figure 8. Measured and predicted calorific value of syngas by NN, where three inputs were used in the test.

Figure 9. Measured and predicted underground temperature by NN, where five inputs were used in the test.

338


vol. 59 no. 4/2019 A comparative study of data-driven modeling methods. . .

self-interactions. Since there were not smoother data,
the piecewise-cubic and piecewise-linear type of mod-
eling have been analysed in order to know the predic-
tion performance. In default, all MARS models are
created as piecewise-linear and are transformed into
piecewise-cubic after the backward phase.
The best or optimal number of maximal BFs in

the final MARS model was estimated by the GCV
criterion and by the 10-fold Cross-Validation. In an
all Cross-Validation iteration, a new MARS model
is created and reduced using the GCV in the in-fold
(training) data. In addition, there also is a calculated
MSE criterion on out-of-fold (test) data (MSEoof) in
the reducing phase for each model.

Figure 10 shows a comparison of the behavior of the
GCV and MSEoof criterion calculated for each new
model after the 10-fold iteration. In the simulation,
the model for predicting syngas calorific value was
considered. The figure shows two vertical dashed lines
at the minimum of the two solid lines. The figure
also shows the number of optimum BFs estimated
by the GCV (cyan) and Cross-Validation (magenta).
Ideally, these two lines would coincide. Similarly,
Figure 11 shows the determination of the optimal
number of basis functions for the MARS model for
the underground temperature prediction. Figures
show simulations of the piecewise-linear type of MARS
models with all inputs considered in a given target
variable. The results from the training of the model
and testing the prediction are shown in Table 4.
The original MARS approximation method uses

a cubic function to smooth the truncated piecewise-
linear functions. The cubic function has the following
general form:

C(x|s = +1, t−, t, t+) =


0, x ≤ t−
α+(x− t−)2 + β+(x− t−)3, t− < x < t+
x− t, x ≥ t+

(27)
where

α+ =
2t+ − 3t + t−

(t+ − t−)2
, β+ =

−t+ + 2t− t−
(t+ − t−)3

, (28)

and
C(x|s = −1, t−, t, t+) =


t−x, x ≤ t−
α−(x− t+)2 + β−(x− t+)3, t− < x < t+
0, x ≥ t+

(29)
where

α− =
−t+ + 3t− 2t−

(t− − t+)2
, β− =

t+ − 2t + t−
(t− − t−)3

. (30)

where t represents a univariate knot, which is se-
lected for each of the factor variables x.
The piecewise-linear type of MARS models better

fits the training data but, in the prediction on un-
trained data, better results with the piecewise-cubic
type of the model (see Table 4) are obtained.

Equation (31) represents a resulting piecewise-cubic
type of MARS model for the prediction of the under-
ground temperature. Basis functions in the equation
are calculated according to Table 5. This MARS
model takes into account five inputs, i.e., concentra-
tion of measured gasses: x1 - CO, x2 - CO2, x3 - H2,
x4 - CH4, x5 - O2. Similarly, equation (32) repre-
sents the piecewise-cubic type of MARS model for the
prediction of the calorific value of the syngas. Basis
functions in the equation are calculated according to
Table 6. The MARS model, in this case, has three
inputs x1 - air flow and x2 - oxygen flow and x3 - con-
trolled pressure on outlet. In these models, the best
results for the prediction were obtained in terms of all
statistical indicators. The lowest performance index
PI = 5.66 was obtained in the testing of the prediction
of the underground temperature on untrained data
with the piecewise-cubic type of the MARS model.
When testing the prediction of the calorific value on
untrained data with piecewise-cubic type of MARS
model, the performance index PI = 12.1382 was ob-
tained.

Temperature (◦C) = 855.36
−2108.8 × BF1
+8.7366 × BF2 + 14.266 × BF3
+20.014 × BF4 − 13.465 × BF5
−18.233 × BF6 + 1.2743 × BF7
−3.764 × BF8 − 1.3594 × BF9
−6.7334 × BF10 + 1.1376 × BF11
+0.61503 × BF12 + 0.79404 × BF13
−1.766 × BF14 + 0.20451 × BF15
−1.8976 × BF16 − 5.8421 × BF17
+0.72146 × BF18 + 0.86513 × BF19

(31)

Calorificvalue (MJ/Nm3) = 13.528
−0.0044217 × BF1 − 0.13132 × BF2
−0.14469 × BF3 − 1.1482 × BF4
+0.015322 × BF5 − 0.0034767 × BF6
−0.019788 × BF7 + 0.11264 × BF8
+0.55397 × BF9 − 0.21132 × BF10
−0.84369 × BF11 + 0.032498 × BF12
−0.16806 × BF13 + 0.12228 × BF14
+0.084482 × BF15 + 0.00096036 × BF16
+0.00075854 × BF17

(32)

Due to the fact that the piecewise-linear type of
the model gives, during the training, better results in
terms of all indicators, these variants of the model are
also presented. These are models with all considered
inputs for the prediction of the given target.

Piecewise-linear type of MARS model usage max(0,
x−t) function, where t is the knot. The max() function
represents positive part of (0,x − t) which can be
formally expressed as the following:

max(0,x− t) =
{
x− t, if x ≥ t
0, otherwise (33)

339


J. Kačur, M. Durdán, M. Laciak, P. Flegner Acta Polytechnica

Figure 10. Example of the estimation of the “best” number of BFs of the calorific value MARS model with three
inputs by GCV and 10-fold Cross-Validation (i.e., MSEoof).

Figure 11. Example of estimation of the “best” number of BFs in MARS model of underground temperature with
five input by GCV and 10-fold Cross-Validation (i.e., MSEoof).

340


vol. 59 no. 4/2019 A comparative study of data-driven modeling methods. . .

P
re
di
ct
ed

va
ri
ab

le
T
yp

e
of

M
A
R
S
m
od

el

N
um

be
r
of

B
Fs

in
fin

al
m
od

el
in
cl
ud

in
g
B
F
0

O
bs
er
va
ti
on

s
(i
np

ut
s)

T
ra
in
in
g

T
es
ti
ng

r
y
Y

r
2 yY

R
R
M
SE

(%
)

P
I

M
A
P
E

(%
)

M
SE

R
M
SE

T
im

e
(s
)

r
y
Y

2 ry
Y

R
R
M
SE

(%
)

P
I

M
A
P
E

(%
)

M
SE

R
M
SE

T
em

pe
ra
tu
re

pi
ec
ew

is
e-
cu

bi
c

16
C
O
,C

O
2

0.
47
07

0.
22
16

7.
23
42

4.
91
89

5.
74
32

52
94
.1
48
1

72
.7
60
9

10
.5
61
9

0.
23
12

0.
05
34

10
.5
86
5

8.
59
86

8.
87
79

10
31
6.
44
17

10
1.
56
99

20
C
O
,C

O
2
,H

2
,C

H
4
,O

2
0.
58
59

0.
34
33

6.
64
45

4.
18
97

5.
20
14

44
66
.1
54
4

66
.8
29
3

17
.8
53
8

0.
60
31

0.
36
37

6.
42
75

4.
00
94

5.
16
23

38
02
.8
74
7

61
.6
67
5

pi
ec
ew

is
e-
lin

ea
r

16
C
O
,C

O
2

0.
49
85

0.
24
85

7.
10
80

4.
74
34

5.
62
62

51
10
.9
55
4

71
.4
90
9

12
.1
52
5

0.
32
82

0.
10
77

10
.6
82
4

8.
04
27

9.
16
29

10
50
4.
30
10

10
2.
49
05

20
C
O
,C

O
2
,H

2
,C

H
4
,O

2
0.
66
66

0.
44
44

6.
11
19

3.
66
73

4.
72
85

37
78
.9
20
0

61
.4
72
9

17
.8
48
2

0.
57
88

0.
33
50

8.
94
62

5.
66
65

7.
52
83

73
67
.2
26
9

85
.8
32
6

C
al
or
ifi
c
va
lu
e

pi
ec
ew

is
e-
cu

bi
c

15
A
ir
,O

2
0.
78
63

0.
61
83

20
.7
58
2

11
.6
20
8

24
.0
94
6

3.
73
07

1.
93
15

8.
05
20

0.
04
19

0.
00
18

83
.9
17
9

80
.5
43
1

32
.5
39
8

94
.3
69
5

9.
71
44

18
A
ir
,O

2
,O

ut
le
t
pr
es
su
re

0.
86
61

0.
75
01

16
.7
96
4

9.
00
10

17
.3
05
7

2.
44
26

1.
56
29

13
.5
41
4

0.
43
86

0.
19
24

17
.4
62
0

12
.1
38
2

17
.8
49
3

4.
08
61

2.
02
14

pi
ec
ew

is
e-
lin

ea
r

16
A
ir
,O

2
0.
79
91

0.
63
86

20
.1
96
5

11
.2
25
6

23
.0
08
4

3.
53
16

1.
87
92

9.
16
35

0.
04
19

0.
00
18

80
.7
36
5

77
.4
89
7

30
.3
19
3

87
.3
50
0

9.
34
61

17
A
ir
,O

2
,O

ut
le
t
pr
es
su
re

0.
87
30

0.
76
21

16
.3
86
6

8.
74
89

16
.5
88
3

2.
32
49

1.
52
47

13
.4
41
0

0.
42
38

0.
17
96

17
.8
94
5

12
.5
68
1

18
.3
34
1

4.
29
10

2.
07
15

Table 4. Results of simulations with MARS models where 10 % of the experiment was used to test.

341


J. Kačur, M. Durdán, M. Laciak, P. Flegner Acta Polytechnica

BF Equation BF Equation
BF1 C(x4 | +1, 20.571, 20.579, 20.607) BF11 C(x1 | +1, 3.6356, 6.3881, 7.6506) × C(x2 | -1, 19.073, 26.51, 28.665)
BF2 C(x4 | -1, 20.571, 20.579, 20.607) BF12 BF8 × C(x1 | +1, 9.2894, 9.6658, 14.265)
BF3 C(x2 | +1, 7.967, 11.636, 19.073) BF13 BF8 × C(x1 | -1, 9.2894, 9.6658, 14.265)
BF4 C(x2 | -1, 7.967, 11.636, 19.073) BF14 BF9 × C(x1 | +1, 7.6506, 8.9131, 9.2894)
BF5 C(x1 | -1, 3.6356, 6.3881, 7.6506) BF15 BF9 × C(x1 | -1, 7.6506, 8.9131, 9.2894)
BF6 C(x3 | +1, 15.747, 26.344, 30.018) BF16 BF5 × C(x3 | +1, 2.5749, 5.1498, 15.747)
BF7 C(x3 | -1, 15.747, 26.344, 30.018) BF17 BF5 × C(x3 | -1, 2.5749, 5.1498, 15.747)
BF8 BF3 × C(x4 | +1, 11.785, 20.563, 20.571) BF18 BF14 × C(x5 | +1, 2.1322, 3.9609, 12.268)
BF9 BF3 × C(x4 | -1, 11.785, 20.563, 20.571) BF19 BF14 × C(x5 | -1, 2.1322, 3.9609, 12.268)

BF10 C(x1 | +1, 3.6356, 6.3881, 7.6506) ×
C(x2 | +1, 19.073, 26.51, 28.665)

Table 5. Basis functions of piecewise-cubic type of MARS model of underground temperature (five inputs).

BF Equation BF Equation
BF1 C(x1 | +1, 5.866, 11.732, 17.503) × C(x3 | +1, 8.357, 16.442, 32.428) BF10 C(x1 | -1, 5.866, 11.732, 17.503) × C(x2 | +1, 0.239, 0.478, 2.505)
BF2 C(x1 | +1, 5.866, 11.732, 17.503) × C(x3 | -1, 8.357, 16.442, 32.428) BF11 C(x1 | -1, 5.866, 11.732, 17.503) × C(x2 | -1, 0.239, 0.478, 2.505)
BF3 C(x2 | +1, 7.763, 9.83, 25.869) BF12 C(x1 | +1, 17.503, 23.274, 31.139) × C(x2 | +1, 2.505, 4.532, 5.114)
BF4 C(x2 | -1, 7.763, 9.83, 25.869) BF13 C(x1 | +1, 17.503, 23.274, 31.139) × C(x2 | -1, 2.505, 4.532, 5.114)
BF5 BF4 × C(x3 | +1, 32.428, 48.414, 74.478) BF14 BF11 × C(x3 | +1, -0.155, 0.272, 8.357)
BF6 BF4 × C(x3 | -1, 32.428, 48.414, 74.478) BF15 BF11 × C(x3 | -1, -0.155, 0.272, 8.357)
BF7 C(x1 | +1, 5.866, 11.732, 17.503) × C(x2 | +1, 5.114, 5.696, 7.763) BF16 BF5 × C(x1 | +1, 31.139, 39.004, 48.437)
BF8 C(x1 | +1, 5.866, 11.732, 17.503) × C(x2 | -1, 5.114, 5.696, 7.763) BF17 BF5 × C(x1 | -1, 31.139, 39.004, 48.437)
BF9 C(x1 | -1, 17.503, 23.274, 31.139)

Table 6. Basis functions of piecewise-cubic type of MARS model of syngas calorific value (three inputs).

Temperature (◦C) = 851.31
+3036.4 × BF1 + 11.023 × BF2
+21.043 × BF3 + 7.8072 × BF4
−7.1445 × BF5 − 28.58 × BF6
+1.264 × BF7 − 483.77 × BF8
−2.2149 × BF9 − 6.5355 × BF10
+0.25511 × BF11 + 33.098 × BF12
+57.057 × BF13 − 1.0868 × BF14
+0.2452 × BF15 − 4.2728 × BF16
−8.3842 × BF17 + 0.56517 × BF18
+0.74386 × BF19

(34)

Calorificvalue (MJ/Nm3) = 10.969
−0.003224 × BF1 − 0.13121 × BF2
−0.63223 × BF3 − 0.51773 × BF4
+0.008499 × BF5 − 0.0075648 × BF6
+0.04656 × BF7 + 0.050259 × BF8
+0.43584 × BF9 − 0.13449 × BF10
−1.2256 × BF11 − 0.043978 × BF12
−0.08991 × BF13 + 0.22099 × BF14
+1.4463 × BF15 + 0.00078717 × BF16
+0.0011077 × BF17

(35)

The piecewise-linear type of MARS model for the
underground temperature prediction represents equa-
tion (34). Corresponding basis functions are shown in
Table 7. The performance index obtained during the
training of the MARS model was PI = 3.66. In this
case, the coefficient of determination was the highest
(r2yY = 0.44). The piecewise-linear type of the MARS
model for the syngas calorific value prediction is rep-
resented by the equation (35). Corresponding basis
functions are shown in Table 8. The performance in-
dex obtained during the training of the MARS model

was PI = 8.74. In this case, the coefficient of determi-
nation was the highest (r2yY = 0.76).

Due to the comparison with other methods, this pa-
per presents the behaviour of the prediction with only
the piecewise-cubic type of MARS models, because
the best results on untrained data have been achieved
for them.
The best prediction of the calorific value and un-

derground temperature by the piecewise-cubic type
of the MARS model where 10 % of the experiment
was used for the test of the prediction is shown in
Figure 12 and Figure 13. The black vertical line di-
vides the prediction into training and testing part. It
can be said that better predictions with the MARS
model were achieved in the case of the underground
temperature.
The experimental results demonstrate that the

piecewise-cubic type of MARS model is better than
the piecewise-linear both in the temperature and the
calorific value prediction.

4.3. Prediction by the Support Vector
Regression

The SVR model has been trained on the predictor
data similarly as in the previous method. The predic-
tor data were mapped using three kernel functions,
and the SMO method for the objective-function mini-
mization was used. The training data table was used
where one row of the table represented one observa-
tion and individual columns were predictors x. The
table contains one additional column for the response
variable y. The standardized predictor matrix has
been used for the training. The standardization was
performed using corresponding weighted means of pre-
dictors and weighted standard deviations. Predictors

342


vol. 59 no. 4/2019 A comparative study of data-driven modeling methods. . .

BF Equation BF Equation
BF1 max(0, x4 -20.579) BF11 max(0, x1 -6.3881) × max(0,26.51 -x2)
BF2 max(0,20.579 -x4) BF12 BF8 × max(0, x1 -9.6658)
BF3 max(0, x2 -11.636) BF13 BF8 × max(0,9.6658 -x1)
BF4 max(0,11.636 -x2) BF14 BF9 × max(0, x1 -8.9131)
BF5 max(0,6.3881 -x1) BF15 BF9 × max(0,8.9131 -x1)
BF6 max(0, x3 -26.344) BF16 BF5 × max(0, x3 -5.1498)
BF7 max(0,26.344 -x3) BF17 BF5 × max(0,5.1498 -x3)
BF8 BF3 × max(0, x4 -20.563) BF18 BF14 × max(0, x5 -3.9609)
BF9 BF3 × max(0,20.563 -x4) BF19 BF14 × max(0,3.9609 -x5)
BF10 max(0, x1 -6.3881) × max(0, x2 -26.51)

Table 7. Basis functions of the piecewise-linear type of MARS model of underground temperature (five inputs).

BF Equation BF Equation
BF1 max(0, x1 -11.732) × max(0, x3 -16.442) BF10 max(0,11.732 -x1) × max(0, x2 -0.478)
BF2 max(0, x1 -11.732) × max(0,16.442 -x3) BF11 max(0,11.732 -x1) × max(0,0.478 -x2)
BF3 max(0, x2 -9.83) BF12 max(0, x1 -23.274) × max(0, x2 -4.532)
BF4 max(0,9.83 -x2) BF13 max(0, x1 -23.274) × max(0,4.532 -x2)
BF5 BF4 × max(0, x3 -48.414) BF14 BF11 × max(0, x3 -0.272)
BF6 BF4 × max(0,48.414 -x3) BF15 BF11 × max(0,0.272 -x3)
BF7 max(0, x1 -11.732) × max(0, x2 -5.696) BF16 BF5 × max(0, x1 -39.004)
BF8 max(0, x1 -11.732) × max(0,5.696 -x2) BF17 BF5 × max(0,39.004 -x1)
BF9 max(0, 23.274 -x1)

Table 8. Basis functions of the piecewise-linear type of MARS model of syngas calorific value (three inputs).

Figure 12. Measured and predicted calorific value of syngas by piecewise-cubic type of MARS model where 10 % of
the experiment and three inputs were used for the test.

343


J. Kačur, M. Durdán, M. Laciak, P. Flegner Acta Polytechnica

Figure 13. Measured and predicted underground temperature by piecewise-cubic type of MARS model where 10 %
of the experiment and five inputs were used for the test.

are less sensitive to the scale on which they are mea-
sured on when standardization is used. Similarly, a
table for the test of the model on untrained data was
prepared. As kernel functions, the Linear, Gaussian
and Polynomial kernels were used (see Table 1). The
final values of α were stored in the memory of the
computer.
Table 9 shows the results of applying a various

types of the kernel function in the SVR where 10 %
of the UCG experiment was used for the test. The
table shows that the best results were obtained in
the utilization of Gaussian kernel function, regard-
ing all input variables. This result was obtained for
the temperature prediction and also for the syngas
calorific value prediction. It is possible to see that
the temperature model with five input variables gives
the best tightness with the real measured data (see
parameter r2yY = 0.82 for training and r

2
yY = 0.47 for

test). Also, the performance index calculated for the
training and testing has reached the lowest value in
this case (see parameter PI = 1.80 for training and PI
= 4.67 for test). The best performance of prediction
is also indicated by other statistical parameters. Simi-
larly, the best result for the SVR model of the syngas
calorific value was also obtained when three inputs
and Gaussian kernel were used. It is possible to see
that the calorific value model with three inputs gives
the best tightness with the real measured data (see
parameter r2yY = 0.83 for training and r

2
yY = 0.35 for

test). Also, the performance index calculated during
the training and testing has reached the lowest value
in this case (see parameter PI = 7.20 for training and
PI = 8.21 for test). The best quality of prediction is

also indicated by other statistical parameters. The
worst results for prediction on untrained data were
obtained using the polynomial kernel, both in the
case of the temperature model and also the calorific
value model. Figure 14 and Figure 15 show the best
prediction of the calorific value and underground tem-
perature by the SVR on untrained data where 10 %
of the experiment was used for the test. The black
vertical line in figures divides the prediction into the
training and testing phase. Even with this method,
the results were better for the temperature prediction.

4.4. Overall Results
A ranking of the results from three evaluated methods
is presented in Table 10 and Table 11. These tables
show the comparison of the best results when two
variants of observations (i.e., input variables) of each
predicted target were used. In the training phase, the
model was verified on training data in order to predict
the target variable. The results from the training
phase of each method are shown in Table 10. It is
possible to see that the SVR model with Gaussian
kernel better fits the measured target data in the case
of modelling the temperature with five observations.
The SVR model also achieved a better performance
when using two input variables. The other interesting
results were obtained with the piecewise-linear type
of the MARS model, both in the case of two and five
observations. However, MARS models have consumed
more time than BPNN and SVR.

When fitting the calorific value, the SVR and Gaus-
sian kernel also reached the best performance. It was
the case with five inputs variables. In the case of

344


vol. 59 no. 4/2019 A comparative study of data-driven modeling methods. . .

P
re
di
ct
ed

va
ri
ab

le
K
er
ne

l
fu
nc

ti
on

O
bs
er
va
ti
on

s
(i
np

ut
s)

T
ra
in
in
g

T
es
ti
ng

r
y
Y

r
2 yY

R
R
M
SE

(%
)

P
I

M
A
P
E

(%
)

M
SE

R
M
SE

T
im

e
(s
)

r
y
Y

2 ry
Y

R
R
M
SE

(%
)

P
I

M
A
P
E

(%
)

M
SE

R
M
SE

T
em

pe
ra
tu
re

Li
ne

ar
C
O
,C

O
2

0.
27
94

0.
07
81

8.
26
94

6.
46
35

6.
83
99

69
17
.6
85
4

83
.1
72
6

0.
48
13

0.
36
50

0.
13
32

8.
45
38

6.
19
33

7.
21
03

65
78
.6
16
5

81
.1
08
7

C
O
,C

O
2
,H

2
,C

H
4
,O

2
0.
29
45

0.
08
67

7.
84
80

6.
06
25

6.
45
95

62
30
.6
60
5

78
.9
34
5

0.
48
21

0.
47
54

0.
22
60

8.
40
61

5.
69
75

6.
88
81

65
04
.5
92
5

80
.6
51
1

G
au

ss
ia
n

C
O
,C

O
2

0.
58
88

0.
34
66

6.
63
62

4.
17
70

4.
80
36

44
55
.0
43
8

66
.7
46
1

0.
66
05

0.
20
53

0.
04
21

12
.4
96
7

10
.3
68
3

10
.4
41
1

14
37
5.
34
49

11
9.
89
72

C
O
,C

O
2
,H

2
,C

H
4
,O

2
0.
90
96

0.
82
74

3.
44
14

1.
80
22

1.
97
18

11
98
.0
79
3

34
.6
13
3

0.
66
88

0.
69
13

0.
47
79

7.
90
51

4.
67
40

6.
96
61

57
52
.3
31
7

75
.8
44
1

P
ol
yn

om
ia
l

C
O
,C

O
2

0.
25
18

0.
06
34

8.
00
87

6.
39
77

6.
25
60

64
88
.3
47
4

80
.5
50
3

2.
16
46

0.
17
04

0.
02
90

12
.7
47
4

10
.8
91
4

10
.2
19
7

14
95
7.
76
85

12
2.
30
20

C
O
,C

O
2
,H

2
,C

H
4
,O

2
0.
55
31

0.
30
59

6.
92
84

4.
46
09

4.
82
13

48
55
.9
22
9

69
.6
84
5

3.
42
30

0.
07
72

0.
00
60

16
.7
33
5

15
.5
34
2

14
.1
21
5

25
77
4.
97
45

16
0.
54
59

C
al
or
ifi
c
va
lu
e

Li
ne

ar
A
ir
,O

2
0.
69
27

0.
47
98

24
.5
83
2

14
.5
23
1

31
.2
77
7

5.
23
23

2.
28
74

1.
52
94

0.
32
22

0.
10
38

19
.2
42
1

14
.5
53
1

20
.0
50
8

4.
96
17

2.
22
75

A
ir
,O

2
,O

ut
le
t
pr
es
su
re

0.
70
61

0.
49
86

24
.2
51
8

14
.2
14
5

31
.6
47
8

5.
09
22

2.
25
66

1.
55
39

0.
38
50

0.
14
82

16
.1
07
3

11
.6
29
7

16
.1
57
2

3.
47
67

1.
86
46

G
au

ss
ia
n

A
ir
,O

2
0.
78
68

0.
61
90

21
.0
84
8

11
.8
00
6

22
.3
99
0

3.
84
91

1.
96
19

0.
51
78

0.
41
71

0.
17
40

18
.9
02
3

13
.3
38
7

19
.4
81
6

4.
78
80

2.
18
81

A
ir
,O

2
,O

ut
le
t
pr
es
su
re

0.
91
30

0.
83
35

13
.7
89
6

7.
20
85

9.
99
55

1.
64
63

1.
28
31

0.
54
46

0.
59
97

0.
35
96

13
.1
47
1

8.
21
84

11
.3
80
8

2.
31
62

1.
52
19

P
ol
yn

om
ia
l

A
ir
,O

2
0.
70
42

0.
49
59

25
.9
05
4

15
.2
01
2

32
.7
63
3

5.
81
03

2.
41
05

10
9.
85
09

0.
37
85

0.
14
33

20
.8
25
6

15
.1
07
5

23
.5
17
0

5.
81
19

2.
41
08

A
ir
,O

2
,O

ut
le
t
pr
es
su
re

0.
65
00

0.
42
25

35
.7
76
2

21
.6
82
6

44
.4
54
0

11
.0
81
7

3.
32
89

12
1.
12
58

0.
20
66

0.
04
27

28
.6
15
5

23
.7
15
8

32
.3
06
5

7.
98
59

3.
31
26

Table 9. Results of simulations with SVR where 10 % of the experiment was used to test.

345


J. Kačur, M. Durdán, M. Laciak, P. Flegner Acta Polytechnica

Figure 14. Measured and predicted calorific value of syngas by SVR with Gaussian kernel function where 10 % of
the experiment and three inputs were used for test.

Figure 15. Measured and predicted underground temperature by SVR with Gaussian kernel function where 10 % of
the experiment and five inputs were used for test.

346


vol. 59 no. 4/2019 A comparative study of data-driven modeling methods. . .

P
re
di
ce
d
va
ri
ab

le
M
et
ho

d
O
bs
er
va
ti
on

s
T
ra
in
in
g

r
y
Y

r
2 yY

R
R
M
SE

(%
)

P
I

M
A
P
E

(%
)

M
SE

R
M
SE

T
im

e
(s
)

T
em

pe
ra
tu
re

B
P
N
N
,L

ay
er
s:

2,
N
eu

ro
ns
:
(L

1:
L2

)
80
0:
8

C
O
,C

O
2

0.
20
43

0.
04
17

8.
06
49

6.
69
67

6.
64
52

65
79
.6
87
8

81
.1
15
3

6.
85
60

B
P
N
N
,L

ay
er
:
1,

N
eu

ro
ns

(L
1)

11
C
O
,C

O
2
,H

2
,C

H
4
,O

2
0.
42
10

0.
17
72

7.
45
52

5.
24
66

6.
14
09

56
22
.4
66
2

74
.9
83
1

0.
74
61

M
A
R
S,

pi
ec
ew

is
e-
lin

ea
r,

16
B
Fs

C
O
,C

O
2

0.
49
85

0.
24
85

7.
10
80

4.
74
34

5.
62
62

51
10
.9
55
4

71
.4
90
9

12
.1
52
5

M
A
R
S,

pi
ec
ew

is
e-
lin

ea
r,

20
B
Fs

C
O
,C

O
2
,H

2
,C

H
4
,O

2
0.
66
66

0.
44
44

6.
11
19

3.
66
73

4.
72
85

37
78
.9
20
0

61
.4
72
9

17
.8
48
2

SV
R
,G

au
ss
ia
n
ke
rn
el

C
O
,C

O
2

0.
58
88

0.
34
66

6.
63
62

4.
17
70

4.
80
36

44
55
.0
43
8

66
.7
46
1

0.
66
05

SV
R
,G

au
ss
ia
n
ke
rn
el

C
O
,C

O
2
,H

2
,C

H
4
,O

2
0.
90
96

0.
82
74

3.
44
14

1.
80
22

1.
97
18

11
98
.0
79
3

34
.6
13
3

0.
66
88

C
al
or
ifi
c
va
lu
e

B
P
N
N
,L

ay
er
s:

2,
N
eu

ro
ns
:
(L

1:
L2

)
80
0:
8

A
ir
,O

2
0.
73
83

0.
54
51

22
.6
63
8

13
.0
37
8

28
.3
85
9

4.
44
72

2.
10
88

7.
10
23

B
P
N
N
,L

ay
er
s:

2,
N
eu

ro
ns
:
(L

1:
L2

)
80
0:
8

A
ir
,O

2
,O

ut
le
t
pr
es
su
re

0.
73
92

0.
54
64

22
.6
56
9

13
.0
27
1

28
.0
10
8

4.
44
44

2.
10
82

7.
58
02

M
A
R
S,

pi
ec
ew

is
e-
lin

ea
r,

15
B
Fs

A
ir
,O

2
0.
79
91

0.
63
86

20
.1
96
5

11
.2
25
6

23
.0
08
4

3.
53
16

1.
87
92

9.
16
35

M
A
R
S,

pi
ec
ew

is
e-
lin

ea
r,

17
B
Fs

A
ir
,O

2
,O

ut
le
t
pr
es
su
re

0.
87
30

0.
76
21

16
.3
86
6

8.
74
89

16
.5
88
3

2.
32
49

1.
52
47

13
.4
41
0

SV
R
,G

au
ss
ia
n
ke
rn
el

A
ir
,O

2
0.
78
68

0.
61
90

21
.0
84
8

11
.8
00
6

22
.3
99
0

3.
84
91

1.
96
19

0.
51
78

SV
R
,G

au
ss
ia
n
ke
rn
el

A
ir
,O

2
,O

ut
le
t
pr
es
su
re

0.
91
30

0.
83
35

13
.7
89
6

7.
20
85

9.
99
55

1.
64
63

1.
28
31

0.
54
46

Table 10. Overall results from the training phase.

347


J. Kačur, M. Durdán, M. Laciak, P. Flegner Acta Polytechnica

P
re
di
ce
d
va
ri
ab

le
M
et
ho

d
O
bs
er
va
ti
on

s
T
es
ti
ng

r
y
Y

r
2 yY

R
R
M
SE

(%
)

P
I

M
A
P
E

(%
)

M
SE

R
M
SE

T
em

pe
ra
tu
re

B
P
N
N
,L

ay
er
s:

2,
N
eu

ro
ns
:
(L

1:
L2

)
80
0:
8

C
O
,C

O
2

0.
67
55

0.
45
63

8.
50
75

5.
07
75

7.
45
26

66
62
.3
75
9

81
.6
23
4

B
P
N
N
,L

ay
er
:
1,

N
eu

ro
ns

(L
1)

11
C
O
,C

O
2
,H

2
,C

H
4
,O

2
0.
67
87

0.
46
06

7.
43
01

4.
42
61

5.
81
29

50
81
.7
73
2

71
.2
86
6

M
A
R
S,

pi
ec
ew

is
e-
lin

ea
r,

16
B
Fs

C
O
,C

O
2

0.
32
82

0.
10
77

10
.6
82
4

8.
04
27

9.
16
29

10
50
4.
30
10

10
2.
49
05

M
A
R
S,

pi
ec
ew

is
e-
cu

bi
c,

20
B
Fs

C
O
,C

O
2
,H

2
,C

H
4
,O

2
0.
60
31

0.
36
37

6.
42
75

4.
00
94

5.
16
23

38
02
.8
74
7

61
.6
67
5

SV
R
,L

in
ea
r
ke
rn
el

C
O
,C

O
2

0.
36
50

0.
13
32

8.
45
38

6.
19
33

7.
21
03

65
78
.6
16
5

81
.1
08
7

SV
R
,G

au
ss
ia
n
ke
rn
el

C
O
,C

O
2
,H

2
,C

H
4
,O

2
0.
69
13

0.
47
79

7.
90
51

4.
67
40

6.
96
61

57
52
.3
31
7

75
.8
44
1

C
al
or
ifi
c
va
lu
e

B
P
N
N
,L

ay
er
:
1,

N
eu

ro
ns

(L
1)

5
A
ir
,O

2
0.
04
01

0.
00
16

19
.8
08
2

19
.0
44
5

22
.9
76
6

5.
25
79

2.
29
30

B
P
N
N
,L

ay
er
:
1,

N
eu

ro
ns

(L
1)

7
A
ir
,O

2
,O

ut
le
t
pr
es
su
re

0.
71
87

0.
51
66

15
.8
21
9

9.
20
55

16
.8
80
4

3.
35
46

1.
83
16

M
A
R
S,

pi
ec
ew

is
e-
lin

ea
r,

15
B
Fs

A
ir
,O

2
0.
04
19

0.
00
18

80
.7
36
5

77
.4
89
7

30
.3
19
3

87
.3
50
0

9.
34
61

M
A
R
S,

pi
ec
ew

is
e-
cu

bi
c,

18
B
Fs

A
ir
,O

2
,O

ut
le
t
pr
es
su
re

0.
43
86

0.
19
24

17
.4
62
0

12
.1
38
2

17
.8
49
3

4.
08
61

2.
02
14

SV
R
,G

au
ss
ia
n
ke
rn
el

A
ir
,O

2
0.
41
71

0.
17
40

18
.9
02
3

13
.3
38
7

19
.4
81
6

4.
78
80

2.
18
81

SV
R
,G

au
ss
ia
n
ke
rn
el

A
ir
,O

2
,O

ut
le
t
pr
es
su
re

0.
59
97

0.
35
96

13
.1
47
1

8.
21
84

11
.3
80
8

2.
31
62

1.
52
19

Table 11. Overall results from the testing phase.

348


vol. 59 no. 4/2019 A comparative study of data-driven modeling methods. . .

model fitting with two observations, slightly better
results for training were achieved when the piecewise-
linear type of the MARS model was used. The MARS
model also had the highest time consumption during
the training in this case. Fitting the model of the
calorific value has reached a worse performance on
average than in the case of the temperature. The
higher performance index in the training phase is due
to the higher variability of inputs and low correla-
tion between the inputs and target. The results of
fitting with the BPNN take the third-place, because
of the worst performance index (PI), both in the case
of the calorific value and the underground tempera-
ture. In general, to improve the performance index in
the training phase, it is suggested to use more input
variables.

The results from the test phase are shown in Ta-
ble 11. These results are not so consistent as in the
training phase, especially for the temperature predic-
tion where the best results are scattered over indi-
vidual methods. In the testing phase, the model was
verified on untrained data in order to predict the tar-
get variable. The time consumption was not evaluated
in the testing phase.

For the temperature prediction with five input vari-
ables, the best result in terms of performance index
was obtained in the utilization of piecewise-cubic type
of the MARS model with 20 BFs. The worst result
in terms of the performance index was obtained by
the SVR with the Gaussian kernel and five input vari-
ables. When the SVR with five inputs was used, the
predicted temperature value correlated the best with
the measured one. With two input variables, the best
results were obtained by the BPNN and the worst by
the piecewise-linear type of the MARS model.
For the calorific value prediction with three input

variables, the results with the lowest performance in-
dex were obtained for the utilization of the SVR and
Gaussian kernel. Results obtained with the utilization
of the BPNN take the second-place. When the BPNN
with three inputs was used, the predicted calorific
value correlated the best with the measured one. The
worst results in terms of the performance index were
obtained by the piecewise-cubic type of the MARS
model with 18 BFs and three inputs. With two input
variables, the best results were obtained for the SVR
and Gaussian Kernel and the worst in the case of the
piecewise-linear type of the MARS model. The pre-
diction of the underground temperature has reached a
higher performance index on average than in the case
of temperature.

5. Summary and Conclusions
In this paper, three approaches were examined in or-
der to find the best prediction method for the UCG
data soft-sensing. A comparison of methods suitable
for predicting the UCG data has not been published
yet. In the UCG process, it is a complicated to predict
some process variables because it is not possible to

see the state of the process that runs in an inacces-
sible environment. The goal was to find the valid
data-driven learning method that allows to estimate
the underground temperature or the syngas calorific
value from other measurable process variables. Pre-
dicting these variables will make it possible to control
the UCG process more efficiently. In this paper, only
a small amount of measurable input variables from
one UCG experiment were used to get a comparison
of learning methods. All methods that have been
applied considered only one output variable. The re-
sulting MARS model can be stored in a PC and is
even portable as an analytic equation and the impact
of each predictor can clearly ne seen (i.e., the model
is easier to understand by humans). In MARS, the
prediction is based on a simple and quick calculation
of the MARS model formula. In SVR, each variable
is multiplied by the corresponding element of each
support vector, which can be a slow process if there
are many variables and a large number of support
vectors. Individual SVR models have been obtained
by using the ε-SVR. Applying the kernel trick in the
SVR, it allows modelling expert knowledge of the
UCG process. The SVR is defined as a convex opti-
mization problem where there are no local minima,
and therefore, effective optimization methods such as
SMO can be used. In the case of NNs, the training is
often complicated because there is always a risk of a
deadlock at the local minimum of the error function.
Also, the learning of NNs is highly complicated by
looking for a high number of weights in a multidi-
mensional space. In MARS and SVR, it is needed to
package program code that provides the prediction
with the optimized weights or support vectors. It can
be said that all three methods have achieved satisfac-
tory results in terms of the underground temperature
and syngas calorific value prediction. Regarding the
training, the SVR with the Gaussian kernel was the
winner. This model best matched the measured data,
both in the case of the temperature and the calorific
value. Regarding the prediction, the best result was
obtained by piecewise-cubic type of MARS model. In
these cases, the better results were achieved at all
considered input variables of the target variable. The
results show that a higher number of input variables
increases the predictive performance. Obtained re-
sults can be applied to the model predictive control
of the UCG process.

Acknowledgements
This work was supported by the EC Research Programme
of the Research Fund for Coal and Steel (Grant No. RFCR-
CT-2013-00002), by the Slovak Grant Agency for Science
under grant VEGA 1/0273/17, and by the Slovak Research
and Development Agency under the contract No. APVV-
14-0892.

References
[1] G. Ökten, V. Didari. Underground gasification of coal.
in: Kural, o. (ed.) coal. Technical report, Istanbul

349


J. Kačur, M. Durdán, M. Laciak, P. Flegner Acta Polytechnica

Technical University, University, Istanbul, Turkey, p.
371-378, 1994.

[2] J. Kačur, M. Durdán, M. Laciak, P. Flegner. Impact
analysis of the oxidant in the process of underground
coal gasification. Measurement 51:147–155, 2014.
doi:10.1016/j.measurement.2014.01.036.

[3] M. Sury, M. White, J. Kirton, et al. Review of
Environmental Issues of Underground Coal Gasification,
Technical Report COAL R272 DTI/Pub URN 04/1880.
Technical report, WS Atkins Consultants Ltd.
Department of Trade and Industry, 2010.

[4] J. Kačur, M. Durdán, G. Bogdanovská. Monitoring
and measurement of the process variable in UCG. In
SGEM 2016: 16th International Multidisciplinary
Scientific GeoConference, Bulgaria - Sofia : STEF92
Technology, pp. 295–302. 2016.
doi:10.5593/SGEM2016/B21/S07.038.

[5] M. Durdán, K. Kostúr. Modeling of temperatures by
using the algorithm of queue burning movement in the
UCG process. Acta Montanistica Slovaca
20(3):181–191, 2015.

[6] K. Kostúr. Mathematical modeling temperature’s
fields in overburden during underground coal gasification.
In ICCC 2014 : Proceedings of the 2014 15th
International Carpathian Control Conference (ICCC)
Velke Karlovice, May 28-30, pp. 248–253. Danvers :
IEEE, 2014. doi:10.1109/CarpathianCC.2014.6843606.

[7] M. Koenen, F. Bergen, P. David. Isotope measurements
as a proxy for optimising future hydrogen production in
underground coal gasification, news in depth, 2015.

[8] M. Benková, M. Durdán. Statistical analyzes of the
underground coal gasification process realized in the
laboratory conditions. In SGEM 2016: 16th
International Multidisciplinary Scientific GeoConference,
Bulgaria - Sofia : STEF92 Technology, pp. 405–412.
2016. doi:10.5593/SGEM2016/B21/S07.052.

[9] L. Fortuna, S. Graziani, A. Rizzo, G. M. Xibilia. Soft
Sensors for Monitoring and Control of Industrial
Processes. Springer London, 2007.
doi:10.1007/978-1-84628-480-9.

[10] T. Ji, H. Shi. Soft sensor modeling for temperature
measurement of texaco gasifier based on an improved
RBF neural network. In 2006 IEEE International
Conference on Information Acquisition, pp. 1147–1151.
IEEE, 2006. doi:10.1109/icia.2006.305907.

[11] A. A. Uppal, A. I. Bhatti, E. Aamir, et al. Control
oriented modeling and optimization of one dimensional
packed bed model of underground coal gasification.
Journal of Process Control 24:269–277, 2014.
doi:10.1016/j.jprocont.2013.12.001.

[12] A. A. Uppal, A. I. Bhatti, E. Aamir, et al.
Optimization and control of one dimensional packed
bed model of underground coal gasification. Journal of
Process Control 35:11–20, 2015.
doi:10.1016/j.jprocont.2015.08.002.

[13] A. A. Uppal, Y. M. Alsmadi, V. I. Utkin, et al.
Sliding mode control of underground coal gasification
energy conversion process. IEEE Transactions on
Control Systems Technology 26(2):587–598, 2018.
doi:10.1109/tcst.2017.2692718.

[14] Q. Wei, D. Liu. Adaptive dynamic programming for
optimal tracking control of unknown nonlinear systems
with application to coal gasification. Transactions on
Automation Science and Engineering, IEEE 11(4):1020–
1036, 2014. doi:10.1109/TASE.2013.2284545.

[15] R. Guo, G. X. Cheng, Y. Wang. Texaco coal
gasification quality prediction by neural estimator based
on dynamic PCA. In Proceedings of the 2006 IEEE
International Conference on Mechatronics and
Automation, pp. 1298–1302. 2006.
doi:10.1109/ICMA.2006.257660.

[16] B. Guo, Y. Shen, F. Zhao. Modelling coal gasification
with a hybrid neural network. Fuel 76(12):1159–1164,
1997. doi:10.1016/s0016-2361(97)00122-1.

[17] S. Liu, Z. Hou, C. Yin. Data-driven modeling for
fixed-bed intermittent gasification processes by
enhanced lazy learning incorporated with relevance
vector machine. In 11th IEEE International Conference
on Control & Automation (ICCA), pp. 1019–1024.
IEEE, 2014. doi:10.1109/icca.2014.6871060.

[18] M. Laciak, J. Kačur, K. Kostúr. The verification of
thermodynamic model for UCG process. In ICCC 2016:
17th International Carpathian Control Conference, pp.
424–428. 2016.
doi:10.1109/CarpathianCC.2016.7501135.

[19] M. Laciak, D. Ráškayová. The using of
thermodynamic model for the optimal setting of input
parameters in the UCG process. In ICCC 2016: 17th
International Carpathian Control Conference, pp. 418–
423. 2016. doi:10.1109/CarpathianCC.2016.7501134.

[20] A. M. Winslow. Numerical model of coal gasification
in a packed bed. Symposium (International) on
Combustion 16(1):503–513, 1977.
doi:10.1016/s0082-0784(77)80347-0.

[21] P. Ji, X. Gao, D. Huang, Y. Yang. Prediction of
syngas compositions in shell coal gasification process via
dynamic soft-sensing method. In Proceeding of 10th
IEEE International Conference on Control and
Automation (ICCA), pp. 244–249. 2013.
doi:10.1109/ICCA.2013.6565140.

[22] I. H. AL-Qinani. Multivariate adaptive regression
splines (MARS) heuristic model: Application of heavy
metal prediction. International Journal of Modern
Trends in Engineering & Research 3(8):223–229, 2016.
doi:10.21884/ijmter.2016.3027.7nuqv.

[23] A. Aryafar, R. Gholami, R. Rooki, F. D. Ardejani.
Heavy metal pollution assessment using support vector
machine in the shur river, sarcheshmeh copper mine,
iran. Environmental Earth Sciences 67(4):1191–1199,
2012. doi:10.1007/s12665-012-1565-7.

[24] D. E. Rumelhart, G. E. Hinton, R. J. Wiliams.
Learning internal representation by error propagation.
in: D. E. Rumelhart, J. L. McClelland, and PDP
Research Group. Parallel Distributed Processing.
Explorations in the Microstructure of Cognition. Vol 1:
Foundation, 1987.

[25] G. Sampson, D. E. Rumelhart, J. L. McClelland,
T. P. R. Group. Parallel distributed processing:
Explorations in the microstructures of cognition.
Language 63(4):871, 1987. doi:10.2307/415721.

350

http://dx.doi.org/10.1016/j.measurement.2014.01.036
http://dx.doi.org/10.5593/SGEM2016/B21/S07.038
http://dx.doi.org/10.1109/CarpathianCC.2014.6843606
http://dx.doi.org/10.5593/SGEM2016/B21/S07.052
http://dx.doi.org/10.1007/978-1-84628-480-9
http://dx.doi.org/10.1109/icia.2006.305907
http://dx.doi.org/10.1016/j.jprocont.2013.12.001
http://dx.doi.org/10.1016/j.jprocont.2015.08.002
http://dx.doi.org/10.1109/tcst.2017.2692718
http://dx.doi.org/10.1109/TASE.2013.2284545
http://dx.doi.org/10.1109/ICMA.2006.257660
http://dx.doi.org/10.1016/s0016-2361(97)00122-1
http://dx.doi.org/10.1109/icca.2014.6871060
http://dx.doi.org/10.1109/CarpathianCC.2016.7501135
http://dx.doi.org/10.1109/CarpathianCC.2016.7501134
http://dx.doi.org/10.1016/s0082-0784(77)80347-0
http://dx.doi.org/10.1109/ICCA.2013.6565140
http://dx.doi.org/10.21884/ijmter.2016.3027.7nuqv
http://dx.doi.org/10.1007/s12665-012-1565-7
http://dx.doi.org/10.2307/415721


vol. 59 no. 4/2019 A comparative study of data-driven modeling methods. . .

[26] T. Hastie, R. Tibshirani, J. Friedman. The Elements
of Statistical Learning - Data Mining, Inference, and
Prediction, Second Edition. Springer New York, 2009.
doi:10.1007/b94608.

[27] V. Kvasnička, . Beňušková, I. F. J. Pospíchal, et al.
Úvod do teórie neurónových sietí. IRIS, Bratislava, 1997.

[28] J. Sedláček. Úvod do teorie grafů. Academia, Praha,
1981.

[29] J. H. Friedman. Multivariate adaptive regression
splines. The Annals of Statistics 19(1):1–67, 1991.
doi:10.1214/aos/1176347963.

[30] P. Sephton. Forecasting recessions: can we do better
on MARS?, 2001.

[31] M. Chugh, S. S. Thumsi, V. Keshri. A comparative
study between least square support vector
machine(lssvm) and multivariate adaptive regression
spline(mars) methods for the measurement of load
storing capacity of driven piles in cohesion less soil.
International Journal of Structural and Civil Engineering
Research 2015. doi:10.18178/ijscer.4.2.189-194.

[32] V. R. Tselykh. Multivariate adaptive regression
splines. Machine learning and data analysis
1(3):272–278, 2012. doi:10.21469/22233792.

[33] P. Samui, D. P. Kothari. A multivariate adaptive
regression spline approach for prediction of maximum
shear modulus and minimum damping ratio. Engineering
Journal 16(5):69–78, 2012. doi:10.4186/ej.2012.16.5.69.

[34] W. Zhang, A. T. C. Goh. Multivariate adaptive
regression splines and neural network models for
prediction of pile drivability. Geoscience Frontiers
7(1):45–52, 2016. doi:10.1016/j.gsf.2014.10.003.

[35] A. Abraham, D. Steinberg. MARS: Still an alien
planet in soft computing? In International Conference
on Computational Science - ICCS (Proceedings), Part
II, vol. 2, pp. 235–244. Springer Berlin Heidelberg, 2001.
doi:10.1007/3-540-45718-6\_27.

[36] B. E. Boser, I. M. Guyon, V. N. Vapnik. A training
algorithm for optimal margin classifiers. In Proceedings
of the 5th Annual ACM Workshop on Computational
Learning Theory - COLT’92, Pittsburgh, PA, pp.
144–152. ACM Press, 1992. doi:10.1145/130385.130401.

[37] V. N. Vapnik. Constructing learning algorithms. In
The Nature of Statistical Learning Theory, pp. 119–166.
Springer Verlag, New York, 1995.
doi:10.1007/978-1-4757-2440-0\_6.

[38] K. R. Müller, A. J. Smola, G. Rätsch, et al. Predicting
time series with support vector machines. In Lecture
Notes in Computer Science, pp. 999–1004. Springer
Berlin Heidelberg, 1997. doi:10.1007/bfb0020283.

[39] J. Kačur, M. Laciak, M. Durdán, P. Flegner.
Utilization of Machine Learning Method in Prediction
of UCG Data. In ICCC 2017: 18th International
Carpathian Control Conference, pp. 1–6. IEEE, 2017.
doi:10.1109/carpathiancc.2017.7970411.

[40] MathWorks. Understanding Support Vector Machine
Regression, In: Statistics and Machine Learning
Toolbox User’s Guide (R2018ab). regression.html, 2018.

[41] J. Kačur, K. Kostúr. Approaches to the Gas Control
in UCG. Acta Polytechnica 57(3), 2017.
doi:10.14311/ap.2017.57.0182.

[42] M. Laciak, K. Kostúr, M. Durdán, et al. The
analysis of the underground coal gasification in
experimental equipment. Energy 114:332–343, 2016.
doi:10.1016/j.energy.2016.08.004.

[43] R. L. I. Dobbs, W. B. Krantz. Combustion front
propagation in underground coal gasification, Final
Report, Work Performed under Grant No.
DE-FG22-86PC90512. Technical report, University of
Colorado, Boulder Department of Chemical
Engineering, 1990. doi:10.2172/6035494.

[44] K. Stańczyk, A. Smoliński, K. Kapusta, et al.
Dynamic experimental simulation of hydrogen oriented
underground gasification of lignite. Fuel
89(11):3307–3314, 2010. doi:10.1016/j.fuel.2010.03.004.

[45] K. Kostúr, J. Kačur. Developing of optimal control
system for UCG. In Proceedings of the 13th International
Carpathian Control Conference (ICCC), pp. 347–352.
IEEE, 2012. doi:10.1109/carpathiancc.2012.6228666.

[46] K. Kostúr, J. Kačur. Development of control and
monitoring system of UCG by promotic. In 2011 12th
International Carpathian Control Conference (ICCC),
pp. 215–219. IEEE, 2011.
doi:10.1109/carpathiancc.2011.5945850.

[47] A. H. Gandomi, D. A. Roke. Intelligent formulation
of structural engineering systems. In Seventh MIT
Conference on Computational Fluid and Solid
Mechanics-Focus: Multiphysics and Multiscale, pp.
n/a–n/a. Cambridge, USA, 2013.

[48] A. H. Gandomi, D. A. Roke. Assessment of artificial
neural network and genetic programming as predictive
tools. Advances in Engineering Software 88:63–72, 2015.
doi:10.1016/j.advengsoft.2015.05.007.

351

http://dx.doi.org/10.1007/b94608
http://dx.doi.org/10.1214/aos/1176347963
http://dx.doi.org/10.18178/ijscer.4.2.189-194
http://dx.doi.org/10.21469/22233792
http://dx.doi.org/10.4186/ej.2012.16.5.69
http://dx.doi.org/10.1016/j.gsf.2014.10.003
http://dx.doi.org/10.1007/3-540-45718-6\_27
http://dx.doi.org/10.1145/130385.130401
http://dx.doi.org/10.1007/978-1-4757-2440-0\_6
http://dx.doi.org/10.1007/bfb0020283
http://dx.doi.org/10.1109/carpathiancc.2017.7970411
http://dx.doi.org/10.14311/ap.2017.57.0182
http://dx.doi.org/10.1016/j.energy.2016.08.004
http://dx.doi.org/10.2172/6035494
http://dx.doi.org/10.1016/j.fuel.2010.03.004
http://dx.doi.org/10.1109/carpathiancc.2012.6228666
http://dx.doi.org/10.1109/carpathiancc.2011.5945850
http://dx.doi.org/10.1016/j.advengsoft.2015.05.007

	Acta Polytechnica 59(4):322–351, 2019
	1 Introduction
	1.1 Understanding UCG Technology
	1.2 Measurement and Monitoring in UCG
	1.3 Modeling and Prediction in UCG

	2 Analysis of Selected Modeling Methods
	2.1 Multilayer Feed-Forward Neural Networks
	2.2 Multivariate Adaptive Regression Splines
	2.3 Support Vector Regression

	3 Experimental UCG in Ex-Situ Reactor
	4 Results and Discussion
	4.1 Prediction by the Back-Propagation NN
	4.2 Prediction by the MARS
	4.3 Prediction by the Support Vector Regression
	4.4 Overall Results

	5 Summary and Conclusions
	Acknowledgements
	References