ap-3-12.dvi


Acta Polytechnica Vol. 52 No. 3/2012

A Neural Network Model for Predicting NOx at the Mělńık 1

Coal-powder Power Plant

Ivo Bukovský1, Michal Kolovratńık2

1
Czech Technical University in Prague, Faculty of Mechanical Engineering, Department of Instrumentation and Control
Engineering, Technická 4, 166 07 Prague, Czech republic

2
Czech Technical University in Prague, Faculty of Mechanical Engineering, Department of Energy Engineering,
Technická 4, 166 07 Prague, Czech republic

Correspondence to: ivo.bukovsky@fs.cvut.cz

Abstract

This paper presents a non-conventional dynamic neural network that was designed for real time prediction of NOx at

the coal powder power plant Mělńık 1, and results on real data are shown and discussed. The paper also presents the

signal preprocessing techniques, the input-reconfigurable architecture, and the learning algorithm of the proposed neural

network, which was designed to handle the non-stationarity of the burning process as well as individual failures of the

measured variables. The advantages of our designed neural network over conventional neural networks are discussed.

Keywords: dynamic neural networks, prediction, NOx emissions, signal processing.

1 Introduction

Neural networks (NN) are a popular and widely studied real data-driven nonlinear modeling tool for compli-
cated systems where mathematical-physical analysis is unavailable for deriving a model. Unlike analytical or
linear models, NNs are black-box models, or sometimes gray-box models, that require proper design of their
mathematical architecture and an efficient learning algorithm. For the principles of fundamental neural net-
works we may refer, e.g., to [1], and we may refer to less recent reviews [2, 3] for studies of NNs in energetic
processes. For more recent works, including studies of conventional NN in energetic processes, we may refer to
papers [5–8], which deal with computational intelligence tools (neural networks, genetic algorithms) focused on
biomass combustion. The study of non-conventional neural architectures for modeling steady state hot steam
turbine data and for modeling a large scale energetic boiler can be found in [9], where the advantages of a
static quadratic neural unit (QNU, [1, 4]) and a special quadratic neural network [9] over conventional multi-
layer perceptron neural networks (MLP) are demonstrated, with reference to the overfitting and local minima
problem, which are typical drawbacks of MLP (even with a single hidden layer NN). The advantage of QNU
is its nonlinear input-output mapping, while this neural model is linear in its parameters [10] (unlike MLP).
This allows us to monitor and maintain adaptation stability by a comprehensible evaluation of the eigenvalues
of the weight update system [10] that offer promising opportunities for adaptive monitoring, modeling, and
process optimization by adaptive nonlinear control. QNU can be seen as a component of higher-order neural
network (HONN), sometimes also referred to as a polynomial neural network (PNN). The origins of these neural
networks can be traced back to works [11–14], while the concept of the standalone higher-order neural units
(HONUs) as a building component of HONN can be found in [1] and in [4]. The fundamental gradient-based
learning rules for training dynamic neural networks are known as Real-time Recurrent Learning (RTRL) [15]
and Back-Propagation Through Time (BPTT) [16, 17]. These algorithms can be made comprehensible and
practically useful for real-time computations.

In this paper, we present the resulting neural network architecture that has been designed and tested for NOx
prediction for a pulverized coal firing boiler at the power plant “Elektrárna Mělńık 1 (EME 1)”; the nominal
steam load of this boiler is 250 tons per hour. The goal is to design and test a model that does not involve
measured O2 or CO in its input and that can potentially be used for optimizing the energetic process regarding
NOx and CO emissions of the pulverized firing boiler at EME 1. The resulting discrete-time dynamic (recurrent)
neural network merges the concept of a conventional recurrent (MLP) neural network with QNU [4, 9]. The

17


Acta Polytechnica Vol. 52 No. 3/2012

Figure 1: The data preprocessing before each reconfiguration and retraining of the neural network

data pre-processing and retraining strategy is described in the next section, and that in turn is followed by the
mathematical notation of the neural architecture that led to the results shown in the section on discussion.

2 Data preprocessing and network training

The NOx dynamics of the pulverized boiler is highly nonstationary, due to varying technical conditions of the
boiler, varying quality of the coal powder, and also because of the measurement outages that occur quite often
on an hourly basis. It was therefore not possible to obtain a neural network model that would reliably predict
the NOx emissions from the long term data. To handle the non-stationary nature of the boiler in EME 1, we
arrived at the data preprocessing technique that is sketched in Figure 1, where U(k) is a matrix of recent history
(a retraining window) of all measured input variables (excluding measured O2, NOx, and CO) at a reference
time k that is particularly given as follows

U(k) =

⎡
⎢⎢⎢⎢⎢⎣

u1(k − Ntrain + 1) . . . u1(k − 1) u1(k)
u2(k − Ntrain + 1) . . . u2(k − 1) u2(k)

...
...

...
...

un(k − Ntrain + 1) . . . un(k − 1) un(k)

⎤
⎥⎥⎥⎥⎥⎦ . (1)

The measured input variables in U(k) are the primary, secondary, and tertiary air valves, and also optionally the
steam load or the air flow before the ventilator (in total n = 18, 19, 20 variables) excluding O2, NOx, and CO.
The principal component analysis (PCA) block is an application of PCA to the linearly correlated variables,
so the number of input variables in Upca(k) is m < n, which importantly decreases the computation load
while it maintains information in the measured input data (note, Figure 1 shows only a simplified sketch, while
detailed implementation of PCA that benefits from process knowledge of the pulverized boiler at EME 1 may be
provided on the basis of an official request for [19]). The structure of the resulting input data matrix Upca(k)
that is used as the neural network input (after the preprocessing shown in Figure 1) is as follows

Upca(k) =

⎡
⎢⎢⎣

upca1(k − Ntrain + 1) . . . upca1(k − 1) upca1(k)
...

...
...

...

upcam(k − Ntrain + 1) . . . upcam(k − 1) upcam(k)

⎤
⎥⎥⎦ . (2)

The presented data pre-processing technique removes variables with measurement outages. Principal component
analysis results in a lower computational load, because of the reduced number of external inputs into the neural
network (m < n). PCA has a filtering effect, and also contributes to more accurate calculations of matrix
inversion with the BPTT training technique by reducing redundant and linearly correlated data.

3 Neural network for NOx prediction

This section describes the mathematical notation of the designed neural network for NOx prediction. This
neural network is a discrete-time recurrent architecture, i.e., a non-linear difference equation system, composed
of a recurrent hidden layer of conventional sigmoid neurons and with an output quadratic neural unit with
feedbacks also from the output to its input. In particular, the neural network predictive model is given as

18


Acta Polytechnica Vol. 52 No. 3/2012

follows. The window of external inputs for retraining the network at reference time k are the pre-processed
measured variables Upca(k), as given in (2) and in Figure 1. The external inputs that enter the neural network
for ns samples ahead prediction at time k are in the last column of Upca(k), as follows

upca(k) =
[
upca1(k) upca2(k) . . . upcar(k)

]T
, (3)

where k is a reference sample time index and r is the dimension of the reduced vector of all measured external
inputs by the PCA method. The input vector to the hidden layer of the neural network is given in (4) as

x(k) =
[
yn(k + nya) . . . yn(k − nyb) upca(k + nua)T . . . upca(k − nub)T ξ(k)

]T
, (4)

where yn(.) are step-delayed neural outputs; nya, nyb, nua, and nub are input configuration parameters; and
ξ(k) is the step delayed feedback of the hidden layer outputs. The output of the hidden sigmoidal layer ξ(k + 1)
(6) is calculated using the hidden layer weight matrix W (6) and using the classical sigmoid function (5), as
follows

φ(ν) =
2

1 + e−ν
− 1, (5)

where ξ(k + 1) is augmented with a unit as

ξ(k + 1) =

[
1

φ(W · x(k))

]
, (6)

where the unit allows the hidden layer (first column of W) and also the QNU (v0,0 in (7)) for biases, so the
neural output is calculated by a quadratic neural unit [1,4,9,10], using (3)–(6), as follows

yn(k + ns) =
∑
i=0

∑
j=i

vi,j · ξi(k + 1) · ξj(k + 1). (7)

The proposed dynamic neural network has purposely designed properties that are worth mentioning and ex-
plaining. The hidden recurrent layer of neurons, which is calculated in (6) as φ(W · x(k)), reduces cognitively
(by training) the number of already PCA preprocessed neural inputs, and thus (6) results in the augmented
vector of state variables ξ(k +1) that are fed both forward to the QNU and also back to the network input x(k),
as in (4). Without the first hidden layer, the number of input variables inputted directly into QNU would still
be too large for the given 1-minute sampling period, as we feed an approximately twelve-minute history of each
PCA preprocessed variable into the network input, i.e. nya − nyb = 12 and also nua − nub = 12 (the estimated
time constant of this pulverized firing boiler has been specified by experts as approximately 12 minutes). Also,
the first layer (6) plays a filtering role due to its step delayed feedback to the network input (4), and its recurrent
feedback naturally calls for training by the Backpropagation Through Time method (BPTT) [15–17], which is
a powerful and efficient and yet practical optimization method, as it can be achieved by a combination of a
gradient descent rule and the Levenberg-Marquardt algorithm [18]. The sigmoid function ϕ(.), which is usually
considered as a main nonlinearity of conventional neural networks, has another importance for this dynamic
network, because the major nonlinearity is provided by the QNU [9, 10, 18]. The sigmoid function ϕ(.) limits
the output of the hidden layer into the given range of values (−1, +1) that importantly assures stability of the
hidden layer (as of a discrete time dynamic system, and this could not be so simply assured for continuous-time
NNs). Then, there are always limited values entering the QNU, so its output is also naturally limited; thus,
the stability of the state variables and of the output of the proposed neural network is naturally assured by
preserving the sigmoid function in hidden neurons. As regards the stability of the learning algorithm, and thus
its convergence, we proposed a novel approach to weight-update stability for gradient descent training of QNU
in [10], and this approach is applicable to both static and dynamic QNU, and also to the hidden-layer weight
system of this network for NOx prediction.

4 Results and discussion

This section shows the results of 3-minute predictions of NOx emissions (in fact the 3-minute floating averages)
of the pulverized firing boiler at EME 1 by the proposed neural network (ns = 3, sampling 1 minute). The
recurrent network does not include measured O2, NOx, or CO on its input; the introduction of NOx as a measured
external input resulted in a prediction failure; the model learned to follow blindly the previous measured NOx,

19


Acta Polytechnica Vol. 52 No. 3/2012

Figure 2: NOx prediction by the neural network (1)–(7) with re-configurations and re-training (Figure 1)

Figure 3: Detail from Figure 2 — a good prediction

Figure 4: Detail from Figure 2 — a bad prediction due to outliers in the training data

which is a typical problem with the improper use of neural networks for complicated systems. The permanent
computation run of prediction for 24 days, with one-minute sampling and retraining every 30 minutes, is shown
in Figure 2.

Figure 2 shows superimposed 24-day recordings of measured NOx (thick line) and the three-minute prediction
(ns = 3) of NOx (bold line) (the three-minute floating average of NOx is predicted); the neural network is
retrained every 30 minutes with 5 hours of the very last measured data (one-minute sampling, the model input
excludes measured O2 , NOx, CO). The network (1)–(7) was retrained every 30 minutes by the back propagation
through time algorithm [18] with blindly selected most recent history of 298 samples (5 hours) of the measured
process variables. Each retraining took less than 3 minutes of real computation time in Matlab on a PC (Win7,
i7), and this is practical for real time retraining implementation.

The good performance of NOx prediction is apparent from the details in Figure 3. Also, Figure 3 shows a
temporary measurement outage (∼ 15 minutes) after sample k = 2.82E + 4; the output of the dynamic neural

20


Acta Polytechnica Vol. 52 No. 3/2012

network substitutes the measurement outage; the good neural network prediction depends on availability of
good retraining data in this observed period. However, this kind of NOx outage affects retraining, see Figure 4.
The prediction accuracy and prediction reliability for NOx prediction depends significantly on the retraining
data (here, the last 298 samples before each predicted value). The impact of NOx outliers is apparent if we
compare the prediction details in Figure 3 and Figure 4, and it is clear that another signal processing technique
for selecting the retraining data needs to be involved in order to avoid NOx outliers in the retraining data; the
neural network fails in prediction after k = 3.122E + 4 because of the poor retraining data and also because of
the outliers at k = 3.12E + 4. The prediction becomes correct again for k > 3.133E + 10, because the related
retraining data already does not include the outliers.

5 Conclusions

We designed and tested a non-conventional recurrent neural network for predicting the NOx emissions of a
pulverized firing boiler without using O2, NOx, or CO on the model input. The proposed method handles
process non-stationarity by frequent retraining, and it handles the outages of input process variables by input
data preprocessing (but not yet the outages of the predicted NOx itself); it is assumed that this can be resolved
by automatically supervised selection of the training data where NOx outliers do not appear, and by avoiding
unnecessary retraining.

Acknowledgement

This work has been supported by grant MPO FR-TI1/538, and in part by grant SGS10/252/OHK2/3T/12.

References

[1] Gupta, M. M., Liang, J., Homma, N.: Static and Dynamic Neural Networks: From Fundamentals to
Advanced Theory. IEEE Press and Wiley-Interscience, John Wiley & Sons, Inc., 2003.

[2] Kalogirou, S. A.: Artificial intelligence for the modeling and control of combustion processes: a review,
Progress in Energy and Combustion Science, 29, 2003, p. 515–566, Elsevier. ISSN 0360-1285.

[3] Mellit, A., Kalogirou, S. A.: Artificial intelligence techniques for photovoltaic applications: A review,
Progress in Energy and Combustion Science, 34, 2008, p. 574–632, Elsevier. ISSN 0360-1285.

[4] Bukovsky, I., Bila, J., Gupta, M. M., Hou, Z.-G., Homma, N.: Foundation and Classification of Nonconven-
tional Neural Units and Paradigm of Nonsynaptic Neural Interaction, in Discoveries and Breakthroughs in
Cognitive Informatics and Natural Intelligence. in the ACINI book series ed. by Yingxu Wang, University
of Calgary, Canada : IGI Publishing, Hershey PA, USA, 2009. ISBN 978-1-60566-902-1.

[5] Pitel’, J., Mižák, J.: Approximation of CO/Lambda Biomass Combustion Dependence by Artificial Intelli-
gence Techniques. In Annals of DAAAM for 2011 & Proceedings of the 22nd International DAAAM Sym-
posium, Vienna, Austria, 23–26th November 2011. Vienna : DAAAM International, 2011, p. 0143–0144.
ISBN 978-3-901509-83-4, ISSN 1726-9679.

[6] Mižák, J., Pitel’, J.: Using Artificial Neural Networks for Biomass Combustion Process Control. In
Proceedings of 2nd International Seminar “System Analysis, Control and Information Processing”, Di-
vnomorskoje, Russia. [CD-ROM]. Rostov on Don: Don State Technical University, 2011, p. 343–348. ISBN
978-5-7890-0666-5.

[7] Pitel’, J., Borž́ıková, J., Mižák, J.: Biomass Combustion Process Control Using Artificial Intelligence
Techniques. In Proceedings of XXXVth Seminar ASR’2010 “Instruments and Control”. Ostrava : VŠB-TU
Ostrava, 2010, p. 317–321. ISBN 978-80-248-2191-7.

[8] Hošovský, A.: Genetic Optimization of Neural Networks Structure for Modeling of Biomass-Fired Boiler
Emissions. Journal of Applied Science in Thermodynamics and Fluid Mechanics, Vol. 9, No. 2, 2011, p. 1–6.
ISSN 1802-9388.

[9] Bukovský, I., Lepold, M., Bı́lá, J.: Quadratic Neural Unit and its Network in Validation of Process Data
of Steam Turbine Loop and Energetic Boiler, WCCI 2010, IEEE Int. Joint. Conf. on Neural Networks
IJCNN, Barcelona, Spain, 2010.

21


Acta Polytechnica Vol. 52 No. 3/2012

[10] Bukovský, I., Bı́lá, J., Noriasu, H., Rodriguez, R.: Prospects of Gradient Methods for Nonlinear Control,
Automatizácia a riadenie v teórii a praxi ARTEP 2012, Slovakia 2012. ISBN 978-80-553-0835-7.

[11] Ivakhnenko, A. G.: Polynomial Theory of Complex Systems, IEEE Tran. on Systems. Man and Cybernetics.
Vol. SMC-1, 4, 1971, p. 364–378.

[12] Nikolaev, N. Y., Iba, H.: Learning Polynomial Feedforward Neural Network by Genetic Programming and
Backpropagation, IEEE Trans. on Neural Networks, Vol. 14, No. 2, March, 2003, p. 337–350.

[13] Taylor, J. G., Commbes, S.: Learning higher order correlations, Neural Networks, 6, 1993, p. 423–428.

[14] Kosmatopoulos, E., Polycarpou, M., Christodoulou, M., Ioannou, P.: High-Order Neural Network Struc-
tures for Identification of Dynamical Systems, IEEE Trans. on Neural Networks, Vol. 6, No. 2, March 1995,
p. 422–431.

[15] Williams, R. J., Zipser, D.: A learning algorithm for continually running fully recurrent neural networks,
Neural Comput., Vol. 1, 1989, p. 270–280.

[16] Werbos, P. J.: Backpropagation through time: What it is and how to do it, Proc. IEEE, Vol. 78, No. 10,
Oct. 1990, p. 1 550–1 560. ISSN 0018-9219.

[17] Pearlmutter, B. A.: Gradient calculation for dynamic recurrent neural networks: a survey. IEEE Transac-
tions on Neural Networks, 6, 5, 1995, 1 212–1228. doi: 10.1109/72.410363.

[18] Gupta, M. M., Bukovský, I., Noriasu, H., Solo, M. G., Hou, Z.-G.: Fundamentals of Higher Order Neural
Networks for Modeling and Simulation, In Artificial Higher Order Neural Networks for Modeling and
Simulation, ed. M. Zhang, IGI Global, 2012, (accepted, to appear in 2012).

[19] Bukovský, I., Křehĺık, K.: Testy neuronového modelu kotle elektrárny Mělńık I. Výzkumná zpráva č. 6 –
ZI00069/E06. Ústav př́ıstrojové a ř́ıd́ıćı techniky, Fakulta Strojńı, ČVUT v Praze, 2011.

22