Plane Thermoelastic Waves in Infinite Half-Space Caused


Decision Making: Applications in Management and Engineering  
Vol. 5, Issue 1, 2022, pp. 154-168. 
ISSN: 2560-6018 
eISSN: 2620-0104  
DOI: https://doi.org/10.31181/dmame0313052022d 

* Corresponding author. 
 E-mail addresses: ana.diniz@edu.ufes.br (A. P. M. Diniz), patrick.ciarelli@ufes.br (P. M. 
Ciarelli), evandro@ufes.br (E. O. T. Salles), klaus.coco@ufes.br (K. F. Coco) 

LONG SHORT-TERM MEMORY NEURAL NETWORKS FOR 
CLOGGING DETECTION IN THE SUBMERGED ENTRY 

NOZZLE 

Ana P. M. Diniz1, Patrick M. Ciarelli1, Evandro O. T. Salles1, and Klaus F. 
Coco1 

1 Universidade Federal do Espírito Santo, Vitória, Espírito Santo, Brazil  

 
Received: 2 April 2022;  
Accepted: 8 May 2022;  
Available online: 13 May 2022. 

 
Original scientific paper 

Abstract: The clogging in the Submerged Entry Nozzle (SEN), responsible for 
controlling the steel flow in continuous casting, is one of the main problems 
faced by steelmaking process, since it can increase the frequency of 
interruptions in the operation for the maintenance and/or exchange of its 
equipment. Although it is a problem inherent to the process, not identifying the 
clogging can result in losses associated with the process yield, as well as 
compromising the product quality. In order to detect the occurrences of 
clogging in a real steel industry from historical data of process variables, in 
this paper, different models of Long Short-Term Memory (LSTM) neural 
networks were tested and discussed. The overall performance of the classifiers 
developed here showed very promising results in real data with class 
imbalance. 

Key words: Continuous Casting, Submerged Entry Nozzle, Clogging, LSTM, 
Deep Learning. 

1. Introduction 

One of the problems faced by the industry concerning the continuous casting 
process is the accumulation of steel impurities that forms on Submerged Entry Nozzle 
(SEN) of the tundish, causing their obstruction, known as clogging (Ikaheimonen, 
2002). 

As evidenced by Ikaheimonen (2002), clogging can be formed by several factors, 
including metallurgical, hydrodynamic and thermodynamic factors, as well as nozzle 
material, unpredictable disturbances and operational failures. 

According to Rackers (1995), the consequences of clogging include reduced 
productivity, increased production costs and decreased product quality. Clogging 
events increase the frequency of interruptions in the operation for the exchange 
and/or maintenance of nozzles and tundishes, which can reduce the useful lifetime by 


Long Short-Term Memory Neural Networks for Clogging Detection in the Submerged Entry 
Nozzle 

155 

up to half (Schmidt, Russo & Bederka, 1991). In addition, small solid inclusions can 
break off and enter the steel flow, causing unacceptable defects in the product (Bessho 
et al., 1991, Wang et al., 2021, Wu et al., 2021). 

Scientific studies show that the clogging phenomenon lasts around 250 seconds, 
being perceptible only after the first 80 seconds, which leaves a margin of 170 seconds 
between possible detection and total obstruction (Barati et al., 2018). Thus, the 
detection model must act quickly to allow corrective action in less than 2 minutes. 

Although the detection of the beginning of clogging is of fundamental importance 
so that control actions can be applied and the system operation is prolonged, in 
practice, the steel industry still does not have effective tools for the detection (Rout et 
al., 2013, Pellegrini et al., 2019, Wang et al., 2021). Due to hostile working conditions, 
variations in production, process and sensors failures, the data are generally noisy, 
with outliers and missing values (Wang, Mao & Huang, 2018). 

Even with the different adversities, researches have been developed and, usually, 
they correlate the clogging occurrences with the gate opening and the casting speed. 
There is not a large number of studies found in the bibliography, mainly due to the 
difficulty in obtaining the data due to their confidentiality. In addition, these 
researches apply techniques associated with the physical parameterization of the 
plant, which restricts their applicability (Ikäheimonen et al., 2002, Vannucci & Colla, 
2011, Rout et al., 2013, Pellegrini et al., 2019). 

Regarding the models developed in the literature, in addition to being heavily 
influenced by physical equations, the small number of clogging occurrences against 
normal operating data end up compromising the classifiers' ability during the learning 
phase. As a result, the predictive accuracy desired by the industry can hardly be 
guaranteed (Vannucci & Colla, 2011, Rout et al., 2013, Pellegrini et al., 2019). 

Several works in this area were evaluated and the best success rates for clogging 
prediction were found by Vannucci & Colla (2011) and Pellegrini et al. (2019). The first 
authors associated neural networks with fuzzy logic to classify clogging events, 
achieving 76.9% of recall and 80.2% of accuracy. In Pellegrini et al. (2019) were 
applied an online predictive estimation model for the probability of clogging using 
about 50 process variables. Although the authors suggest that the model presented a 
good classification performance in series identified as possibly subject to clogging, 
with an overall area under the curve (AUC) equal to 0.8, when tested in series with 
lower probability of clogging incidence, the model presented an accuracy of 75% and 
a precision of 62%. 

In Ikäheimonen et al. (2002), neural networks were applied to data from a real 
plant in a problem similar to the one addressed in this study, however, satisfactory 
results were not obtained in the interest of the industry. Another aspect is related to 
the amount of noise inherent to the input signals, so that a multiplayer perceptron 
neural network did not behave so well in the initial clogging prediction task. The idea 
is to apply as few pre-processing techniques as possible, in order to be able to use real 
data in an online application.  

As discussed by Goodfellow, Bengio & Courville (2016), the use of deep learning is 
motivated by the difficulty of traditional algorithms in generalizing problems 
involving, above all, high-dimensional and highly complex data. Deep learning, then, 
provides a very powerful framework for supervised learning (Wang et al., 2021, Wu 
et al., 2021). 

In order to identify the clogging in the SEN, this article evaluates the general 
performance of classifiers using Long Short-Term Memory (LSTM) neural networks. 
This type of algorithm has been used as an important tool in several researches for 


Diniz et al./Decis. Mak. Appl. Manag. Eng. (2022) 5 (1) (2022) 154-168  

156 

extracting temporal resources from sequential data (Yildirim et al., 2019, Essien & 
Giannetti, 2020, Wang et al., 2021, Wu et al., 2021). These factors motivated us to apply 
deep neural networks, such as LSTM, which is capable of extracting information 
relevant to clogging detection even with a high-noise signal. 

Therefore, this study is motivated by the contribution in the application of 
techniques exclusively based on data for the detection of the initial occurrence of 
obstruction, since the recent researches are based on idealized systems and lack 
sufficient precision in complex tasks or dynamic environments. 

In general, the performance of the classifiers developed here showed very 
promising results in real data applications, obtaining precision and recall levels above 
85%. The correct classification of clogging occurrences can contribute to reducing 
process interruptions and costs associated with production, as well as improving the 
quality of the final product (Vannucci & Colla, 2011, Pellegrini et al., 2019, Wang et al., 
2021).  

This article is divided as follows: Section 2 discusses the causes and effects of the 
occurrence of clogging in the continuous casting process. Section 3 presents LSTM 
neural network. In Section 4, the dataset is presented together with the step-by-step 
of the proposed methodology, as well as the performance metrics used in the 
classification task. In Section 5, the results are presented, comparing the performance 
of the classifiers, followed by Section 6 that provides the final considerations and 
conclusions. 

2. Clogging in the Continuous Casting Process 

The continuous casting process is based on the vertical casting of liquid steel from 
a ladle positioned on a tundish. In Figure 1, a typical schematic of the steel flow from 
the tundish to the mold is presented. The steel flows through the Tundish Nozzle, 
being regulated by the Slide Gate and introduced into the Copper Mold cavity through 
the SEN and Nozzle Port. Thus, its flow process begins to solidify (Rackers, 1995, 
Mourão et al., 2011). 

The SEN has a fundamental role in the stability of the process and quality of the 
final product, being fundamental in the production of special steels (Rackers, 1995). 
However, throughout the process, an accumulation of impurities from the steel builds 
up on the nozzle wall, developing the clog. As the obstruction increases, the Slide Gate 
must be opened in order to maintain the desired flow. However, when its opening 
reaches 100%, production must stop and the SEN or even the Tundish Set (composed 
of the Tundish, Slide Gate and SEN) must be replaced in advance (Thomas & Bai, 2001). 

From the prototype model used to simulate the casting speed variation by solid 
deposition over time, Barati et al. (2018) established three stages for the formation of 
clogging. Throughout the process, deposition of particles occurs in the SEN. When the 
clogging event starts, during the first 80 seconds, some regions of the middle section 
of the SEN are covered by a smooth layer of clogging (coverage stage). Then, there is 
the bulging phase in which the deposition of particles occurs more quickly, emerging 
visible particles. This phase occurs up to about 150 seconds and is followed by the 
branching step where there is the development of a branched structure that grows 
continuously until the SEN cross-section is completely blocked, around 250 seconds. 
In general, as this phenomenon occurs only after some heats, it is not necessary the 
SEN to be fixed or cleaned at such a high frequency. 


Long Short-Term Memory Neural Networks for Clogging Detection in the Submerged Entry 
Nozzle 

157 

 
Figure 1. Schematic summary of the flow of steel from the tundish to the 

mold with flow control performed by the Slide Gate 

In their research, Mourão et al. (2011) found that clogging can be formed not only 
by solidified steel and the transport of oxides present in it, but also by the aspiration 
of air in the SEN and the chemical reactions. However, they emphasize that the exact 
causes of clogging, specifically, can be difficult to identify. 

3. Long Short-Term Memory (LSTM) 

The LSTM has a set of recurrently connected memory blocks. Each block contains 
one or more interconnected cells and three multiplicative units, also called forget gate 
f(t), input gate i(t) and output gate o(t) (Haykin, 2011, Goodfellow, Bengio & Courville, 
2016, Buduma & Locascio, 2017). 

The basic architecture of an LSTM cell is shown in Figure 2, where x(t) corresponds 
to the input signal, C(t) and C(t-1) are, respectively, the current state of the memory 
cell and its previous instant and h(t) and h(t-1) represent its current and previous 
hidden state, respectively. 

The signals are sent to the three gates that control the information. The function of 
forget gate f(t) is to control which parts of the long-term states should be forgotten. 
The input gate i(t), in turn, has the function of controlling which parts should be added 
to the long-term states. The output gate o(t) is responsible for controlling the output 
information h(t) in the current state of time. The gates outputs are calculated using: 
 

Diniz et al./Decis. Mak. Appl. Manag. Eng. (2022) 5 (1) (2022) 154-168  

158 

 
Figure 2. The structure of an LSTM cell 

 
𝑓(𝑡) = 𝜑(𝑤𝑓𝑥 . 𝑥(𝑡) + 𝑤𝑓ℎ . ℎ(𝑡 − 1) + 𝑏𝑓 )            (1) 

𝑖(𝑡) = 𝜑(𝑤𝑖𝑥 . 𝑥(𝑡) + 𝑤𝑖ℎ . ℎ(𝑡 − 1) + 𝑏𝑖 )           (2) 

𝑜(𝑡) = 𝜑(𝑤𝑜𝑥 . 𝑥(𝑡) + 𝑤𝑜ℎ . ℎ(𝑡 − 1) + 𝑏𝑜)            (3) 

where 𝜑(∙) is a nonlinear activation function that, in general, uses the sigmoid 
function. Thus, the updates of the state of the memory cell C(t) and of the hidden state 
h(t) are generated, respectively, by 
 

𝐶(𝑡) = 𝑓(𝑡)⨀𝐶(𝑡 − 1) + 𝑖(𝑡)⨀ tanh(𝑤𝑐𝑥 . 𝑥(𝑡) + 𝑤𝑐ℎ . ℎ(𝑡 − 1) + 𝑏𝑐 )  (4) 

ℎ(𝑡) = 𝑜(𝑡)⨀ tanh(𝑐(𝑡))             (5) 

where tanh(.) represents the hyperbolic tangent activation function and ⨀ denotes the 
point multiplication operation between two vectors. The terms wfx, wix, wox and wcx 
correspond to the input weights of each gate, while wfh, wih, woh and wch refer to their 
respective recurrent weights and the terms bf, bi, bo and bc represent the bias. 

LSTM avoids the disappearance of the gradient through the switch of its gates, 
which develop a kind of temporal memory. During the training phase, samples from 
each batch are passed into cells iteratively through states. The hidden state represents 
short-term memory, while the cell state is long-term memory. It is in this unit that 
information is propagated through the network, interacting with the cell through the 
ability to remove or add information through gates. As a result, they are able to identify 
which temporal information should be transmitted or discarded by the network. After 
processing each batch, the internal states of each cell are reset (Goodfellow, Bengio & 
Courville, 2016, Buduma & Locascio, 2017). 

The great complexity of networks based on deep learning can lead to a problem 
known as overfitting, thus, it is common to use a regularization technique called 
dropout. In it, at each training iteration, there is a random removal of a pre-established 
percentage of neurons from a given layer, adding them again in the next iteration. 
Considering that a given neuron will not depend on the specific presence of the others, 
dropout enables the learning of the network to deal with more robust attributes 
(Goodfellow, Bengio & Courville, 2016, Buduma & Locascio, 2017). 


Long Short-Term Memory Neural Networks for Clogging Detection in the Submerged Entry 
Nozzle 

159 

4. Dataset and Methodology 

4.1. Dataset 

In this paper, historical data from 6 months of measurements made in a continuous 
casting steelmaking process were used. The variables were collected from two 
tundishes operating on a mold at a rate of one sample per second. It is important to 
clarify that in this article does not present data, nor specific characteristics of the 
industrial process, as well as the company name due to the data confidentiality 
protocol. 

Ideally, the process specialists classify as clogging the corresponding event to the 
gate opening without increase the mold level, with or without a variation in casting 
speed. In most cases, the gate opening occurs after increasing oscillations, which may 
or may not reflect oscillations in level. The nature of the data, however, does not allow 
the use of simple rules to classify clogging occurrences. For example, there are clogging 
events where the casting speed is increasing while the gate opening has a much higher 
rate than expected. 

Figure 3 illustrates an example of the clogging event through the process. Starting 
from sample 10.601, there is a gradual increase in the gate opening, at constant speed 
casting, without a significant increase in the mold level. However, due to the reduction 
of the SEN section caused by the obstruction, a small increase in level is observed just 
before the casting speed is reduced. The reduction of casting speed provides a gate 
closure and, when the casting speed reaches a level below 0.6 m/s, the level starts to 
rise. It is also observed that during this period there is no exchange of tundishes, since 
the indicative variable of tundish in operation remains constant. 

 
Figure 3. Example of clogging occurrence observed from selected process 

variables 

Thus, based on experiments performed in the literature (Ikaheimonen, 2002, 
Vannucci & Colla, 2011, Pellegrini et al., 2019) four process variables were selected: 
the percentage of the slide gate's total opening (gate opening), mold level, casting 
speed and the tundish that is operating (tundish operating). Researchers also suggest 


Diniz et al./Decis. Mak. Appl. Manag. Eng. (2022) 5 (1) (2022) 154-168  

160 

the use of variables related to temperature, pressure and argon flow; however, we did 
not observe in our dataset a significant correlation between these variables and the 
studied phenomenon. This may have happened because of the high number of outliers 
in the dataset. Furthermore, it was observed that these variables were not effectively 
measured in the period considered, possibly due to sensor failures.  

After selecting the variables, pre-processing was applied for the treatment of 
outliers and missing data, ensuring the preservation of the relationships between the 
attributes.  

The outliers’ occurrences were seen as measurement errors because they are 
specific cases and, therefore, were treated in order to make them consistent. For this 
purpose, the maximum and minimum theoretical values assumed by each of the 
variables during an operation without anomalies were specified. Thus, samples with 
values outside the theoretical range were considered outliers: if a sample had a value 
below the theoretical minimum, then it was adjusted to the theoretical minimum value 
assumed by the variable. Analogously, if the assumed value was greater than the 
theoretical maximum, then it was set to its theoretical maximum value. 

In relation to missing data, no substantial occurrences were identified in the period 
and variables analyzed. Still, the few absence periods longer than a sample were 
treated by a moving median filter. 

Furthermore, a certain unbalance between the clogging and non-clogging classes 
was verified, which was expected for this type of problem, since clogging is a failed 
event. It was observed that a little more than 7% of the samples indicated the 
occurrence of the event. Given the number of available samples, the data were selected 
in order to promote the balance between the classes of the training data only, without 
losing great importance information for the modeling. 

In Barati et al. (2018) were observed that clogging is only truly perceived in the 
SEN section between approximately 100 and 150 seconds. Therefore, as it is an 
autoregressive system, the composition of the input matrix was made through a 
sliding window covering 120 samples and with a step of 1 sample per iteration. In this 
way, the past behavior of each variable will be used to classify whether in the present 
there is clogging or not. 

Figure 4 illustrates a time-based sliding window process. Following this analysis, 
given an output, y(t), corresponding to one of the binary classes at time t, it will be 
related to the input matrix, X(120×4)={x1, x2, x3, x4}T, containing the 4 selected variables 
with samples delayed 120-time steps, i.e., from (t-119) to t. Thus, each matrix formed 
will be labeled with one of the classes, that is, the event (clogging or non-clogging) that 
is happening at instant t. The sliding window moves along the series so that, at each 
step, a new input matrix and its respective class are generated. 

The dataset after pre-processing is composed of 422,026 matrices of dimensions 
4x120, i.e., 120 samples of each of the four input variables (gate opening, speed 
casting, mold level and tundish operating). The dataset was divided into training, 
validation and testing, with 70% of the data applied in training (Ntr = 292,170 
matrices), 17% for validation (Nval = 73,044 matrices) and 13% reserved for the 
testing stage (Ntst = 56,812 matrices).  

 
Long Short-Term Memory Neural Networks for Clogging Detection in the Submerged Entry 
Nozzle 

161 

 
Figure 4. Sliding Window Process 

As shown in Table 1, each training matrix has a corresponding class, which is 
distributed in 50% representing the clogging event and the others representing the 
non-clogging class. On the other hand, the validation and test datasets remain with the 
old proportion between classes. 

Data standardization (z-score) was performed for each variable of each set 
according to Eq. (6) so that each variable has zero mean and unitary standard 
deviation. In the equation, xr(t) represents the variable to be standardized, r (where r 
∈ {1,2,3,4}) is the index identifying each variable, 𝜇𝑟  is the mean of xr(t) and 𝜎𝑟  is the 
respective standard deviation (both 𝜇𝑟  and 𝜎𝑟  were computed using training data). 
Standardization was chosen because it better handles possible outliers present in the 
series (Skiena, 2017). 

 
𝑥𝑧−𝑠𝑐𝑜𝑟𝑒 (𝑡) =
𝑥𝑟(𝑡)−𝜇𝑟 

𝜎𝑟
                               (6) 

Table 1. Dataset split and proportion of each class. 

Dataset 
Number of 
Matrices 

Split 
Proportion (%) 

Clogging 
Class (%) 

Normal Class 
(%) 

Training 292,170 70 50 50 
Validation 73,044 17 93 7 

Tests 56,812 13 93 7 

 
4.2. Methodology 

In structural terms, different parameter configurations, defined by the trial-and-
error method, were applied to the LSTM classifiers. For this purpose, only the number 
of cells in the two LSTM layers was varied in increments of eight, using all their states. 
In turn, a Fully Connected (FC) layer with 200 neurons was always maintained at the 
output of the last LSTM layer. 

The four input variables, lagged by 120 samples each, were applied to the network 
using a Batch Size of 1,200 samples. A limit of 200 epochs for training was chosen, 


Diniz et al./Decis. Mak. Appl. Manag. Eng. (2022) 5 (1) (2022) 154-168  

162 

which can be interrupted by early stopping with patience of 5. The weights were 
updated using Adam Algorithm with learning rate of 0.001 and Cross-Entropy as loss 
function (Kingma & Ba, 2014, Goodfellow, Bengio & Courville, 2016). Due to the 
stochastic nature of the models used here, the k-fold was applied, with k = 5, for 
validation and comparison of the classifiers. 

From then on, eight classifier configurations presented the best performances in 
the training phase from the validation set. Table 2 shows the configurations of each 
classifier and also the number of trained parameters. 

Table 2. The best-found LSTM models and their main parameters 

associated with their architectures. 

LSTM 
Model 

LSTM Layer 1 
Cells 

LSTM Layer 2 
Cells 

Number of 
Parameters 

1 256 128 490.586 

2 256 64 362.842 

3 256 32 311.258 

4 256 16 288.538 

5 256 8 277.946 

6 128 64 130.906 

7 128 32 95.706 

8 64 32 37.082 

 
Figure 5 represents a schematic diagram of the structure of the networks used in 

this paper. 
The LSTM networks were implemented with two layers, each containing 32 to 256 

cells, with hyperbolic tangent function. It is interesting to mention that, although 
network configurations containing 32 cells in the first LSTM layer were tested, 
relevant results were not obtained. The dropout was applied to the second LSTM layer 
with a value of 0.3. 

In classification tasks, the number of output layer neurons will be equal to the 
number of classes to be predicted. In this layer, it is common to use the Softmax 
activation function. The Softmax function helps to map the output of a neural network 
to a set of categories, by transforming these responses into probabilities that add up 
to 1. The Softmax function is calculated as follows, where J is the number of classes: 

 
𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑦𝑗 ) =
𝑒𝑥𝑝 (𝑦𝑗) 

∑ 𝑒𝑥𝑝(𝑦𝑖)
𝐽 
𝑖 =1 

                 (7) 

where yi is the i-th neural network output (i = 1, …, J) and yj is the class whose 
categorical probability is calculated in the equation. 

In this way, in all models the signal was transferred to a FC layer containing 200 
neurons, activation function Rectified Linear Unit (ReLu) and dropout of 0.5, 
concatenated to a Softmax layer with 2 neurons to generate the probability of 
classifying the input time series as clogging or non-clogging. 

 
Long Short-Term Memory Neural Networks for Clogging Detection in the Submerged Entry 
Nozzle 

163 

 
Figure 5. Generic structure of the LSTM classifier used in this article 

The simulations were implemented on a computer in an environment Python 3.6, 
64 bit operating system, 16 GB of RAM, Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz 
2.59 GHz with GPU NVIDIA GeForce Titan XP. 

 
4.3. Performance Criteria in Classification Tasks 

The confusion matrix is usually used in the analysis of performance in classification 
tasks. As seen in Figure 6, it is the result of comparing the correct class of each sample 
in the test set and the class obtained by the classifier. 

 
Figure 6. Confusion Matrix 

 
Diniz et al./Decis. Mak. Appl. Manag. Eng. (2022) 5 (1) (2022) 154-168  

164 

In binary classification tasks, the confusion matrix is composed of positive and 
negative class observations (Fawcett, 2016). In this work, the occurrence of clogging 
will be associated with the positive class and its absence with the negative class. Thus, 
after classification, the values can belong to four possible categories: 

 
 True Positive (TP): samples that belong to the positive class that were 

correctly classified; 

 False Positive (FP): samples that belong to the negative class, but they 
were incorrectly classified as belonging to the positive class; 

 True Negative (TN): samples that belong to the negative class that were 
correctly classified; 

 False Negative (FN): samples that belong to the positive class, but they 
were incorrectly classified as belonging to the negative class. 

 
From these categories, some performance measures can be calculated, such as 

accuracy, precision and recall, respectively, according to Eq. (8), Eq. (9) and Eq. (10). 
 

𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑃+𝑇𝑁

(𝑇𝑃+𝐹𝑁+𝑇𝑁+𝐹𝑃)
                         (8) 

𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑃

(𝑇𝑃+𝐹𝑃)
                                   (9) 

𝑟𝑒𝑐𝑎𝑙𝑙 =  
𝑇𝑃

(𝑇𝑃+𝐹𝑁)
                  (10) 

Accuracy measures the overall performance of the model, considering both the 
proportion of correct classifications of positive and negative cases. In contrast, 
precision measures the rate of positive examples classified correctly among all those 
predicted as positive. Recall corresponds to the rate of classification of true positives 
in relation to the total number of positive examples (Fawcett, 2016). 

From the combination of accuracy and recall, it is possible to obtain an indicator of 
the overall quality of the model, called F1-Score. The F1-Score calculation is 
represented in Eq. (11). 

 
𝐹1 =  
2 ×𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ×𝑟𝑒𝑐𝑎𝑙𝑙

(𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑟𝑒𝑐𝑎𝑙𝑙)
            (11) 

In some cases, it is also interesting to evaluate the Matthews Correlation Coefficient 
(MCC). The MCC is a more reliable and balanced measure of quality, as it only produces 
a high score if the classifier correctly predicted most of the positive and negative 
grades. Its equation is presented in Eq. (12) (Boughorbel, Jarray & El-Anbari, 2017). 

 
𝑀𝐶𝐶 =  
(𝑇𝑃×𝑇𝑁)−(𝐹𝑃×𝐹𝑁)

√(𝑇𝑃+𝐹𝑃)×(𝑇𝑃+𝐹𝑁)×(𝑇𝑁+𝐹𝑃)×(𝑇𝑁+𝐹𝑁)
          (12) 

The MCC varies in the range between -1 and 1, where the extreme values +1 and -
1 indicate, a perfect classification and a totally incorrect classification, respectively, 
while the value 0 indicates a classification equivalent to what would be done randomly 
(Boughorbel, Jarray & El-Anbari, 2017). 


Long Short-Term Memory Neural Networks for Clogging Detection in the Submerged Entry 
Nozzle 

165 

5. Results 

The performances of the eight LSTM models are shown in Table 3 in terms of mean 
and standard deviation of accuracy, precision, recall, F1-Score and MCC.  

As we can see, the LSTM 1, LSTM 2 and LSTM 6 models reached levels of accuracy 
and precision above 80%. However, these were also the models that showed the 
greatest standard deviations. In particular, the LSTM 1 model stands out for its 85.30% 
recall, that is, around 5% more than the other two models. The same can be observed 
for the MCC metric, where the LSTM 1 model also stands out, reaching an average of 
0.723 with a standard deviation around 0.145. The expected MCC metric for a Dummy 
classifier, which is based on the majority class, is close to zero. Therefore, the obtained 
results are superior to a Dummy classifier. 

Table 3. Comparison among the performances of different LSTM models.. 

LSTM 
Model 

Accuracy  
(%) 

Precision 
(%) 

Recall  
(%) 

F1-Score 
(%) 

MCC 

1 86.10±3.87 85.61±3.42 85.30±5.51 85.45±4.22 0.723±0.145 

2 82.40±5.11 83.09±5.52 79.71±5.15 81.36±5.33 0.631±0.161 

3 75.81±0.31 79.55±0.65 69.49±1.08 74.18±0.81 0.520±0.006 

4 74.31±0.68 77.69±2.67 68.52±3.22 72.82±2.92 0.491±0.016 

5 75.20±0.41 81.84±1.02 64.79±0.56 72.32±0.72 0.515±0.010 

6 82.02±4.33 82.54±7.41 80.89±7.47 81.32±7.44 0.612±0.085 

7 75.83±0.19 79.79±0.80 69.22±1.47 74.13±1.04 0.521±0.013 

8 75.76±0.38 79.15±0.81 69.95±0.87 74.27±0.84 0.518±0.080 

 
Due to the nature of the process, noise and, mainly, control actions by operators 
may be present in measurements, making it difficult to use simple rules assertively. 
Furthermore, due to the unbalance between classes, although this strategy can result 
in high rates of global accuracy, the Dummy classifier would hinder the identification 
of examples belonging to rare classes that, in the problem in question, represent the 
interest class.  

Although precision and recall had values close to each other in the three main 
models (LSTM 1, LSTM 2 and LSTM 6), it is interesting to note that the average recall 
did not exceed precision. In accordance with Eq. (9) and Eq. (10) this result indicates 
a number of FN higher than the number of FP.  

In general, it is possible to observe that the reduction in the number of parameters 
of the second layer compromised the generalizability of the models, mainly in terms 
of recall, and consequently, F1-Score. For example, the LSTM 1 and LSTM 2 models 
differ by the number of cells in the second layer and, as shown in Table 3, the LSTM 1 
model, which has the highest degree of complexity among the models, showed greater 
generalization capacity in the classification task. 

In practice, models analysis must be based on a trade-off between the values of the 
performance metrics and the number of parameters involved. Comparing the 
parameters used by LSTM 1 and LSTM 2 models with those of the LSTM 6 model, which 
also achieved promising results, it appears that there are about 3.74 more parameters 
in the first model and 2.77 in the second model. In particular, it is observed that the 
overall performance of the LSTM 2 and LSTM 6 models does not change considerably, 
which makes the LSTM 6 model more attractive. 


Diniz et al./Decis. Mak. Appl. Manag. Eng. (2022) 5 (1) (2022) 154-168  

166 

Furthermore, although a small reduction in training time was observed with the 
decrease in the number of parameters, no significant differences in processing time 
were observed during the tests of the models. 

Since this is a problem involving real process data, methodologies capable of 
correctly detecting the presence of clogging with the lowest possible error rate are 
sought. In this context, the LSTM 1 model appears to be the best choice, since it 
presents the highest performance averages and the smallest deviations from these 
averages (among the LSTM 1, LSTM 2, and LSTM 6 models). Even taking into account 
the complexities of the classifiers, the models with the highest number of parameters 
still seem to be the most attractive, since the training and processing time of these 
models did not show a significant increase compared to the others.  

In addition, the LSTM 1 model exceeded the 75% accuracy and 62% precision of 
Pellegrini et al. (2019), as well as the 76.9% recall of Vannucci & Colla (2011). 
Although the significant differences in methodologies and datasets, it can be said that 
the LSTM 1 model achieved promising levels of performance criteria in the clogging 
classification task compared to those found in the bibliography.  

6. Conclusion 

The steel industry does not have effective tools that can correctly detect the 
occurrences of clogging in the SEN. Clogging can increase the frequency of 
interruptions in the operation, resulting in an increase in operating costs, decreased 
productivity, and adversely affect product quality. 

In order to treat this problem, in this paper, the general performance of classifiers 
using LSTM neural networks in detecting the clogging in the SEN of a steel production 
process were evaluated. Therefore, based on the evidences observed in the literature 
and analyzes made in this work, four variables were selected to compose the input 
dataset: gate opening, mold level, casting speed and tundish operating. 

The dataset was then pre-processed in order to deal with the presence of outliers 
and missing values, ensuring the relationships between the attributes. Furthermore, 
due to the inherent unbalance of the classes, a careful balancing step was necessary 
for the training, so that the information of great importance for the modeling was not 
disregarded. With balanced classes, accuracy can be considered in the evaluation of 
models, taking into account also other metrics: precision, recall, F1-Score and MCC. 

Industrial problems require high hit rates and, within the task of classifying 
clogging occurrences, the model with the highest number of parameters obtained a 
remarkably superior performance in relation to the other evaluated models, 
presenting the highest performance averages. In particular, the best model achieved 
precision and recall averages above 85%. However, even with higher values of mean 
precision and recall, a higher number of False Negatives was found in relation to the 
number of False Positives, considering that the recall did not exceed precision. This 
characteristic demonstrates that the best model found is more probable to classify a 
sample referring to the occurrence of clogging (positive) as normal operation 
(negative) than the reverse.  

Nevertheless, given the significant differences in methodologies and datasets, it 
can be said that the best model achieved promising levels of performance criteria in 
the task of classifying clogging compared to those found in the literature.  

Through the proposed methodology, the possibility of applying the models in a real 
system was verified. However, the model presented is limited in terms of recall and 
precision. The recall values did not exceed the precision values, which indicates the 


Long Short-Term Memory Neural Networks for Clogging Detection in the Submerged Entry 
Nozzle 

167 

need to develop other techniques to reduce false negative rates. For future work, it is 
desirable to implement other types of classifiers with deep learning architectures, 
such as ConvLSTM, so that even higher hit rates can be achieved and guaranteeing a 
reduction in FP rates compared to FN rates. It is also intended to apply the models 
online in a real plant. 

The fruits of this work may favor productivity gains and reduction of production 
costs, due to the increase in SEN's useful life, for example. It is estimated that the 
application of a system that allows the identification of the initial occurrence of 
clogging can contribute to the increase in the useful life of the SEN, which may result 
in a savings of over US$ 15 million per year. 

Author Contributions: All authors actively participated in all stages of this research. 

Funding: This study was financed in part by the Coordenação de Aperfeiçoamento de 
Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001. 

Data Availability Statement: This article does not present data, nor specific 
characteristics of the industrial process, as well as the company name due to the data 
confidentiality protocol. 

Acknowledgments: The authors would like to thank the financial support provided 
by CAPES, as well as the support of the Programa de Pós-Graduação em Engenharia 
Elétrica (PPGEE). The authors would also like to thank NVIDIA Corporation for the 
donation of a Titan XP GPU used for this research. 

Conflicts of Interest: The authors declare that they have no known competing 
financial interests or personal relationships that could have appeared to influence the 
work reported in this paper. 

References  

Barati, H.,Wu, M., Kharicha, A. & Ludwig, A. (2018). A transient model for nozzle 
clogging. Powder Technology, 329, 181-198.  

Bessho, N., Yoda, R., Yamasaki, H., Fujii, T., Nozaki, T. & Takatori, S. (1991). Numerical 
analysis of fuid flow in continuous casting mold by bubble dispersion model. ISIJ 
international, 31(1), 40-45. 

Boughorbel, S., Jarray, F. & El-Anbari, M. (2017) Optimal classifier for imbalanced data 
using Matthews Correlation Coeficient metric. PloS one, 12(6), e0177678. 

Buduma, N. & Locascio, N. (2017). Fundamentals of deep learning: Designing next 
generation machine intelligence algorithms. Sebastopol: O'Reilly Media, Inc.  

Essien, A. & Giannetti, C. (2020). A deep learning model for smart manufacturing using 
convolutional LSTM neural network autoencoders. IEEE Transactions on Industrial 
Informatics, 16(9), 6069-6078. 

Fawcett, T. (2016). An introduction to ROC analysis, Pattern recognition letters, 27(8), 
861-874. 

Goodfellow, I., Bengio, Y. & Courville, A. (2016). Deep Learning. Cambridge: MIT Press. 


Diniz et al./Decis. Mak. Appl. Manag. Eng. (2022) 5 (1) (2022) 154-168  

168 

Haykin, S. O. (2011). Neural Networks and Learning Machines. New Jersey: Pearson 
Education. 

Ikaheimonen, J., Leiviska, K., Ruuska, J. & Matkala, J. (2002). Nozzle clogging prediction 
in continuous casting of steel. IFAC Proceedings Volumes, 35(1), 143-147. 

Kingma, D. P. & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv 
preprint arXiv:1412.6980. 

Mourão, M. B., Yokoji, A., Malynowskyj, A., Leandro, C. A. S., Takano, C., Quites, E. E. C., 
Gentile, E. F., Silva, G. F. B. L., Bolota, J. R. & Gonçalves, M. (2011). Introdução à 
Siderurgia. São Paulo: Associação Brasileira de Metalurgia, Materiais e Mineração. 

Pellegrini, G., Sandri, M., Villagrossi, E., Challapalli, S., Cestari, L., Polo, A. & Ometto, M. 
(2019). Successful use case applications of artificial intelligence in the steel industry. 
In: Iron & Steel Technology (AIST), Digital Transformations, 44-53. 

Rackers, K. G. (1995). Mechanism and mitigation of clogging in continuous casting 
nozzles. Master's thesis, University of Illinois at Urbana-Champaign. 

Rout, B. K., Singh, R. K., Choudhary, S. K. & Das, C. L. (2013). Development and 
application of nozzle clogging index to improve the castabilty in continuous slab 
casting. In: International Conference on Advances in refractories and Clean Steel 
Makinge, At Ranchi, India. 

Schmidt, M., Russo, T. J. & Bederka, D. J. (1991). Steel shrouding and tundish flow 
control to improve cleanliness and reduce plugging. Iron and Steel Society, Tundish 
Metallurgy, 2, 3-12 

Skiena, S. S. (2017). The data science design manual. New York: Springer. 

Thomas, B. G. & Bai, H. (2011). Tundish nozzle clogging-application of computational 
models. In Steelmaking Conference Proceedings, 84, 895-912  

Vannucci, M. & Colla, V. (2011). Novel classification method for sensitive problems and 
uneven datasets based on neural networks and fuzzy logic. Applied Soft Computing, 
11(2), 2383-2390. 

Wang, B., Mao, Z. & Huang, K. (2018). A prediction and outlier detection scheme of 
molten steel temperature in ladle furnace. Chemical Engineering Research and Design, 
138, 229-247. 

Wang, R., Li, H., Guerra, F., Cathcart, C. & Chattopadhyay, K. (2021). Development of 
quantitative indices and machine learning-based predictive models for SEN clogging. 
In: AISTech 2021, proceedings of the iron & steel technology conference, 1892-190. 

Wu, X., Jin, H., Ye, X., Wang, J., Lei, Z., Liu, Y. & Guo, Y. (2021). Multiscale Convolutional 
and Recurrent Neural Network for Quality Prediction of Continuous Casting Slabs. 
Processes, 9(1), 33. 

Yildirim, O., Baloglu, U. B., Tan, R. S., Ciaccio, E. J., Acharya, U. R. (2019) A new approach 
for arrhythmia classification using deep coded features and LSTM networks. 
Computer methods and programs in biomedicine, 176, 121-133. 

© 2022 by the authors. Submitted for possible open access publication under the 

terms and conditions of the Creative Commons Attribution (CC BY) license 

(http://creativecommons.org/licenses/by/4.0/).