INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL
Online ISSN 1841-9844, ISSN-L 1841-9836, Volume: 17, Issue: 1, Month: February, Year: 2022
Article Number: 4714, https://doi.org/10.15837/ijccc.2022.1.4714

CCC Publications 

Fault Detection in Nuclear Power Plants using Deep Leaning based
Image Classification with Imaged Time-series Data

Y. Shi, X. Xue, J. Xue, Y. Qu

Yong Shi
1. School of Economics and Management, University of Chinese Academy of Sciences
Beijing 100190, China
2. Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences
Beijing 100190, China
3. Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences
Beijing 100190, China
4. College of Information Science and Technology, University of Nebraska at Omaha
Omaha, NE 68182, USA

Xiaodong Xue
1. School of Economics and Management, University of Chinese Academy of Sciences
Beijing 100190, China
2. Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences
Beijing 100190, China

Jiayu Xue*
1. School of Computer Science and Technology, University of Chinese Academy of Sciences
Beijing 101408, China
2. Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences
Beijing 100190, China
3. Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences
Beijing 100190, China
*Corresponding author: xuejiayu18@mails.ucas.ac.cn

Yi Qu*
1. School of Economics and Management, University of Chinese Academy of Sciences
Beijing 100190, China
2. Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences
Beijing 100190, China
3. Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences
Beijing 100190, China
*Corresponding author: quyi17@mails.ucas.ac.cn


https://doi.org/10.15837/ijccc.2022.1.4714 2

Abstract

Fault detection is critical to ensure the safely routine operations in nuclear power plants (NPPs),
requiring very high accuracy and efficiency. Meanwhile, the rapid development of modern informa-
tion technologies have profoundly changed and promoted various sectors including nuclear industry.
Inspired by the great progress and promising performance of deep learning based image classifi-
cation recent years, a two-stage fault detection methodology in NPPs has been proposed in this
paper. First the time-series data describing the operating status of NPPs have been transformed
into two-dimensional images by four methods, preserving the time-series information in images
and converting the fault detection problem into a supervised image classification task. Then four
specific image classifying models based on three primary deep learning architectures have been
separately experimented on the imaged time-series data, achieving excellent accuracies. Further
the performances of different combinations of transforming means and classifying models have been
compared and discussed with extensive experiments and detailed analysis of throughput for four
transforming methods. This methodology proposed has obtained remarkable results by reshaping
data format and structure, making image classifying models applicable, which not only efficiently
detect and warn possible faults in NPPs but also enhances the capability for safety management
in nuclear power systems.

Keywords: fault detection, nuclear power plants, deep learning, image classification, imaged
time-series data.

1 Introduction
Nuclear energy has become a strategic source of global power supply and is essential to national

power security. As a well-known green energy, it could provide stable and huge amount of electricity
with many advantages like no pollution, low operating cost, low carbon emission, etc. According
to the annual report of World Nuclear Association, the electricity generated by nuclear power has
accounted for more than 10% in global electricity supply and it’s still growing, which has already
been a significant component. A common sense is that nuclear energy is probably the only option
that could secure adequate energy consumption but doesn’t trash this planet, by its many great
strengths in dealing with problems from growing energy demand, global carbon reduction, climate
change and environmental protection. However, nuclear power systems have one particular specialty
as huge disaster might happen once there’s a small accident occurred, and that’s precisely why safety
management has always been the top priority in daily operations of NPPs. Therefore, it’s definitely
necessary to detect the faults (or failures) in NPPs accurately and effectively, then make warnings
in advance so that these troubles could be addressed and eliminated in a timely manner. And this
subject fault detection and diagnosis has become a common focus for both academia and industry.

Original fault detection in NPPs was simple and easy to implement, normally using the "pre-set
values" as standard criteria. Once the parameters on meters are beyond the pre-set values or range,
then it’s assumed the fault has occurred, with voting scheme sometimes been introduced to avoid
false alarms [18]. However, with the rapid development of information technology especially artificial
intelligence (AI), advanced and intelligent tools represented by machine learning have infiltrated many
areas recent years. This has resulted in more intelligent transformations in real industries, also the
data-driven fault detection and diagnosis has become a hot research topic of the intelligent cybernetics
in the big data era [53]. Former studies of fault detection and diagnosis can be mainly divided
as two parts according to the chronological development and they are "model-driven" methods like
statistical models, "data-driven" methods like machine learning models. Reasons for the transition
from "model-driven" to "data-driven" are the tremendous expansion in data and its complexity, with
the great promotion in computational capacity [18] [42]. These two points above have also dramatically
accelerated the progress of intelligent techniques especially machine learning and deep learning models,
generating a large variety of intelligent approaches and enabling deep learning to make a significant
influence in fields such as image classification, voice recognition, natural language processing, etc.

Motivated by the outstanding performances of deep learning in image processing, we propose a
two-stage fault detection methodology in NPPs, which is a supervised classification task under this
scenario. Due to the lack of real images from NPPs, we also consider to transform the numeric time-
series data into images with four major methods to make deep learning based image classifying models


https://doi.org/10.15837/ijccc.2022.1.4714 3

applicable. The fault detection methodology proposed in this paper has obtained remarkable results
by reshaping data format and structure, with the results that it’s capable of efficiently detecting and
warning possible faults in NPPs and further enhancing the capability of safety management in many
power systems. The major highlights of this research are described as follows:

• The time-series data describing the operating status of NPPs have been transformed into two-
dimensional images, preserving the time-series information as pixel information in images and
making deep learning based models applicable.

• Four specific image classifying models based on three primary deep learning architectures have
been separately experimented on the imaged time-series data and it has achieved excellent ac-
curacies.

• Performances of different combinations of transforming means and classifying models have been
compared and discussed through extensive experiments, with detailed analysis of throughput for
four methods.

The rest of this paper will be arranged as follows: Section 2 will briefly review the most related
work previously, mainly focused on fault detection in NPPs, data transformation and deep learning
based image classification. Section 3 will specifically illustrate our proposed two-stage fault detection
methodology in NPPs. Data, experiments and results will be described in Section 4 with detailed
analysis and finally, the conclusion will be presented in Section 5 as well as limitations and future
directions.

2 Related work
In this section, related work will be briefly reviewed and they could be categorized into three major

parts. They are fault detection in NPPs of the exact subject, data transformation and deep
learning based image classification concerning with the two-stage fault detection methodology
proposed later. Focusing on these three subjects, this section will outline and introduce the most
related cases previously.

2.1 Fault detection in NPPs

Fault detection and diagnosis is a multidisciplinary task involving several fields like engineering,
management science, information technology, etc, also it could be regarded as an information technol-
ogy & decision making problem. Here the original and ancient research and practice of fault detection
in NPPs will not be mainly discussed, but rather cases of intelligent "data-driven" techniques re-
cent years, which can be categorized into two primary sections, the machine learning ones including
the well-known artificial neural networks (ANN), support vector machines (SVM), ensemble meth-
ods and the deep learning applications like convolutional neural networks (CNN). These applications
are mostly supervised learning tasks and have demonstrated that it’s feasible and promising for the
implementation of deep learning based image classifying techniques.

Machine learning. To support the strategic decision-makings in nuclear fuel management, re-
gression model has been used to estimate the composition of spent nuclear fuel [16]. Logistic regression
(LR) also was applied in fault diagnosis of NPPs [2], with Naive Bayes (NB) introduced into fault
diagnosis of thermocouple [4] and artificial cognitive system for recognizing current status of nuclear
reactors [34] respectively. The powerful ANN has shown great adaptability by being applied in control
rod drive system and accident prevention system [1], also in conceptual accident prevention system for
light water reactors [35], while the SVM used for designing more accurate identification and diagnosis
of faults [52] and detecting loose components in the primary circuit cooling system of NPPs [30]. The
classical model decision tree (DT) has been used to assess the probabilistic risk of precursor events,
with the aim of reducing operational risk in NPPs [44] or accurately and quickly identifying possible
root causes of faults [33]. As a widely-used ensemble learning approach, XGBoost with extracted fea-
tures has performed effective fault detection of screen cleaners in NPPs [6] and another representative


https://doi.org/10.15837/ijccc.2022.1.4714 4

ensemble learning algorithm, random forests (RF), has been integrated as a learner for fault diagnosis
with SVM, kNN and multilayer perceptron (MLP) [23].

Deep learning. Based on different architectures, deep learning has developed many variants like
CNN, recurrent neural networks (RNN), restricted boltzmann machines (RBM), deep belief networks
(DBN), MLP, etc. These can be summarized as a general architecture called deep neural networks
(DNN) while the variants can be designed or constructed using different quantities of hidden layers.
By the convolutional operations and the specialty of parameters sharing, CNN is good at and capable
of processing high-dimensional data like images or videos, constructing a fast and accurate detection
system of cracks in underwater metal surfaces using inspection videos in NPPs [4]. Also it can be
applied to detect scratches in nuclear fuel assembly appearances [11] and diagnose fault with a specific
reactor dataset [55]. Different from CNN, RNN is suitable to deal with sequential data especially
in natural language understanding and time-series signal processing, which has been integrated with
PPCA, multi-resolution wavelet analysis to build a nuclear power machinery fault detection model
[25]. To improve the reliability of engineering systems of NPPs, RBM was used to detect anomalies
and conduct diagnosis & prognosis by analyzing massive data [38]. DBN has also been applied to
indicate online faults for thermocouple in NPPs [29] while MLP has been introduced into transient
detection [31] and identification for fault scenarios [41] in NPPs.

2.2 Data transformation: from numeric to imaged

Data transformation plays a vital role in solving some tough problems, which converts the nature
of problems by reshaping data format or structure. The renowned CNN in image classification is a
representative. Large quantities of temporal data generated in NPPs have been mapped into images
and then anomaly detection has been performed using CNN [12] [22] by its great strength in image
processing. In addition, this kind of data transformation from numeric to imaged has also been
observed in other areas like bankruptcy prediction. By dividing the numerical data, the original
numeric indicators from financial statements have been transformed into grey-scale images [17], then
the issue of predicting bankruptcy for firms was interpreted as image classification task using CNN,
with the results of better and more robust performances than conventional machine learning classifying
models. The promotional effect of precision by data transformation has been experimentally verified
in many cases, suggesting that this way could be considered and adopted in the field of fault detection
in NPPs.

2.3 Deep learning based image classification

Image classification has been a longstanding research topic in the field of computer vision (CV).
As a critical and fundamental problem, it aims to identify the label of classes for a given image.
Traditional image classification using machine learning methods rely on two separate steps, first is
applying hand-designed operations to extract features (e.g., SIFT, HOG, SURF, LBP) from input
images, then is to train a classifier (e.g., SVM, kNN, AdaBoost) to recognize the class of input signals.
These methods can deal with the tasks quite well with relatively small datasets, but the performance is
not satisfactory in more complicated tasks. With the appearance and availability of huge datasets (e.g.,
ImageNet [7]) and the increasing computational capacity, advanced models with better performances
are desperately needed and there comes up the deep learning based image classification techniques.

AlexNet [21], achieving record-breaking classification results in the ImageNet Large-scale Visual
Recognition Challenge, has led to a spurt of various deep convolutional neural networks (CNNs) [14]
[43] [45]. Instead of using hand-crafted features, CNNs apply flexible and trainable architectures
to automatically extract and integrate low/midiem/high-level features in an end-to-end multilayers
fashion. The inductive biases inherent to CNNs, like translation invariance and local connectivity, are
of great help in image feature extraction. In addition to classification, deep CNNs models have got
state-of-the-art (SOTA) performances on a wide range of computer vision applications, such as object
detection [24] [26] [37], image segmentation [5] [13] [40], super-resolution [8] [20], etc. With CNNs
becoming the de-facto standards for computer vision, two major architectures, namely Transformer and
MLP, have achieved competitive results compared with CNN-based models and attracted significant


https://doi.org/10.15837/ijccc.2022.1.4714 5

research interest globally from the CV community.
Transformer [50] is a model architecture based solely on the self-attention mechanism to draw

global dependencies between input and output, dispensing with recurrence and convolutions. Inspired
by the great success of Transformer in natural language processing (NLP), numerous studies have
started to migrate the Transformer to visual recognition tasks. Compared with CNN-based models
that focus only on local features, Transformer is able to learn long-range dependencies, which means
it can easily derive global information. Meanwhile, it also learns visual features with fewer priors and
further reduces human intervention, while it’s of larger computational costs and can’t be generalized
well when training with insufficient amount of data. To tackle this problem, a series of works [3] [27]
[49] have been proposed in researches.

Multilayer perceptron (MLP) is not a new tool in the community of computer vision , which is
actually one of the first classifiers tested on MNIST and has become the most widely-used model
and benchmark in computer vision tasks at that time. Nevertheless, limited by a large amount of
calculation and prone to over-fitting when the amount of data is insufficient, MLP has once been lying
dormant for decades. Recent years, inspired by Transformer’s great success in vision tasks, researchers
have proposed deep MLP models such as MLP-Mixer [47] and ResMLP [48] for promotional perfor-
mances. Compared with conventional MLP, these models have more hidden layers and have changed
the input from full flattening to patch flattening, indicating that a purely MLP-based architecture
which drops the artificially designed attention mechanism and learns from the raw data, can be fairly
competitive with CNNs or Transformer. But it also suffers from the issue of data-hungry and further-
more, the complexity of MLP is quadratic to the size of images, making MLP-like models intractable
on high-resolution images.

3 Methodology
In this section, detailed description of the two-stage fault detection methodology in NPPs will be

presented. For better understanding and directly visual senses, the precise process of this methodology
proposed is shown in Figure 1. In short, after the dimension reduction, the time-series numeric data
of the operating status collected from sensors in NPPs will be firstly encoded into images by different
transforming methods. Then these transformed images are concatenated and will be fed into different
deep classification models to conduct the image classification and output the fault class as detection
results. In addition, we will focus on the processes of exact methods for encoding time-series numeric
data into images and deep learning based image classification models, which will be elaborated while
the specifics of experiments like dimension reduction of data and image concatenation will be fully
illustrated in Section 4.

0

50

100

150

200

250

300

350

1 51 101 151 201

Time Series Dimension
Reduction

Encoding Time Series to Images

Un-Thresholded Recurrence Plot
(UTRP)

Markov Transition Field
(MTF)

Gramian Angular Summary Field
(GASF)

Gramian Angular Difference Field
(GADF)

Deep Learning Architectures

CNN-based Models

Transformer-based Models

MLP-based Models

Output:
Fault Class

Image
Concatenation

Figure 1: The precise process of the two-stage fault detection methodology in NPPs

3.1 Data processing & transformation

Obviously the transformed images as experimental data certainly will affect the results and perfor-
mance of fault detection. A suitable and appropriate transforming method should be able to preserve
temporal relations in images. Thus, four major transforming methods to convert 1-D time-series


https://doi.org/10.15837/ijccc.2022.1.4714 6

data into different types of 2-D images have been adopted in our methodology which have been ap-
plied or tested in other studies. And they are Gramian Angular Summation Field (GASF) [46][54],
Gramian Angular Difference Field (GADF) [46][54], Markov Transition Field (MTF) [12][46][54] and
Un-Thresholded Recurrence Plot (UTRP) [46], of which the basic principles and detailed processes
will be introduced in the following.

Gramian Angular Field. Gramian Angular Field [51] is a method that encodes time-series,
which are represented in a polar coordinate system rather than the Cartesian coordinates, into im-
ages. According to different calculation methods, it can be divided into two types: Gramian Angular
Summation Field (GASF) and Gramian Angular Difference Field (GADF).

Given a time series X = {x1,x2, · · · ,xn} composed of n real-valued monitoring values obtained
from a certain sensor, it is first normalized to the interval [-1,1] according to:

x̃i =
(xi −max(X)) + (xi −min(X))

max(X) −min(X)

Then, encoding time stamp ti as the radius and the value xi as the angular cosine to represent the
rescaled X in polar coordinates, which is defined as:{

φ = arccos(x̃i), −1 ≤ x̃i ≤ 1, x̃i ∈ X̃
r = ti

N
, ti ∈ N

where ti is the time stamp corresponding to xi, N is the total length of the time stamp to normalize
the span of polar coordinates. Encoding time series into polar coordinates system has two important
properties. First, it can produce one and only one result with a unique inverse map, because it is
bijective as cos(φ) is monotonic when φ ∈ [0,π]. Second, compared with Cartesian coordinates, it is
able to preserve absolute temporal relations.

The final step is to use the transformed polar coordinate system to generate 2-D images which
are represented as a Gramian-like matrix. Each element in the Gramian-like matrix is actually the
trigonometric sum/difference between each point, so it can be used to identify the temporal correlation
within different time intervals. The GASF and GADF are defined as follows:

GASF =



cos(φ1 + φ1) · · · cos(φ1 + φn)
cos(φ2 + φ1) · · · cos(φ2 + φn)

...
...

...
cos(φn + φ1) · · · cos(φn + φn)


 = X̃′ · X̃ −

√
I − X̃2

′
·
√
I − X̃2

GADF =



sin(φ1 −φ1) · · · sin(φ1 −φn)
sin(φ2 −φ1) · · · sin(φ2 −φn)

...
...

...
sin(φn −φ1) · · · sin(φn −φn)


 =

√
I − X̃2

′
· X̃ − X̃′ ·

√
I − X̃2

where I is the unit row vector [1, 1, · · · , 1]. In GAF, time increases as the position moves from top-left
to bottom-right, making it possible to retain the temporal dependency. The main diagonal of GAF
contains the original value information.

Markov Transition Field. Markov Transition Field [51] is a method which expands the dynamic
transition statistical information by sequentially representing the Markov transition probability.

Given a time series X = {x1,x2, · · · ,xn}, first determine its quantile bins Q and allocate each
xi to the corresponding bin qj(j ∈ [1,Q]). Then construct a Q × Q weighted adjacency matrix W
by computing the Markov transition probabilities among quantile bins. For example, wij means the
probability of xt ∈ qi and xt+1 ∈ qj. After normalization by

∑
j wij = 1, the Markov transition matrix

W is defined as follows:

W =



w1,1 w1,2 · · · w1,Q
w2,1 w2,2 · · · w2,Q
...

...
...

...
wQ,1 wQ,2 · · · wQ,Q





https://doi.org/10.15837/ijccc.2022.1.4714 7

W is not sensitive to the distribution of X as well as the temporal dependency on time stamp ti,
thus it can not reflect the time characteristic. The final step is to construct MTF based on W .

MTF =



wi,j|x1∈qi,x1∈qj · · · wi,j|x1∈qi,xn∈qj
wi,j|x2∈qi,x1∈qj · · · wi,j|x2∈qi,xn∈qj

...
...

...
wi,j|xn∈qi,x1∈qj · · · wi,j|xn∈qi,xn∈qj




Each element in MTF denotes the transition probability of point in qi at time stamp i move to qj
at time stamp j. The information and time characteristics contained in the MTF vary with the value
of quantile bins Q.

Un-Thresholded Recurrence Plot. Recurrence Plot (RP) [10] is a powerful tool to analyze
the periodicity, chaos and non-stationarity of time-series data and identify hidden regularities in
scalar time-series. Whether using the threshold, it can be divided into two methods: Thresholded
Recurrence Plot (TRP) and Un-Thresholded Recurrence Plot (UTRP). Since the distance parameter
between recursive points in UTRP is preserved, for the same time series, UTRP often contains more
information than TRP. Thus, we choose to apply UTRP to encode the time-series data.

Given a time series X = {x1,x2, · · · ,xn}, the first step is to form a reconstructed phase space by
extending the one-dimensional time-series into a higher-dimensional space, which is represented as:

X̃ =



~x0
~x1
~x2
...
~xn


 =



x0 xτ · · · x(m−1)τ
x1 x1+τ · · · x1+(m−1)τ
x2 x2+τ · · · x2+(m−1)τ
...

...
...

...
xn xn+τ · · · xn+(m−1)τ




where m ≥ 2 is the embedding dimension and τ ≥ 1 is the time delay.
The recurrence plot is constructed by calculating:

Ri,j = ‖~xi − ~xj‖, i,j = 1, 2, · · · ,n

where ‖·‖ is the usual Euclidean norm.
The pixel value of each position (i,j) in the un-thresholded recurrence plot reflects the correlation

between vector ~xi and vector ~xj, therefore it could reveal the dynamic characteristics of the signal in
time.

3.2 Deep learning based architectures for image classification

In the research field of image classification, CNNs have dominated this area over the last decade
since 2012’s powerful AlexNet. With the introduction of Vision Transformer and MLP-Mixer, it
seems that a new round of paradigm shift has ushered in. Therefore, a series of image classification
models including CNN-based models, Transformer-based models and MLP-based models have been
incorporated into this proposed methodology, with extensive experiments carried out in a unified
framework to compare these three major deep learning architectures.

CNN-based models. Convolutional neural network is a special case of neural network, which
makes use of the 2-D structure of images and the advantage that pixels within a neighborhood are
usually highly correlated. A standard CNN-based model consists of an input layer, alternating convo-
lutional and pooling layer, fully connected layer as well as output layer. Figure 2 shows the architecture
of a standard CNN-based model. The convolutional layer performs convolutional operations on the
input image matrix to extract features, then apply the pooling layer to compress feature map ob-
tained by the convolutional layer. Pooling layers not only reduce the number of parameters to prevent
over-fitting, but also are able to maintain model invariance (e.g., rotation, translation, scale). If the
depth of the network increases, the receptive field continues to expand, enabling the network to in-
corporate and learn more abstract characteristics. Finally the fully connected layer acts as a classifier
to obtain classification results based on the feature maps gain by multiple convolutional and pooling


https://doi.org/10.15837/ijccc.2022.1.4714 8

layers. In this past decade, a series of networks have been designed to improve the performances of
CNNs, for instance, the ResNet [14] proposed residual structures, which significantly alleviate the
gradient disappearance problem and make it possible to train deeper models. And the EfficientNet
[45] has achieved better accuracy with less parameters after carefully acquiring a balance between
network width, depth and resolution. Therefore, the ResNet and EfficientNet have been chosen as
representatives of CNN-based models.

Class
…

Convolution ConvolutionPooling Pooling Convolution

Fully-connected

Figure 2: Architecture of a standard CNN-based model

Transformer-based models. ViT [9] is the first full-transformer model that can achieve state-
of-the-art performance on image classification task. The overview of the ViT architecture is depicted
in Figure 3.

Linear Projection of Flattened Patches

… … …

Multi-Head 
Attention

Patch + Position
Embedding

Norm

MLP

L×

Image Partition

… … …
0 *

Class token

…

…1 * 9 *5 * *13

Transformer Encoder

MLP 
Head

Embedded Patches

Norm

+

+

Transformer Encoder

Class

Figure 3: Architecture of ViT

Since the original Transformer receives a 1-D sequence of token embeddings as input, the input
2-D images have to be firstly split into a sequence of non-overlapped patches, similar to words in NLP
application. The patches are then projected into patch embeddings by a trainable linear projection.
The patch embeddings with an additional class token are fed together into the Transformer encoder
after adding position embeddings to retain positional information. In classification tasks, only the
class token can be predicted by a MLP head to output image class. The Transformer encoder in ViT
is composed of a stack of L identical blocks. Each block consists of alternating layers of multi-head
self-attention (MSA) and multilayer perceptron (MLP). Residual connection and layer normalization
are employed to enhance the scalability.

As the core component of Transformer, MSA is an extension of standard self-attention. The self-
attention mechanism estimates the relevance of each position to all positions and aggregates global
information from the complete input sequence to update the output vector. Specifically, the input
sequences X are mapped into three different sequential vectors and they are query vectors Q, key
vectors K and value vectors V , according to:

Q = XWQ, K = XWK, V = XWV


https://doi.org/10.15837/ijccc.2022.1.4714 9

where WQ ∈ Rd×dk, WK ∈ Rd×dk, WV ∈ Rd×dv are linear matrices; d and dk denote the length and
dimension of queries, keys and values respectively.

Compute the dot products of the query with all keys, then scale it and use the softmax function
to obtain the weights on the values. This process can be formulated as:

Attention(Q,K,V ) = softmax(
QKT
√
dk

)V

In order to capture the multiple complex relationships and extract richer information, the MSA
mechanism has been introduced [50], comprising h parallel self-attention layers, and the outputs of
each layer are concatenated and projected to the final output, which is defined as:

MutiHead(Q,K,V ) = Concat(head1, · · · ,headh)WO
headi = Attention(QW

Q
i ,KW

K
i ,V W

V
i ), i = 1, 2, · · · ,h

where WO ∈ Rhdv×d. Furthermore, the Data-efficient image Transformer (DeiT) [49] has been pro-
posed to improve vision transformer applicability when training on ImageNet-1k. Finally, the DeiT
has been chosen as a representative of Transformer-based models.

MLP-based models. The pioneering work of deep MLP models, MLP-Mixer [47], which con-
sists of three modules as shown in Figure 4. The per-patch fully-connected layer and classifier layer
play the same role as in ViT, mapping image patches into embedding vector and generating the clas-
sification results respectively. In the second module, MLP-Mixer stacks N Mixer layers instead of
self-attention layers. Each Mixer layer contains one token-mixing MLP and one channel-mixing MLP.
Apply token-mixing MLP to communicate between different patches, and apply channel-mixing MLP
to communicate between different channels. Every MLP includes two fully-connected layers and a
GELU [15] activation function. The layers of Mixer can be written as follows:

U∗,i = X∗,i + W2σ(W1LayerNorm(X)∗,i), for i = 1 · · ·C,
Yj,∗ = Uj,∗ + W4σ(W3LayerNorm(U)j,∗), forj = 1 · · ·S.

where X ∈ RS×C is the input of each mixer layer, S and C are the numbers of patches and channels, σ
is the GELU function, W1 −W4 represents weights of different fully-connected layers. In addition, the
ResMLP [48] replaces the layer normalization in MLP-Mixer with the affine transform layer, aiming
to stabilize the training process, so the ResMLP has been chosen as a representative of MLP-based
models.

Layer Norm

Token-mixing MLP

Layer Norm

Channel-mixing MLP

N ×
Mixer
Layer

… … … …
Image Partition

Per-patch Fully-connected

+

+

Fully-connected

GELU

Fully-connected

Global Average Pooling

Fully-connected

Class

N×

Figure 4: Architecture of MLP-Mixer


https://doi.org/10.15837/ijccc.2022.1.4714 10

4 Experiment & results
This section will primarily illustrate the experiment & results. First the data acquisition & pro-

cessing will be summarized with details and secondly are the experimental settings. Finally, results
and analysis will be introduced with a discussion of experiments.

4.1 Data acquisition & processing

Data acquisition. The data needed in our study is the time-series data of instances of operating
status in NPPs, with labels of each instance indicating the current status like normal condition or
abnormal conditions. Due to the specialty of NPPs and the difficulty of collecting real data from NPPs,
a practical and realistic way for researchers is to obtain suitable data from simulation softwares [12]
[22]. For example, the most widely-used nuclear reactor transient analyzer called Personal Computer
Transient Analyzer (PCTRAN) [36] is a simulation software for a variety of accidents and transient
conditions occurred in NPPs, which could present and output the status of key parameters, and it
allows simulation of operators’ actions by interactive control. Thus, we choose the PCTRAN software
for experimental data acquisition.

After the simulation by the software, large quantities of numeric instances describing operating
status in NPPs have been collected, containing 70 dimensions as 70 sensors have been monitoring the
pressure, temperature, flow rate and many other indicators from different places in NPPs like reactor
cores, steam generators, pipelines, heat exchangers, etc. In addition, 5 labels of current status have
also been generated as 5 classes of each instance including one normal condition (Label 0) and four
most common types of fault conditions, loss of coolant accident (Label 1), loss of feedwater accident
(Label 2), steam line break (Label 3) and steam generator tube rupture (Label 4). Until here, the
data acquisition process has been basically completed.

Data processing. The simulation process has generated tens of thousands of numeric instances
indicating the operating status of a NPP, which are basically presented in seconds. Then the data
transformation has been conducted using the four methods introduced in 3.1. The 1-D time-series
data collected from each sensor has been transformed into one channel in an image, meaning a single
image is consisted of 70 channels and each channel represents the temporal trend of each dimension
or sensor. In order to better characterize the time-series information, the data have been sampled in
the sliding window of 224 seconds with an interval of 5 seconds. Once an instance with fault condition
labels (label 1-4) is included in this sliding window, the corresponding transformed image should be
noted as fault labels. In addition, for better efficiency of fault detection and quick diagnosis, it’s
necessary to remove noisy and redundant information within the dataset by reducing the dimensions
(70) of the data, before the transformation process. To do this, the most prevalently used dimension
reduction algorithm, the Principal Component Analysis (PCA) [19], has been used to process the
simulated data with the principal components set to 32 based on actual conditions. In this way, we
have obtained 6300 images while each image contains 32 channels, as qualified experimental dataset.
The distribution of classes of transformed images is shown in Table 1 and clearly it’s an imbalanced
supervised classification task.

Table 1: Distribution of classes of transformed images
Classes (Labels) Number of images

Normal 4200
Loss of coolant accident 600

Loss of feedwater accident 300
Steam line break 600

Steam generator tube rupture 600

The four types of transformed images have been acquired and cases of these images are shown in
Figure 5. Specifically, the key parameters in transformation are set as follows: MTF quantile bins = 8,
UTRP delay time = 1 and embedding dimension = 1. In Figure 5, each row indicates four converting


https://doi.org/10.15837/ijccc.2022.1.4714 11

means respectively, GASF, GADF, MTF, UTRP, and columns are different classes of images, as label
0 is normal condition images while label 1-4 represent different types of faults. Different classes of
images are distinctly different from each other, which can be observed simply by visual inspection, and
it has established strong foundations for image classification models and effective detection of faults
using the methodology proposed in this research.

GASF

GADF

MTF

UTRP

(a) Label 0 (b) Label 1 (c) Label 2 (d) Label 3 (e) Label 4

Figure 5: The transformed images of normal data and fault data

4.2 Experimental settings

According to the three primary deep learning architectures mentioned in Section 3.2, we have
chosen the most currently popular models for experiments including ResNet (CNNs) [14], EfficientNet
(CNNs) [45], DeiT (Transformer) [49] and ResMLP (MLP) [48]. Furthermore, to better achieve
real-time fault diagnosis and lower the computational costs, we have chosen the exact models with
minimal parameters among the variants, and they are ResNet18, EfficientNet-B0, DeiT-Tiny as well
as ResMLP-S12. These models with pre-trained weights on ImageNet will be used for experiments
subsequently. The details of network designs are summarized in Table 3.

For the supervised classification task, the quantities of images in training set and testing set have
been arranged as 8:2 separately. We train the models illustrated above for 50 epochs using batch size
32 on one GPU while all models use 224×224 images as input and apply the cross-entropy loss. Most
of our training settings are inherited from the original models as the ResNet18 and EfficientNet-B0 use
a similar setting: SGD [39] as the optimizer with weight decay 0.0001 and momentum 0.9, learning
rate starting from 0.1 and divided by 10 after 30 epochs. In DeiT-Tiny and ResMLP, we employ
AdamW [28] optimizer with weight decay 0.05 and the initial learning rate is 0.0005512 ×batch size with
cosine learning rate decay.

4.3 Analysis of results

Comparison of different transforming methods. We have compared the throughput and
image size of each transformation method, details shown in Table 2. The throughput is measured as
the number of images that can be processed per second on CPU. The results have demonstrated that


https://doi.org/10.15837/ijccc.2022.1.4714 12

the transforming speed of MTF is almost four times faster than the other three means, and at the
same time, the size of the image converted by MTF is also the smallest.

Table 2: Comparison of throughput and image size of different transformation methods
Transformation Methods GASF GADF MTF UTRP

Throughput (im/s) 30.48 29.12 109.57 30.14
Size (kb/per) 78.52 92.70 2.40 89.76

Before the transformation, the numeric 1-D time-series data has been processed by PCA for di-
mension reduction as instructed in 4.1, which has produced 32 key dimensions (channels). Then we
have concated 32 channels into a single 32-channel image, similar with a 3-channel RGB image, as the
feeding data into the deep learning models.

Performances of different combinations: accuracy vs. efficiency. We have trained the
selected four models, ResNet18, EfficientNet-B0, DeiT-Tiny and ResMLP-S12 on the transformed
image dataset, where the input size is 32×224×224. Figure 6 shows the training process of different
transformation methods with different deep learning models. It can be seen that, on the whole, all
models and methods converge within 50 epochs and have achieved very high accuracy eventually.

Specifically, from the perspective of the transformation methods, the training process of MTF is the
most stable one while other methods fluctuates greatly. MTF and GADF can converge in 20 epochs,
and GASF and UTRP need 30 epochs to finish this process. However, UTRP achieves the highest
accuracy in all four transforming methods while the classifying performance of MTF is inferior to other
methods (see Figure 7). From the perspective of the classification models, Efficientnet-B0 achieves
the best performance on all converting methods. DeiT-Tiny is slightly worse than Efficientnet-B0, but
the training process is more stable. Regardless of convergence or accuracy, ResMLP-S12 is inferior to
other models with all transforming means especially in GASF.

(a) GASF (b) GADF

(c) MTF (d) UTRP

Figure 6: Training process of different transforming methods with different deep learning models

In order to evaluate different classifying means more accurately, the classification models are
usually compared based on the trade-off between accuracy, number of parameters and throughput.


https://doi.org/10.15837/ijccc.2022.1.4714 13

Figure 7: Comparison of accuracy, throughput and params of transforming methods

Table 3: Details of network designs of different models and classification accuracy of combinations in
experiments

Architectures Exact Model Params (M) Layers Throughput (im/s) Transforming Methods Accuracy (%)

CNN-based models

ResNet18 11.27 18 1268.75

GASF 99.44
GADF 99.68
MTF 99.21
UTRP 99.68

EfficientNet-B0 4.02 18 1132.28

GASF 99.84
GADF 99.92
MTF 99.84
UTRP 100.00

Transformer-based models DeiT-Tiny 6.90 12 1013.83

GASF 100.00
GADF 99.44
MTF 99.60
UTRP 100.00

MLP-based models ResMLP-S12 17.79 12 919.81

GASF 98.37
GADF 99.60
MTF 98.97
UTRP 99.60

Table 3 reports the classification accuracies of different architectures based on CNN, Transformer and
MLP with throughput, and number of parameters. Because fault diagnosis in NPPs requires very
quick response, we have chosen to conduct assessment based on the trade-off between accuracy and
throughput. In here, the throughput is measured as the number of images that can be processed per
second on one NVIDIA RTX 3060 GPU with largest possible batch size. Larger throughput means
that the processing speed of the model is faster under the same hardware conditions. Among the four
models, the throughput of ResNet18 is the fastest, followed by EfficientNet-B0 while the throughput
of ResMLP-S12 is the worst.

4.4 Summary

This section has conducted extensive experiments to compare the performances of different deep
learning based models (CNN-based models, Transformer-based models, MLP-based models) in fault
detection with images generated by different transforming methods (GASF, GADF, MTF, UTRP).
It can be observed that MTF has the fastest converting speed and the converted image occupies
the smallest memory, yet it is not the optimal choice because its classification accuracy is inferior
to almost every methods on each model. In contrast, UTRP achieves excellent performances, no
matter which classifying model has been used. In terms of the trade-off between throughput and
accuracy, EfficientNet-B0 outperforms other deep learning classification approaches because it has
achieved high accuracy with a faster throughput, which can meet the requirements of quick response
in fault diagnosis of NPPs, according to the experimental results.


https://doi.org/10.15837/ijccc.2022.1.4714 14

5 Conclusion & future directions
This research has proposed a two-stage fault detection method for NPPs based on imaged data

and deep learning image classification. Experimental results have demonstrated that this methodology
proposed is capable of detecting and warning possible faults of NPPs in advance and further it could
be adapted to promote the safety management in many different power systems with high accuracy,
fast response and great efficiency. The primary contributions of our research are as follows:

• The time-series data of the operating status of NPPs have been transformed into imaged data
by four major methods, preserving the time-series information in images, which enables deep
learning models to convert the fault detection problem into supervised image classification task.

• Extensive experiments have been carried out to compare and analyze the performances of dif-
ferent combinations of data transforming methods and image classifying models with the results
of effective accuracies, which has also identified that UTRP and EfficientNet-B0 (CNN-based
model) outperform others in the trade-off between precision and throughput.

This proposed methodology has improved the precision of fault detection and diagnosis in NPPs,
providing effective and scientific support to decision-makings. However, there still leaves many limita-
tions and shortcomings in our research. One is the data used in experiments. Though data transfor-
mation is considered to be a promising direction in solving many tough tasks, it has also raised doubts
about the interpretability of artificial changes of data structure or format. Also, there’s only one kind
of data (numeric data) applied in our research and it’s collected by simulation of software, not real
operating status of a NPP, while it has been confirmed that more data involved in experiments will
help achieve better performances, like the fusion of multiple-source heterogeneous data or integration
of different forms of data. Another issue we need to pay attention to is those new models that have
evolved rapidly recent years represented by deep learning based approaches. These emerging method-
ologies deal with data from different perspectives and could outperform than conventional ones to
certain extent as long as appropriate data with compatible structures is available. Diversification of
data and the significant development of computing capacity not only require more data-driven intelli-
gent analyzing tools, but also make it possible for the implementations of these advanced techniques
in both academic research and practice. Hence, to better promote the theoretical and practical devel-
opment of fault detection in NPPs, it’s necessary to further explore possible innovations from the two
points illustrated above, more different types of data and more intelligent analyzing tools.

Acknowledgements
All authors hereby express sincere condolences for our loss of Ioan Dzitac (Founder & former

Editor-in-Chief of IJCCC), and great appreciation to editors and reviewers for their careful work and
valuable comments, especially Florin G. Filip and Simona Dzitac for their unwavering patience,
understanding and support. The APC was funded by R&D center "Cercetare Dezvoltare Agora" of
Agora University and this article was funded by the Key Project (grant number 71932008) of National
Natural Science Foundation of China.

Author contributions
Y. Shi: Conceptualization; Formal Analysis; Methodology; Funding acquisition; Project adminis-

tration; Supervision; Writing - review & editing. X. Xue: Conceptualization; Data curation; Formal
analysis; Methodology; Software; Writing - original draft; Writing - review & editing. J. Xue: Con-
ceptualization; Data curation; Formal analysis; Methodology; Software; Writing - review & editing.
Y. Qu: Conceptualization; Data curation; Formal analysis; Methodology; Writing - review & editing.


https://doi.org/10.15837/ijccc.2022.1.4714 15

Conflict of interest
All authors declare no conflict of interest.

References
[1] Aizpurua, J. I.; Mcarthur, S.; Stewart, B. G.; Lambert, B.; Cross, J. G.; Catterson, V. M. (2019).

Adaptive Power Transformer Lifetime Predictions Through Machine Learning and Uncertainty
Modeling in Nuclear Power Plants, IEEE Transactions on Industrial Electronics, 66, 4726–4737,
2019.

[2] Ayodeji, A.; Liu, Y. (2019). PWR heat exchanger tube defects: Trends, signatures and diagnostic
techniques, Progress in Nuclear Energy, 112, 171–184, 2019.

[3] Caron, M.; Touvron, H.; Misra, I.; J’egou, H.; Mairal, J.; Bojanowski, P.; Joulin, A. (2021).
Emerging Properties in Self-Supervised Vision Transformers, ArXiv, abs/2104.14294, Avbailable:
https://arxiv.org/abs/2104.14294, 2021.

[4] Chen, F.; Jahanshahi, M. R. (2018). NB-CNN: Deep Learning-Based Crack Detection Using
Convolutional Neural Network and Naïve Bayes Data Fusion, IEEE Transactions on Industrial
Electronics, 65, 4392–4400, 2018.

[5] Chen, L.; Papandreou, G.; Kokkinos, I.; Murphy, K. P.; Yuille, A. L. (2018). DeepLab: Semantic
Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected
CRFs, IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 834–848, 2018.

[6] Deleplace, A.; Atamuradov, V.; Allali, A.; Pelle, J. T.; Plana, R.; Alleaume, G. (2020). Ensemble
Learning-based Fault Detection in Nuclear Power Plant Screen Cleaners, IFAC-PapersOnLine,
53, 10354-10359, 2020.

[7] Deng, J.; Dong, W.; Socher, R.; Li, L.; Li, K.; Fei-Fei, L. (2009). ImageNet: A large-scale
hierarchical image database, 2009 IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), 248–255, 2009.

[8] Dong, C.; Loy, C. C.; He, K.; Tang, X. (2016). Image Super-Resolution Using Deep Convolutional
Networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, 38, 295–307, 2016.

[9] Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani,
M.; Minderer, M.; Heigold, G.; Gelly, S.; Uszkoreit, J.; Houlsby, N. (2021). An Image is Worth
16x16 Words: Transformers for Image Recognition at Scale, 9th International Conference on
Learning Representations (ICLR), 2021.

[10] Eckmann, J.; Kamphorst, S. O.; Ruelle, D. (1987). Recurrence Plots of Dynamical Systems,
Europhysics Letters, 4, 973–977, 1987.

[11] Guo, Z.; Wu, Z.; Liu, S.; Ma, X.; Wang, C.; Yan, D.; Niu, F. (2020). Defect detection of nuclear
fuel assembly based on deep neural network, Annals of Nuclear Energy, 137, 107078, 2020.

[12] He, C.; Ge, D.; Yang, M.; Yong, N.; Wang, J.; Yu, J. (2021). A data-driven adaptive fault
diagnosis methodology for nuclear power systems based on NSGAII-CNN, Annals of Nuclear
Energy, 159, 108326, 2021.

[13] He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. B. (2020). Mask R-CNN, IEEE Transactions on
Pattern Analysis and Machine Intelligence, 42, 386–397, 2020.

[14] He, K.; Zhang, X.; Ren, S.; Sun, J. (2016). Deep Residual Learning for Image Recognition, 2016
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–778, 2016.


https://doi.org/10.15837/ijccc.2022.1.4714 16

[15] Hendrycks, D.; Gimpel, K. (2016). Gaussian Error Linear Units (GELUs), ArXiv,
abs/1606.08415, Available: http://arxiv.org/abs/1606.08415, 2016.

[16] Holdsworth, A. F.; George, K. E.; Adams, S. J.; Sharrad, C. A. (2021). An accessible statistical
regression approach for the estimation of spent nuclear fuel compositions and decay heats to
support the development of nuclear fuel management strategies, Progress in Nuclear Energy, 141,
103935, 2021.

[17] Hosaka, T. (2019). Bankruptcy prediction using imaged financial ratios and convolutional neural
networks, Expert Systems with Applications, 117, 287–299, 2019.

[18] Hu, G.; Zhou, T.; Liu, Q. (2021). Data-Driven Machine Learning for Fault Detection and Diag-
nosis in Nuclear Power Plants: A Review, Frontiers in Energy Research, 9, 1–12, 2021.

[19] Jolliffe, I. T. (1986). Principal Component Analysis, Springer, 1986.

[20] Kim, J.; Lee, J. K.; Lee, K. M. (2016). Accurate Image Super-Resolution Using Very Deep
Convolutional Networks, 2016 IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), 1646–1654, 2016.

[21] Krizhevsky, A.; Sutskever, I.; Hinton, G. E. (2012). ImageNet classification with deep convolu-
tional neural networks, Advances in Neural Information Processing Systems (NIPS), 25, 2012.

[22] Lee, G.; Lee, S. J.; Lee, C. (2021). A convolutional neural network model for abnormality diagnosis
in a nuclear power plant, Applied Soft Computing, 99, 106874, 2021.

[23] Li, J.; Lin, M. (2021). Ensemble learning with diversified base models for fault diagnosis in nuclear
power plants, Annals of Nuclear Energy, 158, 108265, 2021.

[24] Lin, T.; Dollár, P.; Girshick, R. B.; He, K.; Hariharan, B.; Belongie, S. J. (2017). Feature
Pyramid Networks for Object Detection, 2017 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 936–944, 2017.

[25] Ling, J.; Liu, G.; Li, J.; Shen, X.; You, D. (2020). Fault prediction method for nuclear power
machinery based on Bayesian PPCA recurrent neural network model, Nuclear Science and Tech-
niques, 31, 1–11, 2020.

[26] Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S. E.; Fu, C.; Berg, A. C. (2016). SSD:
Single Shot MultiBox Detector, 14th European Conference on Computer Vision (ECCV), 21–37,
2016.

[27] Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. (2021). Swin Trans-
former: Hierarchical Vision Transformer using Shifted Windows, ArXiv, abs/2103.14030, Avail-
able: http://arxiv.org/abs/2103.14030, 2021.

[28] Loshchilov, I.; Hutter, F. (2019). Decoupled Weight Decay Regularization, 7th International
Conference on Learning Representations (ICLR), 2019.

[29] Mandal, S.; Santhi, B.; Sridhar, S.; Vinolia, K.; Swaminathan, P. (2017). Nuclear Power Plant
Thermocouple Sensor-Fault Detection and Classification Using Deep Learning and Generalized
Likelihood Ratio Test, IEEE Transactions on Nuclear Science, 64, 1526–1534, 2017.

[30] Meng, J.; Su, Y.; Xie, S. (2020). Loose parts detection method combining blind deconvolution
with support vector machine, Annals of Nuclear Energy, 149, 107782, 2020.

[31] Mo, K.; Lee, S. J.; Seong, P.H. (2007). A dynamic neural network aggregation model for transient
diagnosis in nuclear power plants, Progress in Nuclear Energy, 49, 262-272, 2007.

[32] Mohapatra, D.; Subudhi, B.; Daniel, R. (2020). Real-time sensor fault detection in Tokamak
using different machine learning algorithms, Fusion Engineering and Design, 151, 111401, 2020.


https://doi.org/10.15837/ijccc.2022.1.4714 17

[33] Nicolau, A. D.; Augusto, J. P.; Schirru, R. (2017). Accident diagnosis system based on real-time
decision tree expert system, AIP Conference Proceeding, 1836, 020017, 2017.

[34] Oh, C.; Lee, J. I. (2020). Real time nuclear power plant operating state cognitive algorithm
development using dynamic Bayesian network, Reliability Engineering and System Safety, 198,
106879, 2020.

[35] Po, L. (2019). Conceptual Design of an Accident Prevention System for Light Water Reactors
Using Artificial Neural Network and High-Speed Simulator, Nuclear Technology, 206, 505–513,
2019.

[36] Racheal, S.; Liu, Y.; Ernest, M.; Ayodeji, A. (2021). A Systematic Review of PCTRAN-Based
Pressurized Water Reactor Transient Analysis, Proceedings of the 2021 28th International Con-
ference on Nuclear Engineering, 2021.

[37] Redmon, J.; Divvala, S. K.; Girshick, R. B.; Farhadi, A. (2016). You Only Look Once: Unified,
Real-Time Object Detection, 2016 IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), 779–788, 2016.

[38] Rezaeianjouybari, B.; Shang, Y. (2020). Deep learning for prognostics and health management:
State of the art, challenges, and opportunities, Measurement, 163, 107929, 2020.

[39] Robbins, H. E. (2007). A Stochastic Approximation Method, Annals of Mathematical Statistics,
22, 400–407, 1951.

[40] Ronneberger, O.; Fischer, P.; Brox, T. (2015). U-Net: Convolutional Networks for Biomedical
Image Segmentation, 18th International Conference on Medical Image Computing and Computer
Assisted Intervention (MICCAI), 234–241, 2015.

[41] Saeed, H. A.; Peng, M.; Wang, H.; Zhang, B. (2020). Novel fault diagnosis scheme utilizing deep
learning networks, Progress in Nuclear Energy, 118, 103066, 2020.

[42] Shi, Y.; Xue, X.; Qu, Y.; Xue, J.; Zhang, L. (2021). Machine Learning and Deep Learning
Methods used in Safety Management of Nuclear Power Plants : A Survey, 2021 International
Conference on Data Mining Workshops (ICDMW), 917–924, 2021.

[43] Simonyan, K.; Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image
Recognition, 3rd International Conference on Learning Representations (ICLR), 2015.

[44] Smith, C. L.; Borgonovo, E. (2007). Decision making during nuclear power plant incidents: a new
approach to the evaluation of precursor events, Risk analysis, 27, 1027–1042, 2007.

[45] Tan, M.; Le, Q. V. (2019). EfficientNet: Rethinking Model Scaling for Convolutional Neural
Networks, Proceedings of the 36th International Conference on Machine Learning (ICML), 6105–
6114, 2019.

[46] Tian, W.; Wu, J.; Cui, H.; Hu, T. (2021). Drought Prediction Based on Feature-Based Transfer
Learning and Time Series Imaging, IEEE Access, 9, 101454–101468, 2021.

[47] Tolstikhin, I. O.; Houlsby, N.; Kolesnikov, A.; Beyer, L.; Zhai, X.; Unterthiner, T.; Yung, J.; Key-
sers, D.; Uszkoreit, J.; Lucic, M.; Dosovitskiy, A. (2021). MLP-Mixer: An all-MLP Architecture
for Vision, ArXiv, abs/2105.01601, Available: http://arxiv.org/abs/2105.01601, 2021.

[48] Touvron, H.; Bojanowski, P.; Caron, M.; Cord, M.; El-Nouby, A.; Grave, E.; Izacard,
G.; Joulin, A.; Synnaeve, G.; Verbeek, J.; J’egou, H. (2021). ResMLP: Feedforward net-
works for image classification with data-efficient training, ArXiv, abs/2105.03404, Available:
http://arxiv.org/abs/2105.03404, 2021.


https://doi.org/10.15837/ijccc.2022.1.4714 18

[49] Touvron, H.; Cord, M.; Douze, M.; Massa, F.; Sablayrolles, A.; J’egou, H. (2021). Training data-
efficient image transformers & distillation through attention, Proceedings of the 38th International
Conference on Machine Learning (ICML, 10347–10357, 2021.

[50] Vaswani, A.; Shazeer, N. M.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, L.;
Polosukhin, I. (2017). Attention is All you Need, Advances in Neural Information Processing
Systems (NIPS), 30, 2017.

[51] Wang, Z.; Oates, T. (2015). Imaging Time-Series to Improve Classification and Imputation,
Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI), 3939–
3945, 2015.

[52] Wang, H.; Peng, M.; Wesley Hines, J.; Zheng, G.; Liu, Y.; Upadhyaya, B. R. (2019). A hybrid fault
diagnosis methodology with support vector machine and improved particle swarm optimization
for nuclear power plants, ISA transactions, 95, 358–371, 2019.

[53] Wang, X.; Xu, Z. S.; Dzitac, I. (2019). Bibliometric Analysis on Research Trends of Interna-
tional Journal of Computers Communications & Control, International Journal of Computers
Communications & Control, 14, 711–732, 2019.

[54] Yang, C.; Chen, Z.; Yang, C. (2020). Sensor Classification Using Convolutional Neural Network
by Encoding Multivariate Time Series as Two-Dimensional Colored Images, Sensors, 20, 1, 168,
2020.

[55] Yao, Y.; Wang, J.; Long, P.; Xie, M.; Wang, J. (2020). Small-batch-size convolutional neural net-
work based fault diagnosis system for nuclear energy production safety with big-data environment,
International Journal of Energy Research, 44, 5841–5855, 2020.

Copyright ©2022 by the authors. Licensee Agora University, Oradea, Romania.
This is an open access article distributed under the terms and conditions of the Creative Commons
Attribution-NonCommercial 4.0 International License.
Journal’s webpage: http://univagora.ro/jour/index.php/ijccc/

This journal is a member of, and subscribes to the principles of,
the Committee on Publication Ethics (COPE).

https://publicationethics.org/members/international-journal-computers-communications-and-control

Cite this paper as:

Shi, Y.; Xue, X.; Xue, J.; Qu, Y. (2022). Fault Detection in Nuclear Power Plants using Deep
Leaning based Image Classification with Imaged Time-series Data, International Journal of Computers
Communications & Control, 17 (1), 4714, 2022. https://doi.org/10.15837/ijccc.2022.1.4714


	Introduction
	Related work
	Fault detection in NPPs
	Data transformation: from numeric to imaged
	Deep learning based image classification

	Methodology
	Data processing & transformation
	Deep learning based architectures for image classification

	Experiment & results
	Data acquisition & processing
	Experimental settings
	Analysis of results
	Summary

	Conclusion & future directions