CHEMICAL ENGINEERING TRANSACTIONS  
 

VOL. 51, 2016 

A publication of 

 
The Italian Association 

of Chemical Engineering 
Online at www.aidic.it/cet 

Guest Editors: Tichun Wang, Hongyang Zhang, Lei Tian 
Copyright © 2016, AIDIC Servizi S.r.l., 

ISBN 978-88-95608-43-3; ISSN 2283-9216 

Application Research on Data Mining and Artificial 
Intelligence Theory in Short-Term Power Load Forecasting 

Honghai Wang 
School of Electronic and Electrical Engineering, Anhui Sanlian University, Hefei, 230601, China  
sanlian_whh@163.com 

Data mining technology provides an effective research tool for us to process uncertain, noisy and implicit 
information. Rough set theory, as a kind of typical data mining algorithm, provides an effective tool for 
research on analysis and induction of inaccurate data, mining relations among data and discovering potential 
knowledge. The paper will establish a short-term load prediction model based on rough set theory. It utilizes 
rough to carry out attribute reduction for various historical data related to load, gets rid of those irrelative 
attribute to decision information and simplifies input variable so as to shorten the search space of neural 
network and improve the prediction performance. 

1. Introduction 

Change rule of short-term load is complicated as time changes. It is difficult to describe it with accurate 
mathematical model and a mass of uncertain factors exist in short-term load prediction, such as various 
information related to actual load (climate, humidity, precipitation, special event, etc.) (Wu and Niu, 2009) 
These factors and power load show non-linear relation. In the previous load prediction, these relations are 
provided by experienced dispatch personnel, but for load prediction, the knowledge is inaccurate and it is 
difficult to judge the change of various random fluctuation only by experience in current open power market 
(Chiu and Kao, 1997). 
In addition, it still needs many statistic information and priori knowledge in artificial intelligence forecasting 
method widely applied at present. However, the information may have information deficiency or 
incompleteness with the change of operation environment of information. Moreover, in short-term power load 
prediction, various factors affect load prediction and the impact degrees are various. Among these various 
factors, not all the conditions are necessary. Some of them are relative and some of them are independent, so 
it only needs some or a few conditions to induce the conclusion. Thus, it needs us to use new intelligence 
method for analysis and mine various relevant information methods from historical data automatically so as to 
enhance the precision and efficiency of load prediction (Paarmann and Najar, 1994). 
Data mining technology provides an effective research tool for us to process uncertain, noisy and implicit 
information. Rough set theory (Jia and Tian, 2008), as a kind of typical data mining algorithm, provides an 
effective tool for research on analysis and induction of inaccurate data, mining relations among data and 
discovering potential knowledge. The chapter will establish a short-term load prediction model based on rough 
set theory. It utilizes rough to carry out attribute reduction for various historical data related to load, gets rid of 
those irrelative attribute to decision information and simplifies input variable so as to shorten the search space 
of neural network and improve the prediction performance (Chen and Huang, 2014). 

2. Basis of rough set theory 

Rough set theory was proposed by Polish scholar Z. Pawalk in 1982. It provides a new mathematical tool for 
processing imprecise and uncompleted information. Rough set theory is established based on classification 
mechanism. It interprets classification to be equivalence relation in specific space, while relation of 
equivalence forms the division of space. The theory construes knowledge as data division and each divided 
set is called concept (Li and Huang, 2014). Main idea of rough set theory is, on the premise of maintaining the 

                               
 
 

 

 
   

                                                  
DOI: 10.3303/CET1651070

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Please cite this article as: Wang H.H., 2016, Application research on data mining and artificial intelligence theory in short-term power load 
forecasting, Chemical Engineering Transactions, 51, 415-420  DOI:10.3303/CET1651070   

415



constant information system classification capacity, utilizing the known knowledge base to depict inaccurate or 
uncertain knowledge with the known knowledge in the knowledge base and import problem decision or 
classification rule through knowledge supplement and reduction. 
The most prominent difference between rough set theory and other theories processing uncertain and 
inaccurate problems is: rough set theory doesn’t need to provide any prior information other than the data set 
needed to process and the description and processing about problem uncertainty is objective. Moreover, the 
theory doesn’t include the mechanism of processing(Huang et.al, 2002) inaccurate or uncertain original data, 
so the theory has strong compliment with other theories processing inaccurate or uncertain problems, such as 
probability theory, fuzzy mathematics, evidence theory, etc. 

2.1 Information system 
Information system is the object of rough set theory. It is a data set, and usually expressed as a data sheet. 
Each line of the data sheet represents an object and the object can be case, event, etc., while each list of data 
sheet is attribute of the object and these attributes can be the feature, measurement, etc. of the object. 
Information system can be formalized expressed as S=<U, Q, V, F>, including U={x1,x2…,xn} is finite non-
empty object set, usually called domain of discourse, Q is finite non-empty attribute set. If the attribute set 
Q=AD, A, D of information system S is condition attribute set and decision attribute set, such information 
system can also be called decision system (or decision table), V=Vq is the union set Vq of all attributes of q 
value domain, f is the information function specifying the attribute value of each object, namely: f: UQV. 

2.2 Relative reduction of attribute 

For attribute set A, DQ , matrix unit element of relative discernibility matrix is: 

    : , ,   
  

  

i j

ij

a A f x a f x a
m                                                                                         (1) 

Relative discernibility function is: 

  :1 , ,       ixD j ijf mi j n j i m                                                                                   (2)
 

When 1 2  i i ipa a a  is prime Df of discernibility function, attribute  1, ,i ipa a is relative reduction of D. Thus, 

we can seek for relative reduction through seeking for prime fd. 

2.3 Dependence degree of importance of attribute 
If all attribute values of attribute set D totally depend on attribute values of attribute set A, it can be expressed 
as A and D. The dependence degree of attribute D to A is expressed with (A,D): 

      , / card  AA D card POS D U                                                                                       (3) 
According to the definition of dependence degree, the importance degree of equivalence class of U/IND(D) is 
defined as: 

 
    

 

, ,
, A, D

,

 



 


A D A a D
SGF a

A D
                                                                                   (4)

 

3. Load forecasting model based on rough set theory 

An accurate load prediction model should be able to describe various factors related to load directly. From 
basic concept of rough set theory, we can see rough set can reduce and get rid of unnecessary information, 
which is good for information classification and reduction. In order to find directly related condition with load 
and improve the prediction accuracy and computation speed, the chapter uses rough set theory to reduce 
attribute of various factors affecting load, seeks for the necessary conditions directly related to power load, 
considers them as input vector of fuzzy neural network and applies the decision rule established by optimal 
reduction set to neural network structure design. 
As is shown in Diagram 1, the step of neural network load prediction model established based on rough set 
theory in the paper is as below: 
Establish initial information table according to historical load data, relevant information and historical data 
Make discretization of original data and establish real numeric type decision tables. 
Make attribute reduction to established decision table, get optimal condition attribute set and kernel related to 
load prediction 

416



Establish neural network model according to decision rule set and determine initial weight of neural network. 
Provide training to neural network 
If the fitting error of neural network meets the requirement, then end. 
 

The initial samples

Continuous attribute 

discretization

Form the real value 

decision table

Properties and 

contracted

The best reduction set

Neural network model

Network training

The output analysis

The training sample set
The test sample set 

training

 

Figure 1: Based on rough set neural network optimization process 

3.1 Determination of relevant factors 
The paper mainly considers the impact of climatic condition change on power load, makes fuzzy processing to 
climatic condition (including total cloud cover cl, wind speed meter Ws, maximum temperature Tmax, minimum 
temperature Tmin, humidity Hu, air pressure and precipitation Rf) and according to the features. At this time, 
the total number of condition attribute is N. membership function is shown as Diagram 2, the division standard 
of each climatic condition membership condition predicts daily load LD cluster to be £decision categories. In 
this paper, £=5, showing the load is very low (1-VL), low (2-L), ordinary (3-NM), high (4-H) and very high (5-
VH). 

3.2 Determination of fitness function 

Discernibility matrix Md(s) contains all the complete information differentiating xi, xj, and any reduction set can 
substitute the whole condition attribute without changing original dependence relation and resolving power, 
then the reduction set should represent the information in discernibility matrix as much as possible. If attribute 
B is the reduction set of attribute C, Formula (5) can be expressed as: 

 
   

 

 

 

C, B, ,
B, A, D

C, ,

  

 


  

D D B D
SGF t

D C D
                                                               (5)

 

At this time, it can be used to indicate the degree of approximation of attribute set B to C, called approximate 
error of reduction. If attribute set R is the optimal reduction set C, then double C, D=double R, D, namely 

417



SGF(R, C, D)=O. At this time, reduction set cannot make correct decision if it lacks any one element. In 
addition, a clear and simple decision rule should have less antecedent combination, namely the number of 
produced reduction set condition attributes should be as less as possible. Therefore, construct fitness function 
according to the above analysis, as is shown in Formula (6). The first item expresses the number of condition 
attributed contained by reduction set is lest, the second time expresses the coverage of reduction set in 
discernibility matrix is largest and the third item expresses the approximation degree of reduction set condition 
attributes to total condition attributes is greatest. 

1 2

1

1 / (R, C, D)
( )



 
   

 

M

i

I

F a S kl a sig
L R

                                                                         (6)

 
Including, L(R) is the number of condition attributes contained by reduction set, s is 0 or 1. When reduction set 
and some one element of discernibility matrix MD(S) have intersection, it is 1, or else 0. Kl is the number of 
elements in discernibility matrix: sig is the dependence degree of reduction set R to condition attribute set C, 
a1 and a2 are any non-negative random weight. We hope to get the most simple and most knowledge covered 
rule, so give greater weight to the first two items of fitting function and give the smaller weigh to the last item. 

The database
The data 

warehouse

A particular data 

set
model knowledge

Cleaning and 

integration

Cleaning and 

integration
The selection and 

transformation

The selection and 

transformation
Data miningData mining

Assessment and 

said

Assessment and 

said

 

Figure 2: Process schematic knowledge mining 

3.3 Basic process of attribute reduction 

If the attribute quantity of attribute combination in discernibility matrix is 1, it is the kernel attribute of decision 
table. It indicates that the remaining condition attributes cannot distinguish two different records of decision 
types in the information table other than attribute, while kernel attribute and the combination of it with any 
other attributes may constitute reduction set. Therefore, when utilizing discernibility matrix to seek for 
reduction set, in order to simplify, consider kernel attribute as feature attribute of data set and deem it as good 
gene to construct new attribute combination in genetic algorithm. However, the remaining useful attribute other 
than kernel attribute is got from analyse the matrix elements that attribute combination quantity is not 1. The 
paper considers other attributes other than kernel attributes as the gene pool, produce an attribute at random 
each time. Join it in the kernel attribute and judge whether the newly produced attribute combination meets the 
requirement or not. It is good for reducing search space of genetic algorithm and improving computation 
efficiency. 
Computation step of attribute reduction algorithm is as below: 
Compute core CORE according to discernibility matrix, get rid of all attributed included by CORE from the 
original condition attributes and take the remaining attributes as candidate gene. 
Select any attributes excluding CORE at random and join them in the CORE, then initial population including n 
individuals generate. Each attributed set generated is expressed in the form of binary string, including each 
single bit expresses a condition attribute, 1 expresses the attribute belongs to reduction set, and 0 means it 
doesn’t belong to reduction set. 
According to formula (4), compute fitness function of each individual. Produce new population through 
selection, cross and mutation operation and make fitness function approach maximum gradually. In order to 
get the simple reduction set, define two mutation probabilities pm1 and pm2, including pm1 expresses the 
probability from 1 to 0 and pm2 expresses the probability from 0 to 1. 

418



When the computation reaches the fixed genetic algebra, then finish the computation, or else turn to c). Here, 
two opposite directions of mutation probabilities pm1 and pm2 are defined, which makes the individual change 
toward the constant reduction of attribute combination quantity with greater probability p m1. It is good for 
finding the minimum attribute combination. During the training, the optimal individual is maintained. These 
individuals constitute the optimal regulation set R jointly. It has less quantity of attribute and better resolving 
power. The reduction set got in the next chapter will be used for optimizing neural network structure so as to 
get a simple and transparent neural network. 
In addition, it still needs many statistic information and priori knowledge in artificial intelligence forecasting 
method widely applied at present. However, the information may have information deficiency or 
incompleteness with the change of operation environment of information. Moreover, in short-term power load 
prediction, various factors affect load prediction and the impact degrees are various. Among these various 
factors, not all the conditions are necessary. Some of them are relative and some of them are independent, so 
it only needs some or a few conditions to induce the conclusion. Thus, it needs us to use new intelligence 
method for analysis and mine various relevant information methods from historical data automatically so as to 
enhance the precision and efficiency of load prediction. 

 

Figure 3: Each feature vector membership function 

4. Conclusion 

Through effective classification of load mode type, it can provide better load prediction classification model for 
power load prediction so as to improve fitting performance of neural network. However, a mass of factors 
affect short-term power load, and excessive input vectors not only increases the scale of neural network 
model, reduces training efficiency of neural network, but also affects fitting effect of neural network. How to 
use the information properly and effectively has become the key problem to improve load predication precision. 
To this end, the paper studies the factors affecting load prediction and proposes power load prediction 
algorithm based on rough set theory specific to climatic conditions. It starts with pre-processing of historical 
data firstly, cuts off redundant attributes and number of neural network input vectors through attribute 
reduction, then starts with reducing neural network model and improving transparency of neural network 
model, and applies sough set decision rule to neural network linkage and initial connection weight design. 
Thus, the paper realizes the purpose of improving prediction performance of model and improving efficiency 
and precision of load prediction. 
Research result indicates the knowledge-based prediction model and load classification model observed 
through data mining are able to describe the complicated relation in load prediction model and trace load 
mode change timely so as to improve short-term power load prediction effect. Data mining technology 
provides an effective research tool for us to process uncertain, noisy and implicit information. Rough set 
theory, as a kind of typical data mining algorithm, provides an effective tool for research on analysis and 
induction of inaccurate data, mining relations among data and discovering potential knowledge. The chapter 
will establish a short-term load prediction model based on rough set theory. It utilizes rough to carry out 
attribute reduction for various historical data related to load, gets rid of those irrelative attribute to decision 
information and simplifies input variable so as to shorten the search space of neural network and improve the 
prediction performance. 

419



Acknowledgment  

This work was supported by the Nature Science Foundation of Anhui Province Education Department (No. 
KJ2016A251), the Nature Science Foundation of Anhui Sanlian University (No.2014Z020), Quality Project of 
Anhui Province Education Department (No. 2014jxtd046) and the revitalization plan project of higher 
education of Anhui Province Grant (No. 2013zytz082). 

Reference  

Chiu C.C., Kao L.J., 1997, Combining a neural network with a rule-based expert system approach for short-
term power load forecasting in Taiwan, Expert Systems with Applications, 13, 299-30, DOI: 
10.1016/S0957-4174(97)00048-1. 

Chen Q.S., Huang W., 2014, Short-term power load forecasting with least squares support vector machines 
and wavelet transform, Applied Mechanics and Materials, 494, 1647-1650, DOI: 
10.4028/www.scientific.net/AMM.494-495.1647. 

Huang H.C., Hwang R.C., Hsieh J.G., 2002, Short-term power load forecasting by non-fixed neural network 
model with fuzzy BP learning algorithm, International Journal of Power and Energy Systems, 22, 50-57, 
DOI: 10.1007/978-3-319-09330-7_47 

Jia Z.Y. Tian L., 2008, Short-term power load forecasting based on fuzzy-RBF neutral network, Proceedings of 
International Conference on Risk Management and Engineering Management, 6, 349-352, DOI: 
10.1109/ICRMEM.2008.41. 

Li L.J., Huang W., 2014, A short-term power load forecasting method based on BP neural network, Applied 
Mechanics and Materials, 4945, 1647-1650, DOI: 10.4028/www.scientific.net/AMM.494-495.1647. 

Paarmann L.D., Najar M.D., 1994, Short-term power load forecasting based on autocorrelation function 
optimization, Midwest Symposium on Circuits and Systems, 2, 1475-1478, DOI: 
10.1016/j.epsr.2016.04.003. 

Wu J., Niu D.X., 2009, Short-term power load forecasting using Least Squares Support Vector Machines(LS-
SVM), 2nd International Workshop on Computer Science and Engineering, WCSE 2009, 1, 246-250, DOI: 
10.1109/WCSE.2009.663. 

420