CHEMICAL ENGINEERING TRANSACTIONS VOL. 51, 2016 A publication of The Italian Association of Chemical Engineering Online at www.aidic.it/cet Guest Editors: Tichun Wang, Hongyang Zhang, Lei Tian Copyright © 2016, AIDIC Servizi S.r.l., ISBN 978-88-95608-43-3; ISSN 2283-9216 Application Research on Data Mining and Artificial Intelligence Theory in Short-Term Power Load Forecasting Honghai Wang School of Electronic and Electrical Engineering, Anhui Sanlian University, Hefei, 230601, China sanlian_whh@163.com Data mining technology provides an effective research tool for us to process uncertain, noisy and implicit information. Rough set theory, as a kind of typical data mining algorithm, provides an effective tool for research on analysis and induction of inaccurate data, mining relations among data and discovering potential knowledge. The paper will establish a short-term load prediction model based on rough set theory. It utilizes rough to carry out attribute reduction for various historical data related to load, gets rid of those irrelative attribute to decision information and simplifies input variable so as to shorten the search space of neural network and improve the prediction performance. 1. Introduction Change rule of short-term load is complicated as time changes. It is difficult to describe it with accurate mathematical model and a mass of uncertain factors exist in short-term load prediction, such as various information related to actual load (climate, humidity, precipitation, special event, etc.) (Wu and Niu, 2009) These factors and power load show non-linear relation. In the previous load prediction, these relations are provided by experienced dispatch personnel, but for load prediction, the knowledge is inaccurate and it is difficult to judge the change of various random fluctuation only by experience in current open power market (Chiu and Kao, 1997). In addition, it still needs many statistic information and priori knowledge in artificial intelligence forecasting method widely applied at present. However, the information may have information deficiency or incompleteness with the change of operation environment of information. Moreover, in short-term power load prediction, various factors affect load prediction and the impact degrees are various. Among these various factors, not all the conditions are necessary. Some of them are relative and some of them are independent, so it only needs some or a few conditions to induce the conclusion. Thus, it needs us to use new intelligence method for analysis and mine various relevant information methods from historical data automatically so as to enhance the precision and efficiency of load prediction (Paarmann and Najar, 1994). Data mining technology provides an effective research tool for us to process uncertain, noisy and implicit information. Rough set theory (Jia and Tian, 2008), as a kind of typical data mining algorithm, provides an effective tool for research on analysis and induction of inaccurate data, mining relations among data and discovering potential knowledge. The chapter will establish a short-term load prediction model based on rough set theory. It utilizes rough to carry out attribute reduction for various historical data related to load, gets rid of those irrelative attribute to decision information and simplifies input variable so as to shorten the search space of neural network and improve the prediction performance (Chen and Huang, 2014). 2. Basis of rough set theory Rough set theory was proposed by Polish scholar Z. Pawalk in 1982. It provides a new mathematical tool for processing imprecise and uncompleted information. Rough set theory is established based on classification mechanism. It interprets classification to be equivalence relation in specific space, while relation of equivalence forms the division of space. The theory construes knowledge as data division and each divided set is called concept (Li and Huang, 2014). Main idea of rough set theory is, on the premise of maintaining the DOI: 10.3303/CET1651070 Please cite this article as: Wang H.H., 2016, Application research on data mining and artificial intelligence theory in short-term power load forecasting, Chemical Engineering Transactions, 51, 415-420 DOI:10.3303/CET1651070 415 constant information system classification capacity, utilizing the known knowledge base to depict inaccurate or uncertain knowledge with the known knowledge in the knowledge base and import problem decision or classification rule through knowledge supplement and reduction. The most prominent difference between rough set theory and other theories processing uncertain and inaccurate problems is: rough set theory doesn’t need to provide any prior information other than the data set needed to process and the description and processing about problem uncertainty is objective. Moreover, the theory doesn’t include the mechanism of processing(Huang et.al, 2002) inaccurate or uncertain original data, so the theory has strong compliment with other theories processing inaccurate or uncertain problems, such as probability theory, fuzzy mathematics, evidence theory, etc. 2.1 Information system Information system is the object of rough set theory. It is a data set, and usually expressed as a data sheet. Each line of the data sheet represents an object and the object can be case, event, etc., while each list of data sheet is attribute of the object and these attributes can be the feature, measurement, etc. of the object. Information system can be formalized expressed as S=<U, Q, V, F>, including U={x1,x2…,xn} is finite non- empty object set, usually called domain of discourse, Q is finite non-empty attribute set. If the attribute set Q=AD, A, D of information system S is condition attribute set and decision attribute set, such information system can also be called decision system (or decision table), V=Vq is the union set Vq of all attributes of q value domain, f is the information function specifying the attribute value of each object, namely: f: UQV. 2.2 Relative reduction of attribute For attribute set A, DQ , matrix unit element of relative discernibility matrix is: : , , i j ij a A f x a f x a m (1) Relative discernibility function is: :1 , , ixD j ijf mi j n j i m (2) When 1 2 i i ipa a a is prime Df of discernibility function, attribute 1, ,i ipa a is relative reduction of D. Thus, we can seek for relative reduction through seeking for prime fd. 2.3 Dependence degree of importance of attribute If all attribute values of attribute set D totally depend on attribute values of attribute set A, it can be expressed as A and D. The dependence degree of attribute D to A is expressed with (A,D): , / card AA D card POS D U (3) According to the definition of dependence degree, the importance degree of equivalence class of U/IND(D) is defined as: , , , A, D , A D A a D SGF a A D (4) 3. Load forecasting model based on rough set theory An accurate load prediction model should be able to describe various factors related to load directly. From basic concept of rough set theory, we can see rough set can reduce and get rid of unnecessary information, which is good for information classification and reduction. In order to find directly related condition with load and improve the prediction accuracy and computation speed, the chapter uses rough set theory to reduce attribute of various factors affecting load, seeks for the necessary conditions directly related to power load, considers them as input vector of fuzzy neural network and applies the decision rule established by optimal reduction set to neural network structure design. As is shown in Diagram 1, the step of neural network load prediction model established based on rough set theory in the paper is as below: Establish initial information table according to historical load data, relevant information and historical data Make discretization of original data and establish real numeric type decision tables. Make attribute reduction to established decision table, get optimal condition attribute set and kernel related to load prediction 416 Establish neural network model according to decision rule set and determine initial weight of neural network. Provide training to neural network If the fitting error of neural network meets the requirement, then end. The initial samples Continuous attribute discretization Form the real value decision table Properties and contracted The best reduction set Neural network model Network training The output analysis The training sample set The test sample set training Figure 1: Based on rough set neural network optimization process 3.1 Determination of relevant factors The paper mainly considers the impact of climatic condition change on power load, makes fuzzy processing to climatic condition (including total cloud cover cl, wind speed meter Ws, maximum temperature Tmax, minimum temperature Tmin, humidity Hu, air pressure and precipitation Rf) and according to the features. At this time, the total number of condition attribute is N. membership function is shown as Diagram 2, the division standard of each climatic condition membership condition predicts daily load LD cluster to be £decision categories. In this paper, £=5, showing the load is very low (1-VL), low (2-L), ordinary (3-NM), high (4-H) and very high (5- VH). 3.2 Determination of fitness function Discernibility matrix Md(s) contains all the complete information differentiating xi, xj, and any reduction set can substitute the whole condition attribute without changing original dependence relation and resolving power, then the reduction set should represent the information in discernibility matrix as much as possible. If attribute B is the reduction set of attribute C, Formula (5) can be expressed as: C, B, , B, A, D C, , D D B D SGF t D C D (5) At this time, it can be used to indicate the degree of approximation of attribute set B to C, called approximate error of reduction. If attribute set R is the optimal reduction set C, then double C, D=double R, D, namely 417 SGF(R, C, D)=O. At this time, reduction set cannot make correct decision if it lacks any one element. In addition, a clear and simple decision rule should have less antecedent combination, namely the number of produced reduction set condition attributes should be as less as possible. Therefore, construct fitness function according to the above analysis, as is shown in Formula (6). The first item expresses the number of condition attributed contained by reduction set is lest, the second time expresses the coverage of reduction set in discernibility matrix is largest and the third item expresses the approximation degree of reduction set condition attributes to total condition attributes is greatest. 1 2 1 1 / (R, C, D) ( ) M i I F a S kl a sig L R (6) Including, L(R) is the number of condition attributes contained by reduction set, s is 0 or 1. When reduction set and some one element of discernibility matrix MD(S) have intersection, it is 1, or else 0. Kl is the number of elements in discernibility matrix: sig is the dependence degree of reduction set R to condition attribute set C, a1 and a2 are any non-negative random weight. We hope to get the most simple and most knowledge covered rule, so give greater weight to the first two items of fitting function and give the smaller weigh to the last item. The database The data warehouse A particular data set model knowledge Cleaning and integration Cleaning and integration The selection and transformation The selection and transformation Data miningData mining Assessment and said Assessment and said Figure 2: Process schematic knowledge mining 3.3 Basic process of attribute reduction If the attribute quantity of attribute combination in discernibility matrix is 1, it is the kernel attribute of decision table. It indicates that the remaining condition attributes cannot distinguish two different records of decision types in the information table other than attribute, while kernel attribute and the combination of it with any other attributes may constitute reduction set. Therefore, when utilizing discernibility matrix to seek for reduction set, in order to simplify, consider kernel attribute as feature attribute of data set and deem it as good gene to construct new attribute combination in genetic algorithm. However, the remaining useful attribute other than kernel attribute is got from analyse the matrix elements that attribute combination quantity is not 1. The paper considers other attributes other than kernel attributes as the gene pool, produce an attribute at random each time. Join it in the kernel attribute and judge whether the newly produced attribute combination meets the requirement or not. It is good for reducing search space of genetic algorithm and improving computation efficiency. Computation step of attribute reduction algorithm is as below: Compute core CORE according to discernibility matrix, get rid of all attributed included by CORE from the original condition attributes and take the remaining attributes as candidate gene. Select any attributes excluding CORE at random and join them in the CORE, then initial population including n individuals generate. Each attributed set generated is expressed in the form of binary string, including each single bit expresses a condition attribute, 1 expresses the attribute belongs to reduction set, and 0 means it doesn’t belong to reduction set. According to formula (4), compute fitness function of each individual. Produce new population through selection, cross and mutation operation and make fitness function approach maximum gradually. In order to get the simple reduction set, define two mutation probabilities pm1 and pm2, including pm1 expresses the probability from 1 to 0 and pm2 expresses the probability from 0 to 1. 418 When the computation reaches the fixed genetic algebra, then finish the computation, or else turn to c). Here, two opposite directions of mutation probabilities pm1 and pm2 are defined, which makes the individual change toward the constant reduction of attribute combination quantity with greater probability p m1. It is good for finding the minimum attribute combination. During the training, the optimal individual is maintained. These individuals constitute the optimal regulation set R jointly. It has less quantity of attribute and better resolving power. The reduction set got in the next chapter will be used for optimizing neural network structure so as to get a simple and transparent neural network. In addition, it still needs many statistic information and priori knowledge in artificial intelligence forecasting method widely applied at present. However, the information may have information deficiency or incompleteness with the change of operation environment of information. Moreover, in short-term power load prediction, various factors affect load prediction and the impact degrees are various. Among these various factors, not all the conditions are necessary. Some of them are relative and some of them are independent, so it only needs some or a few conditions to induce the conclusion. Thus, it needs us to use new intelligence method for analysis and mine various relevant information methods from historical data automatically so as to enhance the precision and efficiency of load prediction. Figure 3: Each feature vector membership function 4. Conclusion Through effective classification of load mode type, it can provide better load prediction classification model for power load prediction so as to improve fitting performance of neural network. However, a mass of factors affect short-term power load, and excessive input vectors not only increases the scale of neural network model, reduces training efficiency of neural network, but also affects fitting effect of neural network. How to use the information properly and effectively has become the key problem to improve load predication precision. To this end, the paper studies the factors affecting load prediction and proposes power load prediction algorithm based on rough set theory specific to climatic conditions. It starts with pre-processing of historical data firstly, cuts off redundant attributes and number of neural network input vectors through attribute reduction, then starts with reducing neural network model and improving transparency of neural network model, and applies sough set decision rule to neural network linkage and initial connection weight design. Thus, the paper realizes the purpose of improving prediction performance of model and improving efficiency and precision of load prediction. Research result indicates the knowledge-based prediction model and load classification model observed through data mining are able to describe the complicated relation in load prediction model and trace load mode change timely so as to improve short-term power load prediction effect. Data mining technology provides an effective research tool for us to process uncertain, noisy and implicit information. Rough set theory, as a kind of typical data mining algorithm, provides an effective tool for research on analysis and induction of inaccurate data, mining relations among data and discovering potential knowledge. The chapter will establish a short-term load prediction model based on rough set theory. It utilizes rough to carry out attribute reduction for various historical data related to load, gets rid of those irrelative attribute to decision information and simplifies input variable so as to shorten the search space of neural network and improve the prediction performance. 419 Acknowledgment This work was supported by the Nature Science Foundation of Anhui Province Education Department (No. KJ2016A251), the Nature Science Foundation of Anhui Sanlian University (No.2014Z020), Quality Project of Anhui Province Education Department (No. 2014jxtd046) and the revitalization plan project of higher education of Anhui Province Grant (No. 2013zytz082). Reference Chiu C.C., Kao L.J., 1997, Combining a neural network with a rule-based expert system approach for short- term power load forecasting in Taiwan, Expert Systems with Applications, 13, 299-30, DOI: 10.1016/S0957-4174(97)00048-1. Chen Q.S., Huang W., 2014, Short-term power load forecasting with least squares support vector machines and wavelet transform, Applied Mechanics and Materials, 494, 1647-1650, DOI: 10.4028/www.scientific.net/AMM.494-495.1647. Huang H.C., Hwang R.C., Hsieh J.G., 2002, Short-term power load forecasting by non-fixed neural network model with fuzzy BP learning algorithm, International Journal of Power and Energy Systems, 22, 50-57, DOI: 10.1007/978-3-319-09330-7_47 Jia Z.Y. Tian L., 2008, Short-term power load forecasting based on fuzzy-RBF neutral network, Proceedings of International Conference on Risk Management and Engineering Management, 6, 349-352, DOI: 10.1109/ICRMEM.2008.41. Li L.J., Huang W., 2014, A short-term power load forecasting method based on BP neural network, Applied Mechanics and Materials, 4945, 1647-1650, DOI: 10.4028/www.scientific.net/AMM.494-495.1647. Paarmann L.D., Najar M.D., 1994, Short-term power load forecasting based on autocorrelation function optimization, Midwest Symposium on Circuits and Systems, 2, 1475-1478, DOI: 10.1016/j.epsr.2016.04.003. Wu J., Niu D.X., 2009, Short-term power load forecasting using Least Squares Support Vector Machines(LS- SVM), 2nd International Workshop on Computer Science and Engineering, WCSE 2009, 1, 246-250, DOI: 10.1109/WCSE.2009.663. 420