HUNGARIAN JOURNAL OF INDUSTRIAL CHEMISTRY VESZPRÉM Vol. 33(1-2). pp. 57-67. (2005) FUZZY ASSOCIATION RULE MINING FOR DATA DRIVEN ANALYSIS OF DYNAMICAL SYSTEMS F.P. PACH, F. SZEIFERT, S. NEMETH, P. ARVA AND J. ABONYI* Department of Process Engeneering, University of Veszprém, Veszprém, Egyetem u. 10, H-8200, HUNGARY, www.fmt.vein.hu/softcomp, abonyij@fmt.vein.hu In system identification a key step is to find a suitable model structure. The utilizations of prior knowledge and physical insight about the system are very important when selecting the model structure. In nonlinear black-box modeling no physical insight is available we have “only” observed inputs and outputs from the dynamical system. Association rule mining is one of the widely used data mining tools. It finds interesting association or correlation relationships among a large data set. The aim of this paper is to demonstrate that this data mining tool can be effectively applied for the data- driven modeling and analysis of dynamical systems. The detected association rules can be interpreted as simple local input-output models of the modeled process. Hence, the analysis of the mined association rules (models) can provide useful information about the structure and the order of the model that can adequately describe the dynamical behavior of the process. In this paper a fuzzy association rule mining algorithm is introduced and a rule-base simplification algorithm is presented for the generation of a set of “rule-based models” that can be directly used as a qualitative model of the system. The general applicability of the developed tool is illustrated by the analysis of the input-output data of a continuously stirred styrene polymerization reactor. The detected association rules is used for the selection of the structure of a linear and nonlinear (neural network) models for this process and determine the most relevant process variables. Keywords: process modeling, model structure selection, association rules, rule base systems, polymerization Introduction In process modeling the a priori knowledge, experimental data and experiments are crucial. The process of modeling from experimental data is known as system identification by Ljung [1]. The main steps of the system identification process are summarized well by Petrick and Wigdorowitz [2]: 1. Design an experiment to obtain the physical process input/output experimental data sets pertinent to the model application. 2. Examine the measured data. Remove trends and outliers. Apply filtering to remove measurement and process noise. 3. Construct a set of candidate models based on information from the experimental data sets. This step is the model structure identification. 4. Select a particular model from the set of candidate models in step 3 and estimate the model parameter values using the experimental data sets. 5. Evaluate how good the model is, using an objective function. If the model is not satisfactory then repeat step 4 until all the candidate models have been evaluated. 6. If a satisfactory model is still not obtained in step 5 then repeat the procedure either from step 1 or step 3, depending on the problem. A key step (step 3) is to find a suitable model structure which is capable of representing the dynamical behavior of the system. Therefore effective methods for structure selection are necessary. Consider the main aspects influencing the choice of a model structure: - What type of model is needed, nonlinear or linear, static or dynamic, distributed or lamped? *Correspondence concerning this article should be addressed to J. Abonyi (abonyij@fmt.vein.hu) 58 - How large must the model set be? This question includes the issue of expected model orders and types of nonlinearities. - How must the model be parameterized? This involves selecting a criterion to enable measuring the closeness of the model dynamic behavior to the physical process dynamic behavior as model parameters are varied. Large number of model structure selection methods has been introduced. For linear models, for example correlation analysis and multivariate structure selection techniques [3] such as principal component analysis (PCA) are proposed. Several information-theoretical criteria have been also proposed for the structure selection of linear dynamic input-output models. These methods are based on the minimization of a criterion function which involves the estimation of the one-step-prediction error plus some penalty function. The classical criteria are the Final Prediction Error (FPE), the Akaike Information Criterion (AIC) [4], the Minimum Description Length (MDL) criterion [5], the Schwarz criterion (BIC) [6] and the Hannan-Quinn Criteria (HIC) [7]. They only differ on the employed penalty function, but in [8] a new criterion function is introduced based on the decomposition of the variance of the innovations of the model in terms of their frequency components. The information criteria have been used in a context of regression models [9, 10], in distributed lag regression models [11], or in selection of the order an autoregressive and autoregressive moving average model [12, 13, 14]. In paper [15] the effects of the model selection problem, and in paper [16] the variable selection problem are studied. Determining the structure of linear systems is a rather straightforward task with these tools, but for nonlinear systems other structure selection methods are need. Aguirre and Billings [17] defined the concepts of term clusters and cluster coefficients and used in the context of system identification. This approach is used for the structure selection of polynomial models in the paper of Aguirre and Mendes [18]. In [19] an alternative solution is introduced by initially conducting a forward search through the many possible candidate model terms before performing an exhaustive all-subset model selection on the resulting model. A backward search approach based on orthogonal parameter estimation is also applied to structure selection [20, 21]. The paper [22] discusses several model structures selection methods and nonlinear input-output models that are suitable for implementation of feed-forward neural networks. A systematic method for the selection of model order and time delay is presented in [23]. The method is applied to the neural network modeling of a multivariable chemical process rig. A deterministic suitability measure is introduced in [24] that quantifies the capably of a particular model class to capture the control relevant I/O-behavior of a nonlinear system. This suitability measure can be used for the purpose of model structure selection prior to the actual parameter identification. The fast bootstrap (FB) methodology to select the best model structure is presented in [25]. The methodology is applied to a regression task. In [26] a methodology for model structure selection based on a genetic algorithm was introduced and applied to non- linear discrete-time dynamic systems. A modified genetic programming approach for model structure selection is introduced in [27]. It is combined with a classical technique for parameter estimation. Hong and Harris [28] introduced a learning algorithm for model subset selection which based on a new composite cost function that simultaneously optimizes the model approximation ability and model adequacy. In [29] a cost functional is evaluated for each identified model and the model with minimum cost is preferred. Suboptimal search strategies are adopted, forward and stepwise strategies are considered. We introduce in this paper a new data-driven structure selection method. The new method is based on fuzzy association rule mining and it is called MOSSFARM (Model Structure Selection by Fuzzy Association Rule Mining). Association rule mining finds interesting association or correlation relationships among a large data set. The problem of mining association rules was introduced over supermarket basket data in [30]. It helps to learn more about the buying habits of the customer. It gets information and answers for the market questions. But the market basket analysis is just one application of association rule mining this paper presents a new application area, the model structure selection. This paper is organized as follows. The first section introduces the system identification problem in nonlinear black box modeling. Association rule mining theory is presented in the second section. In third section the fuzzy association rule mining is detailed. Our new method based on fuzzy association rule mining is introduced in the fourth section. The last fifth section illustrates how the MOSSFARM method select the most important model structures of a linear (least square method) and a non-linear (neural network) model of a styrene polymerization CSTR. System Identification in Nonlinear Black Box Modeling To be successful the entire modeling process should be given as much information about the system as is practical. The utilization of prior knowledge and physical insight about the system are very important, but in nonlinear black-box situation no physical insight is available, we have “only” observed inputs and outputs from the system. This paper concentrates on structure selection task in case of the black-box modeling. 59 In a system identification problem in case of black-box modeling [31] we have only input, and output, data from the process (system), ku ky (1) [ kk uuu ,,21 K=u ] (2) [ ]kk yyy ,,21 K=y We are looking for a relationship between past observations and future outputs, ],[ 11 −− kk yu , (3) kkkk efy += −− ),( 11 yu where represents an error value, because will not be an exact function of past data. However, a goal must be that is small, so that we may think of function as a good prediction of . ke ky ke (.)f ky Eq. 3 models general discrete time dynamic systems, but nonlinear static processes can be also represented by the following regression model: (4) )( kk fy x= The is a non-linear function, represents its input vector and denotes the k-th input-output data. The model regression vector in a NARX (Non-linear Auto-Regressive models with eXegoneus Inputs) model contains the past values of the process outputs and the process inputs as regressors: (.)f kx Nk ,,1 K= kx ky ku T nkkkmkkkk uuuyyy ],,,,[ ,2,12,1 −−−−−−= KKx (5) The m determines the number of past outputs and n represents the past inputs (model order). While the output of the regression model, is the one-step-ahead prediction of the process. This SISO form of the NARX models could be extended to the MIMO case. ky Association Rule Mining One of the widely used research tasks in data mining is the discovery of frequent item sets and association rules. The problem originates in market basket analysis which aims at understanding the behavior of retail customers, or in other words, finding associations among the items purchased together. A famous example of an association rule in such a database is “Diapers => Beer”, i.e. young fathers being sent off to the store to buy diapers, reward themselves for their trouble. Because of the practical usefulness of association rule discovery, this approach can be applied in various research areas. How can we search association rules? The association rule mining is based on frequent item set searching. An item could be for example one product in the supermarket example, e.g. {beer}, and an item set is a set of items (products), e.g. {milk, beer, diapers}. The occurrences of an item (item sets) in a data set are called support. The support value of an item (item set) could be seen as a probability value. The support gives in how many percent of the transactions is the item (are the items of an item set together)? Be the X an item set, the support value of X is calculated as follows: nstransactio of with nstransactio of # # )()supp( X XPX == (6) An item x (or the item set X) is called frequent item (item set) if its support is higher than a given (user defined) threshold, namely the minimal support (σ ). See Table 1 which includes an example supermarket transaction data set, where each row represents a transaction. The first column contains the transaction number (Tid - transaction identifier) and in the second column the purchased products are listed in the transaction. Table 1 Example transaction data set Tid Items 1 Bread, Milk 2 Beer, Bread, Diaper, Eggs 3 Beer, Coke, Diaper, Milk 4 Beer, Bread, Diaper, Milk The frequent item set searching is a very easy task for this example data set, because the number of transactions is only four. If the minimum support is equal to 50 percent (σ = two occurrences), we can search all the frequent items and item sets, e.g. a frequent item is the {Milk} (with 75 % support), or a frequent item set is the {Diaper, Beer} (with 75 % support). But if we have a large data set (database) with many transactions and several items the frequent item set searching demands to use an efficient algorithm. A widely used frequent item set searching algorithm is the Apriori algorithm (was introduced in [30]). The name of algorithm is based on the fact that the Apriori uses prior knowledge of frequent item sets already determined. It is an iterative, breadth-first search algorithm, based on generating stepwise longer candidate item sets, and clever pruning of non-frequent item sets. Pruning takes advantage of the so-called apriori (or upward closure) property of frequent item sets: all subsets of a frequent item set must also be frequent. Each candidate generation step is followed by a counting step where the supports of candidates are checked and non-frequent ones deleted. Generation and counting alternate, until at some step all generated candidates turn out to be non-frequent. If we searched all the frequent item sets by the Apriori algorithm, we can generate association rules from them. An association rule has two parts: the rule antecedent 60 (denoted by X) and the rule consequent (denoted by Y), and both of them contain items. Therefore an association rule is represented by the form X => Y. From a frequent item set we can generate all the possible rules. Each item and sub item set could be placed in both parts of a rule. In the previous example, we can generate six rules from the frequent item sets {Beer, Bread, Diaper} (see the possible rules in Figure 1). Fig.1 Example for association rule generating It is very important to know which association rules are well usable, namely which gives the most information about the data. Two basic measures are used to calculate how “important” an association rule. First one is the previously defined support measure. The support of an association rule is the support of the set of its items. For example, the support of the rule “{Bread, Diaper} => {Beer}” is equal to the support of the item set {Beer, Bread, Diaper}. The second one, the confidence measure of a rule is calculated as follows: )supp( )supp( )( X YX YXconf ∪ ==> (7) Because the confidence measure is a conditional probability (the quotient of the rule - and antecedent supports) it serves information about the relationship of the antecedent and consequent parts of a rule. A rule X => Y is called important, or strong rule if its support and confidence are higher than the minimum support (σ ) and the minimum confidence thresholds ( γ ). The confidence is a basic rule interestingness measure, but many other measures are also can be used to determine the importance and ordering the mined rules (e.g. RI – Rule Interesting, Lift, Correlation, Jaccard, Piatetsky- Shapiro, etc. measures). The number of the possible association rules is very high in case of an item set with many elements (e.g. if ten items are in an item set, the number of possible rules is ). Therefore it is well worth using the anti- monotonic feature. For generating of the (n-1)-large rules we can use in the previous step selected frequent (n)-large rule. For example be the Z a frequent item set. The following two association rules are generated from Z: 1) X => Z \ X and 2) x => Z \ x, where . If the first rule does not fulfil the support criteria, the second rule is also can not to be frequent. 2210 − Xx ∈ The importance of an association rule can be determined not only by objective way. The determination can be also subjective. The users can add the “right” form of the rules. Suppose that a user wants to search only rules with a distinguished item in consequent part. For example be this item a product- family, books. In this case, only the rules where a book is placed in the consequent part will be strong rules. To increase the usability of the association rule, fuzzy association rules are proposed. In the next section, the basic definitions of fuzzy association rule theory are presented. Fuzzy Association Rule Mining The fuzzy association rules can be discovered in also two steps as we showed in crisp case: 1) mining frequent item sets, and 2) generating fuzzy association rules from the discovered set of frequent item sets. A dataset (database) includes records (data rows) via the data fields (columns). The fields are frequent called as attributes. Because in fuzzy association rule theory the items are fuzzy sets, a partition method is necessary which transforms the crisp data set into fuzzy data set for all attributes. For the numerical attributes (as for example the temperature, pressure, etc.) of the data set, fuzzy sets can be defined with Gaussian, sigmoid, or piecewise linear fuzzy dichotomies. Therefore triangular, trapezoidal type of fuzzy sets (intervals) can be used for data partition. See the Figure 2 for an example, where the attributes and are partitioned by two-two trapezoidal fuzzy sets. 1z 2z Fig.2 A fuzzy partition of data space ( , ) 1z 2z Let be a transformed (partitioned) fuzzy dataset of N tuples (data records ~ data points) with a set of attributes },,,{ 21 NtttD K= },,,{ 21 qzzzΖ K= and let be an arbitrary fuzzy interval (fuzzy set) associated with attribute in Z where q denotes the number of attributes. From this point, we use the notation jic , iz jii cz ,: for an attribute-fuzzy interval pair, or simply 61 fuzzy item. An example could be youngAge : . For fuzzy item sets, we use expressions like CZ : to denote an ordered set of attributes (Ζ⊆Z Ζ denotes the set of the all possible attributes) and a corresponding set of some fuzzy intervals, one per attribute, i.e C ]:::[: ,,, 2211 jiijiijii qq czczczCZ ∪∪∪= K . In the literature, the fuzzy support value has been defined in different ways. Some researchers suggest the minimum operator as in fuzzy intersection, others prefer the product operator. They can be defined formally as follows: value for attribute , then the fuzzy support of )( ik zt iz 2: CZ with respect to D is defined as N zt CZFS N k ikCZcz jii ∑ Λ = = ∈1 :: )( ):( , (8) where the is an operator set. In this paper we prefer the product form. A fuzzy support reflects how the record of the identification data set support the item set. A fuzzy item set },{min, KΠ=Λ CZ : is called frequent if its fuzzy support value is higher than or equal to a user-defined minimum support (σ ). The following example illustrates the calculation of the fuzzy support value. Let ]high : Income medium : Balance[: U=AX be a fuzzy item set and the example dataset is shown in Table 2. Table 2 Example database containing memberships The fuzzy support of AX : is calculated as follows: 0.3367= ⋅+⋅+⋅ = 3 7.07.04.08.04.05.0 ):( AXFS Since the rules are generated from the frequent item sets, the generation of fuzzy association rules becomes relatively straightforward. More precisely, each frequent item set CZ : is divided into the consequent BY : and antecedent AX : , where and . With the use of this notation a fuzzy association rule can be represented in the form of CAXZYZX ⊂−=⊂ ,, ACB −= If X is A, then Y is B, (9) or in more compact form, BYAX :: ⇒ . (10) A fuzzy association rule is considered strong if its support and confidence exceeds the given minimum support (σ ) and minimum confidence ( γ ). Since the rules are generated from frequent item sets, they satisfy the minimum support automatically. The fuzzy confidence of a fuzzy association rule BYAX :: ⇒ is defined as ):( )::( )::( AXFS BYAXFS BYAXFC ⇒ =⇒ (11) and it is a conditional probability of the parts of the rule: ( )AXBYP :|: . Fuzzy association rules mined using the above fuzzy support-confidence framework are useful for many applications. However, a rule might be identified as interesting when, in fact, the occurrence AX : does not imply the occurrence of BY : . The occurrence of a fuzzy item set AX : is independent of the item set BY : if B), : FS(Y A) : FS(X C) : FS(Z ⋅= otherwise item sets AX : and BY : are dependent and correlated as events. The correlation between the occurrence of AX : and BY : can be measured by computing the interestingness of a given rule: ( ) ):():( ):( :,: BYFSAXFS CZFS BYAXFcorr ⋅ = (12) Balance: med. Credit: high Income: high 0.5 0.6 0.4 0.8 0.9 0.4 0.7 0.8 0.7 If the resulting value of Eq. 12 is less than one, the occurrence of AX : is negatively correlated with the occurrence of BY : . If the resulting value is greater than one, AX : and BY : are positively correlated. It means the occurrence of one implies the other. If the resulting value is near to one, then AX : and BY : are independent and there is no correlation between them. After the review of basics of association rule mining, the next section presents how use the fuzzy association rules for model structure selection. MOSSFARM - Model Structure Selection by Fuzzy Association Rule Mining Since in the previous section all of the necessary definitions and methods to mine fuzzy association rules were considered, this section will focus on the main steps of our method that are needed to solve the studied model structure (order) selection problem in case of NARX model. 62 Suppose that we have only measured input-output data from a SISO process. In a NARX model of the process, from this I/O data set we can construct a regression vector from each input-output data pairs by the following way, similar for Eq. 5: Nk ,,1 K= T nkkkmkkkk uuuyyy ],,,,[ ,2,12,1 −−−−−−= KKx where the past values of the process outputs and the process inputs are the regressors. The number of past inputs (n) and past outputs (m) are often referred to as model order. ky ku The question is how to select the right model order? An answer could be our method, the MOSSFARM. The method consists of the following five steps: 1) Generate a fuzzy database 2) Mine frequent fuzzy item sets 3) Generate fuzzy association rules 4) Prune the fuzzy rule base 5) Aggregate the mined rules, select the model structure Step 1) Observed (measured) input-output data are general crisp values. In the first step the “attributes” (regressors in the regression vector) need to partition to get fuzzy valued data set. The fuzzy Gustafson-Kessel (GK) [33] clustering algorithm partitions the initial data on every attribute (dimension of data ~ all the candidate regressors). The resulted membership functions are transformed into trapezoidal membership functions (see an example in Fig. 4 where on two attributes are four and three fuzzy sets, respectively). 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 1 1 Fig.4 Trapezoidal membership functions Step 2) The resulted fuzzy data set includes the membership function values of each data points on each attributes, and the index of fuzzy set which give the highest membership value for the data point in a given attribute. These indices can be the items and the set of them are the item sets. The frequent item set searching is based on a fuzzy implementation of the Apriori algorithm. The fuzzy support values are calculated as the Eq. 8 shows. Step 3) The mined frequent item sets are the base of fuzzy rule generation step. Every fuzzy rules are generated, but only the rules with high support (FS > σ ) and confidence (FC > γ ) values are the relevant rules. The fuzzy support and confidence values are calculated as the Eq. 8 and 11 show. The interesting value of the rules are determined by the correlation factor (Fcorr, calculated by Eq.12). Step 4) The advantage of the application of the correlation measure for the analysis of the quality of the rules is that it is upward closed. Based on this, a rule based pruning algorithm has been developed that removes the unnecessarily complex rules. Such rules contain input variables that do not significantly improve the correlation of rules. Step 5) We have to analyze the mined association rules to determine model structures. The rules where the first indices of the fuzzy sets are equal in the antecedent parts, give identical model structures. Therefore it is necessary to aggregate the support, the confidence, and the correlation measures of these individual rules. After the aggregation the given model structures must be ordered by the correlation (calculated by Eq. 12) measure (or by other rule interesting measures) and accordingly the first structures will be the most interesting structures of the models. In the next section an application study is showed to illustrate how this method determines the structures of the models for a dynamic system and selects the most relevant process variables. Application Study Model of styrene polymerization CSTR The data-driven modeling of styrene polymerization in a continuously stirred tank reactor is considered as a case study to demonstrate the applicability of the proposed method. The schematic diagram of the polymerization process is shown in Fig. 5. Qm Cmf Tf Qc Qt Ci CmTTc Tc Ci Qs Qi Cif Tf Tf Cm T Tcf Qm Cmf Tf Qc Qt Ci CmTTc Tc Ci Qs Qi Cif Tf Tf Cm T Tcf Fig.5 Scheme of the styrene polymerization CSTR with a cooling jacket On Figure 5, , , , denote the monomer flowrate, the monomer feed concentration, the feed mQ mfC fT mC 63 temperature, and the monomer concentration, respectively. The solvent flowrate is represented by , while , , are the initiator flowrate, the initiator feed concentration, and the concentration of the initiator in the reactor, respectively. , and are the coolant feed temperature, the coolant flowrate, and the coolant temperature. The total flowrate is denoted by and T is the reactor temperature. The initiator is azobisisobutyroitrile (AIBN) dissolved in benzene, while the monomer is styrene and the solvent is benzene. sQ iQ ifC iC cfT cQ cT TQ For the simulation of this system the model of Hidalgo and Brosilow [33] is applied: id ismiifii Ck V CQQCQ dt dC − +− = )( Q+ (13) gpmp msmimfmm CCk V CQQQCQ dt dC − ++− = )( (14) )( )( ))(( c p gpmp p fsmi TT VC AU CCk C Hr V TTQQQ dt dT −− − + −++ = ρρ (15) )( )( c pc ccfcc TT VC AU V TTQ dt dT −+ − = ρ (16) ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ = t id gp k Cfk C 2 (17) where represents the concentration of the growing polymer. gpC The dimensionless model and its nominal parameter values are detailed in [34]. The dynamical behavior of this dimensionless model is illustrated in Figure 6, where the coolant flowrate (Qc) is considered as an input variable. The steady-state input-output relationship of Qc and the reactor temperature (T) confirms the nonlinear behaviour of the model (see Figure 7). Suppose we have only input output simulated data taken from the above CSTR model, and we want to identify a linear ARX or a neural network based NARX model with some model structure. To determine (select) the structure of these models the proposed fuzzy association rule based method will be followed in the next session. 0 50 100 150 200 250 300 0 0.005 0.01 C i 0 50 100 150 200 250 300 0 0.2 0.4 C m 0 50 100 150 200 250 300 0 1 2 3 Tr 0 50 100 150 200 250 300 -1 0 1 Tc 0 50 100 150 200 250 300 0 5 Q c time Fig.6 Dynamical behavior of the dimensionless model of the styrene polymerization reactor 0.2 0.6 1 1.4 1.8 2.2 2.6 0 1 2 3 4 5 6 coolant flow rate (Qc) re ac to r te m p . ( T ) Fig.7 Steady-state input-output relationship of the coolant flowrate and the reactor temperature Model structure selection by MOSSFARM A SISO NARX model of the previous system is considered where the output of model (y) is the reactor temperature, and the input (u) is the coolant flowrate. The maximal number of lagged inputs and outputs is four-four, therefore the more complex model is: ]),,,([ 4,,14,,11 −−−−+ = kkkkkkk uuuyyyfy KK (18) If the original (the full, see Eq. 18) model structure is used for a linear and a neural network (NN) based model, the mean square error (MSE) values are 0.00047 and 0.00008, respectively. In the linear model Least Square Method is used. At the NN model the number of regressors in the structure is used for set the number of neurons in input, hidden and output layers (e.g. for the structure [yk, yk-1, yk-2, yk-3, uk] input: 5, output: 1, hidden: 3). The applied learning method was the back- propagation method. The structures with highest correlation factor (selected by MOSSFARM with σ =1 %, γ = 95 %) are listed in Table 3. 64 Table 3 The selected model structures # Structure FS FC Fcorr 1. yk, yk-1, yk-2, yk-3, uk 8.1 96.3 822 2. yk, yk-1, yk-3, uk 8.2 95.9 819 3. yk, yk-2, yk-3, uk 8.2 95.7 817 4. yk, yk-1, yk-2, yk-3, uk-1 8.7 95.3 814 5. yk, yk-3, uk, uk-2, uk-3 8.3 95 811 Table 4 shows the selected model structures for several minimal support and confidence conditions. The selected model structures (e.g. in Table 3) can be used to identify the linear and the NN model. Table 4 The selected model structures for several searching condition (support, confidence) σ γ Best structure 1 95 yk, yk-1, yk-2, yk-3, uk 1 80 yk-3, uk 1 70 yk, yk-1, yk-3, uk, uk-2 1 60 yk, yk-1, yk-2, yk-3, uk, uk-1 1 50 yk-1, yk-3 uk, uk-1, uk-2, uk-3 5 50 yk, yk-3, uk 8 60 yk, yk-3, uk 3 70 yk, yk-3, uk The results are showed in Figures 8-11. We can see that the NN model gives lower MSE values in all cases. The MSE values are lower at the using of first structures as the other structures in both model (linear and NN). The resulted MSE values are higher than the original (the full) structures, but the differences are not considerable and these structures are smaller than the original model structure. Therefore MOSSFARM can be an efficient method for determining the model structure for input- output models. 0.00040 0.00050 0.00060 0.00070 0.00080 0.00090 1 2 3 4 5 structure # M SE Fig.8 The MSE values for the linear model with the selected model structures 0.00004 0.00009 0.00014 0.00019 0.00024 0.00029 1 2 3 4 5 structure # M SE Fig.9 The MSE values for the NN model with the selected model structures 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 y(k) - data ym (k ) - li ne ar m od el Fig .10 The linear model output values - ym(k) - in function of the simulated output values - y(k) - 65 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 y(k) - data ym (k ) - N N m od el Fig.11 The NN model output values - ym(k) - in function of the simulated output values - y(k) - Table 5 The effects of the rule pruning step Original r. base Pruned r. base First structure yk, uk, uk-1, uk-2, uk-3 yk, yk-3,uk Rules 714 27 Conditions 3157 77 MSE of Lin. 0.001 0.0008025 MSE of Lin. free 0.0352 0.0141 MSE of NN 0.00039197 0.00030913 MSE of NN free 0.03 0.0093 Table 5 shows the results of the rule base complexity analysis. The searching conditions were followings:σ = 3 %, γ = 90 % for the original simulated (with the regressor in Eq. 18) data. The rule pruning step give smaller, but well usable model structures both in linear and NN models. The pruned rule base has 97.56 percent less complexity than the original rule base and lower MSE values. If the structures are ordered by the number of aggregated rules, the first structures are the yk, yk-1, yk-2, uk-1, uk-2 (with the original rule base), and the yk, yk-3,uk (with pruned rule base, all other searching parameters are equal as was in case of Table 5). For the structure yk, yk-1, yk-2, uk-1, uk-2 the MSE values of the linear and the NN model in the case of free running are 0.0223, 0.0192 respectively (the results are depicted in Fig. 12 and Fig. 13). For structure yk, yk-3,uk lower MSE values are resulted: 0.0141 at free running of linear model and 0.0174 at free running of NN model. 0 200 400 600 800 1000 1200 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Linear input-output modell (free run) y: output ym: modell output (free run) Fig.12 Results of the free run of linear model for yk, yk-1, yk-2, uk-1, uk-2 structure 0 200 400 600 800 1000 1200 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Neural network modell (free run) y: output ym: modell output (free run) Fig .13 Results of the free run of neural model yk, yk-1, yk-2, uk-1, uk-2 structure Feature (variable) selection by MOSSFARM The proposed method also can be used for select the most relevant variable which determine the output variable. The four state dimensionless form of the model Hidalgo and Brosilow could be extended with the dimensionless moment equations. A new dataset is based on the simulation of this extended dimensionless model with six state variables. The dimensionless number-average molecular weight (NAMW) is the ratio of the last two state variables of the model [35]. The relevant variables which determine the value of the NAMW are selected by the MOSSFARM method. The new initial model structure is: ]),([ 7,,211 kkkk xxxfy K=+ (19) where the dimensionless variables are the following: y : model output NAMW, x1 : the initiator concentration, x2 : monomer concentration, x3 : reactor temperature, x4 : jacket temperature, x5 : the first variable in the moment equations, x6 : the second variable in the moment equations and x7 : cooling jacket flowrate. If the searching conditions were the followings: number of 66 partitions (~ clusters) for the output variable: 5, the number of partitions in the regressor variables: 3, σ = 10 %, γ = 50 %, the MOSSFARM selects the structure x1k, x3k (“aggregated” from one rule) as first (ordered by the correlation). This result says that the initiator concentration and the reactor temperature determine the number-average molecular weight. If the ordering is based on the number of aggregated rules, the first selected structure is the x6k, x7k. (aggregated from two rules). The results are summarized in Table 6. Table 6 Linear and NN model for estimate number-average molecular weight Ordering by corralation Ordering by rule number First structure x1k, x3k x6k, x7k MSE of Lin. 0.0103 0.0049 MSE of Lin. free 0.0106 0.576 MSE of NN 0.0001723 0.000416 MSE of NN free 0.2080 0.576 Conclusions This paper showed a new model-free, fuzzy association rule mining based method for model structure selection for input-output data-driven models. The results show that the developed tool provides an efficient method for determining the model structure of both linear and neural network based input-output models. Moreover this method is also can be used to selection of the most relevant process variables (feature selection problem). The proposed approach has been implemented as a MATLAB program called MOSSFARM (Model Structures Selection by Association Rules Mining), it will be free available from: www.fmt.vein.hu/ softcomp. Acknowledgement This project has been financially supported in part by the Hungarian National Science Foundation OTKA (No. T037600, No. T049534). REFERENCES 1. LJUNG L.: System Identification. 1987, Prentice Hall 2. PETRICK M. H., WIGDOROWITZ B.: A priori nonlinear model structure selection for system identification. Control Eng. Practise, 1997, 5(8), 1053-1062 3. WINKLER P.: Optimized Multivariate Lag Structure Selection. Computational Economics, Springer, 2000, 16 (1/2), 87-103 4. AKAIKE H.: A new look at the statistical model identification. IEEE Trans. Autom. Control, 1974, 19, 716–723 5. LIANG G., WILKES D. and CADZOW J.: Arma model order estimation based on the eigenvalues of the covariance matrix. IEEE Trans. Signal Process, 1993, 41 (10), 3003–3009 6. SCHWARZ G.: Estimating the dimension of a model. Annals of Statistics, 1978, 6, 461-464 7. HANNAN E. J., QUINN B. G.: The determination of the order of an autoregression. Journal of the Royal Statistical Society Series B 1979, 41, 190-195 8. HIDALGO J..: Consistent order selection with strongly dependent data and its application to efficient estimation. Journal of Econometrics, 2002, 110, 213-239 9. SHIBATA R..: An optimal selection of regression variables. Biometrika, 1981, 68, 45-54 10. PÖTSCHER B. M.: Model selection under nonstationarity: autoregressive models and stochastic linear regression models. Annals Statistics, 1989, 17, 1257-1274 11. GEWEKE J., MEESE R.: Estimating regression models of finite but unknown order. International Economic Review, 1981, 22, 55-70 12. SHIBATA R..: Selection of the order of an autoregressive model by Akaike’s information criterion. Biometrika, 1976, 63, 117-126 13. SHIBATA R..: Asymptotic efficiency selection of the order of the model for estimating parameters of a linear process. Annals of Statistics, 1980, 8, 147- 164 14. HANNAN E. J.: The estimation of the order of an ARMA process. Annals of Statistics, 1980, 1071- 1081 15. PÖTSCHER B. M.: Effects of model selection on inference. Econometric Theory, 1991, 7, 163-185 16. GEORGE E. I.: The variable selection problem. Journal of the American Statistical Association, 2000, 95, 1304-1308 17. AGUIRRE L. A., BILLINGS S. A.: Improved structure selection for nonlinear models based on term clustering. Int. J. Control, 1995, 62, 569–587 18. AGUIRRE L. A., MENDES E. M. A. M.: Global nonlinear polynomial models: Structure, term clusters and fixed points. Int. J. Bifurcation Chaos, 1996, 6, 279–294 19. MENDES E. M. A. M., BILLINGS S. A: An alternative solution to the model structure selection problem. IEEE Trans. Syst. Man Cybernetics, Part A: Syst. Humans, 2001, 31 (6), 597–608 20. KORENBERG M., BILLINGS S. A, LIU Y., MCOILROY P.: algorithm for nonlinear stochastic systems. Int. J. Control, 1988, 48, 193–210 21. ABONYI J.: Fuzzy Model Identification for Control, 2001, Birkhauser, Boston 67 22. PETROVIC I., BAOTIC M., PERIC N.: Model structure selection for nonlinear system identification using feedforward neural network. International Joint Conference on Neural Networks (IJCNN'00), 2000, 1, 53-57 23. YU D. L., GOMM J. B., WILLIAMS D.: Neural model input selection for a MIMO chemical process. Eng. App. Of Artificial Intelligence, 2000, 13, 15-23 24. MENOLD P. H., ALLGÖWER F., PEARSON R. K.: Nonlinear structure identification of chemical processes. Computers chem. Engng., 1997, 21, 137- 147 25. LENDASSE A., SIMON G., WERTZ V., VERLEYSEN M.: Fast bootstrap methodology for regression model selection. Neurocomputing, 2005, 64, 161- 181 26. AHMAD R., JAMALUDDIN H., HUSSIAN M. A.: Model structure selection for a discrete-time non-linear system using a genetic algorithm, in Proceedings of the I MECH E Part I Journal of Systems & Control Engineering, 2004, 85-98 27. METENIDIS M. F., WITCZAK M., KORBICZ J. X.: A novel genetic programming approach to nonlinear system modeling: application to the DAMADICS benchmark problem. Eng. App. Of Artificial Intelligence, 2004, 17, 363-370 28. HONG X., HARRIS C. J.: Nonlinear model structure detection using optimum experimental design and orthogonal least squares. IEEE Trans Neural Networks, 2001, 12 (2) , 435-439 29. BASSO M., GIARRÉ L., GROPPI S., ZAPPA G.: NARX models of an industrial power plant gas turbine. IEEE Trans. on Control Systems Technology , 2005, 13 (4) 30. AGRAWAL R., IMIELINSKI T. and SWAMI A.: Database mining: A performance perspective. IEEE Transactions on Knowledge and Data Engeneering, December 1993, 5(6):914-925, Special Issue on Learning and Discovery in Knowledge-Based Databases 31. SJÖBERG J., ZHANG Q., LJUNG L., BENVENISTE A., DEYLON B., GLORENNEC P-Y., HJALMARSSON H., JUDITSKY A.: Nonlinear Black.box modeling in system identification: a unified overview. Automatica 1995, 31(12), 1691-1724 32. GUSTAFSON D. E. and KESSEL W. C.: Fuzzy clustering with fuzzy covariance matrix, In Proceedings of the IEEE CDC, San Diego, 1979, pages 761–766 33. HIDALGO P. M. and BROSILOV C. B.: Nonlinear model predictive control of styrene polymerization at unstable operating points, Comp. Chem. Eng., 1990, 14, 481-494 34. RUSSO L. P. and BEQUETTE B. W.: Operability of chemical ractors: multiplicity behavior of a jacketed styrene polymerization reactor, Chem. Eng. Science, 1998, 53 (1), 27-45 35. Rendszergazda Rectangle