 Advances in Technology Innovation, vol. 1, no. 2, 2016, pp. 53 - 57 53 Cop y right © TAETI Improved SVM Classifier Incorporating Adaptive Condensed Instances Based on Hybrid Continuous-Discrete Particle Swarm Optimization Chun-Liang Lu1,*, Tsun-Chen Lin2 1Department of Applied Information and Multimedia, Ching Kuo Institute of Management and Hea lth, Keelung, Taiwan. 2Department of Computer and Communication Engineering, Dahan Institute of Technology, Hualien, Taiwan. Received 20 Sep 2016; received in revised form 24 Sep 2016; accept ed 28 Sep 2016 Abstract In recent years, support vector machine (SVM) based on empirical risk minimization is supervised learning model which has been successfully used in the classification and regression. The standard soft-margin SVM trains a classifier by solving an optimization problem to decide which instances of the training data set are support vectors. However, in many real applications , it is imperative to perform feature selection to detect which features are actually relevant. In order to further improve the performance, we propose the adaptive condensed instances (ACI) strategy based on the hybrid particle swarm optimization (HPSO) algorithm for the SVM classifier design. The basic idea of the proposed method is to adopt HPSO to simultaneously optimize the ACI and SVM kernel parameters for the classification accuracy enhancement. The numerical experiments on several UCI benchmark datasets are conducted to find the optimal parameters for building the SVM model. Experiment results show that the proposed framework can achieve better performance than other published methods in literature and provide a simple but subtle strategy to effectively improve the classification accuracy for SVM classifier. Ke ywor ds : hybrid partic le swarm optimizat ion (HPSO), adaptive condensed instances (ACI), support vector machine (SVM) 1. Introduction Support vector machine (SVM) was originally proposed by Vapnik [1] which is a powerful classification method with state-of-the-art performance in machine learning theory, has drawn considerable attentions due to its high generalization ability for a wide range of applications including speaker recognition [2], bioinformatics [3] and text categorization [4]. In many pattern classification tasks , we are confronted with the problem that the input space is high dimensional and to find out the combination of original input features which contribute most to the classification is crucial. The computational cost of classification grows heavily with data dimension size, making feature selection an important issue for the SVM. The feature selection mechanism falls into three categories: filtering, wrapper, and embedded methods [5]. Filters generally involve a non-iterative computation on the original features, which can execute very fast, but not usually optimal since the learning algorithm are not taken into account. Wrapper methods usually achieve better results than filters since they are tuned to the specific interactions between the classifier with original feature set and very computationally intensive. Finally, unlike filters and wrappers, the embedded techniques simultaneously de- termine features and classifier during the training process but the computational time is smaller than wrapper methods. Feature selection problem is a challenging task because there can be complex interaction among features . Therefore, an exhaustive search is practically impossible, and the efficient global search technique is needed. Evolutionary computation (EC) are well known heuristic approaches global search ability such as simulated annealing (SA) [6], genetic algorithm (GA) [7], and particle swarm optimization (PSO) [8], have gained a lot of attention from researchers in the area. Compared with other EC algorithms such as SA and GA, PSO is comput a- tionally less expensive and can converge more quickly. A GA-based feature selection method, which optimized both the feature selection and parameters for SVM, was proposed by Huang [9], and the authors pointed out that the algorithm may work superior to the conventional grid search method. How- ever, the treatment of these redundant or irrelevant instances is not taken into account in the classification procedure. In this paper, the effectively adaptive condensed instances (ACI) strategy that we previously published [7] is applied to decide which instances of the training data set are support vectors for coping with the problem mentioned above. Short communica- tions of the early stages of this work have appeared in [7]. Here we significantly extend our approach to account for the rele- vant instances selection during the SVM training process. In order to further improve the classification performance, the ACI strategy based on the hybrid particle swarm optimiza- tion (HPSO) algorithm is proposed for the SVM classifier design. Several UCI benchmark datasets are conducted to validate the effectiveness and the experiment results show that the proposed framework can achieve better performance than other GA-based existing methods in literature. The remainder of this paper is organized as follows. Section 2 describes the related work including the basic PSO and SVM classifier. Section 3 illustrates particle representation, hybrid PSO with disturbance operation, ACI scheme and the proposed frame- work for the SVM classifier. Section 4 provides the experiment results, and conclusions are made in section 5. 2. Related work 2.1. Particle swarm optimization (PSO) algorithm The PSO algorithm, which is originally developed by Kennedy and Eberhart [10], is a search algorithm modeling the social behavior of birds within a flock. In the PSO algorithm, individuals referred to as particles, are flown through hyper dimensional search space. PSO is easy to imp le ment, fe w parameters to ad just, and usually faster convergence rates *Corresponding aut hor, Email: leucl@ems.cku.edu.t w Advances in Technology Innovation, vol. 1, no. 2, 2016, pp. 53 - 57 54 Cop y right © TAETI Cop y right © TAETI Cop y right © TAETI Cop y right © TAETI Cop y right © TAETI than other evolutionary algorithms. During the optimiza - tion procedure, particles communicate good positions to each other and adjust position according to their history e xperience and the ne ighboring particles. The basic con- cept of the PSO algorithm is illustrated as follow: 1 1 1 2 2 ( ) ( ) k k k k id id id id k k id id v w v c r pb x c r gb x            (1) 1 1k k k id id id x v x     (2) where 1, 2,...,d D , 1, 2,...,i N , and D is the dimension of the search space, N is the population size, k is the iterative times; k id v is the i-th particle velocity, k id x is the current particle solution, the i-th particle position is updated by equation (2). k id pb is the i-th particle best ( best p ) solution achieved so far; k id gb is the global best ( best g ) solution obtained by any particle in the population; 1r and 2r are random values in the range [0,1] for denoting remembrance ability. Both of 1c and 2c are learning factors, w is inertia factor. A large inertia weight facilitates global exploration, while a small one tends to local exploration. Generally, the value of each component in V can be clamped to the range [− maxV , maxV ] for controlling exces- sive roaming of the particle outside the search space. The PSO procedure is organized in the following sequence of steps. Step 1: Initialize: Randomly generating initial particles. Step 2: Fitness evaluation: Calculate the fitness values of each particle in the population. Step 3: Update: Co mpare fitness values of each particle to update the velocity and position by using equation (1) and (2). Step 4: Termination criterion: Repeat the Step 2 to Step 3 until the number of iteration reaches the pre-defined maximu m number or a termination criterion is satisfied, and the best solution best g is displayed. 2.2. Support vector machine (SVM) classifier Support vector machine (SVM) have drawn much atten- tion due to their good performance and solid theoretical fou n- dations [11]. The main concepts of SVM are to first transform input data into a higher dimensional space by means of a kernel function and then to find an optimal separating hyper-plane between the two data sets. A practical difficulty of using SVM is the selection of parameters such as the penalty parameter C of the error term and the kernel parameter  in RBF kernel function. The appropriate choice of parameters is to get the better generalization performance. The description of SVM is as follows. Given a set of training data 1( , , )px x with cor- responding class labels 1( , , )py y and { 1, 1}iy    . The SVM attempts to find a decision surface ( )f x , to jointly ma ximize the margin between the two classes and minimize the classification error on the training set. 1 ( ) ( , ) p i i ii f x y K x x b    (3) where K is a kernel function, i  is the Lagrange multiplier corresponding to the i-th training data i x and b is the bias. The simple kernel is the inner product function ˆ ˆ( , ) ,K x x x x which produces the linear decision boundaries. Nonlinear kernel function maps data points to a high-dimensional feature space as linear decision spaces . Two commonly used kernels are the polynomial kernel ˆ( , ) p K x x and the radial basis func- tion (RBF) kernel ˆ( , ) r K x x . The integer polynomial order  in p K and the width factor  in r K are hyper-parameters which are tuned to a specific classification problem.  ˆ ˆ( , ) 1 ,pK x x x x    (4) 2ˆ ˆ( , ) x x r K x x e    (5) 3. Method Since the optimal hyper-plane obtained by the SVM de- pends on only a small part of the data points (support vectors), it may become sensitive to noises or outliers in the training set. In this section, the ACI scheme based on hybrid PSO (HPSO) is proposed to tackle feature selection, condensed instances extraction and parameters setting simultaneously for SVM. The particle representation, fitness definition, disturbance strategy for PSO operation, ACI scheme and the proposed hybrid framework for SVM are described as follows. 3.1. Particle representation This study used the RBF kernel function for the SVM classifier to implement our proposed method. The RBF kernel function requires two parameters C and  should be set. Us- ing the adaptively condensed instance rate for ACI scheme and the RBF kernel for SVM, these three parameters cond  , C , and features used as input attributes must be optimized simu l- taneously for our proposed hybrid system. The particle, there- fore, is comprised of four parts, cond  , C ,  are the contin- uous variables and the features mask are the discrete variables. Table 1 shows the particle representation of our design. Table 1 The hybrid particle representation As shown in Table 1, the representation of particle i with dimension of 3 f n  , where f n is the number of features that varies from different datasets, ,1 ,~ fi i nx x are the features mask, , 1fi n x  indicates the parameter value cond , , 2fi nx  represents the parameter value C and , 3fi nx  denotes the parameter value . Fitness function F is the guide of HPSO operation to ma ximize the classification accuracy and minimize the number of selected features. cc A is the SVM classification accuracy, i f denotes the feature mask which 1 represents the feature i is selected and 0 indicates that feature i is not selected. Thus, the particles with high classification accuracy and a small number of features produce a high fitness value and affect particle’s positions on the next iteration. Considering the tradeoff b e- tween the classification accuracy and selected feature number, these two weight 1w and 2w values can be adjusted according to the preference for SVM classifier design. Advances in Technology Innovation, vol. 1, no. 2, 2016, pp. 53 - 57 55 Cop y right © TAETI 1 1 2 1 fn i i F w Acc w f              (6) 3.2. PSO with disturbance operation In the discrete PSO, the particle’s personal best and global best is updated as in continuous value. The major different between discrete PSO with continuous version is that velocities of the particles are rather defined in terms of probabilities that a bit whether change to one. By this definition, a velocity must be restricted within the range min max[ , ]V V . This can be accom- plished by a sigmoid function ( )S v , and the new particle pos i- tion is calculated using the following rule: If 1 min max( , ) k id v V V   then 1 1max minmax(min( , ), ) k k id id v V v V    , where 1 1 1( ) 1 k id k id v S v e      (7) If 1( ) ( )k id rand S v   , then 1 1k id x   ; else 1 0k id x   . The function ( ) id S v is a sigmoid limiting transformation and ( )rand is a random number selected from a uniform distribution in [0, 1]. Note that the discrete PSO is susceptible to a sigmoid function ( )S v saturation which occurs when ve- locity values are either too large or too small. For a velocity of zero, it is a probability of 50% for the bit to flip. According to the searching behavior of PSO, the best g value will be an important clue in leading particles to the global optimal solution. It is unavoidable for the solution to fall into the local minimum while particles try to find better solutions. In order to allow the solution exploration in the area to produce more potential solutions, a mutation-like disturbance operation is inserted between Eq. (1) and Eq. (2). The disturbance oper- ation random selects k dimensions (1 k  problem dimen- sions) of m particles (1  m  particle numbers) to put Gauss- ian noise into their moving vectors (velocities). The disturb- ance operation will affect particles moving toward to unex- pected direction in selected dimensions but not previous expe- rience. It will lead particle jump out from local search and further can explore more diversity of searching space. 3.3. Adaptive condensed instances (ACI) scheme The effectively ACI scheme that we previously published in [7] is extended to decide which instances of the training data set are support vectors. The adaptively condensed instances coefficient in the data reduced process is flexible to edit out noisy samples, reduce the superfluous data points and make the SVM less sensitive to noises and outliners. The adaptively condensed instances coefficient [0,1] cond   is defined as the ratio of the selected number of condensed instances to overall dataset. The ACI scheme is described as follows. Step 1: Initialize: Randomly generating initial instance set. Step 2: Condense: To decide whether all samples have been achieved the user defined threshold cond  . If so, terminate the process; otherwise, go to Step 3. Step 3: Extend: If there are any un-condensed instances , using the nearest neighbor voting to renew the condensed instances; otherwise, no more new data is joined and go to Step 2. 3.4. The proposed framework for SVM classifier Based on the particle representation, fitness definition, PSO with disturbance operation and ACI scheme mentioned above, details of the proposed hybrid framework for SVM procedure from step 1 to step 9 are described as follows. Step 1: Data preparation Given a dataset D is considered using the 10 fold cross validation to split the data into 10 groups. Each group contains training and testing sets. The training and testing sets are rep- resented as train D and test D , respectively. Step 2: Hybrid PSO initialization and parameters setting Set the PSO parameters including the number of iterations, number of particles, velocity, particle dimension, disturbance rate, and weight for fitness. Generate initial hybrid particles comprised of the features mask, cond  , C and  . Step 3: Condensed instances selection via the ACI scheme According to the condensed instances coefficient cond  rep- resented in the particle and calculated from Step 2, when all the condensed instances are computed, the nearest neighbor voting is used to renew the condensed instances set. Step 4: Feature scaling Feature scaling is to properly reveal the interactions be- tween features and to avoid attributes in greater numeric ranges dominating those in smaller numeric ranges. Normalization by Eq. (8) can be linearly scaled to range [-1, +1] or [0, 1], where ( ) j i A x is the original attribute value of feature i x , ' ( ) j i A x is scaled value, max j and min j correspond to the maximum and minimum values for j A over all samples. ' ( ) min( ) , max min j i j j i j j A x A x i     (8) Step 5: Feature selection According to the feature mask, which is represented in the particle from Step 2, is to select input features for training set train D and testing set test D . The selected features subset can be denoted as _f trainD and _f testD , respectively. Step 6: To train and test for SVM classifier For the parameters cond  , C and  which are represented in the particle, to train the SVM classifier on the training da- taset _f trainD , then the classification accuracy ccA for SVM on the testing dataset _f testD can be evaluated. Step 7: Fitness evaluation For each particle, the fitness value is to be calculated by the Eq. (6). The optimal fitness value can be stored on the evolu- tion process of PSO to search for the better fitness of particle in the next particles evolution procedure. Step 8: Termination criteria When the number of iteration reaches the pre-defined ma ximum number or a termination criterion is satisfied, the best solution best g is obtained, and the program ends; otherwise, go to the next step. Advances in Technology Innovation, vol. 1, no. 2, 2016, pp. 53 - 57 56 Cop y right © TAETI Cop y right © TAETI Cop y right © TAETI Cop y right © TAETI Cop y right © TAETI Step 9: Hybrid PSO operation In the evolution process, discrete-valued and continu- ous-valued dimension of HPSO with the disturbance operator continued to be applied for searching better particle solutions. 4. Experiment results 4.1. Dataset and system description To measure the performance of the developed hybrid framework, several real benchmark datasets in Table 2 are conducted to verify the effectiveness of performance. These Table 2 The UCI benchmark datasets [7] datasets are partitioned using the 10-fold cross validation. Our implementation platform was implemented on Matlab 2013, by extending the LIBSVM which is originally designed by Chang and Lin [12]. Through initial experiment, the parameter values of the PSO were set as follows. The swarm size is set to 100 particles. The searching ranges of continuous type dimen- sion parameters are: [0,1] cond   , 2 4[10 ,10 ]C  and 4 4[10 ,10 ]  . The discrete type particle for features mask, we set min max[ , ] [ 6,6]V V   , which yields a range of [0.9975,0.0025] using the sigmoid limiting transformation by Eq. (7). Both the cognition learning factor 1c and the social learning factor 2c are set to 2. The disturbance rate is 0.05, and the number of generation is 500. The inertia weight factor min 0.4w  and max 0.9w  . The linearly decreasing inertia weight is set as Eq. (9), where now i is the current iteration and maxi is the pre-defined maximum iteration. According to the fitness function defined by Eq. (6), set the accuracy’s weight 1 0.8w  and the feature’s weight 2 0.2w  . max max min max ( )now i w w w w i    (9) Table 3 The parameters setting for PSO and GA In [9], the authors presented the GA-based method without ACI mechanism for searching the best C ,  , and features subset. The existing GA-SVM method without ACI scheme deals solely with feature selection and parameters optimization by means of genetic algorithm, and the treatment of these redundant or noisy instances in a classification process did not be taken into account. Our proposed hybrid framework has been tested fairly extensively and compared with these ap- proaches including both GA-SVM and PSO-SVM approaches without ACI mechanism. Furthermore, the comparison of PSO and GA technique with the ACI scheme for SVM is also pre- sented. While applying GA algorithm, a number of parameters are required to be specified. The two algorithms (both PSO and GA) are run for the same number of fitness function evalu a- tions. Table 3 summarized the parameters used for PSO and GA technique. The empirical results are reported in section 4.2. 4.2. Result and comparison The results obtained by the developed HPSO-ACI-SVM approach are compared with those of GA -SVM proposed by Huang et al. [9] without ACI scheme. Taking the Heart disease dataset, for example, the classification accuracy cc A , number of selected features f n , and the best parameters cond  , C ,  for each fold are shown in Table 4. For the HPSO-ACI-SVM method, average classification accuracy rate is 96.87%, and average number of features is 4.9. For the GA-SVM without ACI approach, its average classification accuracy rate is only 94.81%, and average number of features is 5.6. Table 4 Comparison for HPSO-ACI-SVM and GA-SVM Table 5 Comparison for HPSO-ACI-SVM, PSO-SVM and GA-SVM Advances in Technology Innovation, vol. 1, no. 2, 2016, pp. 53 - 57 57 Cop y right © TAETI Table 5 shows the summary results for the average class i- fication accuracy rate of the HPSO-ACI-SVM hybrid frame- work and PSO-SVM, GA-SVM without ACI method on six UCI datasets. In Table 5, the classification accuracy rate is represented as the form of ‘average  standard deviation’. To highlight the advantage, we used the non-parametric Wilcoxon signed rank test for all of the datasets. In Table 5, the p-values of HPSO-ACI-SVM versus PSO-SVM and GA-SVM are smaller than the statistical significance level of 0.05 except Iris dataset. That is to say, the developed HPSO-ACI-SVM yields higher classification accuracy rate across different datasets to enhance the performance for the SVM. Further, under our proposed hybrid framework with ACI mechanism, the performance of both PSO and GA optimiza- tion techniques in terms of the average classification accuracy rate _ cc Avg A and the average number of selected features _ f Avg n is compared. In Table 6, the HPSO exhibits slightly higher classification accuracy and fewer selected features than GA. It is observed that, from an evolutionary point of view, the performance of the HPSO is better than GA. However, the results indicate that both PSO and GA algorithms can be used in optimizing the parameters under our developed hybrid framework to effectively improve the classification accuracy for SVM classifier design. Table 6 Comparison for HPSO-ACI-SVM and GA-ACI-SVM 5. Conclusion For SVM classifier, it is imperative to perform feature se- lection to detect which features are actually relevant. In this study, the effectively adaptive condensed instances (ACI) strategy is applied to decide which instances of the training data set are support vectors. Furthermore, we propose the A CI scheme based on the hybrid particle swarm optimization (HPSO) algorithm to simultaneously optimize the condensed instances and SVM kernel parameters for the classification accuracy enhancement. Several UCI benchmark datasets are conducted to validate the effectiveness of the proposed method, and the experiment results show that the proposed hybrid framework can achieve better performance than other existing methods in literature. Investigating more large scale dataset as well as combining other heuristic algorithm for hybrid system may be interesting future work. References [1] V. N. Vapnik, "An overview of statistical learning theo- ry," IEEE Transactions on Neural Networks, vol. 10, pp. 988-999, 1999. [2] I. J. Ding, C. T. Yen, and D. C. Ou "A method to inte- grate GMM, SVM and DTW for speaker recognition," International Journal of Engineering and Technology Innovation, vol. 4, pp. 38-47, 2014. [3] R. Kothandan and S. Biswas , "Identifying microRNAs involved in cancer pathway using support vector ma- chines ," Computational Biology and Chemistry, vol. 55, pp. 31-36, Apr. 2015. [4] B. Ra mesh and J. G. R. Sathiaseelan, "An advanced multi class instance selection based support vector ma- chine for text classification," Procedia Computer Science, vol. 57, pp. 1124-1130, 2015. [5] X. Zhang, G. Wu, Z. Dong and C. Crawford, "Embed- ded feature-selection support vector machine for driving pattern recognition," Journal of the Franklin Institute, vol. 352, pp. 669-685, 2015. [6] S.W. Lin, Z. J. Lee, S. C. Chen and T. Y. Tseng, "Pa- rameter determination of support vector machine and feature selection using simulated annealing approach ," Applied Soft Computing, vol. 8, pp. 1505-1512, 2008. [7] C. L. Lu, I. F. Chung and T. C. Lin, "The hybrid dy- namic prototype construction and parameter optimiza- tion with genetic algorithm for support vector machine," International Journal of Engineering and Technology Innovation, vol. 5, pp. 220-232, 2015. [8] B. Sahu and D. Mishra, "A novel feature selection algo- rithm using particle swarm optimization for cancer mi- croarray data," Procedia Engineering, vol. 38, pp. 27-31, 2012. [9] C. L. Huang and C. J. Wang, "A GA-based feature s e- lection and parameters optimization for support vector machines ," Expert systems with applications, vol. 31, pp. 231-240, 2006 [10] J. Kennedy and R. C. Eberhart, "A discrete binary ver- sion of the particle swarm algorithm," In Proceedings of the World on Systematics, Cybernetics and Informatics, vol. 8, pp. 4104-4109, 1997. [11] V. Vapnik, The nature of statistical learning theory, New York: Springer-Verlag, 1995. [12] C. C. Chang, and C. J. Lin, "Training nu-support vector regression: theory and algorithms ," Neural Computation, vol. 14, pp. 1959-1977, 2002.