


Advances in Technology Innovation, vol. 1, no. 2, 2016, pp. 53 - 57 
 

53 
 

Cop y right © TAETI 

Improved SVM Classifier Incorporating Adaptive Condensed Instances 

Based on Hybrid Continuous-Discrete Particle Swarm Optimization 

Chun-Liang Lu1,*, Tsun-Chen Lin2 

1Department of Applied Information and Multimedia, Ching Kuo Institute of Management and Hea lth, Keelung, Taiwan. 
2Department of Computer and Communication Engineering, Dahan Institute of Technology, Hualien, Taiwan.

Received 20 Sep 2016; received in revised form 24 Sep 2016; accept ed 28 Sep 2016 
 

Abstract 

In recent years, support vector machine (SVM) based on 
empirical risk minimization is supervised learning model 
which has been successfully used in the classification and 
regression. The standard soft-margin SVM trains a classifier 
by solving an optimization problem to decide which instances 
of the training data set are support vectors. However, in many 
real applications , it is imperative to perform feature selection to 
detect which features are actually relevant. In order to further 
improve the performance, we propose the adaptive condensed 
instances (ACI) strategy based on the hybrid particle swarm 
optimization (HPSO) algorithm for the SVM classifier design. 
The basic idea of the proposed method is to adopt HPSO to 
simultaneously optimize the ACI and SVM kernel parameters 
for the classification accuracy enhancement. The numerical 
experiments on several UCI benchmark datasets are conducted 
to find the optimal parameters for building the SVM model. 
Experiment results show that the proposed framework can 
achieve better performance than other published methods in 
literature and provide a simple but subtle strategy to effectively 
improve the classification accuracy for SVM classifier. 

Ke ywor ds :  hybrid partic le swarm optimizat ion 
(HPSO), adaptive condensed instances 
(ACI), support vector machine (SVM) 

1. Introduction 

Support vector machine (SVM) was originally proposed 
by Vapnik [1] which is a powerful classification method with 
state-of-the-art performance in machine learning theory, has 
drawn considerable attentions due to its high generalization 
ability for a wide range of applications including speaker 
recognition [2], bioinformatics [3] and text categorization [4]. 
In many pattern classification tasks , we are confronted with the 
problem that the input space is high dimensional and to find 
out the combination of original input features which contribute 
most to the classification is crucial. The computational cost of 
classification grows heavily with data dimension size, making 
feature selection an important issue for the SVM. The feature 
selection mechanism falls into three categories: filtering, 
wrapper, and embedded methods  [5]. Filters generally involve 
a non-iterative computation on the original features, which can 
execute very fast, but not usually optimal since the learning 
algorithm are not taken into account. Wrapper methods usually 
achieve better results than filters since they are tuned to the 
specific interactions between the classifier with original feature 
set and very computationally intensive. Finally, unlike filters 
and wrappers, the embedded techniques simultaneously de-

termine features and classifier during the training process but 
the computational time is smaller than wrapper methods.  

Feature selection problem is a challenging task because 
there can be complex interaction among features . Therefore, an 
exhaustive search is practically impossible, and the efficient 
global search technique is needed. Evolutionary computation 
(EC) are well known heuristic approaches global search ability 
such as simulated annealing (SA) [6], genetic algorithm (GA) 
[7], and particle swarm optimization (PSO) [8], have gained a 
lot of attention from researchers in the area. Compared with 
other EC algorithms such as SA and GA, PSO is comput a-
tionally less expensive and can converge more quickly. A 
GA-based feature selection method, which optimized both the 
feature selection and parameters for SVM, was proposed by 
Huang [9], and the authors pointed out that the algorithm may 
work superior to the conventional grid search method. How-
ever, the treatment of these redundant or irrelevant instances is 
not taken into account in the classification procedure. In this 
paper, the effectively adaptive condensed instances (ACI) 
strategy that we previously published [7] is applied to decide 
which instances of the training data set are support vectors for 
coping with the problem mentioned above. Short communica-
tions of the early stages of this work have appeared in [7]. Here 
we significantly extend our approach to account for the rele-
vant instances selection during the SVM training process. 

In order to further improve the classification performance, 
the ACI strategy based on the hybrid particle swarm optimiza-
tion (HPSO) algorithm is proposed for the SVM classifier 
design. Several UCI benchmark datasets are conducted to 
validate the effectiveness and the experiment results show that 
the proposed framework can achieve better performance than 
other GA-based existing methods in literature. The remainder 
of this paper is organized as follows. Section 2 describes the 
related work including the basic PSO and SVM classifier. 
Section 3 illustrates particle representation, hybrid PSO with 
disturbance operation, ACI scheme and the proposed frame-
work for the SVM classifier. Section 4 provides the experiment 
results, and conclusions are made in section 5. 

2. Related work 

2.1. Particle swarm optimization (PSO) algorithm 

The PSO algorithm, which is originally developed by 
Kennedy and Eberhart [10], is a search algorithm modeling the 
social behavior of birds within a flock. In the PSO algorithm, 
individuals referred to as particles, are flown through hyper 
dimensional search space. PSO is easy to imp le ment, fe w 
parameters to ad just, and usually faster convergence rates 

*Corresponding aut hor, Email: leucl@ems.cku.edu.t w 


Advances in Technology Innovation, vol. 1, no. 2, 2016, pp. 53 - 57 

54 
 
Cop y right © TAETI Cop y right © TAETI Cop y right © TAETI Cop y right © TAETI Cop y right © TAETI 

than other evolutionary algorithms. During the optimiza -
tion procedure, particles communicate good positions to 
each other and adjust position according to their history 
e xperience and the ne ighboring particles. The basic con-
cept of the PSO algorithm is illustrated as follow: 

1
1 1

2 2

( )

         ( )

k k k k

id id id id

k k

id id

v w v c r pb x

c r gb x


     

   
 

(1) 

1 1k k k
id id id

x v x
 
   (2) 

where 1, 2,...,d D , 1, 2,...,i N , and D is the dimension of 
the search space, N is the population size, k  is the iterative 
times; k

id
v is the i-th particle velocity, k

id
x is the current particle 

solution, the i-th particle position is updated by equation (2).
k

id
pb  is the i-th particle best (

best
p ) solution achieved so far; 

k

id
gb is the global best (

best
g ) solution obtained by any particle 

in the population; 1r and 2r  are random values in the range 
[0,1] for denoting remembrance ability. Both of 1c and 2c  are 
learning factors, w  is inertia factor. A large inertia weight 
facilitates global exploration, while a small one tends to local 
exploration. Generally, the value of each component in V can 
be clamped to the range [− maxV , maxV ] for controlling exces-
sive roaming of the particle outside the search space. The PSO 
procedure is organized in the following sequence of steps. 

Step 1: Initialize: Randomly generating initial particles. 

Step 2: Fitness evaluation: Calculate the fitness values of each 
particle in the population. 

Step 3: Update: Co mpare fitness values  of each particle to 
update the velocity and position by using equation (1) and (2). 

Step 4: Termination criterion: Repeat the Step 2 to Step 3 
until the number of iteration reaches the pre-defined maximu m 
number or a termination criterion is satisfied, and the best 
solution 

best
g is displayed. 

2.2. Support vector machine (SVM) classifier 

Support vector machine (SVM) have drawn much atten-
tion due to their good performance and solid theoretical fou n-
dations [11]. The main concepts of SVM are to first transform 
input data into a higher dimensional space by means of a kernel 
function and then to find an optimal separating hyper-plane 
between the two data sets. A practical difficulty of using SVM 
is the selection of parameters such as the penalty parameter C  
of the error term and the kernel parameter   in RBF kernel 
function. The appropriate choice of parameters is to get the 
better generalization performance. The description of SVM is 
as follows. Given a set of training data 1( , , )px x with cor-
responding class labels 1( , , )py y and { 1, 1}iy    . The 
SVM attempts to find a decision surface ( )f x , to jointly 
ma ximize the margin between the two classes and minimize 
the classification error on the training set.  

1
( ) ( , )

p

i i ii
f x y K x x b


   (3) 

where K is a kernel function, 
i

 is the Lagrange multiplier 
corresponding to the i-th training data 

i
x and b is the bias. The 

simple kernel is the inner product function ˆ ˆ( , ) ,K x x x x

which produces the linear decision boundaries. Nonlinear 
kernel function maps data points to a high-dimensional feature 
space as linear decision spaces . Two commonly used kernels 
are the polynomial kernel ˆ( , )

p
K x x  and the radial basis func-

tion (RBF) kernel ˆ( , )
r

K x x . The integer polynomial order 
in 

p
K and the width factor   in 

r
K are hyper-parameters 

which are tuned to a specific classification problem. 

 ˆ ˆ( , ) 1 ,pK x x x x


   (4) 
2ˆ  ˆ( , ) x x

r
K x x e

 
  (5) 

3. Method 

Since the optimal hyper-plane obtained by the SVM de-
pends on only a small part of the data points (support vectors), 
it may become sensitive to noises or outliers in the training set. 
In this section, the ACI scheme based on hybrid PSO (HPSO) 
is proposed to tackle feature selection, condensed instances 
extraction and parameters setting simultaneously for SVM. 
The particle representation, fitness definition, disturbance 
strategy for PSO operation, ACI scheme and the proposed 
hybrid framework for SVM are described as follows. 

3.1. Particle representation 

This study used the RBF kernel function for the SVM 
classifier to implement our proposed method. The RBF kernel 
function requires two parameters C and  should be set. Us-
ing the adaptively condensed instance rate for ACI scheme and 
the RBF kernel for SVM, these three parameters

cond
 , C  ,  

and features used as input attributes must be optimized simu l-
taneously for our proposed hybrid system. The particle, there-
fore, is comprised of four parts, 

cond
  , C  ,   are the contin-

uous variables and the features mask are the discrete variables. 
Table 1 shows the particle representation of our design.  

Table 1 The hybrid particle representation 

 
As shown in Table 1, the representation of particle i with 

dimension of 3
f

n  , where 
f

n  is the number of features that 
varies from different datasets, ,1 ,~ fi i nx x  are the features mask, 

, 1fi n
x

 indicates the parameter value cond , , 2fi nx   represents the 

parameter value C  and , 3fi nx   denotes the parameter value . 

Fitness function F is the guide of HPSO operation to 
ma ximize the classification accuracy and minimize the number 
of selected features. 

cc
A is the SVM classification accuracy, 

i
f

denotes  the feature mask which 1 represents the feature i is 
selected and 0 indicates that feature i is not selected. Thus, the 
particles with high classification accuracy and a small number 
of features produce a high fitness value and affect particle’s 
positions on the next iteration. Considering the tradeoff b e-
tween the classification accuracy and selected feature number, 
these two weight 1w  and 2w  values can be adjusted according 
to the preference for SVM classifier design. 


Advances in Technology Innovation, vol. 1, no. 2, 2016, pp. 53 - 57 

55 
 

Cop y right © TAETI 

1

1 2
1

fn

i

i

F w Acc w f





 
     

 
  (6) 

3.2. PSO with disturbance operation 

In the discrete PSO, the particle’s personal best and global 
best is updated as in continuous value. The major different 
between discrete PSO with continuous version is that velocities 
of the particles are rather defined in terms of probabilities that a 
bit whether change to one. By this definition, a velocity must 
be restricted within the range min max[ , ]V V . This can be accom-
plished by a sigmoid function ( )S v , and the new particle pos i-
tion is calculated using the following rule: 

If 1 min max( , )
k

id
v V V


 then 1 1max minmax(min( , ), )

k k

id id
v V v V

 
 , 

where 

1
1 1( )

1
k
id

k

id v
S v

e








 (7) 

If 1( ) ( )k
id

rand S v


 , then 1 1k
id

x

 ; else 1 0k

id
x


 . 

The function ( )
id

S v  is a sigmoid limiting transformation 
and ( )rand  is a random number selected from a uniform 
distribution in [0, 1]. Note that the discrete PSO is susceptible 
to a sigmoid function ( )S v saturation which occurs when ve-
locity values are either too large or too small. For a velocity of 
zero, it is a probability of 50% for the bit to flip.  

According to the searching behavior of PSO, the 
best

g  
value will be an important clue in leading particles to the global 
optimal solution. It is unavoidable for the solution to fall into 
the local minimum while particles try to find better solutions. 
In order to allow the solution exploration in the area to produce 
more potential solutions, a mutation-like disturbance operation 
is inserted between Eq. (1) and Eq. (2). The disturbance oper-
ation random selects k  dimensions (1 k  problem dimen-
sions) of m particles (1  m  particle numbers) to put Gauss-
ian noise into their moving vectors (velocities). The disturb-
ance operation will affect particles moving toward to unex-
pected direction in selected dimensions but not previous expe-
rience. It will lead particle jump out from local search and 
further can explore more diversity of searching space.  

3.3. Adaptive condensed instances (ACI) scheme 

The effectively ACI scheme that we previously published 
in [7] is extended to decide which instances of the training data 
set are support vectors. The adaptively condensed instances 
coefficient in the data reduced process is flexible to edit out 
noisy samples, reduce the superfluous data points and make the 
SVM less sensitive to noises and outliners. The adaptively 
condensed instances  coefficient [0,1]

cond
   is defined as the 

ratio of the selected number of condensed instances to overall 
dataset. The ACI scheme is described as follows. 

Step 1: Initialize: Randomly generating initial instance set. 

Step 2: Condense: To decide whether all samples have been 
achieved the user defined threshold

cond
 . If so, terminate the 

process; otherwise, go to Step 3. 

Step 3: Extend: If there are any un-condensed instances , using 
the nearest neighbor voting to renew the condensed instances; 
otherwise, no more new data is joined and go to Step 2. 

3.4. The proposed framework  for SVM classifier 

Based on the particle representation, fitness definition, 
PSO with disturbance operation and ACI scheme mentioned 
above, details of the proposed hybrid framework for SVM 
procedure from step 1 to step 9 are described as follows. 

Step 1: Data preparation 

Given a dataset D  is considered using the 10 fold cross 
validation to split the data into 10 groups. Each group contains 
training and testing sets. The training and testing sets are rep-
resented as 

train
D and

test
D , respectively. 

Step 2: Hybrid PSO initialization and parameters setting 

Set the PSO parameters including the number of iterations, 
number of particles, velocity, particle dimension, disturbance 
rate, and weight for fitness. Generate initial hybrid particles 
comprised of the features mask,

cond
 , C  and  . 

Step 3: Condensed instances selection via the ACI scheme 

According to the condensed instances  coefficient
cond
 rep-

resented in the particle and calculated from Step 2, when all the 
condensed instances are computed, the nearest neighbor voting 
is used to renew the condensed instances set. 

Step 4: Feature scaling 

Feature scaling is to properly reveal the interactions be-
tween features and to avoid attributes in greater numeric ranges 
dominating those in smaller numeric ranges. Normalization by 
Eq. (8) can be linearly scaled to range [-1, +1] or [0, 1], where 

( )
j i

A x  is the original attribute value of feature
i

x , ' ( )
j i

A x  is 
scaled value, max

j
and min

j
 correspond to the maximum and 

minimum values for 
j

A  over all samples. 

' ( ) min( ) ,
max min

j i j

j i

j j

A x
A x i


 


 (8) 

Step 5: Feature selection 

According to the feature mask, which is represented in the 
particle from Step 2, is to select input features for training set 

train
D  and testing set

test
D . The selected features subset can be 

denoted as  _f trainD  and _f testD , respectively. 

Step 6: To train and test for SVM classifier 

For the parameters
cond
 , C  and   which are represented 

in the particle, to train the SVM classifier on the training da-
taset _f trainD , then the classification accuracy ccA  for SVM on 
the testing dataset _f testD  can be evaluated. 

Step 7: Fitness evaluation 

For each particle, the fitness value is to be calculated by the 
Eq. (6). The optimal fitness value can be stored on the evolu-
tion process of PSO to search for the better fitness of particle in 
the next particles evolution procedure. 

Step 8: Termination criteria 

When the number of iteration reaches the pre-defined 
ma ximum number or a termination criterion is satisfied, the 
best solution 

best
g is obtained, and the program ends; otherwise, 

go to the next step. 


Advances in Technology Innovation, vol. 1, no. 2, 2016, pp. 53 - 57 

56 
 
Cop y right © TAETI Cop y right © TAETI Cop y right © TAETI Cop y right © TAETI Cop y right © TAETI 

Step 9: Hybrid PSO operation 

In the evolution process, discrete-valued and continu-
ous-valued dimension of HPSO with the disturbance operator 
continued to be applied for searching better particle solutions. 

4. Experiment results 

4.1. Dataset and system description 

To measure the performance of the developed hybrid 
framework, several real benchmark datasets in Table 2 are 
conducted to verify the effectiveness of performance. These 

Table 2 The UCI benchmark datasets  [7] 

 
datasets are partitioned using the 10-fold cross validation. Our 
implementation platform was implemented on Matlab 2013, 
by extending the LIBSVM which is originally designed by 
Chang and Lin [12]. Through initial experiment, the parameter 
values of the PSO were set as follows. The swarm size is set to 
100 particles. The searching ranges of continuous type dimen-
sion parameters are: [0,1]

cond
  , 2 4[10 ,10 ]C  and 

4 4[10 ,10 ]  . The discrete type particle for features mask, we 
set min max[ , ] [ 6,6]V V   , which yields a range of 
[0.9975,0.0025]  using the sigmoid limiting transformation by 
Eq. (7). Both the cognition learning factor 1c  and the social 

learning factor 2c  are set to 2. The disturbance rate is 0.05, and 
the number of generation is 500. The inertia weight factor 

min 0.4w   and max 0.9w  . The linearly decreasing inertia 
weight is set as Eq. (9), where 

now
i is the current iteration and 

maxi is the pre-defined maximum iteration. According to the 
fitness function defined by Eq. (6), set the accuracy’s weight 

1 0.8w   and the feature’s weight 2 0.2w  . 

max max min
max

( )now
i

w w w w
i

    (9) 

Table 3 The parameters setting for PSO and GA  

 
In [9], the authors presented the GA-based method without 

ACI mechanism for searching the best C ,  , and features 
subset. The existing GA-SVM method without ACI scheme 
deals solely with feature selection and parameters optimization 
by means of genetic algorithm, and the treatment of these 
redundant or noisy instances in a classification process did not 
be taken into account. Our proposed hybrid framework has 
been tested fairly extensively and compared with these ap-
proaches including both GA-SVM and PSO-SVM approaches 
without ACI mechanism. Furthermore, the comparison of PSO 
and GA technique with the ACI scheme for SVM is also pre-
sented. While applying GA algorithm, a number of parameters 
are required to be specified. The two algorithms (both  PSO and 
GA) are run for the same number of fitness function evalu a-
tions. Table 3 summarized the parameters used for PSO and 
GA technique. The empirical results are reported in section 4.2. 

4.2. Result and comparison 

The results obtained by the developed HPSO-ACI-SVM 
approach are compared with those of GA -SVM proposed by 
Huang et al. [9] without ACI scheme. Taking the Heart disease 
dataset, for example, the classification accuracy

cc
A , number of 

selected features
f

n , and the best parameters
cond

 , C ,   for 
each fold are shown in Table 4. For the HPSO-ACI-SVM 
method, average classification accuracy rate is 96.87%, and 
average number of features is 4.9. For the GA-SVM without 
ACI approach, its average classification accuracy rate is only 
94.81%, and average number of features is 5.6. 

Table 4 Comparison for HPSO-ACI-SVM and GA-SVM 

 
Table 5 Comparison for HPSO-ACI-SVM, PSO-SVM and GA-SVM 

 
Advances in Technology Innovation, vol. 1, no. 2, 2016, pp. 53 - 57 

57 
 

Cop y right © TAETI 

Table 5 shows the summary results for the average class i-
fication accuracy rate of the HPSO-ACI-SVM hybrid frame-
work and PSO-SVM, GA-SVM without ACI method on six 
UCI datasets. In Table 5, the classification accuracy rate is 
represented as the form of ‘average  standard deviation’. To 
highlight the advantage, we used the non-parametric Wilcoxon 
signed rank test for all of the datasets. In Table 5, the p-values 
of HPSO-ACI-SVM versus PSO-SVM and GA-SVM are 
smaller than the statistical significance level of 0.05 except Iris 
dataset. That is to say, the developed HPSO-ACI-SVM yields 
higher classification accuracy rate across different datasets to 
enhance the performance for the SVM. 

Further, under our proposed hybrid framework with ACI 
mechanism, the performance of both PSO and GA optimiza-
tion techniques in terms of the average classification accuracy 
rate _

cc
Avg A and the average number of selected features 
_

f
Avg n  is compared. In Table 6, the HPSO exhibits slightly 

higher classification accuracy and fewer selected features than 
GA. It is observed that, from an evolutionary point of view, the 
performance of the HPSO is better than GA. However, the 
results indicate that both PSO and GA algorithms can be used 
in optimizing the parameters under our developed hybrid 
framework to effectively improve the classification accuracy 
for SVM classifier design. 

Table 6 Comparison for HPSO-ACI-SVM and GA-ACI-SVM 

 
5. Conclusion 

For SVM classifier, it is imperative to perform feature se-
lection to detect which features are actually relevant. In this 
study, the effectively adaptive condensed instances (ACI) 
strategy is applied to decide which instances of the training 
data set are support vectors. Furthermore, we propose the A CI 
scheme based on the hybrid particle swarm optimization 
(HPSO) algorithm to simultaneously optimize the condensed 
instances and SVM kernel parameters for the classification 
accuracy enhancement. Several UCI benchmark datasets are 
conducted to validate the effectiveness of the proposed method, 
and the experiment results show that the proposed hybrid 
framework can achieve better performance than other existing 
methods in literature. Investigating more large scale dataset as 
well as combining other heuristic algorithm for hybrid system 
may be interesting future work.

References 

[1] V. N. Vapnik, "An overview of statistical learning theo-
ry," IEEE Transactions on Neural Networks, vol. 10, pp. 
988-999, 1999. 

[2] I. J. Ding, C. T. Yen, and D. C. Ou "A method to inte-
grate GMM, SVM and DTW for speaker recognition," 
International Journal of Engineering and Technology 
Innovation, vol. 4, pp. 38-47, 2014. 

[3] R. Kothandan and S. Biswas , "Identifying microRNAs 
involved in cancer pathway using support vector ma-
chines ," Computational Biology and Chemistry, vol. 55, 
pp. 31-36, Apr. 2015. 

[4] B. Ra mesh and J. G. R. Sathiaseelan, "An advanced 
multi class instance selection based support vector ma-
chine for text classification," Procedia Computer Science, 
vol. 57, pp. 1124-1130, 2015. 

[5] X. Zhang, G. Wu, Z. Dong and C. Crawford, "Embed-
ded feature-selection support vector machine for driving 
pattern recognition," Journal of the Franklin Institute, vol. 
352, pp. 669-685, 2015. 

[6] S.W. Lin, Z. J. Lee, S. C. Chen and T. Y. Tseng, "Pa-
rameter determination of support vector machine and 
feature selection using simulated annealing approach ," 
Applied Soft Computing, vol. 8, pp. 1505-1512, 2008. 

[7] C. L. Lu, I. F. Chung and T. C. Lin, "The hybrid dy-
namic prototype construction and parameter optimiza-
tion with genetic algorithm for support vector machine," 
International Journal of Engineering and Technology 
Innovation, vol. 5, pp. 220-232, 2015. 

[8] B. Sahu and D. Mishra, "A novel feature selection algo-
rithm using particle swarm optimization for cancer mi-
croarray data," Procedia Engineering, vol. 38, pp. 27-31, 
2012. 

[9] C. L. Huang and C. J. Wang, "A GA-based feature s e-
lection and parameters optimization for support vector 
machines ," Expert systems with applications, vol. 31, 
pp. 231-240, 2006 

[10] J. Kennedy and R. C. Eberhart, "A discrete binary ver-
sion of the particle swarm algorithm," In Proceedings of 
the World on Systematics, Cybernetics and Informatics, 
vol. 8, pp. 4104-4109, 1997. 

[11] V. Vapnik, The nature of statistical learning theory, New 
York: Springer-Verlag, 1995. 

[12] C. C. Chang, and C. J. Lin, "Training nu-support vector 
regression: theory and algorithms ," Neural Computation, 
vol. 14, pp. 1959-1977, 2002.