Engineering, Technology & Applied Science Research Vol. 8, No. 4, 2018, 3203-3208 3203 www.etasr.com Smyrlis et al.: Constrained K-Means Classification Constrained K-Means Classification Panagiotis N. Smyrlis Department of Informatics and Telecommunications Engineering University of Western Macedonia Kozani, Greece pan.smyrlis@gmail.com Dimosthenis C. Tsouros Department of Informatics and Telecommunications Engineering University of Western Macedonia Kozani, Greece dtsouros@uowm.gr Markos G. Tsipouras Department of Informatics and Telecommunications Engineering University of Western Macedonia Kozani, Greece mtsipouras@uowm.gr Abstract—Classification-via-clustering (CvC) is a widely used method, using a clustering procedure to perform classification tasks. In this paper, a novel K-Means-based CvC algorithm is presented, analysed and evaluated. Two additional techniques are employed to reduce the effects of the limitations of K-Means. A hypercube of constraints is defined for each centroid and weights are acquired for each attribute of each class, for the use of a weighted Euclidean distance as a similarity criterion in the clustering procedure. Experiments are made with 42 well–known classification datasets. The experimental results demonstrate that the proposed algorithm outperforms CvC with simple K-Means. Keywords-classification-via-clustering; k-means; supervised learning I. INTRODUCTION Data science and especially data mining [1] is a rapidly evolving field with the extraction of valuable knowledge out of large accumulated information being a major challenge. Recent technological progress has led to the creation of large datasets to all scientific areas. Thus, developing methodologies which discover knowledge out of raw data is now many researchers' common concern. Supervised learning methods are an important part of machine learning and data mining, referring to the task of learning from labeled training data, with classification being the most common. Many sophisticated classification algorithms have been proposed in the literature, each exploiting the data in a different way, such as Artificial Neural Networks [2], Bayesian Classifiers [3], Rule Based Classifiers [4, 5], Decision Trees [6, 7] and Ensemble Classifiers [8]. A widely used method is Classification-via- clustering (CvC) [9-14]. The latter involves a clustering technique, in order to group data into clusters, mapped to the classes. Instances are assigned to the clusters or classes according to some criterion (depending on the clustering algorithm used). The most known algorithm in clustering is K- Means [15, 16], which is computationally light. Its low complexity makes it suitable to use in big data analysis for clustering and classification. It is referred to a clustering procedure which pattern for each cluster is a centroid. As a similarity criterion, a sort of distance is used (most commonly the squared Euclidean). According to that criterion, each data is assigned to the proper cluster and the centroids are then recalculated given the updated cluster. This algorithm is widely used in CVC, where the clusters are matched to classes. COP- KMeans [17] is a variation of K-Means, using background knowledge in terms of data-level constraints. This includes two kinds of constraints, must-link and cannot-link, defining whether a data must or must not be in a cluster, neither of which can be violated in order to assign a data point to a cluster. So, if a cluster includes a data-point which must be in the same cluster with another, the latter must be assigned to that cluster. Also, if two data-points are parts of a cannot-link constraint, they cannot be in the same cluster. In the same manner, another technique proposes an extended set of the above set of constraints [18]. Must-link and cannot-link constraints remain, and two more are examined. The latter are based on calculating the maximum distance that two data- points in the same cluster and the minimum distance that two data-points in different clusters must have. Another approach is Constrained K-Means [19], which includes cluster-level constraints; each cluster has a constraint which is a threshold of the minimum number of data to contain. So, the final centroids are specified when all the clusters contain the desired amount of data. A different method of finding the k centroids is Global K-Means [20], where the problem is solved incrementally. An iterative procedure begins solving the problem initially for one centroid and each iteration adds one centroid to the solution until the k centroids are found. “Fast Global K-Means” [20] tries to accelerate the above procedure. Instead of solving the k-centroids' problem given the k-1 solution, tries to calculate the error reduction from inserting the new centroid. The calculated error is used as a threshold on the next execution of k-means. A different approach is “Fuzzy C-Means” algorithm [21], with each datum fractionally belonging to all clusters and contributing to all centroids' update. In this work, a K-Means based CvC algorithm is presented, introducing the use of constraints for the centroids movement and a weighted Euclidean distance as a similarity criterion. The proposed algorithm is evaluated with 42 well known datasets from the UCI Machine Learning Repository [22]. The 10-fold cross-validation [23] is employed and classification accuracy, average sensitivity and average precision [24] are used as metrics of interest. Experimental results demonstrate that the proposed algorithm considerably outperforms CvC with simple K-Means. II. CVC WITH SIMPLE K-MEANS Clustering is a method of finding subgroups within observations. The most known clustering algorithm used for Engineering, Technology & Applied Science Research Vol. 8, No. 4, 2018, 3203-3208 3204 www.etasr.com Smyrlis et al.: Constrained K-Means Classification CvC is K-Means, which divides the instances of a dataset into clusters. Since the number of clusters to be extracted is predefined, K-Means can be easily used in classification by setting K to be equal or greater than the number of classes and mapping each cluster to a specific class. In K-Means based CvC, is usually set equal to the number of classes . K- Means forms the clusters using the training set. Each cluster is mapped to the appropriate class using the labels of the training data. The majority class of the instances of each cluster is assigned to it. In the classification process, the unlabeled instances are assigned to the closest centroid of the clusters formed in the previous step. The formulation of the clusters using K-Means algorithm is summarized in the following simple K-Means algorithm: Input: DT (DT: the training set) Output: m (m: the means (centroids)) 1: t=0; 2: initialize randomly the C means m1 t,…,mC t; 3: do 4: C=AssignInstances(DT, m t); 5: mt+1=UpdateMeans(C); 6: t++; 7: while mt!=mt-1 8: return mt; K-Means consists of three main steps. At first, the means are initialized randomly (line 2). Next, the algorithm assigns the training instances to the “nearest”' mean's cluster (line 4) and updates the means to the centroid of the formulated cluster (line 5). To assign each instance , = 1, . . . , , to a cluster, the squared Euclidean distance , = 1, . . . , , from each mean is calculated, as shown in (1). = ∑ ( , − , ) (1) where is the number of attributes. After the calculation of the squared Euclidean distance, the instance is assigned to the cluster where ≤ for every ∈ 1, . . . , . The third step of k-means is the recalculation of each mean to be the centroid of all the instances currently in the cluster: , = (∑ ∈ , )/| | , (2) | | is the size of the cluster, in other words the number of instances that are assigned to it. In K-Means, if (the number of clusters) and (the number of attributes) are fixed, the problem can be exactly solved in time ( ), where is the number of instances to be clustered [25]. The K-Means algorithm has some limitations and weaknesses [26 - 28] that influence negatively its classification results. First of all, the random initialization of the centroids affects significantly the formation of the cluster-classses. Furthermore, due to clustering task small classes may be dominated by bigger classes, so the grouping may demarcate the bigger class instances into 2 separate classes and include the instances of the small class to one of the bigger clusters. Moreover, the presence of noise - outliers in the dataset can affect the cluster pulling its centroid to a ’bad’ position. Another factor that should be taken into account is the distance criterion for the assignation of each instance to a centroid. In CvC, each class may have a difference standard deviation in each attribute than the other classes, thus each attribute may have different contribution in the classification. III. CONSTRAINED K-MEANS CLASSIFICATION ALGORITHM Constrained K-Means Classification (C-K-Means), a novel CvC algorithm, is proposed in this work. The algorithm is designed so as to address the limitations and weaknesses mentioned above. As classification is a supervised learning technique, background knowledge is used efficiently to extract the model. Two main alterations to K-Means are proposed to use background knowledge: (i) application of constraints to the initialization and update of the centoids, (ii) weighted euclidean distance fuction employment. The training data are used to acquire constraints for each centroid and the weights for each attribute and class. Since each hypercube of constraints is generated based on data from a single class, each centroid inherits the class of the respective hypercube. The clustering procedure and the formulation of the clusters takes place in the test data rather than the training set. This helps us to better classify the testing instances, not only using the distance from each observation from the centroids, but also updating the centroids to the test set clusters formation. A. Description of C-K-Means The model produced during the training procedure includes a hypercube of each class and the weights for each attribute and each class. The algorithm takes as input a training dataset with attributes and classes and a hypercube is defined for each class by calculating minimum and maximum bounds for each attribute: , = , − ∗ , (3) , = , + ∗ , (4) with , being the average value of the attribute ( ∈[1, ]) of the instances belonging to the class ( ∈ [1, ]), and , the respective standard deviation. The minimum and maximum bounds are , and , , respectively and is a parameter defining the relaxation of these bounds. The bounds are used to limit the movement of the centroids and so the cluster formation. The weights are used to facilitate a weighted Euclidean distance calculation as a similarity criterion to the clusters during clustering procedure. The concept includes the estimation of the standard deviation of each class’ features in order to define an importance factor for that to be used in the similarity criterion. The calculation of the weights is given by the following formula: , = 1 − ,, , (5) with , being the calculated weight, , being the standard deviation of the attribute ( ∈ [1, ]) of the observations belonging to the class ( ∈ [1, ]), and , , , the respective maximum and minimum values. The weights are then normalized: , = ,∑ , (6) After the acquisition of the hypercube of constraints and the weights, the algorithm is ready for the CvC of the unlabeled Engineering, Technology & Applied Science Research Vol. 8, No. 4, 2018, 3203-3208 3205 www.etasr.com Smyrlis et al.: Constrained K-Means Classification instances. K-means algorithm is applied as follows: Initialization step: The initial K centroids are generated randomly within the hypercubes, with one centroid in each hypercube ( = ). Thus: , ≤ , ≤ , (7) where , is the centroid value of class in attribute . Assignment step: In this step, the weighted Euclidean distance is employed as similarity criterion, with each observation assigned to the closest centroid. Weighted Euclidean distance is calculated as follows: = ∑ , ∗ ( , − , ) (8) Update step: If during this procedure, a centroid is to be positioned outside the class hypercube of constraints (in which is initilly created), then it is forced to the bound values, i.e. if the value is less than its class , , then it is set to , , and the same applies if it is greater than the respective , . After the convergence of the clustering procedure, each instance is classified to the class of the cluster of the centroid that it was assigned in the last iteration. IV. EXPERIMENTAL RESULTS AND DISCUSSION To set up the experimental procedure, the discussed algorithm is implemented using C++. The relaxation parameter in (3) and (4), is experimentally set to 0.1. Also, all attributes of the datasets are normalized. In order to evaluate the proposed algorithm, a comparative study with CvC with simple K-Means took place and classification accuracy, average sensitivity and average precision performance metrics are extracted. For that purpose, WEKA software [29] was employed, where the respective results for CvC with Simple K- Means were obtained. The number of clusters is set equal to the number of classes of each dataset. Both algorithms are tested with 42 datasets from the UCI Machine Learning Repository, with various characteristics. The datasets, number of attributes and instances are presented in Table I (the number of attributes does not include the class). Classification accuracy, average sensitivity and average precision are obtained for each dataset; all results are shown in Table II. The differences per dataset are illustrated in Figures 1, 2 and 3 for the classification accuracy, average sensitivity and average precision, respectively. The diagrams were created as follows: the result obtained with simple K-Means was subtracted by the respective C-K-Means result (for each dataset) and then the results were sorted in a descending order. The experimental study was based on an extremely large selection of classification datasets (42), including all classification datasets reported as “Most Popular” in the website, and several other well-known datasets. The main exclusion criterion was that no missing values should exist, since dealing with missing values is outside the scope of this study. The datasets have a number of attributes ranging from 4 to 255 and number of instances ranging from 83 to 45211, with 8 datasets having less than 200 instances, 20 having 200 to 1000 instances, 10 having 1000 to 5000 instances and 4 having more than 5000 instances. TABLE I. DATASETS INFORMATION # Dataset Attributes Instances 1 Abalone 8 4177 2 Balance scale 4 625 3 Bank Full 17 45211 4 Banknote 4 1372 5 Blood Transfusion 4 748 6 Breast cancer (wdbc) 32 569 7 Breast cancer (winscon) 9 699 8 Breast_tissue 9 106 9 Cardiotocography 21 2126 10 Contraceptive 9 1473 11 Dermatology 34 366 12 Ecoli 7 336 13 Fertility Diagnosis 9 100 14 Firm Teacher 19 10800 15 Glass 9 214 16 GPS 6 163 17 Ionosphere 34 351 18 Iris 4 150 19 Messidor features 18 1151 20 Occupancy Detection 7 20560 21 Optdigit 64 1797 22 Parkinson disease 22 195 23 Parkinson speech 26 1239 24 Pendigits 16 10992 25 Pima indian 8 768 26 Seeds 7 210 27 Semeion 255 477 28 Sonar 60 208 29 Spect heart 22 267 30 Spectf heart 44 267 31 Statlog aust credit 14 690 32 Statlog german credit 24 1000 33 Statlog heart 13 270 34 Statlog img seg 19 810 35 Statlog sat image 36 2000 36 Thyroid disease new 5 215 37 Urban 147 168 38 Vertebral column 3C 6 310 39 Waveform 21 5000 40 Wine 13 178 41 Yeast 8 1484 42 Zoo 16 83 For all experiments, the classification accuracy results per fold were obtained and a paired 10-fold cross validated T-test [30] was performed, in order to examine the statistical importance of the differences, with 95% confidence interval of 95%. In Table II, all datasets with statistically important difference are marked with a bullet in the right. The results presented in Table II indicate that the proposed approach is superior in terms of classification accuracy: the proposed algorithm outperforms the traditional K-means based CvC to all 42 datasets with the difference ranging from 1.14% to 43.22%. More detailed, the difference in 11 datasets is up to 5%, in 13 it ranges from 5% to 15%, while in another 10 datasets it’s between 15% and 25% and in the remaining 8 datasets it is over 25%. Accuracy differences for all datasets are illustrated in Figure 1. Regarding the average sensitivity res K- 32 15 sur 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 we sim K- ins app Engineerin www.etasr sults, once aga -Means to all .16%. In 7 dat %, in 12 datas rpassed 25%. TABLE II. Dataset Abalone Balance scale Banknote ● Blood Transfusi Breast cance (Winsconsin Breast cancer (w Breast_tissue Cardiotocograph Contraceptive Dermatology Ecoli ● Fertility Diagno Firm Teacher Glass GPS ● Ionosphere ● Iris Messidor Featu Optdigit ● Parkinson Disea Parkinson Spee Pima Indian Seeds Semeion ● Sonar ● Spect Heart Spectf Hear Statlog Aust Cre Statlog Germ Credit ● Statlog Hear Statlog Img Se Statlog Sat Imag Thyroid Disea New ● Urban ● Vertebral Column 3C ● Waveform ● Wine Yeast ● Zoo ● Bank Full ● Occupancy Detection ● Pendigits ● Average Despite the ere observed mple K-Means -Means classi stances from proach. All se ng, Technology r.com ain the propos datasets. The tasets it was be sets it was betw CLASSIFICA Classifica Accura K- Means C M 50.73% 52 e ● 55.52% 68 ● 57.43% 85 ion ● 59.09% 66 er n) 95.71% 96 wdbc) 92.62% 93 ● 41.51% 60 hy ● 34.20% 54 e ● 37.54% 47 ● 71.86% 97 52.98% 79 sis ● 44.00% 68 r ● 55.56% 98 40.19% 45 66.87% 82 ● 70.94% 75 84.00% 93 ures 52.74% 57 74.80% 90 ase ● 58.46% 76 ech 57.78% 60 ● 64.84% 70 88.57% 91 58.07% 82 54.33% 73 t 62.55% 67 rt 64.04% 66 edit ● 76.81% 85 an 59.40% 70 rt 77.04% 80 eg ● 57.90% 85 ge ● 67.30% 78 ase 85.12% 94 60.12% 73 ● 44.19% 74 ● 49.94% 81 94.38% 96 36.86% 50 69.88% 92 ● 56.78% 71 y ● 62.16% 87 70.87% 82 62.28% 76 smaller max where C-K-M s with a bigger fication mana each class (i ensitivity resul y & Applied Sci sed algorithm difference ran elow 5%, in 1 ween 15% and ATION-VIA-CLUSTE ation acy Avera Sensitiv C-K- Means K- Means M 2.21% 50.90% 5 8.64% 51.43% 7 5.57% 57.70% 8 6.44% 49.20% 6 6.85% 94.85% 9 3.85% 91.05% 9 0.38% 45.02% 5 4.99% 38.32% 6 7.39% 37.77% 4 7.81% 75.67% 9 9.76% 49.99% 6 8.00% 50.40% 6 8.78% 65.80% 9 5.33% 40.27% 5 2.82% 65.15% 8 5.21% 71.40% 7 3.33% 87.30% 9 7.41% 51.85% 5 0.76% 75.81% 9 6.41% 70.50% 7 0.02% 57.35% 6 0.96% 60.55% 7 1.43% 88.57% 9 2.18% 58.14% 8 3.56% 54.35% 7 7.42% 60.65% 6 6.29% 50.00% 7 5.65% 78.55% 8 0.40% 52.50% 6 0.37% 77.15% 8 5.06% 62.81% 8 8.15% 66.55% 7 4.42% 72.80% 8 3.21% 63.36% 7 4.52% 41.52% 7 1.20% 49.97% 8 6.63% 95.30% 9 0.27% 37.93% 5 2.77% 65.06% 8 1.93% 52.10% 7 7.99% 68.60% 8 2.94% 71.51% 8 6.41% 61.67% 7 imum differe Means was fo r difference. T aged to class in average) t lts are also giv ience Research outperforms s nges from 1.9 7 it was from d 25%, while ERING RESULTS age vity Aver Precis C-K- Means K- Means 53.54% 52.57% 5 73.93% 55.03% 7 84.92% 57.60% 8 67.16% 49.35% 6 96.81% 95.60% 9 93.28% 93.15% 9 58.62% 50.40% 6 64.02% 34.30% 5 49.47% 37.60% 4 97.55% 84.45% 9 61.79% 52.19% 5 60.23% 50.20% 5 97.96% 58.80% 9 51.96% 29.32% 5 81.66% 71.05% 8 76.48% 69.85% 7 93.33% 88.70% 9 57.42% 52.00% 5 90.76% 76.24% 9 77.34% 66.25% 7 60.04% 57.25% 5 72.33% 60.90% 7 91.43% 88.90% 9 81.95% 59.35% 8 73.08% 54.35% 7 67.42% 61.05% 6 78.77% 37.85% 6 86.30% 76.90% 8 69.71% 52.45% 6 80.83% 76.85% 8 85.79% 63.16% 8 77.59% 68.18% 7 87.62% 86.47% 9 74.44% 64.74% 7 73.22% 43.17% 7 81.11% 50.07% 8 97.18% 94.20% 9 57.24% 34.00% 4 84.85% 60.17% 8 71.64% 50.90% 5 89.36% 63.25% 8 82.93% 72.54% 8 76.44% 61.60% 7 nce, more da ound to outpe his indicates th sify correctly than the tradi ven in Figure 2 h V simple 6% to 5% to in 6 it age sion C-K- Means 50.84% 71.66% 85.85% 62.75% 96.28% 93.53% 61.40% 53.60% 48.41% 97.55% 57.77% 54.96% 97.62% 52.39% 87.05% 74.46% 93.38% 57.39% 91.13% 71.54% 59.85% 70.33% 91.38% 82.61% 73.68% 67.81% 68.97% 85.91% 67.19% 80.46% 85.72% 77.36% 97.53% 74.47% 72.23% 84.63% 96.43% 48.65% 87.94% 59.80% 82.48% 83.20% 75.45% atasets erform hat C- more itional 2. The last deta trad whi pre 10 bett whe abo than clas with algo fold diff to resu acc clas Vol. 8, No. 4, 20 S t metric, avera ail in Table I ditional CvC. ile one datase sented slightly datasets were ter up to 5%, ere the differe ove 25%. Here n the tradition ss actually bel h higher dif orithm are obs d cross vali ference in clas 30 out of 42 ults, alongside curacy confir ssification is s Fig. 1 Fig. 2. Fig. 3. 018, 3203-3208 Smyrlis et al.: C age precision, I. Again, the The difference et was found y better precisi e found where , 18 where it ence was from e is shown tha nal one in ter longs to that c fferences to served accordi dated T-test ssification accu 2 datasets. Av e with the T- rm in a ve uperior to K-m 1. Accuracy d Average sensitiv Average precisi 8 Constrained K-M is illustrated proposed app e ranges from where the tr ion (+1.73%). e C-K-Means was better fr m 15% to 25% at the proposed rms of if wha class. Once ag the benefit ing to this met revealed th uracy was stat verage sensitiv -test validatio ersatile way means based C differences for all vity differences fo ion differences for 3206 Means Classific in Figure 3 a proach outperf m 0.38% to 38. raditional appr On the other h classification rom 5% to 15 and 6 where i d approach is b at is classified gain, more dat for the prop tric. The paire hat the nume tistically signif vity and prec on in classific that C-K-M CvC. datasets or all datasets r all datasets cation and in forms .82%, roach hand, n was 5%, 8 it was better d in a tasets posed ed 10- erical ficant cision cation Means Engineering, Technology & Applied Science Research Vol. 8, No. 4, 2018, 3203-3208 3207 www.etasr.com Smyrlis et al.: Constrained K-Means Classification V. CONCLUSIONS In this study, a novel CvC algorithm, based on K-Means clustering is proposed. The algorithms include two major modifications compared to the K-means, being (i) the use of a hypercube of constraints for each centroid extracted from the information of the training data, and (ii) the use weights for each attribute and each class along with the weighted Euclidean distance as a similarity criterion for the clustering procedure. An initial effort on this direction can be found in [31]. Since classification is a supervised learning method, the performance can be assessed by the classification rate, i.e. the number of successes and failures of the model. Reducing the number of misclassifications is very important regardless of which clustering algorithm or which variation of K-Means is used in CvC, an efficient use of the background knowledge is required to succeed. The introduction of constrains in the centroids initialization and update based on background knowledge, which is the main idea of the proposed algorithm, contributes to this direction. In the C-K-Means, both training (mainly associated with classification) and clustering are used, since by training the class hypercubes and attribute weights are obtained, and clustering is used for moving the centroids inside the hypercubes. Thus, the principal idea is to use the background knowledge to limit the solutions inside a predefined space (i.e. the hypercubes) and then follow an unsupervised technique to fine tune this solutions (by allowing the centroids to be updated but forcing them to remain inside the hypercube). Experimental results were obtained using 42 datasets from the UCI machine learning repository, with various number of attributes and instances. The results clearly demonstrate that the proposed approach dominates CvC with simple K-Means. Differences up to 43.22% in accuracy, 32.16% in average sensitivity and 38.82% in average precision are presented, while statistical analysis in terms of the paired 10-fold cross validated T-test revealed statistically significant differences in 30 out of 42 datasets. The proposed algorithm is generic, and in this realization K-means clustering and hard hypercube bound, have been used. Alternative approaches, such as fuzzy c-means or soft/ellipsoid bounds, can be integrated. Moreover, other types of weights or other metric of distance can be used as a similarity criterion. All the above will result to alternative realizations of the proposed algorithm; potential variations of the C-K-Means classification algorithm will be addressed in future communications. REFERENCES [1] P. N. Tan, Introduction to data mining, Pearson Education India, 2006 [2] R. J. Schalkoff, Artificial neural networks. Vol. 1. McGraw-Hill, New York, 1997 [3] P. Langley, W. Iba, K. Thompson, “An analysis of Bayesian classifiers”, Tenth National Conference On Artificial Intelligence (AAAI-92), USA, July 12–16, 1992 [4] A. Chidanand, F. Damerau, S. M. Weiss, “Automated learning of decision rules for text categorization”, ACM Transactions on Information Systems (TOIS), Vol. 12, No. 3, pp. 233-251, 1994 [5] S. K. M. Wong, W. Ziarko, On optimal decision rules in decision tables, University of Regina, Computer Science Department, 1985 [6] J. R. Quinlan, “Induction of decision trees”, Machine Learning, Vol. 1, No. 1, pp. 81-106, 1986 [7] J. R. Quinlan, C4.5: Programming for machine learning, Morgan Kauffmann, 1993 [8] L. Rokach, “Ensemble-based classifiers”, Artificial Intelligence Review, Vol. 33, No. 1-2, pp. 1-39, 2010 [9] M. I. Lopez, J. M. Luna, C. Romero, S. Ventura, “Classification-via- clustering for predicting final marks based on student participation in forums”, International Educational Data Mining Society, 2012 [10] P. V. Gorsevski, P. E. Gessler, P. Jankowski, “Integrating a fuzzy k- means classification and a Bayesian approach for spatial prediction of landslide hazard”, Journal of Geographical Systems, Vol. 5, No. 3, pp. 223-251, 2003 [11] J. Erman, M. Arlitt, A. Mahanti, “Traffic classification using clustering algorithms”, SIGCOMM Workshop on Mining Network Data, Pisa, Italy, pp. 281-286, September 11-16, 2006 [12] M. Panda, M. R. Patra, “A novel Classification-via-clustering method for anomaly based network intrusion detection system”, International Journal of Recent Trends in Engineering, Vol. 2, No. 1, pp. 1-6, 2009 [13] P. A. Burrough, P. F. M. van Gaans, R. A. MacMillan, “High-resolution landform classification using fuzzy k-means”, Fuzzy Sets and Systems, Vol. 113, No. 1, pp. 37-52, 2000 [14] E. Reuben, B. Pfahringer, G. Holmes “Clustering for classification”, 7th International Conference on Information Technology in Asia, Malaysia, July 12-13, 2011 [15] S. Lloyd, “Least squares quantization in pcm”, IEEE Transactions on Information Theory, Vol. 28, No. 2, pp. 129-137, 1982 [16] J. MacQueen, “Some methods for classification and analysis of multivariate observations”, Proceedings of the 5th Berkeley symposium on mathematical statistics and probability, Vol. 1, No. 14, 1967 [17] K. Wagstaff, C. Cardie, S. Rogers, S. Schrodl, “Constrained k-means clustering with background knowledge”, Proceedings of the Eighteenth International Conference on Machine Learning, pp. 577–584, Morgan Kaufmann, 2001 [18] I. Davidson, S. S. Ravi, “Clustering with constraints: Feasibility issues and the k-means algorithm”, Proceedings of the 2005 SIAM International Conference on Data Mining, Society for Industrial and Applied Mathematics, 2005 [19] P. S. Bradley, K. P. Bennett, A. Demiriz, “Constrained k-means clustering”, Microsoft Research, Redmond, pp. 1-8, 2000 [20] A. Likas, N. Vlassis, J. J. Verbeek, “The global k-means clustering algorithm”, Pattern Recognition, Vol. 36, No. 2, pp. 451-461, 2003 [21] J. C. Bezdek, R. Ehrlich, W. Full, “FCM: The fuzzy c-means clustering algorithm”, Computers & Geosciences, Vol. 10, No. 2-3, pp. 191-203, 1984 [22] D. Dua, K. E. Taniskidou, UCI Machine Learning Repository, Irvine, University of California, School of Information and Computer Science, 2017 [23] R. Payam, L. Tang, H. Liu, “Cross-validation” in: Encyclopedia of Database Systems, pp. 532-538, Springer, 2009 [24] G. Tsoumakas, I. Katakis, “Multi-label classification: An overview”, International Journal of Data Warehousing and Mining, Vol. 3, No. 3, 2006 [25] M. Inaba, K. Naoki, I. Hiroshi, “Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering”, Proceedings of the 10th ACM Annual Symposium on Computational Geometry, pp. 332-339, ACM, 1994 [26] P. S. Bradley, U. M. Fayyad, “Refining Initial Points for K-Means Clustering”, Proceedings of the 15th International Conference on Machine Learning (ICML98), pp. 91- 99, Morgan Kaufmann, 1998 [27] J. M. Pena, J. A. Lozano, P. Larranaga, “An empirical comparison of four initialization methods for the k-means algorithm”, Pattern Recognition Letters, Vol. 20, No. 10, pp. 1027-1040, 1999 [28] S. Kehar, D. Malik, N. Sharma, “Evolving limitations in K-means algorithm in data mining and their removal”, International Journal of Computational Engineering & Management, Vol. 12, pp. 105-109, 2011 [29] M. Hall, F. Eibe, G. Holmes, B. Pfahringer, P. Reutemann, I. H. Witten, “The WEKA data mining software: an update”, ACM SIGKDD Explorations Newsletter, Vol. 11, No. 1, pp. 10-18, 2009 Engineering, Technology & Applied Science Research Vol. 8, No. 4, 2018, 3203-3208 3208 www.etasr.com Smyrlis et al.: Constrained K-Means Classification [30] T. G. Dietterich, “Approximate statistical tests for comparing supervised classification learning algorithms”, Neural Computation, Vol. 10, No. 7, pp. 1895-1923, 1998 [31] D. C. Tsouros, P. N. Smyrlis, M. G. Tsipouras, D. G. Tsalikakis, N. Giannakeas, A. T. Tzallas, P. Manousou, “Automated collagen proportional area extraction in liver biopsy images using a novel classification-via-clustering algorithm”, IEEE 30th International Symposium on Computer-Based Medical Systems (CBMS), Greece, June 22-24, 2017