Microsoft Word - 07_IS_Albert_Hendro_Analisa Cluster K-means Kemiskinan Revised - KP-- 2


Application of K-Means Algorithm … (Albert V. Dian Sano; Hendro Nindito) 141 

APPLICATION OF K-MEANS ALGORITHM 
FOR CLUSTER ANALYSIS ON POVERTY 

OF PROVINCES IN INDONESIA 
 
 
Albert V. Dian Sano1; Hendro Nindito2 

 
1,2 Information Systems Department, School of Information Systems, Bina Nusantara University 

Jln. K.H. Syahdan No 9, Palmerah, Jakarta Barat, 11480 
1albert_vds@yahoo.com; 2h_nindito@yahoo.com 

 
ABSTRACT 
 
 
The objective of this study was to apply cluster analysis or also known as clustering on poverty data of 
provinces all over Indonesia.The problem was that the decision makers such as central government, local 
government and non-government organizations, which involved in poverty problems, needed a tool to support 
decision-making process related to social welfare problems. The method used in the cluster analysis was k-
means algorithm. The data used in this study were drawn from Badan Pusat Statistik (BPS) or Central Bureau 
of Statistics on 2014.Cluster analysis in this study took  characteristics of data such as absolute poverty of each 
province, relative number or percentage of poverty of each province, and the level of depth index poverty of 
each province in Indonesia. Results of cluster analysis in this study are presented in the form of grouping of 
clusters' members visually. Cluster analysis in the study can be used to identify more quickly and efficiently on 
poverty chart of all provinces all over Indonesia. The results of such identification can be used by policy makers 
who have interests of eradicating the problems associated with poverty and welfare distribution in Indonesia, 
ranging from government organizations, non-governmental organizations, and also private organizations. 

 
Keywords: cluster analysis, k-means, poverty 
 
 
INTRODUCTION 
 
 
Poverty is one of the social problems and also become a challenge for many communities 
around the world to always find a solution. At the global level, data on poverty regarding the number 
of poor people is dominated by developing countries. However, in developed countries like the 
United States as well, there are still poor people. So poverty is everywhere universally to be a 
problem with the community and the world. In the national context, Indonesia, in the New Order era 
that began in the mid or late 1960s to 1996, the poverty rate in Indonesia decreased drastically due to 
strong economic growth with poverty alleviation programs. Unfortunately, the economic crisis in 
1997/1998 hit the Indonesian economy and raised the amount of poverty in Indonesia sharply. 

 
The experience of poverty reduction in the past have shown many weaknesses, such as: (1) 

macro growth orientation without considering aspects of equity, (2) centralized policy, (3) more 
caricature than transformative, (4) positioning communities as objects rather than subjects, (5) 
orientation poverty alleviation tends to caricature and instantaneous than sustainable productivity, and 
(6) perspectives and solutions that are generic to the problems of poverty that exist regardless of 
diversity which exists. Because it is so diverse nature of the challenges that exist, the handling of the 
problem of poverty must touch the bottom of the source and root of the real problem, either directly or 
indirectly (Multifiah, 2011). 

 
142   ComTech Vol. 7 No. 2 June 2016: 141-150 

Inequality or inequality in poverty reduction is not an easy problem. Reducing inequality is 
more complex than just reducing poverty. In addition, it also has the potential imbalances bigger 
problem than the problem of poverty itself. Figures economic inequality is measured using the Gini 
index. Gini index figures are presented in the form of a number between 0 and 1. The higher the 
number of Gini index, the higher the degree of economic inequality. Gini index numbers are high is a 
serious social threat and could lead to social unrest. In some countries that have a Gini index above 0.5 
tends to be social unrest and disintegration. 

 
Table 1 Gini Index Indonesia (BPS, 2015) 
 

INDONESIA 
1996 1999 2002 2005 2007 2008 2009 2010 2011 2012 2013 

0.355 0.308 0.329 0.363 0.364 0.35 0.37 0.38 0.41 0.41 0.413 
 
 
To reduce drawbacks of the problems related to inequality in poverty reduction, then this 
study aims to represent the poverty map based on provinces groupings all over Indonesia. This study 
will apply cluster analysis or commonly called clustering. Results of this study are expected to 
facilitate the stakeholders to see the poverty map visually so that the decision makers associated with 
poverty alleviation programs can more quickly and easily set policies for provinces deserving to get 
higher priorities and to receive major attention in poverty alleviation programs. 

 
The problem is that the decision makers such as central government, local government and 

non-government organizations, which involve in poverty problems, need a tool to support decision-
making process related to social welfare problems. Cluster analysis in this study will try to group 
provinces all over Indonesia into four clusters. This study will consist of two types of cluster analysis. 
The first is to perform cluster analysis based on a variable of the percentage of the number of poor 
people and a variable of depth's level of poverty. The second is to perform cluster analysis based on a 
variable of an absolute number of poor people and a variable of depth's level of poverty. Cluster 
analysis is an analysis aimed at grouping the data objects into clusters based on similar variables or 
characteristics. Data objects having high similarities would be put in the same cluster while those 
having low similarities or a big difference will be put into different clusters. 

 
Poverty has a diverse concept. World Bank defines poverty using purchasing power, i.e., US 

$2 per capita per day. Meanwhile, Badan Pusat Statistik or BPS (BPS-Statistics Indonesia) defines 
poverty based on the poverty line. According to BPS (2015), the poor are people who have an average 
monthly expenditure below the poverty line per capita. The poverty line, according to BPS, is the sum 
of the food poverty line (FPL) and Non-Food Poverty Line (NFPL). FPL is the spending on minimum 
food needs equivalent to 2100 calories per day per capita. Basic needs package of food commodities is 
represented by 52 kinds of commodities, such as grains, tubers, fish, meat, eggs, milk, vegetables, 
legumes, fruits, oils, fats, and others. NFPL is the minimum requirement for housing, clothing, health 
and education. Another definition of poverty by Bappenas (Purwanto, 2007) is a condition in which a 
person or a group of men and women who are unable to meet their basic rights to maintain and 
develop a dignified life. The basic rights of human beings include the fulfillment for food, clothing, 
health, education, employment, housing, clean water, land, natural resources and the environment, safe 
treatment or safe from the threat of violence and the right to participate in political and social life. 
Because this study will use a database of BPS in 2014, then it will automatically refer to the concept of 
poverty by BPS. 

 
Cluster analysis technique in this study will apply to some specific characteristics or attributes 

of data, that is, the absolute number of poverty, the relative number or percentage of poverty, and the 


Application of K-Means Algorithm … (Albert V. Dian Sano; Hendro Nindito) 143 

level of poverty's depth index. The absolute number of poverty is the number of people living below 
the poverty line. The relative number is the percentage of population below the poverty line. The 
poverty's depth index is the average gap of expenditure of the poor to the poverty line. The higher the 
index value, the higher the gap between the average expenditure from the poverty line. 

 
"We are living in the information age" is a popular saying; however, we are living in an era of 

data. The data in terabytes or petabytes poured into our computer network, World Wide 
Web (www), and various data storage devices each day ranging from world business, community, 
science and engineering, medicine, and almost every other aspect of daily life. The explosive growth 
of the volume of existing data is the result of the process of computerization of our society and the 
rapid development of various devices the collection and storage of data which is terrific (Han & 
Kamber, 2012). 

 
The explosive widely available growth of data really makes us aware that we are in the era of 

data. Various reliable and versatile tools are needed to automatically reveal valuable information from 
the large-volume data and transform it into the organized knowledge. This need has led to the birth of 
data mining. The field is still young, dynamic and promising. Data mining has been and will continue 
to make great strides in our journey from the era of data into the information age to come (Han & 
Kamber, 2012). 

 
Data mining is the process of finding previously unknown patterns and trends in databases and 

using that information to build predictive models. Data mining provides a set of tools and techniques 
that can be applied to this processed data to discover hidden patterns and also provides healthcare 
professionals an additional source of knowledge for making decisions (Hossain et al., 2013) 

 
Data mining is a fun way to extract various kinds of patterns, which presents knowledge 

stored in large data sets implicitly and focuses on matters related to its feasibility, usefulness, 
effectiveness and scalability. Data mining can also be seen as a very important step in the process to 
find knowledge. Data is normally done through a pre-process data cleansing, data integration, 
selection and transformation of data and prepared for mining. Data mining can also be done on 
different types of databases and data storage, but the type of pattern is found determined by different 
types of functionality mining data such as descriptions, association, correlation analysis, classification, 
prediction, analysis of clusters, and so on (Tajunisha, 2010). 

 
The concept of data mining involves three steps, i.e., capturing and storing the data, 

converting the raw data into information and converting the information into knowledge. Data in this 
context comprises all the raw material an institution collects via normal operation. Capturing and 
storing the data is the first phase that is the process of applying mathematical and statistical formulas 
to "mine" the data warehouse (Kumar, 2011). 

 
Figure 1 Data mining and Knowledge Discovery Process of Database 
(Sources: Fayyad in Silwattananusarn & Tuamsuk, 2012) 


144  

B
iterative 
Selection
invalid d
the data 
algorithm
/ Evaluat
same pat
ordinary 
 

C
object to
from the
algorithm
(Gothai 

 
C

observat
the objec
set of clu
& Kamb

 
C

2011). A
problems
techniqu
structure

 
O

data poin
the simp
other. T
Euclidia

 
I

the other

Based on Fi
methods su

n: Choosing 
data and inco

into a form
m that match
tion: Interpre
ttern and rep
people. 

Clustering is
ogether in a 
e clustering 
ms, two of w
& Balasubra

Cluster analy
tion) into sev
cts that are si
usters resultin
ber, 2012). 

Cluster analy
Analysis of th
s that are un

ue used to sol
e of the data t

One importa
nts. If a com
ple Euclidean
The distance 
an and City B

In addition t
r measureme

T

Mink

Eucli

City-

Maha

gure 1 above
uch as the fo
yes ng Data 

onsistent data
m suitable to
es the pattern
et various pat
petitive; trans

s an importa
cluster (or c
process. Bu

which most im
amanie, 2012

ysis also calle
veral subsets.
imilar to each
ng from the c

ysis offers a 
he cluster ca
nsupervised 
lve problems
that has not b

ant compone
mponent of th
n distance m

between th
Block or Man

to the similar
ents are show

Table 2 Size of

Measure

kowski distance

idean distance 

-block distance 

alanobis distanc

e, the knowl
ollowing (Fa
relevant to th

a; combine m
o perform da
n in d nature 
tterns into kn
slating a varie

ant method 
clusters) and 
ut there are 
mportant is, c
2). 

ed clustering
 Each of the
h other, but v
cluster analys

useful way t
an be regarde
learning or t

s with techniq
been labeled (

ent of the clu
he vector sam

metric is suffi
he two grou
nhattan. 

rity and dissi
wn in Table 2

f Similarity an

es 

e 

ce 

(Sources

ledge discov
ayyad, Han, 
he task of a d

multiple sour
ata mining. 
of the data; 

nowledge by 
ety of pattern

of data ware
dissimilar o
some specia

clustering pe

g is the proces
ese subsets is 
very different
sis such as cl

to organize a
ed as the mo
the learning 
ques like this
(Tayal & Ra

ustering algo
mple data is i
icient to clas

ups can be m

imilarity of t
2 (Xu & Wun

 
nd Dissimilari
 

, where S i
s:Xu & Wunsc

ComTech

ery process 
in Silwattan

database ana
ces of data. (
(4) Data Mi
extract vario
eliminating 

ns useful in t

ehousing an
object in oth
al requireme
erformance a

ss of dividing
a cluster, su

t from the ob
lustering can 

and present a
ost popular t
process und

s, will certain
aghuwanshi, 

orithm is a m
in the same p
ssify the data
measured by

the two types
nsch, 2005).

ity for Quantit

Form

is the within gro
ch, 2005) 

h Vol. 7 No. 2

consists of s
nanusarn & 
alyst. (2) Prep
(3) Transform
ining: Choos

ous data patte
irrelevant va
terms that co

nd data minin
er cluster (o
ents for sear
and meaningf

g a set of dat
ch that the o

bjects that are
be referred t

a complex da
techniques a
directed or u
nly find a wa
2011). 

measure of t
physical unit
a instants tha
y (Tayal & 

s of measure

tative Variabl

ms 

oup covariance 

2 June 2016:

several seque
Tuamsuk, 2

processing: D
mation: Tran
sing the dat
erns. (5) Inter
arious pattern
ould be under

ng. It group
r clusters) o
rch results c
ful cluster de

ta objects (or 
objects in a c
e in another c
to as a cluster

ataset (Wang
and foremost
unsupervised
ay of dealing

the distance 
t, then it is li
at are simila
Raghuwansh

ement above

les  

matrix 

 141-150 

ential and 
012): (1) 

Delete the 
nsforming 
ta mining 
rpretation 
ns and the 
rstood by 

ps similar 
r remove 

clustering 
escription 

object of 
cluster are 
cluster. A 
ring (Han 

 & Song, 
t to solve 
. So each 

g with the 

between 
ikely that 

ar to each 
hi, 2011) 

, some of 


Application of K-Means Algorithm … (Albert V. Dian Sano; Hendro Nindito) 145 

K-means is one of the learning algorithms undirected/unsupervised learning, the simplest used 
to solve various problems of the grouping. The procedure is by applying a simple and easy way to 
classify data that has been given to some clusters (such as clusters k) predefined (Tayal & 
Raghuwanshi, 2011). 

 
K-means algorithm will define the midpoint of the cluster from the average value of the points 

in the cluster. Steps in k -means algorithm can be explained as follows. Per all, the algorithm will 
select k (central cluster) at random from various objects in D (dataset), which respectively represent 
the center of the cluster at the beginning or the first time. For any other object, each object is assigned 
or grouped into clusters that are most similar or the most closely based on the Euclidean distance 
between the object and the center cluster. 

 
K-means clustering algorithm then iterates to improve or increase the separation distances or 

similarities in the cluster. For each cluster, this algorithm will calculate a new average using the 
objects are grouped into a cluster in the previous iteration. All objects will then be regrouped by using 
the average of the newly updated as the new cluster center. The iterations will continue until it reaches 
a stable grouping, which means that the clusters formed in the latest iteration are the same as 
the clusters formed in the previous iteration. K-means clustering procedure is summarized in Figure 2 
below (Han & Kamber, 2012). 

 
Figure 2 Summary of Procedure Algorithm K-Means 
(Source: Han & Kamber, 2012) 

 
As with any other algorithms, k-means also has some advantages and disadvantages. Here are 
the advantages and disadvantages of k-means algorithm according to Tayal and Raghuwanshi (2011). 
The advantages are (1) k-means is a simple algorithm that has been adapted to many domains. (2) 
More automated than making threshold imply a manual of an image or images. (3) This is an 
algorithm that can be good candidates for use as a continuation of the work relates to vectors that have 
the characteristics feature or vague (fuzzy). 

 
On the other hand, the disadvantages are (1) though it can be demonstrated that the procedure 

will always end, k-means clustering algorithm does not always find the most optimal configuration, 
which is related to the global objective function. (2) This algorithm is also very sensitive to cluster 
centers randomly selected at the beginning. K-means algorithm can be run several times to reduce the 
impact on this problem. 

 
Algorithm: k-means. The k-means algorithm for partitioning, where each cluster’s center is represented 
by the mean value of the objects in the cluster. 

Input: 
 k: the number of clusters, 
 D: a data set containing n objects. 

Output: A set of k clusters. 
Method: 

1) Arbitrarily choose k objects from D as the initial cluster centers; 
2) Repeat  
3) (re)assign each object to the cluster to which the object is the most similar, 
4)  based on the mean value of the objects in the cluster; 
5)  Update the cluster means, that is, calculate the mean value of the objects for 
6)  Each cluster;  
7) Until no changes; 


146   ComTech Vol. 7 No. 2 June 2016: 141-150 

 
Figure 3 Traditional k-means Algorithm  
(Source: Oyelade, et al., 2010) 

 
METHODS 
 
 
The method applied in this study generally includes three main stages: (1) data collection, (2) 
data pre-processing, and (3) data mining. In data collection, data collected in this study was taken from 
BPS-Statistics Indonesia's (Badan Pusat Statistik) website (www.bps.go.id). Next, data pre-processing 
is the most important task in data mining. This stage is often said to take almost 80% of the total time 
or task in data mining. Techniques and methods to be applied in this stage must be precise and correct. 
Data pre-processing used in this study is based on the theory by Jiawei Han and Michelin which 
includes: first, data cleaning: filling in the missing values, repairing data errors, identify or remove 
outliers, and fixing inconsistent data. Second, data integration: merging related data from tables, 
databases, cube, or files. Third, data selection: Selecting data only related to the process of analysis. 
The benefit of this step is to reduce less important or less relevant data in data mining processes. 

 
Fourth, data transformation: Transforming data to support the process of analyzing the data 

that will be used. Fifth, data mining: This stage is the primary stage of the entire task in this study. As 
with the data collection as well as data pre-processing, this stage also applies the theory by Jiawei Han 
and Michelin which include: (1) Data Mining, this stage is the stage of the implementation of the 
modeling used in data mining. In this study, the model applied is k-means cluster analysis. (2) Pattern 
Evaluation, this is an evaluation of the pattern that has been processed. (3) Knowledge Presentation, 
this is a presentation of the results of the data mining process. 

 
RESULTS AND DISCUSSIONS 
 
 
The application of cluster analysis in this study applied four numbers of clusters. The analysis 

applies to two types of measurement of the number of poor people in all provinces all over Indonesia. 
The first one is a measurement based on the percentage of poor people or also known as the relative 

1. MSE = largenumber; 
2. Select initial cluster centroids {m j }j 

K = 1; 
3. Do 
4. OldMSE = MSE; 
5. MSE1 = 0; 
6. For j = 1 to k 
7. mj = 0; nj = 0; 
8. Endfor 
9. For i = 1 to n 
10. For j = 1 tok 
11. Compute squared Euclidean distance d2(xi, mj); 
12. Endfor 
13. Find the closest centroid mj to xi 
14. mj = mj + xi, nj= nj +1; 
15. MSE1 = MSE1 + d2(xi, mj); 
16. Endfor 
17. For j = 1 to k 
18. nj = max ( nj , 1) ; mj = mj / nj ; 
19. Endfor 
20. MSE = MSE1; while (MSE<OldMSE) 


Application of K-Means Algorithm … (Albert V. Dian Sano; Hendro Nindito) 147 

number of poor people. The second is a measurement of the number of poor people as it is or also 
called as the absolute number of poor people. 

 
The source of data for this study was taken from BPS-Statistics Indonesia's (Badan Pusat 

Statistik) website. Data applied in this study is the data that was last updated in September 2014. The 
data has 16 attributes. In this cluster analysis, the software used is RapidMiner Studio. Pre-processing 
will select and determine four relevant attributes to be analyzed, namely (1) Province attribute, (2) 
Number of Poor (Village + City) attribute, (3) Total Percentage of Poor (Village + City) attribute, and 
(4) P1 (Village + City) attribute. Province attribute will act as an identifier whereas P1 attribute 
describes the level of the depth of poverty. The study runs two cluster analysis processes applying k-
means algorithm. The first process is to process Number of Percentage of Poor People vs. Depth 
Poverty Level while the second process is to process Absolute Number of Poor People vs. Poverty 
Depth's Level. The first process will apply attributes (1), (3) and (4) while the second process will 
apply attributes (1), (2), and (4). 

 
Both cluster analysis in this study implements similarities and dissimilarities between data 

objects based on Euclidian distance measurement method. So suppose, i = (χi1, χi2, ..., χip) and j = 
(χj1, χj2, ..., χj1) are two objects described by a numerical attribute p, then to measure the Euclidian 
distance between these objects is (Han, 2012): 
 

, ⋯        (1)

 
The similarity measurement technique using the Euclidean method as above also meets the 

mathematical properties such as the following (Han, 2012): (1) Non-negative: d (i, j) ≥ 0: The distance 
is not possible to be negative. (2) The identity of an indistinguishable: d (i, i) = 0: The distance of an 
object to itself is 0. (3) Symmetrical: d (i, j) = d (j, i): The distance is a function of symmetry. (4) 
Triangle inequality: d (i, j) ≤ d (i, k) + d (k, j): The distance of the object i to j cannot be greater than 
the distance going through k. 

 
The plot of figure 4 below is the result of a cluster analysis based on the relative number of 

poor people and poverty depth's level in all provinces all over Indonesia. 
 
 
Figure 4 Plot of Cluster Analysis of the Percentage 
of Poverty vs. Poverty’s Depth for Each Province in Indonesia 


148   ComTech Vol. 7 No. 2 June 2016: 141-150 

Figure 4 describes that the graph between the percentages of poverty versus poverty depth's 
level tends to be linear. It means that the provinces having a large percentage of poverty also tend to 
have deep level of poverty. Thus, to set policies as to how to prioritize provinces coping up with 
poverty alleviation tends to be easier. The figure also describes that cluster_2 (yellow color) be a 
representation of two provinces having the greatest percentage of poor people and the deepest level of 
poverty. So both two provinces should receive top priority in poverty handling when viewed from the 
percentage and depth of poverty points of view. When viewed from the dataset, the two provinces are 
provinces of Papua and West Papua. Then the next provinces to highlight are provinces in cluster_0 
(blue color). When we look at the dataset, provinces in the list of cluster_0 are the province of Aceh, 
South Sumatra Province, Bengkulu, Lampung, Central Java, Yogyakarta, West Nusa Tenggara, East 
Nusa Tenggara, Central Sulawesi, Gorontalo, and Maluku. 

 
Figure 5 shows the result of a cluster analysis based on the absolute number of poor people 

(not the amount in percentage) and the depth level of poverty of all provinces in Indonesia. 
 
 
Figure 5 Plot of Cluster Analysis of the Percentage  
of Poverty vs. Poverty’s Depth of Each Province in Indonesia 

 
Figure 5 describes that there is a quite outstanding cluster and in a remotely separated area, 
which is cluster_1 (red color). This cluster should also receive top priority in poverty reduction if 
poverty alleviation program refers to the absolute number of poor people. From the dataset, we know 
the provinces in the list of cluster_1 cluster are East Java, Central Java and West Java. The next 
priority to be considered is cluster_3 (green color). From the dataset, it can be identified easily that 
provinces in the list are the provinces of North Sumatra, Lampung, South Sumatra and East Nusa 
Tenggara. However the figure above also shows that the relationship between the numbers of absolute 
poverty versus the depth of poverty of provinces in Indonesia is not linear. This means that cluster_1 
and some members of cluster_3 have a large number of absolute poverty but the depth of poverty is 
not high. If poverty reduction policies are more oriented towards poverty depth's level then some 
members of cluster_2 (yellow color) or cluster_0 (blue color) are the target of the policy. Members of 
the said clusters are the provinces of Papua (cluster_0) and West Papua (cluster_2). 

 
A more comprehensive approach in setting policies for poverty reduction based on priorities 

can combine both types of cluster analysis above. If such a combination is applied then some 
provinces that should be prioritized in poverty reduction are Provinces of Papua, West Papua, East 


Application of K-Means Algorithm … (Albert V. Dian Sano; Hendro Nindito) 149 

Java, Central Java, West Java, Aceh, South Sumatra, Bengkulu, Lampung, Yogyakarta, West Nusa 
Tenggara, East Nusa East, Central Sulawesi, Gorontalo, and Maluku. 

 
As with many other types of studies, this study certainly is not perfect. Some potential 

weakness in this study identified by the researchers is the need for comparing the accuracy of cluster 
analysis if the cluster analysis is run multiple times with a different number of clusters. Another 
potential weakness is the need for comparing accuracy of cluster analysis by using different models of 
cluster analysis such as k-medoids or other. 

 
CONCLUSIONS 
 
 
This cluster analysis study can provide information more quickly and efficiently on the 
distribution picture of the poor provinces all over Indonesia. Results of this cluster analysis of poverty 
in Indonesia provide visual information that is useful to see the map of the poorest provinces which 
should become the target of policies of stakeholders. Provinces that need attention and become the 
target of the policies ordered by priorities are provinces of Papua and West Papua, East Java, Central 
Java, West Java, Aceh, South Sumatra, Bengkulu, Lampung, Yogyakarta, West Nusa Tenggara, East 
Nusa Tenggara, Central Sulawesi, Gorontalo, and Maluku. Results of cluster analysis in this study 
could be used by stakeholders associated with poverty alleviation programs such as government 
agencies, non-governmental organizations and private institutions as part of the decision-making 
processes. 
 
 
REFERENCES 
 
 
BPS. (2015). Gini Ratio Menurut Provinsi Tahun 1996, 1999, 2002, 2005, 2007-2013. Retrieved 

August 3, 2015 from http://www.bps.go.id/linkTabelStatis/view/id/1493 
 
BPS. (2015). Konsep Penduduk Miskin. Retrieved August 3, 2015 from 

http://www.bps.go.id/Subjek/view/id/23#subjekViewTab1|accordion-daftar-subjek1 
 
Gothai, E., Balasubramanie, P. 2012. An efficient way for clustering using alternative decision tree. 

American Journal of Applied Science, 9, 531-534. 
 
Han, J., Kamber, M. (2012). Data Mining: Concepts and Techniques (4th ed.). San Francisco: Morgan 

Kaufmann Publishers. 
 
Hossain, J., Sani,  N. F. M., Mustapha A., & Affendey L. S., 2013. Using feature selection as accuracy 

benchmarking in clinical data mining. Journal of Computer Science, 9, 883-888. 
 
Kumar, S. P., Ramaswami, K. S. (2011). Fuzzy modeled k-cluster quality mining of hidden knowledge 

for decision support. Journal of Computer Science, 7, 1652-1658. 
 
Multifiah. (2011). Telaah Kritis Kebijakan Penanggulan Kemiskinan Dalam Tinjauan Konstitusi. 

Journal of Indonesian Applied Economics, 5(1), 1-27. Retrieved August 4, 2015 from 
http://jiae.ub.ac.id/index.php/jiae/article/view/109 

 
Oyelade, O. J., Oladipupo, O. O., & Obagbuwa, I. C. (2010). Application of k-Means Clustering 

algorithm for prediction of Students’ Academic Performance. International Journal of 
Computer Science and Information Security (IJCSIS), 7(1). Retrieved on August 3rd, 2015 
from http://arxiv.org/ftp/arxiv/papers/1002/1002.2425.pdf 


150   ComTech Vol. 7 No. 2 June 2016: 141-150 

Purwanto, E. A. (2007). Mengkaji Potensi Usaha Kecil dan Menengah (UKM) untuk Pembuatan 
Kebijakan Anti Kemiskinan di Indonesia. Jurnal Ilmu Sosial dan Ilmu Politik, 10(3), 295-
324. 

 
Silwattananusarn, T., & Tuamsuk, K. (2012). Data Mining and Its Applications for Knowledge 

Management: A Literature Review from 2007 to 2012. International Journal of Data Mining 
& Knowledge Management Process (IJDKP), 2(5). Retrieved on August 3, 2015 from 
http://arxiv.org/ftp/arxiv/papers/1210/1210.2872.pdf 

 
Tajunisha, S. (2010). Performance analysis of k-means with different initialization methods for high 

dimensional data. International Journal of Artificial Intelligence & Applications (IJAIA), 
1(4), 44-52. Retrieved on August 3, 2015 from 
https://www.academia.edu/12640770/Performance_analysis_of_k-
means_with_different_initialization_methods_for_high_dimensional_data 

 
Tayal, M. A., & Raghuwanshi, M. M. (2011). Review on Various Clustering Methods for the Image 

Data. Journal of Emerging Trends in Computing and Information Sciences, 2, 34-38, Special 
Issue. 

 
Wang, H., & Song, M. (Desember, 2011). Ckmeans.1d.dp: Optimal k-means clustering in One 

Dimension by Dynamic Programming. The R Journal, 3(2), 29-32. 
 
Xu, R., Wunsch, D. C. (2005). Survey of Clustering Algorithms. IEEE Transactions on Neural 

Networks, 16 (3), 645 - 678.