Microsoft Word - 08_SI_Hendro_THE APPLICATION OF K-MEANS Revised - a2t-- 3.docx The Application of K-Means … (A. Raharto Condrobimo: et. al) 151  THE APPLICATION OF K-MEANS ALGORITHM FOR LQ45 INDEX ON INDONESIA STOCK EXCHANGE A. Raharto Condrobimo1; Albert V. Dian Sano2; Hendro Nindito3 1,2,3 Information Systems Department, School of Information Systems, Bina Nusantara University Jln. K.H. Syahdan No 9, Palmerah, Jakarta Barat, 11480 1condrobimo@binus.ac.id; 2albert_vds@yahoo.com; 3Hendro.nindito@binus.ac.id ABSTRACT The objective of this study is to apply cluster analysis or also known as clustering on stocks data listed in LQ45 index at Indonesia Stock Exchange. The problem is that traders need a tool to speed up decision-making process in buying, selling and holding their stocks.The method used in this cluster analysis is k-means algorithm. The data used in this study were taken from Indonesia Stock Exchange. Cluster analysis in this study took data’s characteristics such as stocks volume and value. Results of cluster analysis were presented in the form of grouping of clusters’ members visually. Therefore, this cluster analysis in this study could be used to identify more quickly and efficiently about the members of each cluster of LQ45 index. The results of such identification can be used by beginner-level investors who have started interest in stock investment to help make decision on stocks trading. Keywords: blue chip stock, data mining, k-means, clustering INTRODUCTION Stock market development has been the subject of intensive theoretical and empirical studies. More recently, the emphasis has increasingly shifted to stock market indexes and the effect of stock markets on economic development (Athanasios & Antonius, 2012). A share or a stock is the ownership relationship between the company and the shareholder or stockholder. Based on the classification, there are two types of stocks, that is, preferred stocks and common stocks. Preferred stocks are stocks that have special rights in a company (for example: the distribution of company profits received beforehand than the other stock owners) while the common stocks are stocks that do not have more rights in addition to the general right of obtaining profit sharing in accordance with the schedule of distribution of profits which will be convened in the Annual General Meeting of Stockholders (AGM). Common stocks (hereinafter referred to stocks) have advantages over special interests that can be transferred freely to other parties so that they can be traded in a market called stock. Today Indonesia has only one stock market or stock exchange, that is, the Indonesian Stock Exchange (IDX). IDX provides mechanisms in selling and buying stocks for public owned companies listed in the IDX. Perseroan Terbatas (PT) is a legal entity to run a business which consists of capital stocks, which is a part owner of the shares they own. Public PT is a company with a limited liability company as well as the status of a public company (Go Public). Share is a major product in the capital market instruments transacted. There are several derivatives arising from transactions that occur due to the stock exchanges. There are two ways to invest in stocks, first is buying and storing these shares so that the gain distribution of profits 152   ComTech Vol. 7 No. 2 June 2016: 151-159  (dividends) and second is buying and selling back shares so as to benefit from the difference between the buying and selling value (capital gain). Buying stocks in general can be done through two ways, bought during a stocks will rise and begin at its Initial Public Offering (IPO) and purchased through the secondary market that we are familiar with the stock market. A few years ago the notion of shares was an investment only for the upper class. However, since the era of online trading increased where transactions could use online networking internet, stock transaction has increasingly shifted into an investment option for many people. It’s because the minimum initial deposit today is more affordable. With the more easily to get started investing in shares in the capital market, it is not only necessary to prepare the funds, but also requires a knowledge so that we can analyze the market situation at the time. Transaction in the stock market is actually the same as if we want to trade as usual. It required a skill in analyzing the current trends in order to trade in goods that we still exist and must be purchased by buyer's profit situation. To be able to analyze the market need, a sufficient education is needed so that ultimately have an analysis of its own. Currently it is not a bit of market participants who do not have sufficient knowledge, not even know yet, have already participated in the transaction market. In a normal market situation and market environment which tends to have strengthened due to the state of the economy and strong corporate fundamentals, all market participants are capable being in a safe zone. However, in an downward moving markets just as what happened to 2008 as we faced together, market moved in any unpredictable direction, and it could drive investors just to follow the crowd, or follow gossip, and could get caught in a loss because the market moved to unwanted direction. Often a recommendation given by someone would work with the other way around, because it is related to the interests and the responses of the people. By being able to analyze independently, we are expected to be investor who are not easily affected by misleading information at that time. To narrow the withdrawal of shares for approximately 500 stocks listed on our exchanges, we concentrate on stocks that are listed in LQ45 Index. LQ45 is a row of 45 stocks which are stocks with the most transaction in Indonesia Stock Exchange. That is why it is called LQ45 (Liquid 45). What about the blue-chip stocks? There is no formal form for Blue Chips definitions on this day, even today this term become more common, therefore we do not provide a list based on LQ45, IDX. (n.d.).Why only those shares? We position ourselves all this time in a state of learning on the state of the market, and the stocks included in the index LQ45 are chosen as liquid stocks within the meaning of actively traded to keep us stuck in the second tier stocks that are sometimes played are very profitable and then hibernate in a long period of time making them hard to sell. In order to avoid a lot of things like that, we try to adapt to the index which is relatively safer for the transaction. LQ45 Index is a market capitalization of the most 45 liquid stocks and has a large of capitalization. It is an indicator of liquidation. LQ45, using the 45 stocks are selected based on liquidity of stock trading and adjusted every six months (every early February and August). Thus the stocks contained in the index will always change. Some of criteria in determining if an issuer can be included in LQ45 index are consisted of two criteria. The first criteria are: (1) being in the TOP 95% of the total average - the annual average value of share transactions in the regular market. (2) Being in the TOP 90% of the average - the annual average market capitalization. The second criteria: (1) it is the highest order which represents the sector in the Indonesia Stock Exchange (IDX) industrial classifications according to its market capitalization. (2) It is the highest order based on the frequency of transactions. The Application of K-Means … (A. Raharto Condrobimo: et. al) 153  LQ45 index consists of 45 stocks that have been chosen through a variety of selection criteria, which will consist of stocks with liquidity and high market capitalization. Shares in LQ45 index must meet the selection criteria and pass the following key: (1) being in the top 60 of the total share transactions in the regular market (the average value of transactions during the last 12 months). (2) Ranking based on market capitalization (average market capitalization during the last 12 months). (3) It has been listed on the JSE at least 3 months. (4) The financial position of the company and its growth prospects, the frequency and number of trading days of regular market transactions. Shares included in LQ45 continue to be monitored and will be held every six months review (early February and August). If there are shares that have not entered the criteria, it will be replaced with other shares that qualify. Selection process of shares LQ45 have to be reasonable, therefore Indonesia Stock Exchange has advisory committee consisting of experts in BAPEPAM, Universities, and professionals in the capital market. The factors that play a role in the movement of LQ45are: (1) Indonesia Interest Rate as the benchmark of portfolio investment in Indonesia's financial markets. (2) The level of investor tolerance for risk. (3) Index mover stocks which in fact are large market capitalization stocks on IDX. Factors that influence the rise of LQ45: (1) the strengthening of global and regional markets following a drop in world crude oil prices, and (2) the strengthening of the Indonesia currency exchange rate that can lift LQ45 to the positive zone. The purpose of LQ45 is complementary for Composite Stock Price Index and in particular provides an objective and reliable tool for financial analysis, fund managers, investors and other capital market observers to monitor the price movements of stocks that are actively traded. "We are living in the information age" is a saying popular; however, we are living in an era of data. The data in terabytes or petabytes poured into our computer network, worldwide web (www), and various data storage devices each day ranging from world business, community, science and engineering, medicine, and almost every other aspect of daily life. The explosive growth of the volume of existing data is the result of the process of computerization of our society and the rapid development of various devices the collection and storage of data which is terrific (Han and Kamber, 2012). The explosive growth of data and widely available really make us aware that we are in the era of data. Various reliable and versatile tools are needed to automatically reveal valuable information from the large-volume data and transform it into the organized knowledge. This need has led to the birth of data mining. The field is still young, dynamic and promising. Data mining has been and will continue to make great strides in our journey from the era of data into the information age to come (Han & Kamber, 2012). Data mining is the process of finding previously unknown patterns and trends in databases and using that information to build predictive models. Data mining provides a set of tools and techniques that can be applied to this processed data to discover hidden patterns and also provides healthcare professionals an additional source of knowledge for making decisions (Hossain,et al., 2013) Data mining is a fun way to extract various kinds of patterns, which presents knowledge implicitly stored in large datasets and focuses on matters related to its feasibility, usefulness, effectiveness and scalability. Data mining can also be seen as a very important step in the process to find knowledge. Data is normally done through a pre-process data cleansing, data integration, selection and transformation of data and prepared for mining. Data mining can also be done on different types of databases and data storage, but the type of pattern is found determined by different 154   ComTech Vol. 7 No. 2 June 2016: 151-159  types of functionality mining data such as descriptions, association, correlation analysis, classification, prediction, analysis of clusters, and so on (Tajunisha, 2010). The concept of data mining, involves three steps i.e., capturing and storing the data, converting the raw data into information and converting the information into knowledge. Data in this context comprise all the raw material that an institution collects via normal operation. Capturing and storing the data is the first phase that is the process of applying mathematical and statistical formulas to “mine” the data warehouse (Kumar & Ramaswami, 2011). Figure 1 Data mining and knowledge discovery process of Database (Sources: Fayyad, et. al in Silwattananusarn, 2012) Based on Figure 1 above, the knowledge discovery process consists of several sequential and iterative methods such as the following (Fayyad, et. al, Han & Kamber, in Silwattananusarn, 2012): (1) selection: choosing relevant data to the task of a database analyst. (2) Preprocessing: deletingthe invalid data and inconsistent data; combining multiple sources of data. (3) Transformation: transforming the data into a suitable form to perform data mining. (4) Data Mining: choosing the data mining algorithm that matches the nature pattern of the data; extracting various data patterns. (5) Interpretation / evaluation: interpreting various patterns into knowledge by eliminating irrelevant various patterns and the same pattern and repetitive; translating a variety of patterns useful in terms that could be understood by ordinary people. Clustering is an important method in data warehousing and data mining. It groups similar object together in a cluster (or clusters) and dissimilar object in other cluster (or clusters) or remove from the clustering process. However, there are some special requirements for search results clustering algorithms, two of which most important are, clustering performance and meaningful cluster description (Gothai & Balasubramanie, 2012). Cluster analysis can also be called as clustering is the process of dividing a set of data objects (or object of observation) into several subsets. Each of these subsets is a cluster, such that the objects in a cluster are the objects that are similar to each other, but very different from the objects that are in another cluster. A set of clusters resulting from the cluster analysis such as clustering can be referred to clustering (Han & Kamber, 2012). Cluster analysis offers a useful way to organize and present a complex dataset (Wang & Song, 2011). Analysis of the cluster can be regarded as the most popular techniques and foremost to solve problems that are unsupervised learning or undirected or unsupervised learning process. So each technique is used to solve problems. Certainly, a way of dealing with the structure of the data that has not been labeled will be found (Tayal & Raghuwanshi, 2011). The App O data poin that the each oth measure I the other K used to s to classi Raghuw K in the clu k (centr center of grouped object an K in the clu grouped of the n grouping in the p below (H plication of K- One importa nts. If a comp simple Eucli her. Tayal an d by (1) Euc In addition t r measureme Table 2 Size M Minkow Euclide City-blo Mahala K-means is solve variou fy data that wanshi, 2011) K-means alg uster. Steps ral cluster) ra f the cluster into clusters nd the center K-means alg uster. For ea into a cluste newly updat g, which mea previous ite Han & Kamb K-Means … (A ant compone ponent of the idean distan nd Raghuwa clidian and (2 to the similar ents are show of Similarity Measures wski distance ean distance ock distance anobis distanc one of the s problems o has been giv ). gorithm will in k -means andomly fro at the begin s that are the r of cluster. gorithm then ach cluster, th er in the prev ted as new c ans that the c eration. K-m ber, 2012). A. Raharto Co ent of the clu e vector sam ce metric is anshi (2011) 2) City Block rity and dissi wn in Table 2 and Dissimila e , wh simplest und of the groupi ven into a nu define the m algorithm ca om various nning or the f e most simil n iterates to i his algorithm vious iteratio cluster center clusters form eans cluster Condrobimo: ustering algo mple data is in sufficient to stated that k or Manhatt imilarity of t 2 below (Rui arity for Quan here S is the w directed/no s ing. The pro umber of pre midpoint of th an be explain objects in D first time. Fo lar or close improve or i m will calcul n. All object r. The iterat med in the late ring procedu et. al) orithm is a m n the same p o classify the the distance tan. the two types & Donald, 2 ntitative Varia Forms within group c supervised (u ocedure is by edefined clus he cluster fro ned as follow D (dataset), w or any other based on the increase the late a new a ts will then b tions will co est iteration ure is gene measure of t hysical unit, e data instan e between th s of measure 2005). ables (Rui & D covarience ma unsupervised y applying a sters (such as om the averag ws. First, the which respe object, each e Euclidean separate dis average using be regrouped ontinue until is the same a erally summ the distance then it is mo nts that are s he two group ement above Donald, 2005) atrix d) learning al simple and e s clusters k) ge value of t algorithm w ctively repre object is ass distance betw stances or sim g the objects d by using the l it reaches as the cluster marized in F 155  between ore likely similar to ps can be , some of lgorithms easy way (Tayal & the points will select esent the signed or ween the milarities s that are e average a stable rs formed Figure 2 156   ComTech Vol. 7 No. 2 June 2016: 151-159  Figure 2 Summary of Procedure Algorithm K-means (Han & Kamber, 2012) As with any other algorithms, k-means also has some advantages and disadvantages. Here are the advantages and disadvantages of k-means algorithm according to Tayal and Raghuwanshi (2011). Firstly, the advantages: (1) k-means is a simple algorithm that has been adapted to many domains ma wrong. (2) k-means is more automated than making the threshold manually from an image or images. (3) This algorithm is a good candidate to be used as a continuation of the work relates to vectors that have the feature or vague (fuzzy) characteristics. Next, the disadvantages are: (1) though it can be demonstrated that the procedure will always end, k-means clustering algorithm does not always find the most optimal configuration, which is related to the global objective function. (2) This algorithm is also very sensitive to randomly selected cluster centers at the beginning. K-means algorithm can be run several times to reduce the impact on this problem. Figure 3 Traditional k-means Algorithm (Oyelade et al., 2010) 1. MSE = largenumber; 2. Select initial cluster centroids {m j }j K = 1; 3. Do 4. OldMSE = MSE; 5. MSE1 = 0; 6. For j = 1 to k 7. mj = 0; nj = 0; 8. Endfor 9. For i = 1 to n 10. For j = 1 tok 11. Compute squared Euclidean distance d2(xi, mj); 12. Endfor 13. Find the closest centroid mj to xi 14. mj = mj + xi, nj = nj +1; 15. MSE1 = MSE1 + d2(xi, mj); 16. Endfor 17. For j = 1 to k 18. nj = max ( nj , 1) ; mj = mj / nj ; 19. Endfor 20. MSE = MSE1; while (MSE