Engineering, Technology & Applied Science Research Vol. 8, No. 1, 2018, 2459-2463 2459 www.etasr.com Hassan et al.: SLA Management For Virtual Machine Live Migration Using Machine Learning with … SLA Management For Virtual Machine Live Migration Using Machine Learning with Modified Kernel and Statistical Approach M. K. Hassan A. Babiker M. B. M. Amien M. Hamad Future University Khartoum, Sudan memo1023@gmail.com Neelain University Khartoum, Sudan amin.31766@hotmail.com University of Gezira Wad Madani, Sudan magdy_baker@yahoo.co.uk Universiti Teknologi Malaysia Johor Bahru, Malaysia Taza1040@gmail.com Abstract—Application of cloud computing is rising substantially due to its capability to deliver scalable computational power. System attempts to allocate a maximum number of resources in a manner that ensures that all the service level agreements (SLAs) are maintained. Virtualization is considered as a core technology of cloud computing. Virtual machine (VM) instances allow cloud providers to utilize datacenter resources more efficiently. Moreover, by using dynamic VM consolidation using live migration, VMs can be placed according to their current resource requirements on the minimal number of physical nodes and consequently maintaining SLAs. Accordingly, non optimized and inefficient VMs consolidation may lead to performance degradation. Therefore, to ensure acceptable quality of service (QoS) and SLA, a machine learning technique with modified kernel for VMs live migrations based on adaptive prediction of utilization thresholds is presented. The efficiency of the proposed technique is validated with different workload patterns from Planet Lab servers. Keywords-virtual machine; migration; machine learning; SLA I. INTRODUCTION Resource optimization has been improved significantly by virtualization. It introduced isolation between application and the physical resource [1], it allows live virtual machines (VMs) to seamlessly move between physical hosts. This allows service providers to host high availability applications and to better commit to their level of service governed by a service level agreement (SLA). Most applications in telecommunication industry permit small fraction of downtime or no downtime at all [2]. SLA violation may occur due to server’s resources being over utilized. Therefore, high availability and fault- tolerant systems are crucial in order to maintain such a demanding policy which is costly in terms of capital and operational expenses. Tightly controlled live migration can provide solution to this problem by moving VMs with little or no interruption but that is often against the agreed SLA. However, these interruptions can cause performance degradation which varies between applications [3-4]. Thus, predicting live migration at the earliest time possible will significantly contribute to reducing any performance degradation due to SLA violation or from the duration of any interruption. Our objective is to provide a machine learning and statistical based predictive model to predict VM migration and consequently maintaining the SLA. It is a heuristic based predictive model where future SLA violation is to be predicted, then migration decision will be made by a machine learning algorithm classifier. CPU utilization, inter VM bandwidth utilization and memory utilization will be used as potential classifiers. II. RELATED WORK Resource optimization in cloud based data center has been extensively investigated in recent years. Authors in [5] suggested that live migration is to be handled by global policies applied to redistribute the VMs, suggestion based on resources classification into local and global policies. Authors in [6] have adopted priority based approach to allocate the resources in the virtualized clusters. In [9] dynamic consolidation problem was addressed by using a heuristic based approach for the bin packing problem. In [8] a threshold-based reactive approach to dynamic workload consolidation has been proposed. However it was applicable for certain types of applications. Popular approaches such as VMware distributed power management [7] have the drawback that they operate on fixed threshold values which is not suitable for dynamic and unpredictable workloads [9]. In our proposed model, we introduce an approach to set the threshold values dynamically, depending on VMs historical predicted data of the resource usage by each VM and machine learning as a decision making approach. Static and dynamic resource assignment policies in virtualized data centers is discussed in [10, 11]. Authors in [12-15] classified VM consolidation as centralized and decentralized. It was suggested VM migration trigger point to be based on predefined threshold, where on other hand other approaches [13, 15] trigger migration after workload analyzed based on learned- intelligent QoS-based threshold and predictive heuristic methods [16]. Nevertheless, a few set of approaches [14, 15] studied workload-independent QoS-based threshold approaches for the purpose of SLA violation avoidance and efficient migration management. In [16] VM placement problem with traffic-aware balancing (VMPPTB) has been discussed and a longest processing time based placement algorithm (LPTBP algorithm) is designed to solve it. In addition to that, locality- aware VM placement problem with traffic-aware balancing Engineering, Technology & Applied Science Research Vol. 8, No. 1, 2018, 2459-2463 2460 www.etasr.com Hassan et al.: SLA Management For Virtual Machine Live Migration Using Machine Learning with… (LVMPPTB) is proposed. Authors in [17] proposed a VM placement algorithm named ATEA (adaptive three-threshold energy-aware algorithm) to reduce the energy consumption and SLA violation. It is based on historical data collected from resource usage by VMs to migrate VMs on heavily loaded to lightly loaded hosts. All previous works did not consider inter VM bandwidth and memory utilization effects on VM consolidation problem and on SLA definition especially in applications that do not tolerate any downtime or performance degradation like the telecommunication applications [18]. III. SLA VIOLATIONS DETECTION The VMs experience dynamic variable workloads, in a way that hardware resources consumed by a VM arbitrarily change over time. During this variation SLA can be breached or the host can be over utilized, i.e. if all the VMs request their maximum allowed physical resources. In such case, the algorithm must have perfect knowledge of the time when the SLA violation will occur before it actually occurs. Live migrations can have negative impact on the performance of applications in a VM during a migration. The length of a live migration depends on the total amount of memory used by the VM and the available network bandwidth. The migration time and performance degradation experienced by a VMj is expressed in (1) [15]. /j j jTm M B (1) 0 0.1 ( ) ( ) m o t T dj j t U U t d t (2) where Udj is the performance degradation by VMj during migration, t0is the time when the migration starts, Tmj is the time taken to complete the migration,Uj is the CPU utilization by VMj , Mj is the amount of memory used by VMj and Bj is the available network bandwidth.[15] j dj djS x U (3) In (3) xdj is the performance degradation when the allocated resource utilization for VMj is not aligned with the agreed SLA and Sj is the total performance degradation by VMj. Thus, from (3) in order to minimize the total performance degradation, either Udj or xdj should be minimized. In this work we concentrate on minimizing xdj as Udjis not only influenced by the CPU utilization Uj, but is also depended on the amount of memory used and the available network bandwidth as well. Moreover, it is more likely that VMj will experience performance degradation while the host resources utilization is above the agreed SLA more than during the actual live migration. In order to avoid SLA violation and performance degradation, host should perform regular check on the system utilization where an SLA violation detection algorithm should be executed. One of the earliest methods relied on setting static CPU utilization threshold to differentiate between overload and non-overload states of the host. It is simple but inefficient for dynamic workloads, particularly when different types of applications share a physical node. In such case the system should be able to automatically adjust its behavior based on the workload patterns adopted by the applications [15]. A. Local Regression Our approach, as depicted in Figure 1, relies firstly on work load prediction based on statistical analysis of historical data collected during the VMs’ lifetime. Local regression (LR) has proved its efficiency as predictor method [17]. It is a model used to build up a curve from localized subsets of data that approximate the original input with the original data. LR algorithm, derived from local regression algorithm. For each new observation a new trend line is found [15] ( ) = + (4) This trend line is used to predict the next observation ( ) . The new observation can be a host resource utilization such as CPU and memory ( ) ≥ 1, − ≤ (5) where tm is the maximum time required for a VM migration Fig. 1. Proposed Method B. Classification Trees When a class is already known on prior in the training samples, classification trees are effective. Let tp be a parent node and tl, tr left and right child nodes of the parent node respectively. Assume the learning sample with variable matrix X with M number of variables xj and N observations. Let class vector Y consist of N observations with total amount of K classes. Classification tree is based on splitting rule that performs the splitting of learning sample into smaller parts. We already know that each time data have to be divided into two parts with maximum homogeneity of left and right child nodes will be equivalent to the maximization of change of impurity function Δi(t): [19] ( ) [ ( )]p cit t E ii t (6) where tc represents the left and right child nodes of the parent node. Assuming that the Pl, Pr probabilities of right and left nodes, we get:[19] ( ) ( ) ( )p l l r ri t Pi t P i ti t (7) Therefore, at each node classification trees solves the following maximization problem:[20] arg max , j=1,..., M [i(t ( ) ( )] R j j p l l r r x x Pi t P i t (8) Engineering, Technology & Applied Science Research Vol. 8, No. 1, 2018, 2459-2463 2461 www.etasr.com Hassan et al.: SLA Management For Virtual Machine Live Migration Using Machine Learning with… From (8) all possible values of all variables in matrix X for the best split question will be searched through xj<xk K which will maximize the change of impurity measure Δi(t) [19]. C. Support Vector Machine Support vector machines (SVMs) support nonlinear classification and can find the hyper plane of maximal margin. Given a training set of N data points 1{ } N k k ky x where xk є n is the kth output pattern, the support vector method approach aims at constructing a classifier of the form: 1 ( ) [ ( , ) ] N k k kk y x sign a y x x b (9) Input (10) shows an example of hard-margin SVM with noise free training data to be correctly classified by a linear function. Data points D (or training set) are represented mathematically [20, 21] 1 1 2 2{( , ), ( , ),.........., ( , )}m mD x y x y x y (10) where xi is a n-dimensional real vector, yi is either 1 or -1 indicating the class to which the point xi belongs to. The SVM classification function F(x) takes the form [20] F(x) = w·x−b (11) where b is the bias and is the weight vector , which will be calculated during the training process. First, to correctly classify the training set F(x) (or w and b) we must return negative numbers for negative data points and positive numbers otherwise, for every point xi in D in (12) [20] wx-b>0 if yi = 1 , wx-b<0 if yi = -1 (12) D. K-Nearest Neighbors The k-nearest neighbors (KNN) is one of the simplest methods for pattern classification. When combined with prior knowledge it can be used to produce significant results [22]. In the KNN each unlabeled example is classified by the majority label among its k - nearest neighbors in the training set. Therefore its classification performance depends on the distance metric used to identify nearest neighbors. In the case of missing prior knowledge, Euclidean distances between examples represented as vector inputs are used to measure the similarities in most KNN classifiers. Let 1{ , } n i i ix y denote a training set of n labeled examples with inputs ix є and discrete (in our case binary) class labels yi. We use the binary matrix yij є {0,1} to indicate whether or not the labels match. Our goal is to learn a linear transformation which we will use to compute squared distances as [21]: 2( , ) || ( ) ||i j i jD x x L x x (13) IV. RESULTS AND ANALYSIS In this work, workload was collected from CoMon project [16]. Extracted data was part of more than a thousand VMs's resource uilization distributed across the world. Samples were selected from 6 servers for a period of one week with 5 minutes measurement interval. CPUs and other resources were adjusted manually with different resource utilizations for the purpose of this experiment. The collected data did not contain memory and inter VM bandwidth utilization. In this work average TPR, Friedman rank summations and average ranking were used as performance metrics. TPR is defined as VM migration being correctly classified due to high utilizations. Since the provided training set is not very large, cross validation has been used to train, test and validate the classification techniques, the provided data is divided into 5 folds and each fold is held out in turn for training and testing. At the initial stage, data is predicted using local regression provided that prediction window is less than or equal to the migration time as in the bond xk+1 - xk≤ tm and this is crucial to maintain the SLA then different classification techniques are investigated. Sample data were collected for the six servers named 146CS4, cs-planetlab3, cs-planetlab4, Fobos, jupiter_cs and node1. Figure 2 shows sample resource utilization for 146CS4 for one week. For the purpose of the experiment, SLA has been identified as 90%, 80% and 70% for the 146CS4, cs- planetlab3 and cs-planetlab4 servers and 80%,70% and 60% for Fobos, jupiter_cs and node1 severs for the CPU, memory and Inter VM bandwidth utilization respectively. Using the workload data described above, all algorithms mentioned in Table I have been applied and analyzed. TPR results are shown in Tables II and III respectively and final ranking is shown in Table IV. Friedman test was conducted to assess the statistical significance for the obtained results. Friedman test was chosen for multi classifier performance assessment due to it is non parametric nature and wide use in multi domain analysis [22]. The null-hypothesis tested is defined as that all classifiers perform the same and the obtained differences are not significantly random. Algorithms will be ranked for each data set separately. Under the null-hypothesis, all the algorithms are equivalent and so their ranks Rj should be calculated as [22]: 2 2 1 12 [ ] ( ) 3 ( 1) ( 1) k F j j X R k k nk k (14) TABLE I. CLASSIFICATION ALGORITHMS Algorithm Table Column Head Complex Tree Fine distinction with Maximum number of Leave Splits 100 Medium Tree Maximum number of Leave Splits 20 Simple Tree Coars distinction with Maximum number of Leave Splits 4 KNN Coars Coars distinction between classes wih neightbours set to 100 KNN Cosine Uses Cosine distance metrics KNN Cubic Uses Cubic distance metrics KNN Fine Uses fine detailed distance metrics with neighbour set to 1 KNN Medium Neighbour set to 10 SVM Coars Gausian Coars distinction with Gausian kernel SVM Cubic Uses Cubic kernel SVM Fine Gausian Fine detailed distinction Gausian kernel SVM Liner Uses Linear Kernel SVM Medium Fewer Distinctions are used SVM Quad Uses quadretic Kernel Engineering, Technology & Applied Science Research Vol. 8, No. 1, 2018, 2459-2463 2462 www.etasr.com Hassan et al.: SLA Management For Virtual Machine Live Migration Using Machine Learning with… (a) (b) (c) Fig. 2. 146CS4 server resource utilization: (a) CPU (b) Memory (c) InterVM TABLE II. TPR RESULTS Algorithm TPR 146CS4 (90-80- 70)_105 cs-planetlab3 (90-80- 70)_54 cs-planetlab4 (90-80-70)_117 Complex Tree 89.7 96.4 95.9 Medium Tree 89.7 96.4 97.5 Simple Tree 89.7 96.4 96.7 KNN Coars 88.9 0 57.4 KNN Cosine 89.7 0 96 KNN Cubic 89.7 94.6 95.1 KNN Fine 82.9 96.4 98.4 KNN Medium 89.7 94.6 95.1 SVM Coars Gausian 97.5 100 98.4 SVM Cubic 98.4 98.2 95.1 SVM Fine Gausian 97.5 94.6 98.4 SVM Liner 89.7 98.2 96.7 SVM Medium 98.4 98.4 98.4 SVM Quad 89.7 96.4 98.4 The Friedman statistic is distributed according to XF 2 with k−1 degrees of freedom. Level of significance at p < 0.05 is chosen. If the null-hypothesis is rejected, then we can proceed with a post-hoc test. The Nemenyi test is used to compare classifiers to each other. If the corresponding average ranks differ by at least the critical difference, the performance of two classifiers will be considered significantly different. In this work average TPR, Friedman Rank summations and average ranking are used as performance metrics. TABLE III. TPR RESULTS Algorithm TPR fobos(80- 70-60)_25 jupiter_cs (80-70- 60)_17 node1 (80-70- 60)_57 Complex Tree 96 88.2 100 Medium Tree 96 88.2 100 Simple Tree 96 88.2 100 KNN Coars 0 0 0 KNN Cosine 52 10 59 KNN Cubic 92 20 86 KNN Fine 92 47.1 93 KNN Medium 92 10 86 SVM Coars Gausian 100 10 80.7 SVM Cubic 96.5 64 93 SVM Fine Gausian 44 5 75.4 SVM Liner 96 11.8 68.4 SVM Medium 88 41.2 41.2 SVM Quad 88 52.9 96.5 TABLE IV. FINAL RANKING Algorithm Final Ranking Average TPR Friedman Rank Summations Average Ranking Medium Tree 94.63333 13 2.166667 Simple Tree 94 14 2. 333333 Complex Tree 94.3 16 2.666667 SVM Cubic 90 16 2.666667 SVM Coars 79.4 17 2.833333 SVM Quad 86.89 17 2.833333 KNN Fine 84.96 20 3.333333 SVM Medium 77.26 23 3.833333 SVM Liner 76.8 24 4 SVM Fine 68.3 27 4.5 KNN Cubic 76.2 28 4.666667 KNN Medium 76.2 28 4.666667 KNN Cosine 49.6 33 5.5 KNN Coars 24.2 41 6.833333 From Table IV we see that the tree based algorithms show better performance compared to other algorithms, whereas KNN shows the worst performance. Medium tree has average TPR of 94.63% with Friedman Rank summations of 13 and overall Friedman ranking of 2.166667. Now to measure statistical significance using Friedman test XF 2 was calculated as 154.66 and found to be larger than the critical value of p=23.68 which indicates the statistical significance of the obtained TPR values. Accordingly null hypothesis that tested classifiers have the same performance is rejected. Thus, Nemenyi test is used to pinpoint where the significance lies. qa is found to be 3.353 and based on two pair testing ,we failed to reject the null hypothesis between medium, simple and complex tree in addition to SVM Cubic, SVM Coars, SVM quad and KNN Fine. However, medium tree shows better performance in terms of average TPR, Friedman Rank Engineering, Technology & Applied Science Research Vol. 8, No. 1, 2018, 2459-2463 2463 www.etasr.com Hassan et al.: SLA Management For Virtual Machine Live Migration Using Machine Learning with… summations and average Friedman ranking. In addition to that, medium tree algorithm results were consistent among all tested data. V. CONCLUSION In this work, machine learning based approach with modified kernel along with Friedman rank summation and average ranking have been used for dynamic live migration based on adaptive prediction of utilization thresholds. From the analysis of the proposed approach different classification techniques have been used to predict VM migration. It is shown that the regression trees have more accuracy compared to SVM and KNN. This approach can be used to manage SLA in virtualized cloud based data centers for critical applications like telecommunication ones, especially applications with strict SLA. Further analysis can be made on other dynamic consolidation problems such as VM placement following the approach presented in this paper. REFERENCES [1] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, A. Warfield, “Xen and the art of virtualization”, 19th ACM Symposium on Operating Systems Principles, pp 164-177, 2003 [2] S. Akoush, R. Sohan, A. Rice, A. W. Moore, A. Hopper, “Predicting the Performance of Virtual Machine Migration”, IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems, 2010 [3] C. Clark, K. Fraser, S. Hand, J. G. Hansen, E. Jul, C. Limpach, I. Pratt, A. Warfield, “Live migration of virtual machines”, 2nd Symposium on Networked Systems Design and Implementation, pp 273-286, 2005 [4] A. B. Nagarajan , F. Mueller, C. Engelmann, L. Scott, “Proactive fault tolerance for HPC with Xen virtualization”, 21st Annual International Conference on Supercomputing, pp 23–32, 2007 [5] R. Nathuji, K. Schwan, “Virtual power: Coordinated power management in virtualized enterprise systems”, ACM SIGOPS Operating Systems Review, Vol. 41, No. 6, pp. 265-278, 2007 [6] Y. Song, H. Wang, Y. Li, B. Feng, Y. Sun, “Multi-tiered on-demand resource scheduling for VM-based data center”, 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, pp. 148- 155, 2009 [7] VMware Inc, VMware distributed power management concepts and use, 2010 [8] A. Beloglazov, R. Buyya.,Adaptive Threshold-Based Approach for Energy-Efficient Consolidation of Virtual Machines in Cloud Data Centers, Dept. of Computer Science and Software Engineering, University of Melbourne, 2010 [9] T. C. Ferreto, M. A. S. Netto, R. N. Calheiros, C. A. F. De Rose, “Server consolidation with migration control for virtualized data centers”, Future Generation Compute Systems, Vol. 27, No. 8, pp 1027–1034, 2011 [10] T. Wood, G.Tarasuk-Levin, P. Shenoy, P. Desnoyers, E. Cecchet, M. D. Corner, “Memory buddies: exploiting page sharing for smart co-location in virtualized data centers”, ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, pp. 31-40,2009 [11] T. Hirofuchi, H. Nakada, S. Itoh, S. Sekiguchi, “Reactive consolidation of virtual machines enabled by post copy live migration”, 5th international workshop on Virtualization technologies in distributed computing, pp 11-18, 2011 [12] D. Kakadia, N. Kopri, V. Varma, “Network-aware virtual machine consolidation for large data centers”, 3rd International Workshop on Network-Aware Data Management, 2013 [13] H. Mi, H. Wang, G. Yin, Y. Zhou, D. Shi, L. Yuan, “Online self- reconfiguration with performance guarantee for energy-efficient large- scale cloud computing data centers”, IEEE International Conference on Service Computing, pp. 514-521, 2010 [14] M. Sindelar, R. K. Sitaraman, P. Shenoy, “Sharing-aware algorithms for virtual machine co location”, AMC 23rd symposium on Parallelism in algorithms architectures, pp. 367-378, 2011 [15] A. Beloglazov, “Energy-efficient management of virtual machines in data centers for cloud computing”, PhD Thesis, Department of Computer Science, Melbourne University, 2013 [16] T. Chen, X. Gao, G. Chen, “Optimized Virtual Machine Placement with Traffic-Aware Balancing in Data Center Networks”, Scientific Programming, Vol. 2016, Article ID 3101658, 2016 [17] Z. Zhou, Z. Hu, K. Li, “Virtual Machine Placement Algorithm for Both Energy-Awareness and SLA Violation Reduction in Cloud Data Centers”, Scientific Programming, Vol. 2016, Article ID 5612039, 2016 [18] M. Khalaf Alla H. M., A. Babiker, M. B. M. Amien, M. Hamad, “Review in cloud based next generation telecommunication network”, Jurnal Teknology, Vol. 78, No. 6, pp. 51–57, 2016 [19] R. Timofeev, Classification and Regression Trees (CART). Theory and Applications, MSc Thesis, Humboldt University, Berlin, 2004 [20] H. Yu, S. Kim, “SVM Tutorial: Classification, Regression, and Ranking”, In: Handbook of Natural Computing, pp. 479-506, Springer, 2012 [21] K. Q. Weinberger, L. K. Saul, “Distance Metric Learning for Large Margin Nearest Neighbor Classification”, Journal of Machine Learning Research, Vol. 10, pp. 207-244, 2009 [22] N. Japkowicz, M. Shah, Evaluating Learning Algorithms: A Classification Perspective, Cambridge Press, 2011