Engineering, Technology & Applied Science Research Vol. 8, No. 1, 2018, 2459-2463 2459  
  

www.etasr.com Hassan et al.: SLA Management For Virtual Machine Live Migration Using Machine Learning with … 
 

SLA Management For Virtual Machine Live 
Migration Using Machine Learning with Modified 

Kernel and Statistical Approach 

M. K. Hassan A. Babiker M. B. M. Amien   M. Hamad 
Future University 
Khartoum, Sudan 

memo1023@gmail.com 

Neelain University 
Khartoum, Sudan 

amin.31766@hotmail.com 

University of Gezira 
Wad Madani, Sudan 

magdy_baker@yahoo.co.uk 

Universiti Teknologi Malaysia 
Johor Bahru, Malaysia 
Taza1040@gmail.com 

 

 

Abstract—Application of cloud computing is rising substantially 
due to its capability to deliver scalable computational power. 
System attempts to allocate a maximum number of resources in a 
manner that ensures that all the service level agreements (SLAs) 
are maintained. Virtualization is considered as a core technology 
of cloud computing. Virtual machine (VM) instances allow cloud 
providers to utilize datacenter resources more efficiently. 
Moreover, by using dynamic VM consolidation using live 
migration, VMs can be placed according to their current 
resource requirements on the minimal number of physical nodes 
and consequently maintaining SLAs. Accordingly, non optimized 
and inefficient VMs consolidation may lead to performance 
degradation. Therefore, to ensure acceptable quality of service 
(QoS) and SLA, a machine learning technique with modified 
kernel for VMs live migrations based on adaptive prediction of 
utilization thresholds is presented. The efficiency of the proposed 
technique is validated with different workload patterns from 
Planet Lab servers. 

Keywords-virtual machine; migration; machine learning; SLA 

I. INTRODUCTION 
Resource optimization has been improved significantly by 

virtualization. It introduced isolation between application and 
the physical resource [1], it allows live virtual machines (VMs) 
to seamlessly move between physical hosts. This allows service 
providers to host high availability applications and to better 
commit to their level of service governed by a service level 
agreement (SLA). Most applications in telecommunication 
industry permit small fraction of downtime or no downtime at 
all [2]. SLA violation may occur due to server’s resources 
being over utilized. Therefore, high availability and fault-
tolerant systems are crucial in order to maintain such a 
demanding policy which is costly in terms of capital and 
operational expenses. Tightly controlled live migration can 
provide solution to this problem by moving VMs with little or 
no interruption but that is often against the agreed SLA. 
However, these interruptions can cause performance 
degradation which varies between applications [3-4]. Thus, 
predicting live migration at the earliest time possible will 
significantly contribute to reducing any performance 
degradation due to SLA violation or from the duration of any 
interruption. Our objective is to provide a machine learning and 

statistical based predictive model to predict VM migration and 
consequently maintaining the SLA. It is a heuristic based 
predictive model where future SLA violation is to be predicted, 
then migration decision will be made by a machine learning 
algorithm classifier. CPU utilization, inter VM bandwidth 
utilization and memory utilization will be used as potential 
classifiers.  

II. RELATED WORK 
Resource optimization in cloud based data center has been 

extensively investigated in recent years. Authors in [5] 
suggested that live migration is to be handled by global policies 
applied to redistribute the VMs, suggestion based on resources 
classification into local and global policies. Authors in [6] have 
adopted priority based approach to allocate the resources in the 
virtualized clusters. In [9] dynamic consolidation problem was 
addressed by using a heuristic based approach for the bin 
packing problem. In [8] a threshold-based reactive approach to 
dynamic workload consolidation has been proposed. However 
it was applicable for certain types of applications. Popular 
approaches such as VMware distributed power management [7] 
have the drawback that they operate on fixed threshold values 
which is not suitable for dynamic and unpredictable workloads 
[9]. In our proposed model, we introduce an approach to set the 
threshold values dynamically, depending on VMs historical 
predicted data of the resource usage by each VM and machine 
learning as a decision making approach. Static and dynamic 
resource assignment policies in virtualized data centers is 
discussed in [10, 11]. Authors in [12-15] classified VM 
consolidation as centralized and decentralized. It was suggested 
VM migration trigger point to be based on predefined 
threshold, where on other hand other approaches [13, 15] 
trigger migration after workload analyzed based on learned-
intelligent QoS-based threshold and predictive heuristic 
methods [16]. Nevertheless, a few set of approaches [14, 15] 
studied workload-independent QoS-based threshold approaches 
for the purpose of SLA violation avoidance and efficient 
migration management. In [16] VM placement problem with 
traffic-aware balancing (VMPPTB) has been discussed and a 
longest processing time based placement algorithm (LPTBP 
algorithm) is designed to solve it. In addition to that, locality-
aware VM placement problem with traffic-aware balancing 



Engineering, Technology & Applied Science Research Vol. 8, No. 1, 2018, 2459-2463 2460  
 

www.etasr.com Hassan et al.: SLA Management For Virtual Machine Live Migration Using Machine Learning with… 
 

(LVMPPTB) is proposed. Authors in [17] proposed a VM 
placement algorithm named ATEA (adaptive three-threshold 
energy-aware algorithm) to reduce the energy consumption and 
SLA violation. It is based on historical data collected from 
resource usage by VMs to migrate VMs on heavily loaded to 
lightly loaded hosts. All previous works did not consider inter 
VM bandwidth and memory utilization effects on VM 
consolidation problem and on SLA definition especially in 
applications that do not tolerate any downtime or performance 
degradation like the telecommunication applications [18]. 

III. SLA VIOLATIONS DETECTION 
The VMs experience dynamic variable workloads, in a way 

that hardware resources consumed by a VM arbitrarily change 
over time. During this variation SLA can be breached or the 
host can be over utilized, i.e. if all the VMs request their 
maximum allowed physical resources. In such case, the 
algorithm must have perfect knowledge of the time when the 
SLA violation will occur before it actually occurs. Live 
migrations can have negative impact on the performance of 
applications in a VM during a migration. The length of a live 
migration depends on the total amount of memory used by the 
VM and the available network bandwidth. The migration time 
and performance degradation experienced by a VMj is 
expressed in (1) [15]. 

/j j jTm M B     (1) 

0

0.1 ( ) ( )
m

o

t T

dj j

t

U U t d t


     (2) 

where Udj is the performance degradation by VMj during 
migration, t0is the time when the migration starts, Tmj is the 
time taken to complete the migration,Uj is the CPU utilization 
by  VMj , Mj  is the amount of memory used by VMj and Bj  is 
the available network bandwidth.[15] 

j dj djS x U      (3) 
In (3) xdj is the performance degradation when the allocated 

resource utilization for VMj is not aligned with the agreed SLA 
and Sj is the total performance degradation by VMj. Thus, from 
(3) in order to minimize the total performance degradation, 
either Udj or xdj  should be minimized. In this work we 
concentrate on minimizing xdj as Udjis not only influenced by 
the CPU utilization Uj, but is also depended on the amount of 
memory used and the available network bandwidth as well. 
Moreover, it is more likely that VMj will experience 
performance degradation while the host resources utilization is 
above the agreed SLA more than during the actual live 
migration. In order to avoid SLA violation and performance 
degradation, host should perform regular check on the system 
utilization where an SLA violation detection algorithm should 
be executed. One of the earliest methods relied on setting static 
CPU utilization threshold to differentiate between overload and 
non-overload states of the host. It is simple but inefficient for 
dynamic workloads, particularly when different types of 
applications share a physical node. In such case the system 

should be able to automatically adjust its behavior based on the 
workload patterns adopted by the applications [15]. 

A. Local Regression 
Our approach, as depicted in Figure 1, relies firstly on work 

load prediction based on statistical analysis of historical data 
collected during the VMs’ lifetime. Local regression (LR) has 
proved its efficiency as predictor method [17]. It is a model 
used to build up a curve from localized subsets of data that 
approximate the original input with the original data. LR 
algorithm, derived from local regression algorithm. For each 
new observation a new trend line is found [15] ( ) = +      (4) 

This trend line is used to predict the next observation (  ) . The new observation can be a host resource 
utilization such as CPU and memory  (  )  ≥  1,             −   ≤     (5) 
where tm is the maximum time required for a VM migration  

 

 
Fig. 1.  Proposed Method 

B. Classification Trees 
When a class is already known on prior in the training 

samples, classification trees are effective. Let tp be a parent 
node and tl, tr left and right child nodes of the parent node 
respectively. Assume the learning sample with variable matrix 
X with M number of variables xj and N observations. Let class 
vector Y consist of N observations with total amount of K 
classes. Classification tree is based on splitting rule that 
performs the splitting of learning sample into smaller parts. We 
already know that each time data have to be divided into two 
parts with maximum homogeneity of left and right child nodes 
will be equivalent to the maximization of change of impurity 
function Δi(t): [19] 

  ( ) [ ( )]p cit t E ii t     (6) 
where tc represents the left and right child nodes of the parent 
node. Assuming that the Pl, Pr probabilities of right and left 
nodes, we get:[19] 

  ( ) ( ) ( )p l l r ri t Pi t P i ti t        (7) 
Therefore, at each node classification trees solves the 

following maximization problem:[20] 

arg max  , 

j=1,..., M [i(t ( ) ( )]

R
j j

p l l r r

x x

Pi t P i t



 
   (8) 



Engineering, Technology & Applied Science Research Vol. 8, No. 1, 2018, 2459-2463 2461  
 

www.etasr.com Hassan et al.: SLA Management For Virtual Machine Live Migration Using Machine Learning with… 
 

From (8) all possible values of all variables in matrix X for 
the best split question will be searched through xj<xk

K  which 
will maximize the change of impurity measure Δi(t) [19]. 

C. Support Vector Machine 
Support vector machines (SVMs) support nonlinear 

classification and can find the hyper plane of maximal margin. 
Given a training set of N data points 1{ }

N
k k ky x    where  xk є

n  is the kth output pattern, the support vector method 
approach aims at constructing a classifier of the form:  

1
( ) [ ( , ) ]

N

k k kk
y x sign a y x x b


     (9) 

Input (10) shows an example of hard-margin SVM with 
noise free training data to be correctly classified by a linear 
function. Data points D (or training set) are represented 
mathematically [20, 21] 

1 1 2 2{( , ), ( , ),.........., ( , )}m mD x y x y x y  (10) 

where xi is a n-dimensional real vector, yi is either 1 or -1 
indicating the class to which the point xi belongs to. The SVM 
classification function F(x) takes the form [20] 

F(x) = w·x−b    (11) 

where b is the bias and  is the weight vector , which will be 
calculated during the training process. First, to correctly 
classify the training set F(x) (or w and b) we must return 
negative numbers for negative data points and positive numbers 
otherwise, for every point xi in D in (12) [20] 

wx-b>0 if yi = 1 , wx-b<0 if yi = -1   (12) 

D. K-Nearest Neighbors 
The k-nearest neighbors (KNN) is one of the simplest 

methods for pattern classification. When combined with prior 
knowledge it can be used to produce significant results [22]. In 
the KNN each unlabeled example is classified by the majority 
label among its k - nearest neighbors in the training set. 
Therefore its classification performance depends on the 
distance metric used to identify nearest neighbors. In the case 
of missing prior knowledge, Euclidean distances between 
examples represented as vector inputs are used to measure the 

similarities in most KNN classifiers. Let 1{ , }
n

i i ix y 


  denote a 

training set of n labeled examples with inputs  ix


 є  and 
discrete (in our case binary) class labels yi. We use the binary 
matrix yij є {0,1} to indicate whether or not the labels match. 
Our goal is to learn a linear transformation which we will use 
to compute squared distances as [21]: 

2( , ) || ( ) ||i j i jD x x L x x 
  

   (13) 

IV. RESULTS AND ANALYSIS 
In this work, workload was collected from CoMon project 

[16]. Extracted data was part of more than a thousand VMs's 
resource uilization distributed across the world. Samples were 
selected from 6 servers for a period of one week with 5 minutes 
measurement interval. CPUs and other resources were adjusted 

manually with different resource utilizations for the purpose of 
this experiment. The collected data did not contain memory and 
inter VM bandwidth utilization. In this work average TPR, 
Friedman rank summations and average ranking were used as 
performance metrics. TPR is defined as VM migration being 
correctly classified due to high utilizations. Since the provided 
training set is not very large, cross validation has been used to 
train, test and validate the classification techniques, the 
provided data is divided into 5 folds and each fold is held out in 
turn for training and testing. 

At the initial stage, data is predicted using local regression 
provided that prediction window is less than or equal to the 
migration time as in the bond xk+1 - xk≤ tm and this is crucial to 
maintain the SLA then different classification techniques are 
investigated. Sample data were collected for the six servers 
named 146CS4, cs-planetlab3, cs-planetlab4, Fobos, jupiter_cs 
and node1. Figure 2 shows sample resource utilization for 
146CS4 for one week. For the purpose of the experiment, SLA 
has been identified as 90%, 80% and 70% for the 146CS4, cs-
planetlab3 and cs-planetlab4 servers and 80%,70% and 60% for 
Fobos, jupiter_cs and node1 severs for the CPU, memory and 
Inter VM bandwidth utilization respectively. 

Using the workload data described above, all algorithms 
mentioned in Table I have been applied and analyzed. TPR 
results are shown in Tables II and III respectively and final 
ranking is shown in Table IV. Friedman test was conducted to 
assess the statistical significance for the obtained results. 
Friedman test was chosen for multi classifier performance 
assessment due to it is non parametric nature and wide use in 
multi domain analysis [22]. The null-hypothesis tested is 
defined as that all classifiers perform the same and the obtained 
differences are not significantly random. Algorithms will be 
ranked for each data set separately. Under the null-hypothesis, 
all the algorithms are equivalent and so their ranks Rj should be 
calculated as [22]: 

2 2

1

12
[ ] ( ) 3 ( 1)

( 1)

k

F j
j

X R k k
nk k 

  


   (14) 

TABLE I.  CLASSIFICATION ALGORITHMS 

Algorithm Table Column Head 

Complex Tree 
Fine distinction with Maximum number of 

Leave Splits 100 
Medium Tree Maximum number of Leave Splits 20 

Simple Tree 
Coars distinction with Maximum number of 

Leave Splits 4 

KNN Coars 
Coars distinction between classes wih 

neightbours set to 100 
KNN Cosine Uses Cosine distance metrics 
KNN Cubic Uses Cubic distance metrics 

KNN Fine 
Uses fine detailed distance metrics with 

neighbour set to 1 
KNN Medium Neighbour set to 10 

SVM Coars Gausian Coars distinction with Gausian kernel 
SVM Cubic Uses Cubic kernel 

SVM Fine Gausian Fine detailed distinction Gausian kernel 
SVM Liner Uses Linear Kernel 

SVM Medium Fewer Distinctions are used 
SVM Quad Uses quadretic Kernel 

 



Engineering, Technology & Applied Science Research Vol. 8, No. 1, 2018, 2459-2463 2462  
 

www.etasr.com Hassan et al.: SLA Management For Virtual Machine Live Migration Using Machine Learning with… 
 

(a) 

(b) 

(c) 

Fig. 2.  146CS4 server resource utilization: (a) CPU (b) Memory (c) 
InterVM 

TABLE II.  TPR RESULTS 

Algorithm 

TPR 
146CS4 
(90-80-
70)_105 

cs-planetlab3 
(90-80-
70)_54 

cs-planetlab4 
(90-80-70)_117 

Complex Tree 89.7 96.4 95.9 
Medium Tree 89.7 96.4 97.5 
Simple Tree 89.7 96.4 96.7 
KNN Coars 88.9 0 57.4 
KNN Cosine 89.7 0 96 
KNN Cubic 89.7 94.6 95.1 
KNN Fine 82.9 96.4 98.4 

KNN Medium 89.7 94.6 95.1 
SVM Coars 

Gausian 
97.5 100 98.4 

SVM Cubic 
 

98.4 98.2 95.1 
SVM Fine 
Gausian 

97.5 94.6 98.4 

SVM Liner 89.7 98.2 96.7 
SVM Medium 98.4 98.4 98.4 

SVM Quad 89.7 96.4 98.4 
 

The Friedman statistic is distributed according to XF
2 with 

k−1 degrees of freedom. Level of significance at p < 0.05 is 
chosen. If the null-hypothesis is rejected, then we can proceed 
with a post-hoc test. The Nemenyi test is used to compare 
classifiers to each other. If the corresponding average ranks 

differ by at least the critical difference, the performance of two 
classifiers will be considered significantly different. In this 
work average TPR, Friedman Rank summations and average 
ranking are used as performance metrics. 

TABLE III.  TPR RESULTS 

Algorithm 

TPR 

fobos(80-
70-60)_25 

jupiter_cs 
(80-70-
60)_17 

node1 (80-70-
60)_57 

Complex Tree 96 88.2 100 
Medium Tree 96 88.2 100 
Simple Tree 96 88.2 100 
KNN Coars 0 0 0 
KNN Cosine 52 10 59 
KNN Cubic 92 20 86 
KNN Fine 92 47.1 93 

KNN Medium 92 10 86 
SVM Coars 

Gausian 
100 10 80.7 

SVM Cubic 
 

96.5 64 93 

SVM Fine 
Gausian 

44 5 75.4 

SVM Liner 96 11.8 68.4 
SVM Medium 88 41.2 41.2 

SVM Quad 88 52.9 96.5 

 

TABLE IV.  FINAL RANKING 

Algorithm 

Final Ranking 

Average 
TPR 

Friedman 
Rank 

Summations 

Average 
Ranking 

Medium Tree 94.63333 13 2.166667 
Simple Tree 94 14 2. 333333 

Complex Tree 94.3 16 2.666667 
SVM Cubic 90 16 2.666667 
SVM Coars 79.4 17 2.833333 
SVM Quad 86.89 17 2.833333 
KNN Fine 84.96 20 3.333333 

SVM Medium 77.26 23 3.833333 
SVM Liner 76.8 24 4 
SVM Fine 68.3 27 4.5 

KNN Cubic 76.2 28 4.666667 
KNN Medium 76.2 28 4.666667 
KNN Cosine 49.6 33 5.5 
KNN Coars 24.2 41 6.833333 

 
From Table IV we see that the tree based algorithms show 

better performance compared to other algorithms, whereas 
KNN shows the worst performance. Medium tree has average 
TPR of 94.63% with Friedman Rank summations of 13 and 
overall Friedman ranking of 2.166667. Now to measure 
statistical significance using Friedman test XF

2 was calculated 
as 154.66 and found to be larger than the critical value of 
p=23.68 which indicates the statistical significance of the 
obtained TPR values. Accordingly null hypothesis that tested 
classifiers have the same performance is rejected. Thus, 
Nemenyi test is used to pinpoint where the significance lies. qa 
is found to be 3.353 and based on two pair testing ,we failed to 
reject the null hypothesis between medium, simple and 
complex tree in addition to SVM Cubic, SVM Coars, SVM 
quad and KNN Fine. However, medium tree shows better 
performance in terms of average TPR, Friedman Rank 



Engineering, Technology & Applied Science Research Vol. 8, No. 1, 2018, 2459-2463 2463  
 

www.etasr.com Hassan et al.: SLA Management For Virtual Machine Live Migration Using Machine Learning with… 
 

summations and average Friedman ranking. In addition to that, 
medium tree algorithm results were consistent among all tested 
data. 

V. CONCLUSION 
In this work, machine learning based approach with 

modified kernel along with Friedman rank summation and 
average ranking have been used for dynamic live migration 
based on adaptive prediction of utilization thresholds. From the 
analysis of the proposed approach different classification 
techniques have been used to predict VM migration. It is 
shown that the regression trees have more accuracy compared 
to SVM and KNN. This approach can be used to manage SLA 
in virtualized cloud based data centers for critical applications 
like telecommunication ones, especially applications with strict 
SLA. Further analysis can be made on other dynamic 
consolidation problems such as VM placement following the 
approach presented in this paper. 

REFERENCES 
[1] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. 

Neugebauer, I. Pratt, A. Warfield, “Xen and the art of virtualization”, 
19th ACM Symposium on Operating Systems Principles, pp 164-177, 
2003 

[2] S. Akoush, R. Sohan, A. Rice, A. W. Moore, A. Hopper, “Predicting the 
Performance of Virtual Machine Migration”, IEEE International 
Symposium on Modeling, Analysis & Simulation of Computer and 
Telecommunication Systems, 2010 

[3] C. Clark, K. Fraser, S. Hand, J. G. Hansen, E. Jul, C. Limpach, I. Pratt, 
A. Warfield, “Live migration of virtual machines”, 2nd Symposium on 
Networked Systems Design and Implementation, pp 273-286, 2005 

[4] A. B. Nagarajan , F. Mueller, C. Engelmann, L. Scott, “Proactive fault 
tolerance for HPC with Xen virtualization”, 21st Annual International 
Conference on Supercomputing, pp 23–32, 2007 

[5] R. Nathuji, K. Schwan, “Virtual power: Coordinated power management 
in virtualized enterprise systems”, ACM SIGOPS Operating Systems 
Review, Vol. 41, No. 6, pp. 265-278, 2007 

[6] Y. Song, H. Wang, Y. Li, B. Feng, Y. Sun, “Multi-tiered on-demand 
resource scheduling for VM-based data center”, 9th IEEE/ACM 
International Symposium on Cluster Computing and the Grid, pp. 148-
155, 2009 

[7] VMware Inc, VMware distributed power management concepts and use, 
2010 

[8] A. Beloglazov, R. Buyya.,Adaptive Threshold-Based Approach for 
Energy-Efficient Consolidation of Virtual Machines in Cloud Data 
Centers, Dept. of Computer Science and Software Engineering, 
University of Melbourne, 2010 

[9] T. C. Ferreto, M. A. S. Netto, R. N. Calheiros, C. A. F. De Rose, “Server 
consolidation with migration control for virtualized data centers”, Future 
Generation Compute Systems, Vol. 27, No. 8, pp 1027–1034, 2011 

[10] T. Wood, G.Tarasuk-Levin, P. Shenoy, P. Desnoyers, E. Cecchet, M. D. 
Corner, “Memory buddies: exploiting page sharing for smart co-location 
in virtualized data centers”, ACM SIGPLAN/SIGOPS International 
Conference on Virtual Execution Environments, pp. 31-40,2009 

[11] T. Hirofuchi, H. Nakada, S. Itoh, S. Sekiguchi, “Reactive consolidation 
of virtual machines enabled by post copy live migration”, 5th 
international workshop on Virtualization technologies in distributed 
computing, pp 11-18, 2011  

[12] D. Kakadia, N. Kopri, V. Varma, “Network-aware virtual machine 
consolidation for large data centers”, 3rd International Workshop on 
Network-Aware Data Management, 2013  

[13] H. Mi, H. Wang, G. Yin, Y. Zhou, D. Shi, L. Yuan, “Online self-
reconfiguration with performance guarantee for energy-efficient large-
scale cloud computing data centers”, IEEE International Conference on 
Service Computing, pp. 514-521, 2010 

[14] M. Sindelar, R. K. Sitaraman, P. Shenoy, “Sharing-aware algorithms for 
virtual machine co location”, AMC 23rd symposium on Parallelism in 
algorithms architectures, pp. 367-378, 2011 

[15] A. Beloglazov, “Energy-efficient management of virtual machines in 
data centers for cloud computing”, PhD Thesis, Department of Computer 
Science, Melbourne University, 2013 

[16]  T. Chen, X. Gao, G. Chen, “Optimized Virtual Machine Placement with 
Traffic-Aware Balancing in Data Center Networks”, Scientific 
Programming, Vol. 2016, Article ID 3101658, 2016 

[17] Z. Zhou, Z. Hu, K. Li, “Virtual Machine Placement Algorithm for Both 
Energy-Awareness and SLA Violation Reduction in Cloud Data 
Centers”, Scientific Programming, Vol. 2016, Article ID 5612039, 2016 

[18]  M. Khalaf Alla H. M., A. Babiker, M. B. M. Amien, M. Hamad, 
“Review in cloud based next generation telecommunication network”, 
Jurnal Teknology, Vol. 78, No. 6, pp. 51–57, 2016 

[19] R. Timofeev, Classification and Regression Trees (CART). Theory and 
Applications, MSc Thesis, Humboldt University, Berlin, 2004 

[20] H. Yu, S. Kim, “SVM Tutorial: Classification, Regression, and 
Ranking”, In: Handbook of Natural Computing, pp. 479-506, Springer, 
2012 

[21] K. Q. Weinberger, L. K. Saul, “Distance Metric Learning for Large 
Margin Nearest Neighbor Classification”, Journal of Machine Learning 
Research, Vol. 10, pp. 207-244, 2009 

[22] N. Japkowicz, M. Shah, Evaluating Learning Algorithms: A 
Classification Perspective, Cambridge Press, 2011