Int. J. of Computers, Communications & Control, ISSN 1841-9836, E-ISSN 1841-9844 Vol. VI (2011), No. 4 (December), pp. 635-646 The Research of QoS Approach in Web Servers Y. Hu, D. Mu, A. Gao, G. Dai Yansu Hu, Dejun Mu, Ang Gao, Guanzhong Dai School of Automation Northwest Polytechnical University Xi’an 710072, China E-mail: huyansu@gmail.com, mudejun@nwpu.edu.cn, snailgao@gmail.com, daiguanzhong@nwpu.edu.cn Abstract: Proportional Delay Guarantee has been widely used in the Web QoS service, and the most basic methods are the feedback of control theory and the predictive control of queuing theory. While the former belonging to passive control has a long setting time and imperfect real-time, the latter can not simulate the Web server queuing system well because of the model limitations. After the experimental verification and shortages analysis of the two methods, an improved approach is proposed in this paper. Based on the queuing feature of Web server and the HTTP 1.1 persistent connection, the improved approach predicts the delay by calculating the queue length and service rate and achieves the relative delay guarantee of different classes by adjusting their quota of worker threads. The experimental results demonstrate that the approach could maintain the relative delay guarantees well even in poor network environment and performs a much better superior compared with the traditional methods. Keywords: Web Server, Proportional Delay Guarantee, Feedback Control, Predictive Control 1 Introduction The increasing diversity of Web applications the last decade has witnessed an increasing demand for provisioning of different levels of quality of service (QoS) to meet changing system configuration and satisfy different client requirements. Proportional delay differentiation (PDD) service aims to ensure the QoS parameters between data flow of different classes to meet the specified proportions, so the requests with higher priority will receive the quality of service which is "at least not lower than" the low priority. There are many previous studies worked on the Web QoS and two different methods are used to achieve the differentiated service: the feedback of control theory (see [1], [2], [3], [4] and [5]) and predictive control of queuing theory (see [2], [6] and [7]). Although the algorithms are different, the QoS architecture of Web server they used is non-distinctive. The QoS of Web application is usually related to the network layer transmission and the operating system kernel. The latter that needs to replace the operating system on all the termi- nals is difficult to deploy, so more and more researchers tend to the differentiation service on the application layer. It ensures the PDD service by changing the original Web FIFO mechanism and the modified Apache MPM (Multi-Processing Modules) architecture is illustrated in Figure 1. 1. The single connection queue is improved to a multi-queue structure in accordance with the classified strategy. The listener monitors the network port, accepts the client TCP connections, classifies the requests based on some classified strategy, and then puts them into the appropriate waiting queue. Copyright c⃝ 2006-2011 by CCC Publications 636 Y. Hu, D. Mu, A. Gao, G. Dai 2. Any class can not consume the server resource unlimitedly, so the requests of different classes must be isolated. The thread per connection structure of MPM allows all the worker threads in pool to be the resource to be allocated. So we divide the pool into several sub-pools which are isolated with each other. The number of threads in each sub- pool is known as the thread quota, and the requests are serviced in the corresponding sub-pool. 3. For N kinds of classes, let ci(i = 1, ..., N) be the thread quota allocated to the class i. When load varies, the size of sub-pool is dynamically adjusted to ensure the proportion between classes constant. And is calculated by feedback control used the delay observer to get the real delay (Figure 1 ¬method), or by predictive control used the queue length observer to get the request arrival rate(Figure 1 ­method). ` Worker Pool 1 Worker 1 Worker m Worker Pool n Waiting Queue n Waiting Queue 1 Requests Classified Queue Length Observer Quota Schedule Delay Observer Feedback Control HTTP Response Data Flow Control Flow Predictive Control Performance Isolation ( )Y k Figure 1: The Web server architecture for PDD service But there are some imperfections and limitations in both the methods. For example, the feedback control is essentially a passive approach which works only after the deviation appeared. The delay in response is significant since the server response time is particularly slow. Although the predictive control based on queuing theory adjusts the controller output in advance, it can not get the precise plant model because of limitations itself. So an improved predictive approach is proposed to overcome their shortages and the contributions in this paper can be summarized as follows: 1) A general architecture for proportional delay service in Web server is summed up. 2) The two methods referred above are verified and compared by experiments. 3) Implement and evaluate an improved approach to get a better performance. 2 The Comparison of Two PDD Methods 2.1 The Feedback of Control Theory Assuming C worker threads are concurrent on the Web server. For N kinds of classes, let δi be the constant weighting factor of class i where the higher priority has a smaller parameter value, and di be the expectation of class i’s average measured delay, then the proportional delay guarantees can be described as follows: di/dj = δi/δj, 1 ≤ i ≤ N, 1 ≤ j ≤ N, (1) The Research of QoS Approach in Web Servers 637 ∑N i=1 ci = C (2) Quota Schedule Feedback Control d Y ! Delay Observer i Xie Plant i Y i c jc Figure 2: The feedback control model Where ci is the thread quota of class i. So the improved MPM is equivalent to a control model as Figure 2. The system desired output is the inherent priority parameter ratio between class i and i + 1 , which is Ydesire = [y1desire, y2desire, ..., yN−1disire] T. (3) yidesire = δi/δi+1, i = 1, 2, ..., N − 1. (4) Similarly, the controller output is the thread quota ratio and Y(k) is the measured output ratio between the class i and i + 1, which is X(k) = [x1(k), x2(k), ..., xN−1(k)] T. (5) xi(k) = ci(k)/ci+1(k), i = 1, 2, ..., N − 1. (6) Y(k) = [y1(k), y2(k), ..., yN−1(k)] T. (7) yi(k) = di(k)/di+1(k), i = 1, 2, ..., N − 1. (8) Obviously, the deviation is E(k) = Y(k) − Ydesire. On the Web server, all classes share a single host resource. So the feedback controller��class-per-loop��adjusts X(k) to satisfy Equation (1) by the measured E(k) . The Web server is modeled as a linear system, i.e. a r-order difference equation is Y(k) = ∑r l=1 [alY(k − l) + blX(k − l)]. (9) Where al and bl are the unknown parameters. The transfer function in z-domain is G(z) = Y(z) X(z) = ∑r l=1 blz r−l zr − ∑r l=1 alz r−1 . (10) The plant’s mathematical model (order and parameters) is obtained by system identification which is presented by the authors in [8] in detail. Then we can design the controller based on the classical control theory. Take the PI control for example, the controller transfer function in z-domain is D(z) = Kp + KIT(z + 1) 2(z − 1) , (11) And the Closed-loop system transfer function is GC(z) = D(z)G(z) 1 + D(z)G(z) . (12) 638 Y. Hu, D. Mu, A. Gao, G. Dai 2.2 The Predictive Control of Queuing Theory Although the feedback maintains the system at the balance, it takes an imperfect real-time for the lagging output. So [2, 6, 7] try to correct the deviation before the system output being affected with the help of queuing theory. In the predictive control, the queue length observer (see Figure 1­) measures the request arrival rate λi and service rate µi. Then the predictive controller reallocates the thread quota ci for respective class. M/M/1/∞ Queuing System The Web server performance mainly subjects to the system bottlenecks(see [7]), so each class in Apache can be considered as a M/M/1/∞ queuing model. Define ρ = λ/µ be the system traffic intensity. And the residence time is the sum of queuing time (connecting time) and service time (transfer time), which is l = ω + χ (13) Where ω and χ are independent from each other. According to paper [2], the mean residence time is calculated by Equation (14). l = ω + E[χ] = ρ µ(1 − ρ) + 1 µ = 1 µ − λ , (ρ < 1) (14) To meet the proportion relationship specified by Equation (1), we can get µj(k) − λj(k) µi(k) − λi(k) = δi δj (15) While normalized the thread quota si = ci/Ci, 1 ≤ i ≤ N, ∑N i=1 si = 1, (16) Hence, for class i, the service rate can be rewritten as µi(k) = µsi(k) (17) Where µ is the total service capacity. Then Equation (14) becomes li = 1µsi−λi . Combined with Equation (15),   µs2(k) − λ2(k) µs1(k) − λ1(k) = δ2 δ1 µs3(k) − λ3(k) µs2(k) − λ2(k) = δ3 δ2 ... µsN(k) − λN(k) µsN−1(k) − λN−1(k) = δN δN−1 ∑N i=1 si = 1 (18) Solving equations at each sampling time, we can get the thread quota for different classes (s1, s2, ..., sN). The Research of QoS Approach in Web Servers 639 M/G/1/∞ Queuing System In M/M/1/∞ model, the service time obeys the memoryless exponential distribution. But when it is general distribution, the correlation between the current and several previous service time must be considered (see [9]). The paper [6, 10] described the "heavy tail" features of Web service time series {χn, n ≥ 1}, and gave a more compatible queuing model M/G/1/∞ for the Web server. According to the Pollaczek-Kinchin(PK) formula and lemma, if si is the normalized thread quota for class i, service time χi obeys the bounded pareto distribution, the mean queuing time is m1,i = E[χi] = mi/si, m2,i = E[χi] = m2/si 2 (19) Hence the mean residence time is li = ω + E[χi] = m2,iλi 2(1 − λiE[χ]) + m1 si = m2,iλi 2(1 − λiE[χ]) + m1 si (20) = m2λi 2si(si − m1λi) + m1 si Where m1 ,m2 are the constants related to shape parameter of bounded pareto distribu- tion, λi is the requests arrival rate which is obtained by the queue length observer. Combined Equation(20) with Equation (1), si(si − m1λi) si+1(si+1 − m1λi+1) 2m1si+1 + λi+1(m2 − 2m21) 2m1si + λi(m2 − 2m21) = δi+1 δi (21) Solve the equations similar to Equation (18) and get the results (s1, s2, ..., sN). 2.3 Experimental Results The test-bed consists of a Web sever and two client machines, each with a 3.0GHz Pentium 4 professor and 521MB RAM connected with 100 Mbps Ethernet. The Web server is Apache 2(Httpd-ver2.0.53) running on Windows NT and the total number of server processes is config- ured to 100. The two client machines run Liunx-2.6.27 with SURGE (see [11]) (ver 1.00a) as the workload generator and each operates 120 concurrent UE. Requests are classified according to their source IP address. All the experiments are under HTTP1.1 pipeline and the number of maximum concurrent clients in SURGE is 1. Set δ1/δ1 = 1/2,which means the delay of high class is half of the low one. The sampling period is set to 10 sec, and all the experiments last 3000 sec. 300 valid data is measured and the controller works after 750 sec. The results under PI controller and predictive controller M/G/1/∞ are respectively shown in Figure 3. Obviously, before the controllers works there is no differentiated service in different classes. But after 750 sec, the thread quota for high class increases until the expectation is satisfied. And all the methods achieve the proportional delay guarantees well. In order to compare the features of the two methods, the mean value of the measured delay ratio l1/l2 and the variance relative to δ1/δ2 = 0.5 are defined as follows: Ee = ∑300 k=75 l1(k)/l2(k) 300 − 75 (22) De = ∑300 k=75 [l1(k)/l2(k) − δ1(k)/δ2(k)]2 (23) Compute Equation (22) and (23), we get Ee = 0.5198,De = 4.56 under the PI controller,while Ee = 0.4136 ,De = 5.5376 under the M/G/1/∞ predictive controller.Now we can see 640 Y. Hu, D. Mu, A. Gao, G. Dai 0 0.2 0.4 0.6 0.8 1 1.2 1.4 0 500 1000 1500 2000 2500 3000 20 30 40 50 60 70 80 d e la y r a ti o th re a d q u o ta delay ratio high class thread low class thread (a) PI control 0 0.2 0.4 0.6 0.8 1 1.2 1.4 0 500 1000 1500 2000 2500 3000 20 30 40 50 60 70 80 d e la y r a ti o th re a d q u o ta delay ratio high class thread low class thread (b) M/G/1/∞ queuing predictive control Figure 3: The change of delay ratio and thread quota under different controllers 1. As a passive control, the PI controller works until the deviation appeared. Although the output is stabilized in the vicinity of the expectation value, there is a long setting time ts ≈ 300s. 2. In contrast, the latter with a much better setting time ts ≈ 40s only maintains the delay proportion at 0.4. This is because the queuing theory based on PK formula asks for the statistics balance which means the requests arrival rate is equal to service rate. So the model limitation leads to the ineffaceable deviation. 3 An Improved Predictive Control From Section 2, the queuing theory is invalid when the Web loads bursts and fluctuates dramatically. Meanwhile, for the Web queuing process, the number of requests in the front of request j can be measured. On this issue, an improved predictive control is proposed in this paper to achieve the relative delay guarantees. 3.1 Design The relationship of the requests arrival rate, service rate and the queue length of class i are demonstrated in Figure 4(see [12]).The arrival curve Arrviali(k) presents the rate of TCP connections, and the service curve Servicei(k) presents the capacity of Web server for class i. ∆Arrivali(k) ∆k = λi(k) (24) ∆Servicei(k) ∆k = µi(k) (25) The Research of QoS Approach in Web Servers 641 When classi is overload, the arrival rate is always greater than the Web service rate,i.e. λi(k) > µi(k).So the vertical distance between the two curves is the actual length ni(k) of waiting queue i at the kth sampling time, which is ni(k) = Arrivali(k) − Servicei(k). From the view of the tail, the waiting time wi,ni−1is the sum of the processing time of the top (ni − 1) requests. Considered the processing time χi,ni itself, the mean residence time is li,ni = wi,ni−1 + χi,ni = ∑ni j=1 χi,j (26) If the mean residence time li(k+1) can be predicted as l̃i(k) based on the queue length ni(k), we can adjust the thread quota to satisfy Equation (1). Q u e u e L e n g th time ( ) i l k ( ) i l k Service Curve Arrival Curve ( ) i n k k 1k 1k Figure 4: The arrival curve and service curve 3.2 Dealy Prediction The Web load is actually the superposition of lots of ON/OFF process(see [11]). Just like Figure 5, Web pages are transferred during Object, and each page is composed of several em- bedded files which are transferred in a single TCP persistence connection under HTTP 1.1. The intervals between every embedded URL are Active OFF time corresponding to the processing time spent by browse parsing Web page. The Inactive OFF time is a longer pause corresponding to clients’ "thinking procedure". Therefore, the request of a Web page consists a sequence of HTTP requests with self-similarity, which is Embedded URL ON Object Inactive OFF Active OFFURL time Figure 5: HTTP1.1 ON/OFF model si,n = ∑yi,n j=1 xi,jn = 1, 2, ...ni(k) (27) Where si,n means the size of the nth Web page in waiting queue i, xi,j is the size of embedded files, and yi,n is the number of its embedded files. The previous research indicates that the heavy- tailed distribution is more accuracy for Web page (see [13] and [14]),and the probability mass function is: 642 Y. Hu, D. Mu, A. Gao, G. Dai f(x) = αkα 1 − (k/p)α x−α−1, k < x < p (28) Where k and p are the size of minimum and maximal file,α(0 < α < 2) is the sharp parameter which determine the variability of distribution. Generally set 1 < α < 2to correspond to the actual communications. The number of embedded files yi,n follows Pareto distribution (see [11]).yi,n and xi,j,xi,j itself are all independent from each other. The mathematical expectation of page sizesiis E[si] = E[yi]E[xi], i = 1, 2, ...N (29) Let {si,1, si,2, ..., si,ni(k)}be the size of HTTP requests on a TCP connection at the k thsample. Then the total size of the page in waiting queue i to be transferred is ∆i(k) = ∑ni(k) n=1 si,n = ∑ni(k) n=1 ∑yi,n j=1 xi,j (30) This means that the forecasting delay is the time spent by ci(k) threads concurrently pro- cessing the files of size ∆i(k). The paper [15] illustrates that the processing time of a static Web page is approximately linear to the size of request files. Combined the condition that the unit of the operation system is 64KB, the total size of the page to be transferred is χ(s, ci) = b × (s/64) + d × ci (31) Where b is the transmission coefficient, d is the overhead of threads switching and synchro- nization. They can be obtained by linear regression. So the total delay predicted is: li(ni(k), ci(k)) = ∑ni(k) n=1 χi(si,n, ci(k))/ci(k) (32) l̃i(k) = E[li(ni(k), ci(k))] = ni(k) 64 ( bE[si] ni(k) + 64d) (33) 3.3 Threads Dispatch Combined Equation (33) with Equation (1): l̃i(k) l̃i+1(k) = δi δi+1 = ni(k)ci+1(k) ni+1(k)ci(k) bE[si] + 64dci(k) bE[si+1] + 64dci+1(k) (34) Regardless the priority, Web pages stored in server is identical. So we have E[xi] = E[xi+1]. If Zi(k) = ni(k)/ni+1(k),e = bE[si],f = 64d,Equation (34) can be rewritten as δi δi+1 = Zi(k) fci(k)ci+1(k) + eci+1(k) fci(k)ci+1(k)+eci(k) (35) Where Zi(k) is obtained by the queue length observer, and Equation (35) can be written into the form of ci+1 = Fi(ci(k), Zi(k)). Take class 1 be the reference (δ1 = 1).Considered the Equation (2), the number of threads for each class is solved. The Research of QoS Approach in Web Servers 643 3.4 Parameter Regression As described in Section 3.2, the parameters b and din Equation (31) are calculated by the Least Square algorithm. The sampling period is set to 10 sec and the parameter regression experiments totally obtain 61 group valid data. Figure 6 presents the pages size, the predict delay and the measured delay when the thread quota varies. The identification results are b = 9.7,d = 55.7 under 0.05 significant level. 20 40 60 80 100 120 0 20 40 60 80 100 20 40 60 80 D e la y ( m s ) / fi le s iz e ( K B ) th re a d q u o ta time(/sec) estimate processing time file size thread Figure 6: Parameter regression 3.5 Experimental Results The test-bed has been shown in the section 3.3. Request rate of class1 varies from 6/s to 15/s every 300 seconds and class 2 keeps 12/s. The sampling time is T = 10s and all the experiments last 2000s. The reference value is 0.5, i.e. δ1/δ1 = 0.5. The thread quota is equivalent for each class in the initial state. The results under different controllers are shown in Figure 7. Each sub-picture has two insets. The upper is the request arrival rate of two classes and their delay proportion. The lower presents the change of their thread quota. Compared the controllers, the results are in accordance with what we analyzed before: 1. Whatever using feedback, M/G/1/∞ predictive control or improved predictive control, they all achieve PDD in Web server. The experiments also did at sampling time T = 8s and T = 15s which are similar. So In order to save space, we only give the best effect figure at T = 10s. 2. Define the variance Ξ be the stability evaluation index ,which is Ξ = √∑n k=1 (yi(k) − y1d)2 / n (36) Compute it under the different methods respectively, we find that the variance of propor- tional delay to 0.5 in improved predictive controller and M/G/1/∞ predictive controller is 40.18% and 57.4% of which in PI controller. This means the predictive controller is more stable when the load changes. 3. Compared the setting time in Figure 7a,7b and 7c. For the feedback, the setting time is nearly 100s at rising edge and 150s at the falling edge. The M/G/1/∞ predictive controller is almost the same situation but a little shorter. But those of the improved predictive controller are less than 1s at both edges, which presents a much better dynamic 644 Y. Hu, D. Mu, A. Gao, G. Dai 6 8 10 12 14 16 0 500 1000 1500 2000 0 0.2 0.4 0.6 0.8 1 re q u e s t ra te p e r s e c o n d s (1) delay ratio class1 request rate class2 request rate service time ratio 30 35 40 45 50 55 60 65 70 0 500 1000 1500 2000 q u o ta (2) quota variation class1 thread class2 thread (a) PI control 6 8 10 12 14 16 0 500 1000 1500 2000 0 0.2 0.4 0.6 0.8 1 re q u e s t ra te p e r s e c o n d s (1) delay ratio class1 request rate class2 request rate service time ratio 30 35 40 45 50 55 60 65 70 0 500 1000 1500 2000 q u o ta (2) quota variation class1 thread class2 thread (b) M/G/1/∞ queuing predictive control 6 8 10 12 14 16 0 500 1000 1500 2000 0 0.2 0.4 0.6 0.8 1 re q u e s t ra te p e r s e c o n d s (1) delay ratio class1 request rate class2 request rate service time ratio 30 35 40 45 50 55 60 65 70 0 500 1000 1500 2000 q u o ta (2) quota variation class1 thread class2 thread (c) Improved predictive control Figure 7: The change of delay ratio and thread quota under different controllers performance. There are two reasons for this. One is the load changes influence waiting queue length first, and then this leads to the delay changes observed which exists delay. The other is the refreshed controller output works in the next sample time, while both the predictive controllers adjust the quota early. 4. Finally, compared how they adjust the thread quota when load changes in Figure 7b and 7c. Because of the limitation of PK formula, it takes a long time to adjust the quota to the steady state in Figure 7b. But the improved predictive controller can complete the The Research of QoS Approach in Web Servers 645 procedure in just "one step". 4 Conclusion and Future Work The paper first presents a general Web QoS architecture, and then analyzed the defects of two traditional differentiated methods by experimental results. Finally, based on the queue length observer and parameter regression, improved predictive controller is proposed and produces a much better performance in real-time and stability. But there are still some problems. For example, if the parameters b and d obtained online, it will be more accurate to the Web model. Meanwhile, the paper only talked about the static requests and is not compatible to the dynamic loads. In the future, we should also break the local area network, and take the network congestion into account. Bibliography [1] J. Wei, C. Xu, eQoS:Provisioning of client-perceived end-to-end qos guarantees in web servers. IEEE Transactions on Computers, Vol.55, No.12, pp.1543-1556,2006. [2] Y. Lu, T. Abdelzaher, C. Lu, L. Sha, X. Liu, Feedback control with queuing theoretic pre- diction for relative delay guarantees in web servers, The 9th IEEE Real-Time and Embedded Technology and Applications Symposium, Washington, DC., USA,pp.208-217,2003. [3] K.H. Chan, X. Chu, Design of a Cluster-Based Web Server with Proportional Connection Delay Guarantee, The IEEE International conference on Communications, Beijing,China, pp.5692-5696,2008. [4] W. Pan, D. Mu, H. Wu, Q. Sun, Proportional Delay Differentiation Service in Web Applica- tion Servers: A Feedback Control Approach, International Journal of Intelligent Information Technology Application, Vol.1, No.1,pp.37-42,2008. [5] Y. Lu, T.F. Abdelzaher, A.Saxena, Design, implementation, and evaluation of differenti- ated caching services, IEEE Transactions on Parallel and Distributed Systems, Vol.15, No.5, pp.440-452,2004. [6] J. Wei, X. Zhou, C. Xu, Robust processing rate allocation for proportional slowdown differen- tiation on Internet servers, IEEE Transactions on Computers, Vol.54, No.8,pp.964-977,2005. [7] L.Sha, X. Liu, U.Y. Lu, T. Abdelzaher, Queueing model based network server performance control, The Proceedings Real-Time Systems Symposium, Austin, USA,pp.81-90,2002. [8] C. Lu, T. Abdelzaher, J. Stankovic, S. Son, A Feedback Control Approach for Guaranteeing Relative Delays in Web Servers, The 7th Real-Time Technology and Applications Symposium, Taipei, Taiwan,pp.51-62,2001. [9] Y. Tang, X. Tang, The queuing theory foundation and analysis, Science Press. Beijing. 2005. [10] X. Zhou, J. Wei, C.Z. Xu, Processing rate allocation for proportional slowdown differentia- tion on Internet servers. The 18th International Parallel and Distributed Processing Sympo- sium, Santa Fe, USA, pp.1247-1256,2004. [11] P. Barford, M. Crovella, Generating Representative Web Workloads for Network and Server Performance Evaluation, Proceedings of the 1998 Joint International Conference on Measure- ment and Modeling of Computer Systems, Madison, USA,pp.151-160,1998. 646 Y. Hu, D. Mu, A. Gao, G. Dai [12] J. Liebeherr, N. Christin, JoBS: Joint buffer management and scheduling for differentiated servers, Proceedings of the 9th International Workshop on Quality of Service, Karlsruhe, German,pp.404-418,2001. [13] M. Harchol-Balter, Task assignment with unknown duration, Journal of the ACM, Vol.49, No.2, pp.260-288,2002. [14] M. Arlitt, T. Jin, A workload characterization study of the 1998 world cup web site, IEEE network, Vol.14,No.3, pp.30-37,2000. [15] TF Abdelzaher, N. Bhatti, Web server QoS management by adaptive content delivery, Proceedings of the 7th International Workshop on Quality of Service, London,UK, pp.216- 225,1999. [16] A. Gao, D. Mu, Y. Hu, W. Pan, Proportional Delay Guarantee in Web QoS Based on Predictive Control, Proceedings of iCISE 2009, Nanjing,China,pp.173-176,2009.