Integrated Monitoring and Control for Performance Management of Distributed Enterprise Systems

Please download to get full document.

View again

of 3
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Similar Documents
Information Report
Category:

Paintings & Photography

Published:

Views: 58 | Pages: 3

Extension: PDF | Download: 0

Share
Description
Integrated Monitoring and Control for Performance Management of Distributed Enterprise Systems
Tags
Transcript
  Integrated Monitoring and Control for PerformanceManagement of Distributed Enterprise Systems Rajat Mehrotra ∗ , Abhishek Dubey † , Sherif Abdelwahed ∗ , Asser Tantawi ‡∗ Electrical and Computer Engineering, Mississippi State University, Miss. State, MS † Institute for Software Integrated Systems, Vanderbilt University, Nashville, TN ‡ IBM TJ Watson Research Center, Hawthorne, NYI. I NTRODUCTION Self-managing techniques in distributed systems have beeninvestigated recently for optimizing operational cost andmultidimensional quality of service metrics including re-sponse time, throughput and reliability. Practical applicationsinclude task scheduling [1], bandwidth allocation and QoSadaptation in web servers [2], load balancing in e-mail andfile servers [3], and CPU provisioning [4]. The challenge in developing these techniques such that they remain valid evenwhen the systems are under uncertain and dynamic operatingconditions, is to identify and learn the system model. Thenonly we can develop an appropriate management structure. Contribution of Our Work : In this paper, we presentan integrated framework for estimating workload patterns,system performance using mathematical models and model-predictive control for managing the system’s Quality of Ser-vice parameters. Our approach starts with performing systemmodel identification through extensive experimentation andthen identifies the parameters and underlying model structureof the system using regression and queuing theory techniques.Later, the value of model parameters is estimated and refinedusing Kalman Filters(KF). We use auto regressive movingaverage filters for workload forecasting. The effectivenessof our approach has been demonstrated by using a modelpredictive controller to minimize power consumption of amulti-tier enterprise system while maintaining the systemresponse time within a desired level. A detailed descriptionof these results is available as a technical report [5]. System Setup:  This work utilizes IBM Web Sphere Ap-plication Server Community Edition with Daytrader as therepresentative application. Modified version of Httperf is usedto generate client requests (see [5]). Table I summarizesconfiguration of physical machines. Numerous experimentswere performed to understand the system behavior withrespect to system utilization, various work load profiles,bottleneck resource utilization, and their impact on systemperformance to develop analytical models. Next subsectionsdescribe them in detail.II. S YSTEM  M ODELING  A PPROACH Queuing Model With Runtime Estimation of VariousParameters:  Queues are a useful abstraction for understand-ing the nature of web servers. Typically, a new web requesthas to wait in a queue for release of computational resourcesfrom older requests before entering in to the system. There-fore, the total service time of the enterprise system is directlyaffected by the queuing policy at each tier. During these tests,we used an equivalent open single-tier queuing model toapproximate the combined behavior of all tiers that is shownin Fig. 1. Here  S   is the average service time for each request. D  is a delay corresponding to the time taken to process therequest in all subsequent tiers. Fig. 1. An equivalent queuing model for the two-tier system. To obtain the relationship between the state vector andthe observation vector on-line, we consider  Processor Shar-ing(PS)  queue system. From experiments, we found that thismodel provides good estimation of the system behavior andis easier to analyze compared to  Limited Processor Sharing(LPS)  queue system. We implemented an  exponential Kalman filter   to predict the computational nature of the incidentrequests over web server by estimating the  S   and  D  of arequest by observing the current average response time of theincident request and request arrival rate on the web server.This filter uses an  M/G/ 1 / ∞  PS   queuing model andconsiders variation in  S   and  D  at previous approximation toestimate the  S   and  D  at next sample time. It operates onthe exponential transformation of the system state variablesthat allows us to enforce the feasibility constraints,  S,D  ≥ 0 . Such constraints are not possible in typical Kalman filterimplementations as described in [6]. Note:  We can approximate the system as a M/G/1/  ∞  PSqueue only if the total number of requests in the system areless than the maximum concurrency limit, or  the bottleneck resource utilization is less than  1  that represents an infinitePS queue model. Hence, we identify the operating regionsthrough bottleneck utilization and analyze the system in theinfinite PS queue region only.The KF equations, written in the terms of exponentiallytransformed variables,  [ x 1  ∈  R ; x 2  ∈  R ]  s.t.  S   =  exp ( x 1) and  D  =  exp ( x 2)  are as follows. Note that this transforma-tion ensures  S,D ∈ R + : For a given timed index of observa-  TABLE IP HYSICAL MACHINE CONFIGURATION .Physical M/C Cores Description RAM DVFS Virtual MachinesNop01 8 2 Quad core 1.9GHz AMD Opteron 2347 HE 8GB No Nop04,Nop07 (Development Machines)Nop02 4 2.0 GHz Intel Xeon E5405 processor 4GB No Nop05,Nop08 (Client Machines)Nop03 8 2 Quad core 1.9GHz AMD Opteron 2350 8GB Yes Nop06,Nop09 (Application server)Nop10 8 2 Quad core 1.9GHz AMD Opteron 2350z 8GB Yes Nop11,Nop12 (Database Server) tion,  k , the equations   exp ( x 1 k ) exp ( x 2 k )  =   exp ( x 1 k − 1 ) exp ( x 2 k − 1 )  + N  (0 , Q )  and  T   =  exp ( x 1 k ) ∗ (1 / (1 − λ k ∗ exp ( x 1 k )))  + exp ( x 2 k ) +  V  (0 , R )  define the state update dynamics andobservation.  N   and  V   are Gaussian process and measurementnoises with mean zero and covariances Q and R respectively.The assumption that this process is Gaussian in nature isbased on our limited observation. We do not claim thatthis will be applicable in all situations. Predicted bottleneck utilization is given by  ˆ ρ k  =  λ k * exp ( x 1 k ) . Note:  For stability reasons and infinite PS queue assump-tion, the Kalman filter does not update its state when thepredicted bottleneck resource utilization becomes equal to  1 . Model for Power Consumption:  We extend our previouspower consumption model described in [6] with additionalparameters and data to achieve greater accuracy. Main obser-vation during our experiments (details available in [5]) wasthat power consumption model of a physical machine is non-linear because power consumption in these machines dependsnot only upon the CPU core frequency and utilization, butalso depends non-linearly on other power consuming devicese.g. hard drive, CPU cooling fan etc. As a result, a look-uptable with near neighbor interpolation was used as the powerconsumption model of the physical machine. Combinationof CPU frequency, and aggregate CPU core usage of thephysical machine was used as a key of the lookup tableto access the corresponding power consumption value. Thisaggregate power model was utilized mainly for the controlledexperiments described in section IV.III. T HE  C ONTROLLER This section describes the implementation of a feed-back control based online predictive controller that uses theKalman filter and the queuing model identified in previoussection to maintain the multi-dimensional QoS demands.This controller is similar to the  L0 Controller   describedin [7]. It predicts the aggregate response time of the incidentrequests and the estimated power consumption during thenext sample time (look-ahead horizon  N  ) of the system basedon different possible combinations of control inputs (CPUcore frequency). It optimizes the system behavior in termsof QoS objectives by continuous observation of the systemmeasurements and choosing the best control input for thesystem in next sample interval. System Variables:  We have chosen a small set of mostrelevant parameters from list in [5] for our predictive con-troller to show the performance of our modeling approach.The chosen control input is the  CPU core frequency  due to itsimpact on the system performance in multiple dimensions forresponse time of the system and power consumption. Exper-iments from [5] indicate that the higher value of applicationqueue represents contention in computational resources of theapplication and total response time value indicates system’scapability to process the requests lying in system queue ina timely manner. Therefore  System queue size  and  responsetime  were the chosen state variables while  power consump-tion  was the performance variable. These are also the typicalvariables measured in web service industry and used to definemulti-dimensional service level agreements (SLA). Control Objective:  We try to minimize the applicationqueue size and total response time as one of the componentin cost function  J   (described later in this section). Plant Model:  The queuing model identified in the previoussection was used to estimate the state of the managed system. Controller Model:  To combine the power consumption,QoS and the predicted response time, the controller uses adifferent internal model (not same as plant model, whichis used for state estimation). The controller model uses theestimated system state, predicted response time and predictedpower consumption to make the system decisions. Kalmanfilter is used at run-time to estimate the service time  ˆ S  t  of the incident request at current frequency  u ( t ) , which is thenused by the controller to estimate the average service timefor the next sampling interval. Request Forecaster:  An  autoregressive moving average model is used as estimator of the environmental input withuser specified weights on the current and previous arrivalrates for accurate prediction. Control Algorithm and Performance Specification:  Weuse a limited look ahead controller algorithm, which is atype of model predictive control. Starting from a time  t 0 ,the controller solves an optimization problem defined over apredefined horizon ( t  = 1 ...N  ) and chooses the first controlinput (CPU core frequency  u ( t 0 ) ) that minimizes the totalcost of operating the system  J   within the prediction horizon.During this work, we set the horizon to  N   = 2  to reduce thecomputation overhead.The cost function ( J  ) is the weighted conjunction of driftof system state from the desired set point of the system state(desired maximum queue size, desired maximum responsetime) and power consumption (desired power consumptionis 0). The power consumption is predicted with the help of  lookup table  generated from the system power consumptionmodel, the current frequency of the CPU core, and aggregatesystem utilization of the physical server.IV. C ASE  S TUDY : P OWER  C ONSUMPTION AND  R ESPONSE T IME  M ANAGEMENT This section uses the concepts introduced in the earliersections for managing server power consumption while main-taining the predefined QoS requirement of minimum response  Fig. 2. Comparison of of results with and without controller: Sampling pe-riod=30 seconds. Std deviation for all response measurements=0.02 seconds(without controller ), 0.019 seconds(with controller)Fig. 3. Online exponential Kalman filter output corresponding to theexperiment with controller. Service time and delay are in millisecond range.Response time is specified in seconds. time under a time varying dynamic workload for applicationhosted in virtualized environment [5]. During this study, weperformed two separate experiments to operate a multi-tierenterprise service  (Daytrader)  described in [5] with and with-out predictive controller and compared the cost of operatingthe system in terms of response time and power consumptionover a periods of 4 hours. Analysis of the controller results:  Fig. 3 shows theKalman filter tracking average response time of the incidentrequests and bottleneck utilization with high accuracy. Ac-cording to sub-figure 3, predicted response time from theKalman filter  T   pred  and actual response time  T   observed atweb server are also very close to each other, which indicatesaccuracy of the Kalman filter estimation. The controlledversion runs at a lower frequency most of the times thatresults into considerable amount of power saving ( 18 %) overa period of four hours of experiment (fig 2) compared to thebaseline experiment without controller at max frequency allthe time. Fig 3 shows the controller changing the frequencyof the CPU core at very few occasions, but it is able toidentify the sudden increase in the incident request rate whichreflects adaptive nature of the controller in case of dynamicload conditions. This experiment shows that the predictivecontroller has a negligeable negative effect on the responsetime as well as CPU (not shown in figure) and memoryutilization (not shown in figure), but greatly reduces thepower consumption.V. C ONCLUSION We have presented a simple and novel approach to developmodels with low variance for multi-tier enterprise systems.We showed that the developed model can be integrated witha predictive control framework for dynamically changing thesystem tuning parameters to achieve a pre-specified QoSobjective. The results shown in section IV shows that thedeveloped Kalman filter tracks the system model parametersat run time with high accuracy. Additionally, the proposedpower consumption model of the system used by the con-troller predicts the overall physical server power consumptionwell ( 95 % accurate). Using this model we showed that wecan optimize system performance and achieve  18 % reductionof power consumption in four hours of experiment in singleserver without affecting the response time severely. Further-more, the experimental results (CPU and RAM consumptionwith and without the controller) indicates that the proposedapproach has low run-time overhead in terms of computa-tional and memory resources. We further plan to extend thisframework and verify its performance over a cluster of multi-tier computing systems in hierarchical fashion as describedin [7]. Acknowledgment  This work was supported in part by theNSF SOD Program, contact number CNS-0804230.R EFERENCES[1] Anton Cervin, Johan Eker, Bo Bernhardsson, and Karl-Erik. Feedback–feedforward scheduling of control tasks.  Real-Time Syst. , 23(1/2):25–53,2002.[2] T.F. Abdelzaher, K.G. Shin, and N. Bhatti. Performance guarantees forweb server end-systems: a control-theoretical approach.  Parallel and  Distributed Systems, IEEE Transactions on , 13(1):80–96, Jan 2002.[3] Chenyang Lu, Guillermo A. Alvarez, and John Wilkes. Aqueduct:Online data migration with performance guarantees. In  FAST ’02:Proceedings of the 1st USENIX Conference on File and Storage Tech-nologies , page 21, Berkeley, CA, USA, 2002. USENIX Association.[4] Dara Kusic, Nagarajan Kandasamy, and Guofei Jiang. Approximationmodeling for the online performance management of distributed com-puting systems. In  ICAC ’07: Proceedings of the Fourth InternationalConference on Autonomic Computing , page 23, 2007.[5] Rajat Mehrotra, Abhishek Dubey, Sherif Abdelwahed, and AsserTantawi. Model identification for performance management of dis-tributed enterprise systems. Technical Report ISIS-10-104, Institute forSoftware Integrated Systems, Vanderbilt University, April 2010.[6] Abhishek Dubey, Rajat Mehrotra, Sherif Abdelwahed, and AsserTantawi. Performance modeling of distributed multi-tier enterprisesystems.  SIGMETRICS Performance Evaluation Review , 37(2):9–11,2009.[7] N. Kandasamy, S. Abdelwahed, and M. Khandekar. A hierarchicaloptimization framework for autonomic performance management of dis-tributed computing systems. In  Proc. 26th IEEE Int’l Conf. Distributed Computing Systems (ICDCS) , 2006.
Recommended
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks
SAVE OUR EARTH

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!

x