scispace - formally typeset
Search or ask a question

Showing papers on "Service level objective published in 2021"


Journal ArticleDOI
TL;DR: In this article, a hybrid simulation and container orchestration framework is proposed to optimize Quality of Service (QoS) parameters in large-scale fog platforms, using a gradient-based optimization strategy using back-propagation of gradients with respect to input.
Abstract: Intelligent task placement and management of tasks in large-scale fog platforms is challenging due to the highly volatile nature of modern workload applications and sensitive user requirements of low energy consumption and response time. Container orchestration platforms have emerged to alleviate this problem with prior art either using heuristics to quickly reach scheduling decisions or AI driven methods like reinforcement learning and evolutionary approaches to adapt to dynamic scenarios. The former often fail to quickly adapt in highly dynamic environments, whereas the latter have run-times that are slow enough to negatively impact response time. Therefore, there is a need for scheduling policies that are both reactive to work efficiently in volatile environments and have low scheduling overheads. To achieve this, we propose a Gradient Based Optimization Strategy using Back-propagation of gradients with respect to Input (GOBI). Further, we leverage the accuracy of predictive digital-twin models and simulation capabilities by developing a Coupled Simulation and Container Orchestration Framework (COSCO). Using this, we create a hybrid simulation driven decision approach, GOBI*, to optimize Quality of Service (QoS) parameters. Co-simulation and the back-propagation approaches allow these methods to adapt quickly in volatile environments. Experiments conducted using real-world data on fog applications using the GOBI and GOBI* methods, show a significant improvement in terms of energy consumption, response time, Service Level Objective and scheduling time by up to 15, 40, 4, and 82 percent respectively when compared to the state-of-the-art algorithms.

33 citations


Proceedings ArticleDOI
01 Sep 2021
TL;DR: SLO'Neill et al. as mentioned in this paper present SLO Script, a language and accompanying framework, motivated by real-world, industrial needs to define complex, high-level SLOs in an orchestrator-independent manner.
Abstract: Service Level Objectives (SLOs) allow defining expected performance of cloud services, such that cloud service providers know what they guarantee and service consumers know what to expect. Most approaches focus on low-level SLOs, closely related to resources, e.g., average CPU or memory usage, and are usually bound to specific elasticity controllers. We present SLO Script, a language and accompanying framework, motivated by real-world, industrial needs to allow service providers to define complex, high-level SLOs in an orchestrator-independent manner. The main features of SLO Script include: i) novel abstractions (StronglyTypedSLO) with type safety features, ensuring compatibility between SLOs and elasticity strategies, ii) abstractions that enable decoupling of SLOs from elasticity strategies, iii) a strongly typed metrics API, and iv) an orchestrator-independent object model that enables language extensibility. We present a case study about a real-world, cloud-native application and evaluate our language while implementing a realistic Cost Efficiency SLO.

22 citations


Proceedings ArticleDOI
01 Sep 2021
TL;DR: In this paper, the authors present a middleware that provides an orchestrator-independent SLO controller for periodically evaluating SLO and triggering elasticity strategies, while decoupling SLOs from the elasticity strategy to increase flexibility, and provider-independent services for obtaining low-level metrics and composing them into higher level metrics.
Abstract: Service Level Objectives (SLOs) guide the elasticity of cloud applications, e.g., by deciding when and how much the resources provisioned to an application should be changed. Evaluating SLOs requires metrics, which can be directly measured on the application or system, or, more elaborately, be composed from multiple low-level metrics. The implementation of such metrics and SLOs, the triggering of elasticity strategies, and allowing configurability by the user deploying an application, requires a flexible middleware. In this paper, we present a middleware that provides an orchestrator-independent SLO controller for periodically evaluating SLOs and triggering elasticity strategies, while decoupling SLOs from the elasticity strategies to increase flexibility, and provider-independent services for obtaining low-level metrics and composing them into higher-level metrics. We evaluate our middleware by implementing a motivating use case, featuring a cost efficiency SLO for an application deployed on Kubernetes.

19 citations


Journal ArticleDOI
TL;DR: An abstract model of CGs is described and a technique that leverages constraint satisfaction problem solvers to automatically validate them is provided, which helps to pave the way for using CGs in a safer and more reliable way.
Abstract: A Service Level Agreement (SLA) regulates the provisioning of a service by defining a set of guarantees. Each guarantee sets a Service Level Objective (SLO) on some service metrics, and optionally a compensation that is applied when the SLO is unfulfilled or overfulfilled. Currently, there are software tools and research proposals that use the information about compensations to automate and optimise certain parts of the service management. However, they assume that compensations are well defined, which is too optimistic in some circumstances and can lead to undesirable situations. In this article we discuss about the notion of validity of guarantees with a compensation, which we refer to as compensable guarantees (CG). We describe an abstract model of CGs and we provide a technique that leverages constraint satisfaction problem solvers to automatically validate them. We also present a materialisation of the model of CGs in iAgree, a language to specify SLAs and a tooling support that implements our whole approach. An assessment over 319 CGs taken from 24 real-world SLAs suggests that the expressiveness and effectiveness of our proposal can pave the way for using CGs in a safer and more reliable way.

13 citations


Journal ArticleDOI
TL;DR: A witness model is proposed implemented with smart contracts to solve the trust issue between cloud customer and provider and define three types of malicious behaviors and propose quantitative indicators to audit and detect these behaviors.
Abstract: There lacks trust between the cloud customer and provider to enforce traditional cloud SLA (Service Level Agreement) where the blockchain technique seems a promising solution. However, current explorations still face challenges to prove that the off-chain SLO (Service Level Objective) violations really happen before recorded into the on-chain transactions. In this paper, a witness model is proposed implemented with smart contracts to solve this trust issue. The introduced role, “Witness”, gains rewards as an incentive for performing the SLO violation report, and the payoff function is carefully designed in a way that the witness has to tell the truth, for maximizing the rewards. This fact that the witness has to be honest is analyzed and proved using the Nash Equilibrium principle of game theory. For ensuring the chosen witnesses are random and independent, an unbiased selection algorithm is proposed to avoid possible collusions. An auditing mechanism is also introduced to detect potential malicious witnesses. Specifically, we define three types of malicious behaviors and propose quantitative indicators to audit and detect these behaviors. Moreover, experimental studies based on Ethereum blockchain demonstrate the proposed model is feasible, and indicate that the performance, ie, transaction fee, of each interface follows the design expectations.

12 citations


Journal ArticleDOI
TL;DR: A novel scheduling algorithm called Bottleneck and Cost Value Scheduling (BCVS) coupled with a novel dynamic data replication strategy called Correlation and Economic Model-based Replication (CEMR) to improve data access effectiveness in order to meet service level objectives.
Abstract: Task scheduling and data replication are highly coupled resource management techniques that are widely used by cloud providers to improve the overall system performance and ensure service level agreement (SLA) compliance while preserving their own economic profit. However, balancing the trade-off between system performance and provider profit is very challenging. In this paper, we propose a novel scheduling algorithm called Bottleneck and Cost Value Scheduling (BCVS) algorithm coupled with a novel dynamic data replication strategy called Correlation and Economic Model-based Replication (CEMR). The main goal is to improve data access effectiveness in order to meet service level objectives in terms of response time SLORT and minimum availability SLOMA, while preserving the provider profit. The BCVS algorithm focuses on reducing system bottleneck situations caused by data transfer when the CEMR focuses on preventing future SLA violations and guaranteeing a minimum availability. An economic model is also proposed to estimate the cloud provider profit. Simulation results indicate that the proposed combination of scheduling and replication algorithms offers higher monetary profit for the cloud provider by up to 30% compared to existing strategies. Moreover, it allows better performance.

12 citations


Journal ArticleDOI
TL;DR: An in-depth empirical investigation for the scalability of the proposed system is carried out in order to address the challenge of transparently enforcing real-time monitoring of cloud-hosted services leveraging blockchain technology.
Abstract: Cloud computing is an important technology for businesses and individual users to obtain computing resources over the Internet on-demand and exibly. Although cloud computing has been adopted across diverse applications, the owners of time-and-performance critical applications require cloud service providers' guarantees about their services, such as availability and response times. Service Level Agreements (SLAs) are a mechanism to communicate and enforce such guarantees typically represented as service level objectives (SLOs), and financial penalties are imposed on SLO violations. Due to delays and inaccuracies caused by manual processing, an automatic method to periodically verify SLA terms in a transparent and trustworthy manner is fundamental to effective SLA monitoring, leading to the acceptance and credibility of such service to the customers of cloud services. This paper presents a blockchain-based distributed infrastructure that leverages fundamental blockchain properties to achieve immutable and trustworthy SLA monitoring within cloud services. The paper carries out an in-depth empirical investigation for the scalability of the proposed system in order to address the challenge of transparently enforcing real-time monitoring of cloud-hosted services leveraging blockchain technology. This will enable all the stakeholders to enforce accurate execution of SLA without any imprecisions and delays by maintaining an immutable ledger publicly across blockchain network. The experimentation takes into consideration several attributes of blockchain which are critical in achieving optimum performance. The paper also investigates key characteristics of these factors and their impact to the behaviour of the system for further scaling it up under various cases for increased service utilization.

10 citations


Journal ArticleDOI
TL;DR: CEDULE+ is a data-driven framework that enables efficient resource management for burstable cloud instances by analyzing the system workload and latency data and is evaluated on Amazon EC2, where its efficiency and high accuracy are assessed through real-case scenarios.
Abstract: Nearly all principal cloud providers now provide burstable instances in their offerings. The main attraction of this type of instance is that it can boost its performance for a limited time to cope with workload variations. Although burstable instances are widely adopted, it is not clear how to efficiently manage them to avoid waste of resources. In this article, we use predictive data analytics to optimize the management of burstable instances. We design CEDULE+, a data-driven framework that enables efficient resource management for burstable cloud instances by analyzing the system workload and latency data. CEDULE+ selects the most profitable instance type to process incoming requests and controls CPU, I/O, and network usage to minimize the resource waste without violating Service Level Objectives (SLOs). CEDULE+ uses lightweight profiling and quantile regression to build a data-driven prediction model that estimates system performance for all combinations of instance type, resource type, and system workload. CEDULE+ is evaluated on Amazon EC2, and its efficiency and high accuracy are assessed through real-case scenarios. CEDULE+ predicts application latency with errors less than 10%, extends the maximum performance period of a burstable instance up to 2.4 times, and decreases deployment costs by more than 50%.

9 citations


Proceedings ArticleDOI
01 Nov 2021
TL;DR: Parslo as discussed by the authors proposes a Gradient Descent-based approach to assign partial SLO among nodes in a microservice graph under an end-to-end latency SLO.
Abstract: Modern cloud services are implemented as graphs of loosely-coupled microservices to improve programmability, reliability, and scalability. Service Level Objectives (SLOs) define end-to-end latency targets for the entire service to ensure user satisfaction. In such environments, each microservice is independently deployed and (auto-)scaled. However, it is unclear how to optimally scale individual microservices when end-to-end SLOs are violated or underutilized, and how to size each microservice to meet the end-to-end SLO at minimal total cost. In this paper, we propose Parslo---a Gradient Descent-based approach to assign partial SLOs among nodes in a microservice graph under an end-to-end latency SLO. At a high level, the Parslo algorithm breaks the end-to-end SLO budget into small incremental "SLO units", and iteratively allocates one marginal SLO unit to the best candidate microservice to achieve the highest total cost savings until the entire end-to-end SLO budget is exhausted. Parslo achieves a near-optimal solution, seeking to minimize the total cost for the entire service deployment, and is applicable to general microservice graphs that comprise patterns like dynamic branching, parallel fan-out, and microservice dependencies. Parslo reduces service deployment costs by more than 6x in real microservice-based applications, compared to a state-of-the-art partial SLO assignment scheme.

9 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed a QoE-oriented cloud service orchestration algorithm that can guide ASPs on how to plan their budget to enhance satisfactory QoEs to end-users.
Abstract: New virtualization technologies allow Infrastructure Providers (InPs) to lease their resources to Application Service Providers (ASPs) for highly scalable delivery of cloud services to end-users. However, existing literature lacks knowledge on Quality of Experience (QoE)-oriented cloud service orchestration algorithms that can guide ASPs on how to plan their budget to enhance satisfactory QoE delivery to end-users. In contrast to the InP’s cloud service orchestration, the ASP’s orchestration should not rely on expensive infrastructure control mechanisms such as Software-Defined Networking (SDN), or require apriori knowledge on the number of services to be instantiated and their anticipated placement location within InP’s infrastructure. In this paper, we address this issue of delivering satisfactory user QoE by synergistically optimizing both ASP’s management and data planes . The optimization within the ASP management plane first maximizes Service Level Objective (SLO) coverage of users when application services are being deployed, and are not yet operational. The optimization of the ASP data plane then enhances satisfactory user QoE delivery when applications services are operational with real user access. Our evaluation of QoE-oriented algorithms using realistic numerical simulations, real-world cloud testbed experiments with actual users and ASP case studies show notably improved performance over existing cloud service orchestration solutions.

7 citations


Proceedings ArticleDOI
01 Nov 2021
Abstract: Resource management for geo-distributed infrastructures is challenging due to the scarcity and non-uniformity of edge resources, as well as the high client mobility and workload surges inherent to situation awareness applications. Due to their centralized nature, state-of-the-art schedulers that work well in datacenters lack the performance and feature requirements of such applications. We present OneEdge, a hybrid control plane that enables autonomous decision-making at edge sites for localized, rapid single-site application deployment. Edge sites handle mobility, churn, and load spikes, by cooperating with a centralized controller that allows coordinated multi-site scheduling and dynamic reconfiguration. OneEdge's scheduling decisions are driven by each application's end-to-end service level objective (E2E SLO) as well as the specific requirements of situation awareness applications. OneEdge's novel distributed state management combines autonomous decision-making at the edge sites for rapid localized resource allocations with decision-making at the central controller when multi-site application deployment is needed. Using a mix of applications on multi-region Azure instances, we show that, in contrast to centralized or fully distributed control planes, OneEdge caters to the unique requirements of situation awareness applications. Compared to a centralized control plane, OneEdge reduces deployment latency by 66% for single-site applications, without compromising E2E SLOs.

Journal ArticleDOI
TL;DR: In this article, the authors discuss the challenges encountered by network orchestrators in allocating resources to disparate 5G network slices, and propose the use of artificial intelligence to make core placement and scaling decisions that meet the requirements of network slices deployed on shared infrastructure.
Abstract: Network slicing enables communication service providers to partition physical infrastructure into logically independent networks. Network slices must be provisioned to meet the service-level objectives (SLOs) of disparate offerings, such as enhanced mobile broadband, ultrareliable low-latency communications, and massive machine-type communications. Network orchestrators must customize service placement and scaling to achieve the SLO of each network slice. In this article, we discuss the challenges encountered by network orchestrators in allocating resources to disparate 5G network slices, and propose the use of artificial intelligence to make core placement and scaling decisions that meet the requirements of network slices deployed on shared infrastructure. We explore how artificial intelligence-driven scaling algorithms, coupled with functionality-aware placement, can enable providers to design closed-loop solutions to meet the disparate SLOs of future network slices.

Proceedings ArticleDOI
01 Sep 2021
TL;DR: In this article, a lightweight fault localization system, that establishes causal relationships among the golden signal service errors and error logs, and further leverages PageRank centrality of the derived causal graph for generating a ranked list of faulty microservices.
Abstract: In cloud-native applications, a large fraction of operational failures, known as outages, result in violations of Service Level Objectives (SLOs). SLOs are defined around specific measurable characteristics: availability, throughput, frequency, response time, and quality. Four metrics, latency, traffic, errors, and saturation, ensure coverage for most outages of an application. These are often called golden signals. The dynamicity and complexity of cloud-native applications complicate Site Reliability Engineers’ (SREs) efforts in problem determination, in particular in its fault localization. The fault localization is often a try-and-error process in which SREs rely on their domain knowledge and experience. It is laborious and frequently results in long Mean Time To Resolution (MTTR) for outages. This paper describes a lightweight fault localization system, that establishes causal relationships among the golden signal service errors and error logs, and further leverages PageRank centrality of the derived causal graph for generating a ranked list of faulty microservices.

Journal ArticleDOI
22 Mar 2021
TL;DR: In this article, the authors examine how human resources in the Maltese public service adopt new work practices in response to COVID-19 public health measures during the first wave of the pandemic.
Abstract: This study examines how human resources in the Maltese Public Service adopt new work practices in response to COVID-19 public health measures during the first wave of the pandemic. We analyze the data we collected through seven focus group discussions and ten in-depth interviews with Public Service employees and managers in a diversity of ministries and roles. Our study reveals that Public Service policies promoting remote working relied exclusively on the service's IT infrastructure. However, the ability to respond to customer needs effectively in a time of surging demand relied entirely on effective employees' access to responsive and efficient ICT support as well as employees' prior experience with remote work modes and their predisposition to change to remote working. Adopting remote working modes uncovered inherent weaknesses in the Public Service IT infrastructure that put additional strain on the Government's centralized IT support function, especially when Public Service employees adopted tools not supported by the centralized IT support. In circumstances where centralized IT support was ineffective, Public Service employees relied on their own knowledge resources which they informally shared in groups of practice or employed operant resources (or tacit knowledge) to achieve service level objectives. These observations suggest that in times when organizations respond to immediate and unprecedented change, human resources seek to adapt by relying on tacit knowledge that is shared among people in known (often informal) groups of people with a common interest or role.

Posted Content
TL;DR: In this article, a composable Just in Time Architecture for Data Science (DS) Pipelines named JITA-4DS and associated resource management techniques for configuring disaggregated data centers (DCs).
Abstract: This paper proposes a composable "Just in Time Architecture" for Data Science (DS) Pipelines named JITA-4DS and associated resource management techniques for configuring disaggregated data centers (DCs). DCs under our approach are composable based on vertical integration of the application, middleware/operating system, and hardware layers customized dynamically to meet application Service Level Objectives (SLO-application-aware management). Thereby, pipelines utilize a set of flexible building blocks that can be dynamically and automatically assembled and reassembled to meet the dynamic changes in the workload's SLOs. To assess disaggregated DC's, we study how to model and validate their performance in large-scale settings.

Journal ArticleDOI
TL;DR: This article proposes an approach that continuously makes elastic deployment plans aimed at optimizing cost and performance, even during adaptation processes, to meet service level objectives (SLOs) at lower costs.
Abstract: Containers such as Docker provide a lightweight virtualization technology. They have gained popularity in developing, deploying and managing applications in and across Cloud platforms. Container management and orchestration platforms such as Kubernetes run application containers in virtual clusters that abstract the overheads in managing the underlying infrastructures to simplify the deployment of container solutions. These platforms are well suited for modern web applications that can give rise to geographic fluctuations in use based on the location of users. Such fluctuations often require dynamic global deployment solutions. A key issue is to decide how to adapt the number and placement of clusters to maintain performance, whilst incurring minimum operating and adaptation costs. Manual decisions are naive and can give rise to: over-provisioning and hence cost issues; improper placement and performance issues, and/or unnecessary relocations resulting in adaptation issues. Elastic deployment solutions are essential to support automated and intelligent adaptation of container clusters in geographically distributed Clouds. In this article, we propose an approach that continuously makes elastic deployment plans aimed at optimizing cost and performance, even during adaptation processes, to meet service level objectives (SLOs) at lower costs. Meta-heuristics are used for cluster placement and adjustment. We conduct experiments on the Australia-wide National eResearch Collaboration Tools and Resources Research Cloud using Docker and Kubernetes. Results show that with only a 0.5 ms sacrifice in SLO for the 95th percentile of response times we are able to achieve up to 44.44% improvement (reduction) in cost compared to a naive over-provisioning deployment approach.

Proceedings ArticleDOI
01 Sep 2021
TL;DR: In this article, a real-time messaging system motivated by Internet-of-Things (IoT) applications is designed and implemented in a cloud environment, and a solution capable of realizing an effective compromise between load distribution decisions and rate limiting is presented.
Abstract: The cloud's flexibility and promise of seamless auto-scaling notwithstanding, its ability to meet service level objectives (SLOs) typically calls for some form of control in resource usage. This seemingly traditional problem gives rise to new challenges in a cloud setting, and in particular a subtle yet significant trade-off involving load-distribution decisions (the distribution of workload across available cloud resources to optimize performance), and rate limiting (the capping of individual workloads to prevent global over-commitment). This paper investigates that trade-off through the design and implementation of a real-time messaging system motivated by Internet-of- Things (IoT) applications, and demonstrates a solution capable of realizing an effective compromise. The paper's contributions are in both explicating the source of this trade-off, and in demonstrating a possible solution.

Journal ArticleDOI
TL;DR: In this paper, a QoS optimization method is designed to obtain a near-optimal QoS solution for a tradeoff between user satisfaction and provider profit for a win-win service application between service providers and users.
Abstract: Cloud services request lower cost compared to traditional software of self-purchased infrastructure due to the characteristics of on-demand resource provisioning and pay-as-you-go mode.In the cloud services market, service providers attempt to make more profits from their services,while users hope to choose low-cost services with high-quality.The conflict of interests between users and service providers is an important challenge for the booming cloud service market. This paper characterizes this application problem formally based on a utility game model of service providers and users. In the model, QoS is considered as the basis for determining the utilities of both parties.By analyzing the behaviors of users and service providers,we introduce the concept of reputation cost for the first time in the model and find a QoS solution that balances the utilities of users and service providers in service transactions.In such a balance, any change in either party's strategy will result in a loss of utility. And then a QoS optimization method is designed to obtain a near-optimal QoS solution for a tradeoff between user satisfaction and provider profit. Extensive simulation experiments are conducted to substantiate the effectiveness of our method.The results are applicable to win-win service applications between service providers and users.

Journal ArticleDOI
TL;DR: A mathematical model is built to derive the upper bound of acceptable request arrival rate on each server of a Deadline Guaranteed storage service that incorporates three basic algorithms and shows the superior performance of DGCloud compared with previous methods in terms of deadline guarantees and system resource utilization, and the effectiveness of its individual algorithms.
Abstract: More and more organizations move their data and workload to commercial cloud storage systems. However, the multiplexing and sharing of the resources in a cloud storage system present unpredictable data access latency to tenants, which may make online data-intensive applications unable to satisfy their deadline requirements. Thus, it is important for cloud storage systems to provide deadline guaranteed services. In this paper, to meet a current form of service level objective (SLO) that constrains the percentage of each tenant’s data access requests failing to meet its required deadline below a given threshold, we build a mathematical model to derive the upper bound of acceptable request arrival rate on each server. We then propose a Deadline Guaranteed storage service (called DGCloud ) that incorporates three basic algorithms. Its deadline-aware load balancing scheme redirects requests and creates replicas to release the excess load of each server beyond the derived upper bound. Its workload consolidation algorithm tries to maximally reduce servers while still satisfying the SLO to maximize the resource utilization. Its data placement optimization algorithm re-schedules the data placement to minimize the transmission cost of data replication. We further propose three enhancement methods to further improve the performance of DGCloud . A dynamic load balancing method allows an overloaded server to quickly offload its excess workload. A data request queue improvement method sets different priorities to the data responses in a server’s queue so that more requests can satisfy the SLO requirement. A wakeup server selection method selects a sleeping server that stores more popular data to wake up, which allows it to handle more data requests. Our trace-driven experiments in simulation and Amazon EC2 show the superior performance of DGCloud compared with previous methods in terms of deadline guarantees and system resource utilization, and the effectiveness of its individual algorithms.

Proceedings ArticleDOI
09 Apr 2021
TL;DR: In this paper, a multi-loop control approach is proposed to allocate resources to VMs based on the service level agreement (SLA) requirements and the run-time conditions, which can meet applications' performance goals by assigning the resources required by cloud-based applications.
Abstract: In cloud computing model, resource sharing introduces major benefits for improving resource utilization and total cost of ownership, but it can create technical challenges on the running performance. In practice, orchestrators are required to allocate sufficient physical resources to each Virtual Machine (VM) to meet a set of predefined performance goals. To ensure a specific service level objective, the orchestrator needs to be equipped with a dynamic tool for assigning computing resources to each VM, based on the run-time state of the target environment. To this end, we present LOOPS, a multi-loop control approach, to allocate resources to VMs based on the service level agreement (SLA) requirements and the run-time conditions. LOOPS is mainly composed of one essential unit to monitor VMs, and three control levels to allocate resources to VMs based on requests from the essential node. A tailor-made controller is proposed with each level to regulate contention among collocated VMs, to reallocate resources if required, and to migrate VMs from one host to another. The three levels work together to meet the required SLA. The experimental results have shown that the proposed approach can meet applications' performance goals by assigning the resources required by cloud-based applications.

Proceedings ArticleDOI
09 Apr 2021
TL;DR: In this paper, the authors proposed Courier, a model that selects a batch size based on the type of machine learning job such that the response time adheres to the Service Level Objectives (SLO) specified, while also rendering the highest possible accuracy.
Abstract: Distributed machine learning has seen immense rise in popularity in recent years. Many companies and universities are utilizing computational clusters to train and run machine learning models. Unfortunately, operating such a cluster imposes large costs. It is therefore crucial to attain as high system utilization as possible. Moreover, those who offer computational clusters as a service, apart from keeping high utilization, also have to meet the required Service Level Agreements (SLAs) for the system response time. This becomes increasingly more complex in multitenant scenarios, where the time dedicated to each task has to be limited to achieve fairness. In this work, we analyze how different parameters of the machine learning job influence the response time as well as system utilization and propose Courier. Courier is a model that, based on the type of machine learning job, can select a batch size such that the response time adheres to the Service Level Objectives (SLOs) specified, while also rendering the highest possible accuracy. We gather the data by conducting real-world experiments on a BigDL cluster. Later on, we study the influence of the factors and build several predictive models which lead us to the proposed Courier model.

Journal ArticleDOI
28 Oct 2021
TL;DR: In this article, a deep neural network-based multi-label classification methodology is proposed to identify and predict multiple categories of SLO breaches associated with an application state, and the performance of the proposed methodology is evaluated with a set of machine learning classifiers.
Abstract: Recent advancements in the domain of Network Function Virtualization (NFV), and rollout of next-generation networks have necessitated the requirement for the upkeep of latency-critical application architectures in future networks and communications. While Cloud service providers recognize the evolving mission-critical requirements in latency sensitive verticals such as autonomous driving, multimedia, gaming, telecommunications, and virtual reality, there is a wide gap to bridge the Quality of Service (QoS) constraints for the end-user experience. Most latency-critical services are over-provisioned on all fronts to offer reliability, which is inefficient towards scalability in the long run. To address this, we propose a strategy to model frequent violations on the application level as a multi-output target to enable more complex decision-making in the management of virtualised communication networks. In this work, we utilize data from a real-world deployment to configure and draft a realistic set of Service Level Objectives (SLOs) for a voice based NFV application, and develop a deep neural network based multi-label classification methodology to identify and predict multiple categories of SLO breaches associated with an application state. With this, we aim to gain granular SLA and SLO violation insights, enabling us to study and mitigate their impact and inform precision in drafting proactive scaling policies. We further compare the performance against a set of multi-label compatible machine learning classifiers, and address class imbalance in a multi-label setup. We perform a comprehensive evaluation to assess the performance on example-based, label-based and ranking-based measures, and demonstrate the suitability of deep learning in such a use-case.

Proceedings Article
17 May 2021
TL;DR: In this paper, the authors proposed a Kubernetes scheduler extension and resource rescheduling that incorporates QoE metrics into SLO, and evaluated the architecture using the ITU P.1203 standard in the context of video streaming services co-located with other services.
Abstract: Cloud management has traditionally considered Service Level Objectives (SLO) based on QoS metrics. However, QoS-focused metrics have a limited effect on the Quality of Experience (QoE) experienced by the clients. This paper proposes a Kubernetes scheduler extension and resource rescheduling that incorporates QoE metrics into SLOs. As a proof of concept, this work evaluates the architecture using the QoE metric proposed in the ITU P.1203 standard, in the context of video streaming services co-located with other services. Experimental results show that our scheduler improves the average QoE by 50% compared to other schedulers, while resource rescheduling improved the average QoE by 135%. In addition, our architecture eliminated over-provisioning altogether.

Posted Content
TL;DR: In this paper, a real-time messaging system motivated by Internet-of-Things (IoT) applications is designed and implemented in a cloud environment, and a solution capable of realizing an effective compromise between load distribution decisions and rate limiting is presented.
Abstract: The cloud's flexibility and promise of seamless auto-scaling notwithstanding, its ability to meet service level objectives (SLOs) typically calls for some form of control in resource usage. This seemingly traditional problem gives rise to new challenges in a cloud setting, and in particular a subtle yet significant trade-off involving load-distribution decisions (the distribution of workload across available cloud resources to optimize performance), and rate limiting (the capping of individual workloads to prevent global over-commitment). This paper investigates that trade-off through the design and implementation of a real-time messaging system motivated by Internet-of-Things (IoT) applications, and demonstrates a solution capable of realizing an effective compromise. The paper's contributions are in both explicating the source of this trade-off, and in demonstrating a possible solution.

Posted Content
TL;DR: In this paper, the authors proposed and evaluated three dynamic placement strategies, two heuristic (greedy approximation based on set cover, and integer programming based optimization) and one learning-based algorithm, which satisfy the application constraints, minimize infrastructure deployment cost, while ensuring availability of services to all clients and User Equipment (UE) in the network coverage area.
Abstract: Edge computing hosts applications close to the end users and enables low-latency real-time applications. Modern applications inturn have adopted the microservices architecture which composes applications as loosely coupled smaller components, or services. This complements edge computing infrastructure that are often resource constrained and may not handle monolithic applications. Instead, edge servers can independently deploy application service components, although at the cost of communication overheads. Consistently meeting application service level objectives while also optimizing application deployment (placement and migration of services) cost and communication overheads in mobile edge cloud environment is non-trivial. In this paper we propose and evaluate three dynamic placement strategies, two heuristic (greedy approximation based on set cover, and integer programming based optimization) and one learning-based algorithm. Their goal is to satisfy the application constraints, minimize infrastructure deployment cost, while ensuring availability of services to all clients and User Equipment (UE) in the network coverage area. The algorithms can be extended to any network topology and microservice based edge computing applications. For the experiments, we use the drone swarm navigation as a representative application for edge computing use cases. Since access to real-world physical testbed for such application is difficult, we demonstrate the efficacy of our algorithms as a simulation. We also contrast these algorithms with respect to placement quality, utilization of clusters, and level of determinism. Our evaluation not only shows that the learning-based algorithm provides solutions of better quality; it also provides interesting conclusions regarding when the (more traditional) heuristic algorithms are actually better suited.