scispace - formally typeset
Search or ask a question

Showing papers in "ACM transactions on modeling and performance evaluation of computing systems in 2022"


Journal ArticleDOI
TL;DR: This paper focuses on optimal policies for “Network Friendly Recommendations” (NFR), and uses a Markov Decision Process (MDP) framework that offers significant advantages, compared to existing works, in terms of both modeling flexibility as well as computational efficiency.
Abstract: Controlling the network cost by delivering popular content to users, as well as improving streaming quality and overall user experience, have been key goals for content providers (CP) in recent years. While proposals to improve performance, through caching or other mechanisms (DASH, multicasting, etc.) abound, recent works have proposed to turn the problem on its head and complement such efforts. Instead of trying to reduce the cost to deliver every possible content to a user, a potentially very expensive endeavour, one could leverage omnipresent recommendations systems to nudge users towards the content of low(er) network cost, regardless of where this cost is coming from. In this paper, we focus on this latter problem, namely optimal policies for “Network Friendly Recommendations” (NFR). A key contribution is the use of a Markov Decision Process (MDP) framework that offers significant advantages, compared to existing works, in terms of both modeling flexibility as well as computational efficiency. Specifically we show that this framework subsumes some state-of-the-art approaches, and can also optimally tackle additional, more sophisticated setups. We validate our findings with real traces that suggest up to almost 2X in cost performance, and 10X in computational speed-up compared to recent state-of-the-art works.

2 citations


Journal ArticleDOI
TL;DR: The proposed DVFS-based power management techniques are particularly effective for a class of memory-intensive benchmarks – they improve EE from 121% to 183% and PxEE from 100% to 141%.
Abstract: This paper describes the results of our measurement-based study, conducted on an Intel Core i7 processor running the SPEC CPU2017 benchmark suites, that evaluates the impact of dynamic voltage frequency scaling (DVFS) on performance (P), energy efficiency (EE), and their product (PxEE). The results indicate that the default DVFS-based power management techniques heavily favor performance, resulting in poor energy efficiency. To remedy this problem, we introduce, implement, and evaluate four DVFS-based power management techniques driven by the following metrics derived from the processor's performance monitoring unit: (i) the total pipeline slot stall ratio (FS-PS), (ii) the total cycle stall ratio (FS-TS), (iii) the total memory-related cycle stall ratio (FS-MS), and (iv) the number of last level cache misses per kilo instructions (FS-LLCM). The proposed techniques linearly map these metrics onto the available processor clock frequencies. The experimental evaluation results show that the proposed techniques significantly improve EE and PxEE metrics compared to the existing approaches. Specifically, EE improves from 44% to 92%, and PxEE improves from 31% to 48% when all the benchmarks are considered together. Furthermore, we find that the proposed techniques are particularly effective for a class of memory-intensive benchmarks – they improve EE from 121% to 183% and PxEE from 100% to 141%. Finally, we elucidate the advantages and disadvantages of each of the proposed techniques and offer recommendations on using them.

2 citations


Journal ArticleDOI
TL;DR: In this paper , the authors propose a Configuration Health Index (CHI) framework specifically attuned to the performance attribute to capture the influence of CVs on the performance aspects of the system.
Abstract: Most IT systems depend on a set of configuration variables (CVs) , expressed as a name/value pair that collectively defines the resource allocation for the system. While the ill effects of misconfiguration or improper resource allocation are well-known, there are no effective a priori metrics to quantify the impact of the configuration on the desired system attributes such as performance, availability, etc. In this paper, we propose a Configuration Health Index (CHI) framework specifically attuned to the performance attribute to capture the influence of CVs on the performance aspects of the system. We show how CHI , which is defined as a configuration scoring system, can take advantage of the domain knowledge and the available (but rather limited) performance data to produce important insights into the configuration settings. We compare the CHI with both well-advertised segmented non-linear models and state-of-the-art data-driven models, and show that the CHI not only consistently provides better results but also avoids the dangers of a pure data drive approach which may predict incorrect behavior or eliminate some essential configuration variables from consideration.

1 citations


Journal ArticleDOI
TL;DR: A Spatial Power of two (sPOT) policy in which each job is allocated to the least loaded server among its two geographically nearest servers is designed and experimental results suggest the efficacy of sPOT with respect to expected implementation cost.
Abstract: Distributed load balancing is the act of allocating jobs among a set of servers as evenly as possible. The static interpretation of distributed load balancing leads to formulating the load-balancing problem as a classical balls-and-bins problem with jobs (balls) never leaving the system and accumulating at the servers (bins). While most of the previous work in the static setting focus on studying the maximum number of jobs allocated to a server or maximum load, little importance has been given to the implementation cost, or the cost of moving a job/data to/from its allocated server, for such policies. This article designs and evaluates server proximity aware static load-balancing policies with a goal to reduce the implementation cost. We consider a class of proximity aware Power of Two (POT) choice-based assignment policies for allocating jobs to servers, where both jobs and servers are located on a two-dimensional Euclidean plane. In this framework, we investigate the tradeoff between the implementation cost and load-balancing performance of different allocation policies. To this end, we first design and evaluate a Spatial Power of two (sPOT) policy in which each job is allocated to the least loaded server among its two geographically nearest servers. We provide expressions for the lower bound on the asymptotic expected maximum load on the servers and prove that sPOT does not achieve classical POT load-balancing benefits. However, experimental results suggest the efficacy of sPOT with respect to expected implementation cost. We also propose two non-uniform server sampling-based POT policies that achieve the best of both implementation cost and load-balancing performance. We then extend our analysis to the case where servers are interconnected as an n-vertex graph G(S, E). We assume each job arrives at one of the servers, u, chosen uniformly at random from the vertex set S. We then assign each job to the server with minimum load among servers u and v where v is chosen according to one of the following two policies: (i) Unif-POT(k): Sample a server v uniformly at random from k-hop neighborhood of u; (ii) InvSq-POT(k): Sample a server v from k-hop neighborhood of u with probability proportional to the inverse square of the distance between u and v. An extensive simulation over a wide range of topologies validates the efficacy of both the policies. Our simulation results show that both policies consistently produce a load distribution that is much similar to that of a classical POT. Depending on topology, we observe the total variation distance to be of the order of 0.002–0.08 for both the policies while achieving a 8%–99% decrease in implementation cost as compared to the classical POT.

1 citations


Journal ArticleDOI
TL;DR: In this paper , the authors describe a focused model that includes the important components (the focus) and aggregates the rest in groups, called dependency groups, which can be used to indicate trustable sensitivity results.
Abstract: Performance models of server systems, based on layered queues, may be very complex. This is particularly true for cloud-based systems based on microservices, which may have hundreds of distinct components, and for models derived by automated data analysis. Often only a few of these many components determine the system performance, and a smaller simplified model is all that is needed. To assist an analyst, this work describes a focused model that includes the important components (the focus) and aggregates the rest in groups, called dependency groups. The method Focus-based Simplification with Preservation of Tasks described here fills an important gap in a previous method by the same authors. The use of focused models for sensitivity predictions is evaluated empirically in the article on a large set of randomly generated models. It is found that the accuracy depends on a “saturation ratio” (SR) between the highest utilization value in the model and the highest value of a component excluded from the focus; evidence suggests that SR must be at least 2 and must be larger to evaluate larger model changes. This dependency was captured in an “Accurate Sensitivity Hypothesis” based on SR, which can be used to indicate trustable sensitivity results.

Journal ArticleDOI
TL;DR: In this article , the authors formulate the coordination problem for maximizing PPL utility under available resources, capacity, and demand constraints, and develop an exact pseudo-polynomial time dynamic programming algorithm for each scenario with a proven guarantee to produce an optimal coordination schedule.
Abstract: Many mission- and time-critical cyber-physical systems deploy an isolated power system for their power supply. Under extreme conditions, the power system must process critical missions by maximizing the Pulsed Power Load (PPL) utility while maintaining the normal loads in the cyber-physical system. Optimal operation requires careful coordination of PPL deployment and power supply processes. In this work, we formulate the coordination problem for maximizing PPL utility under available resources, capacity, and demand constraints. The coordination problem has two scenarios for different use cases, fixed and general normal loads. We develop an exact pseudo-polynomial time dynamic programming algorithm for each scenario with a proven guarantee to produce an optimal coordination schedule. The performance of the algorithms is also experimentally evaluated, and the results agree with our theoretical analysis, showing the practicality of the solutions.

Journal ArticleDOI
TL;DR: In this article , a prisoner's dilemma game is used to model zero-rating between Internet service providers and content providers to understand the conditions under which offering zero-rate is preferred, and who gains in utility.
Abstract: An objective of network neutrality is to design regulations for the Internet and ensure that it remains a public, open platform where innovations can thrive. While there is broad agreement that preserving the content quality of service falls under the purview of net neutrality, the role of differential pricing, especially the practice of zero-rating, remains controversial. Zero-rating refers to the practice of providing free Internet access to some users under certain conditions, which usually concurs with differentiation among users or content providers. Even though some countries (India, Canada) have banned zero-rating, others have either taken no stance or explicitly allowed it (South Africa, Kenya, U.S.). In this article, we model zero-rating between Internet service providers and content providers (CPs) to better understand the conditions under which offering zero-rating is preferred, and who gains in utility. We develop a formulation in which providers’ incomes vary, from low-income startups to high-income incumbents, where their decisions to zero-rate are a variation of the traditional prisoner’s dilemma game. We find that if zero-rating is permitted, low-income CPs often lose utility, whereas high-income CPs often gain utility. We also study the competitiveness of the CP markets via the Herfindahl Index. Our findings suggest that in most cases the introduction of zero-rating reduces competitiveness.

Journal ArticleDOI
TL;DR: In this paper , a model-based reinforcement learning (RL) algorithm is proposed to learn the optimal control policy for queueing networks so that the average job delay (or equivalently the average queue backlog) is minimized.
Abstract: With the rapid advance of information technology, network systems have become increasingly complex and hence the underlying system dynamics are often unknown or difficult to characterize. Finding a good network control policy is of significant importance to achieve desirable network performance (e.g., high throughput or low delay). In this work, we consider using model-based reinforcement learning (RL) to learn the optimal control policy for queueing networks so that the average job delay (or equivalently the average queue backlog) is minimized. Traditional approaches in RL, however, cannot handle the unbounded state spaces of the network control problem. To overcome this difficulty, we propose a new algorithm, called RL for Queueing Networks (RL-QN), which applies model-based RL methods over a finite subset of the state space while applying a known stabilizing policy for the rest of the states. We establish that the average queue backlog under RL-QN with an appropriately constructed subset can be arbitrarily close to the optimal result. We evaluate RL-QN in dynamic server allocation, routing, and switching problems. Simulation results show that RL-QN minimizes the average queue backlog effectively.

Journal ArticleDOI
TL;DR: This work proposes a new method to compute an upper bound on hit probability for all non-anticipative caching policies, for policies that have no knowledge of future requests, and finds it to be tighter than state-of-the-art upper bounds for some specific object request arrival processes such as independent renewal, Markov modulated and shot noise processes.
Abstract: Caching systems have long been crucial for improving the performance of a wide variety of network and web-based online applications. In such systems, end-to-end application performance heavily depends on the fraction of objects transferred from the cache, also known as the cache hit probability. Many caching policies have been proposed and implemented to improve the hit probability. In this work, we propose a new method to compute an upper bound on hit probability for all non-anticipative caching policies and for policies that have no knowledge of future requests. Our key insight is to order the objects according to the ratio of their Hazard Rate(HR) function values to their sizes, and place in the cache the objects with the largest ratios till the cache capacity is exhausted. When object request processes are conditionally independent, we prove that this cache allocation based on the HR-to-size ratio rule guarantees the maximum achievable expected number of object hits across all non-anticipative caching policies. Further, the HR ordering rule serves as an upper bound on cache hit probability when object request processes follow either independent delayed renewal process or a Markov modulated Poisson process. We also derive closed form expressions for the upper bound under some specific object request arrival processes. We provide simulation results to validate its correctness and to compare it to the state-of-the-art upper bounds, such as produced by Bélády’s algorithm. We find it to be tighter than state-of-the-art upper bounds for some specific object request arrival processes such as independent renewal, Markov modulated, and shot noise processes.

Journal ArticleDOI
TL;DR: In Section 4.3 (Analysis), the last few lines of the proof of Claim 3 should be replaced with the following as mentioned in this paper , where the last line of claim 3 is replaced by the following
Abstract: In Section 4.3 (Analysis), the last few lines of the proof of Claim 3 should be replaced with the following

Journal ArticleDOI
TL;DR: Experimental results demonstrate that PathTracer provides information on the nature of the application and response time modeling which can reach high accuracy when performed post-execution, leading to prediction errors with average and standard deviation under 5% and 3% respectively.
Abstract: In embedded and cyber-physical systems, the design of a desired functionality under constraints increasingly requires parallel execution of a set of tasks on a heterogeneous architecture. The nature of such parallel systems complicates the process of understanding and predicting performance in terms of response time. Indeed, response time depends on many factors related to both the functionality and the target architecture. State-of-the-art strategies derive response time by examining the operations required by each task for both processing and accessing shared resources. This procedure is often followed by the addition or elimination of potential interference due to task concurrency. However, such approaches require an advanced knowledge of the software and hardware details, rarely available in practice. This work presents an alternative “top-down” strategy, called PathTracer, aimed at understanding software response time and extending the cases in which it can be analyzed and estimated. PathTracer leverages on dataflow-based application representation and response time estimation of signal processing applications mapped on heterogeneous Multiprocessor Systems-on-a-Chip (MPSoCs). Experimental results demonstrate that PathTracer provides (i) information on the nature of the application (work-dominated, span-dominated, or balanced parallel), and (ii) response time modeling which can reach high accuracy when performed post-execution, leading to prediction errors with average and standard deviation under 5% and 3% respectively.