Bio: Mor Harchol-Balter is an academic researcher from Carnegie Mellon University. The author has contributed to research in topics: Server & Scheduling (computing). The author has an hindex of 50, co-authored 192 publications receiving 10213 citations. Previous affiliations of Mor Harchol-Balter include University of California, Berkeley & Massachusetts Institute of Technology.
Papers published on a yearly basis
•01 Feb 2013
TL;DR: Tackling the questions that systems designers care about, this book brings queueing theory decisively back to computer science and helps readers acquire the skills needed to model, analyze, and design large-scale systems with good performance and low cost.
Abstract: Computer systems design is full of conundrums: Given a choice between a single machine with speed s, or n machines each with speed s/n, which should we choose? If both the arrival rate and service rate double, will the mean response time stay the same? Should systems really aim to balance load, or is this a convenient myth? If a scheduling policy favors one set of jobs, does it necessarily hurt some other jobs, or are these "conservation laws" being misinterpreted? Do greedy, shortest-delay, routing strategies make sense in a server farm, or is what's good for the individual disastrous for the system as a whole? How do high job size variability and heavy-tailed workloads affect the choice of a scheduling policy? How should one trade off energy and delay in designing a computer system? If 12 servers are needed to meet delay guarantees when the arrival rate is 9 jobs/sec, will we need 12,000 servers when the arrival rate is 9,000 jobs/sec? Tackling the questions that systems designers care about, this book brings queueing theory decisively back to computer science. The book is written with computer scientists and engineers in mind and is full of examples from computer systems, as well as manufacturing and operations research. Fun and readable, the book is highly approachable, even for undergraduates, while still being thoroughly rigorous and also covering a much wider span of topics than many queueing books. Readers benefit from a lively mix of motivation and intuition, with illustrations, examples, and more than 300 exercises - all while acquiring the skills needed to model, analyze, and design large-scale systems with good performance and low cost. The exercises are an important feature, teaching research-level counterintuitive lessons in the design of computer systems. The goal is to train readers not only to customize existing analyses but also to invent their own.
••15 Jun 2009
TL;DR: The analysis shows that the optimal power allocation is non-obvious and depends on many factors such as the power-to-frequency relationship in the processors, the arrival rate of jobs, the maximum server frequency, the lowest attainable server frequency and the server farm configuration.
Abstract: Server farms today consume more than 1.5% of the total electricity in the U.S. at a cost of nearly $4.5 billion. Given the rising cost of energy, many industries are now seeking solutions for how to best make use of their available power. An important question which arises in this context is how to distribute available power among servers in a server farm so as to get maximum performance.By giving more power to a server, one can get higher server frequency (speed). Hence it is commonly believed that, for a given power budget, performance can be maximized by operating servers at their highest power levels. However, it is also conceivable that one might prefer to run servers at their lowest power levels, which allows more servers to be turned on for a given power budget. To fully understand the effect of power allocation on performance in a server farm with a fixed power budget, we introduce a queueing theoretic model, which allows us to predict the optimal power allocation in a variety of scenarios. Results are verified via extensive experiments on an IBM BladeCenter.We find that the optimal power allocation varies for different scenarios. In particular, it is not always optimal to run servers at their maximum power levels. There are scenarios where it might be optimal to run servers at their lowest power levels or at some intermediate power levels. Our analysis shows that the optimal power allocation is non-obvious and depends on many factors such as the power-to-frequency relationship in the processors, the arrival rate of jobs, the maximum server frequency, the lowest attainable server frequency and the server farm configuration. Furthermore, our theoretical model allows us to explore more general settings than we can implement, including arbitrarily large server farms and different power-to-frequency curves. Importantly, we show that the optimal power allocation can significantly improve server farm performance, by a factor of typically 1.4 and as much as a factor of 5 in some cases.
••01 Apr 2010
TL;DR: It is shown that the implementation of least-attained-service thread prioritization reduces the time the cores spend stalling and significantly improves system throughput, and ATLAS's performance benefit increases as the number of cores increases.
Abstract: Modern chip multiprocessor (CMP) systems employ multiple memory controllers to control access to main memory. The scheduling algorithm employed by these memory controllers has a significant effect on system throughput, so choosing an efficient scheduling algorithm is important. The scheduling algorithm also needs to be scalable — as the number of cores increases, the number of memory controllers shared by the cores should also increase to provide sufficient bandwidth to feed the cores. Unfortunately, previous memory scheduling algorithms are inefficient with respect to system throughput and/or are designed for a single memory controller and do not scale well to multiple memory controllers, requiring significant finegrained coordination among controllers. This paper proposes ATLAS (Adaptive per-Thread Least-Attained-Service memory scheduling), a fundamentally new memory scheduling technique that improves system throughput without requiring significant coordination among memory controllers. The key idea is to periodically order threads based on the service they have attained from the memory controllers so far, and prioritize those threads that have attained the least service over others in each period. The idea of favoring threads with least-attained-service is borrowed from the queueing theory literature, where, in the context of a single-server queue it is known that least-attained-service optimally schedules jobs, assuming a Pareto (or any decreasing hazard rate) workload distribution. After verifying that our workloads have this characteristic, we show that our implementation of least-attained-service thread prioritization reduces the time the cores spend stalling and significantly improves system throughput. Furthermore, since the periods over which we accumulate the attained service are long, the controllers coordinate very infrequently to form the ordering of threads, thereby making ATLAS scalable to many controllers. We evaluate ATLAS on a wide variety of multiprogrammed SPEC 2006 workloads and systems with 4–32 cores and 1–16 memory controllers, and compare its performance to five previously proposed scheduling algorithms. Averaged over 32 workloads on a 24-core system with 4 controllers, ATLAS improves instruction throughput by 10.8%, and system throughput by 8.4%, compared to PAR-BS, the best previous CMP memory scheduling algorithm. ATLAS's performance benefit increases as the number of cores increases.
TL;DR: The measurements indicate that the distribution of lifetimes for a UNIX process is Pareto (heavy-tailed), with a consistent functional form over a variety of workloads, and it is shown how to apply this distribution to derive a preemptive migration policy that requires no hand-tuned parameters.
Abstract: We consider policies for CPU load balancing in networks of workstations. We address the question of whether preemptive migration (migrating active processes) is necessary, or whether remote execution (migrating processes only at the time of birth) is sufficient for load balancing. We show that resolving this issue is strongly tied to understanding the process lifetime distribution. Our measurements indicate that the distribution of lifetimes for a UNIX process is Pareto (heavy-tailed), with a consistent functional form over a variety of workloads. We show how to apply this distribution to derive a preemptive migration policy that requires no hand-tuned parameters. We used a trace-driven simulation to show that our preemptive migration strategy is far more effective than remote execution, even when the memory transfer cost is high.
••04 Dec 2010
TL;DR: This paper presents a new memory scheduling algorithm that addresses system throughput and fairness separately with the goal of achieving the best of both, and evaluates TCM on a wide variety of multiprogrammed workloads and compares its performance to four previously proposed scheduling algorithms, finding that TCM achieves both the best system throughputand fairness.
Abstract: In a modern chip-multiprocessor system, memory is a shared resource among multiple concurrently executing threads. The memory scheduling algorithm should resolve memory contention by arbitrating memory access in such a way that competing threads progress at a relatively fast and even pace, resulting in high system throughput and fairness. Previously proposed memory scheduling algorithms are predominantly optimized for only one of these objectives: no scheduling algorithm provides the best system throughput and best fairness at the same time. This paper presents a new memory scheduling algorithm that addresses system throughput and fairness separately with the goal of achieving the best of both. The main idea is to divide threads into two separate clusters and employ different memory request scheduling policies in each cluster. Our proposal, Thread Cluster Memory scheduling (TCM), dynamically groups threads with similar memory access behavior into either the latency-sensitive (memory-non-intensive) or the bandwidth-sensitive (memory-intensive) cluster. TCM introduces three major ideas for prioritization: 1) we prioritize the latency-sensitive cluster over the bandwidth-sensitive cluster to improve system throughput, 2) we introduce a ``niceness'' metric that captures a thread's propensity to interfere with other threads, 3) we use niceness to periodically shuffle the priority order of the threads in the bandwidth-sensitive cluster to provide fair access to each thread in a way that reduces inter-thread interference. On the one hand, prioritizing memory-non-intensive threads significantly improves system throughput without degrading fairness, because such ``light'' threads only use a small fraction of the total available memory bandwidth. On the other hand, shuffling the priority order of memory-intensive threads improves fairness because it ensures no thread is disproportionately slowed down or starved. We evaluate TCM on a wide variety of multiprogrammed workloads and compare its performance to four previously proposed scheduling algorithms, finding that TCM achieves both the best system throughput and fairness. Averaged over 96 workloads on a 24-core system with 4 memory channels, TCM improves system throughput and reduces maximum slowdown by 4.6%/38.6% compared to ATLAS (previous work providing the best system throughput) and 7.6%/4.6% compared to PAR-BS (previous work providing the best fairness).
••01 May 1975
TL;DR: The Fundamentals of Queueing Theory, Fourth Edition as discussed by the authors provides a comprehensive overview of simple and more advanced queuing models, with a self-contained presentation of key concepts and formulae.
Abstract: Praise for the Third Edition: "This is one of the best books available. Its excellent organizational structure allows quick reference to specific models and its clear presentation . . . solidifies the understanding of the concepts being presented."IIE Transactions on Operations EngineeringThoroughly revised and expanded to reflect the latest developments in the field, Fundamentals of Queueing Theory, Fourth Edition continues to present the basic statistical principles that are necessary to analyze the probabilistic nature of queues. Rather than presenting a narrow focus on the subject, this update illustrates the wide-reaching, fundamental concepts in queueing theory and its applications to diverse areas such as computer science, engineering, business, and operations research.This update takes a numerical approach to understanding and making probable estimations relating to queues, with a comprehensive outline of simple and more advanced queueing models. Newly featured topics of the Fourth Edition include:Retrial queuesApproximations for queueing networksNumerical inversion of transformsDetermining the appropriate number of servers to balance quality and cost of serviceEach chapter provides a self-contained presentation of key concepts and formulae, allowing readers to work with each section independently, while a summary table at the end of the book outlines the types of queues that have been discussed and their results. In addition, two new appendices have been added, discussing transforms and generating functions as well as the fundamentals of differential and difference equations. New examples are now included along with problems that incorporate QtsPlus software, which is freely available via the book's related Web site.With its accessible style and wealth of real-world examples, Fundamentals of Queueing Theory, Fourth Edition is an ideal book for courses on queueing theory at the upper-undergraduate and graduate levels. It is also a valuable resource for researchers and practitioners who analyze congestion in the fields of telecommunications, transportation, aviation, and management science.
••01 Aug 1999
TL;DR: It is found that the SPIN protocols can deliver 60% more data for a given amount of energy than conventional approaches, and that, in terms of dissemination rate and energy usage, the SPlN protocols perform close to the theoretical optimum.
Abstract: In this paper, we present a family of adaptive protocols, called SPIN (Sensor Protocols for Information via Negotiation), that efficiently disseminates information among sensors in an energy-constrained wireless sensor network. Nodes running a SPIN communication protocol name their data using high-level data descriptors, called meta-data. They use meta-data negotiations to eliminate the transmission of redundant data throughout the network. In addition, SPIN nodes can base their communication decisions both upon application-specific knowledge of the data and upon knowledge of the resources that are available to them. This allows the sensors to efficiently distribute data given a limited energy supply. We simulate and analyze the performance of two specific SPIN protocols, comparing them to other possible approaches and a theoretically optimal protocol. We find that the SPIN protocols can deliver 60% more data for a given amount of energy than conventional approaches. We also find that, in terms of dissemination rate and energy usage, the SPlN protocols perform close to the theoretical optimum.
TL;DR: An architectural framework and principles for energy-efficient Cloud computing are defined and the proposed energy-aware allocation heuristics provision data center resources to client applications in a way that improves energy efficiency of the data center, while delivering the negotiated Quality of Service (QoS).
Abstract: Cloud computing offers utility-oriented IT services to users worldwide. Based on a pay-as-you-go model, it enables hosting of pervasive applications from consumer, scientific, and business domains. However, data centers hosting Cloud applications consume huge amounts of electrical energy, contributing to high operational costs and carbon footprints to the environment. Therefore, we need Green Cloud computing solutions that can not only minimize operational costs but also reduce the environmental impact. In this paper, we define an architectural framework and principles for energy-efficient Cloud computing. Based on this architecture, we present our vision, open research challenges, and resource provisioning and allocation algorithms for energy-efficient management of Cloud computing environments. The proposed energy-aware allocation heuristics provision data center resources to client applications in a way that improves energy efficiency of the data center, while delivering the negotiated Quality of Service (QoS). In particular, in this paper we conduct a survey of research in energy-efficient computing and propose: (a) architectural principles for energy-efficient management of Clouds; (b) energy-efficient resource allocation policies and scheduling algorithms considering QoS expectations and power usage characteristics of the devices; and (c) a number of open research challenges, addressing which can bring substantial benefits to both resource providers and consumers. We have validated our approach by conducting a performance evaluation study using the CloudSim toolkit. The results demonstrate that Cloud computing model has immense potential as it offers significant cost savings and demonstrates high potential for the improvement of energy efficiency under dynamic workload scenarios.
•01 Jan 2008
TL;DR: The architecture of WSCs is described, the main factors influencing their design, operation, and cost structure, and the characteristics of their software base are described.
Abstract: As computation continues to move into the cloud, the computing platform of interest no longer resembles a pizza box or a refrigerator, but a warehouse full of computers. These new large datacenters are quite different from traditional hosting facilities of earlier times and cannot be viewed simply as a collection of co-located servers. Large portions of the hardware and software resources in these facilities must work in concert to efficiently deliver good levels of Internet service performance, something that can only be achieved by a holistic approach to their design and deployment. In other words, we must treat the datacenter itself as one massive warehouse-scale computer (WSC). We describe the architecture of WSCs, the main factors influencing their design, operation, and cost structure, and the characteristics of their software base. We hope it will be useful to architects and programmers of today's WSCs, as well as those of future many-core platforms which may one day implement the equivalent of today's WSCs on a single board. Table of Contents: Introduction / Workloads and Software Infrastructure / Hardware Building Blocks / Datacenter Basics / Energy and Power Efficiency / Modeling Costs / Dealing with Failures and Repairs / Closing Remarks