Home
/
Authors
/
Sergey Zhuravlev

Author

Sergey Zhuravlev

Bio: Sergey Zhuravlev is an academic researcher from Simon Fraser University. The author has contributed to research in topics: Scheduling (computing) & Multi-core processor. The author has an hindex of 9, co-authored 9 publications receiving 1536 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Addressing shared resource contention in multicore processors via scheduling

[...]

Sergey Zhuravlev¹, Sergey Blagodurov¹, Alexandra Fedorova¹•Institutions (1)

Simon Fraser University¹

13 Mar 2010

TL;DR: This study is the first to provide a comprehensive analysis of contention-mitigating techniques that use only scheduling, and finds a classification scheme that addresses not only contention for cache space, but contention for other shared resources, such as the memory controller, memory bus and prefetching hardware.

...read moreread less

Abstract: Contention for shared resources on multicore processors remains an unsolved problem in existing systems despite significant research efforts dedicated to this problem in the past. Previous solutions focused primarily on hardware techniques and software page coloring to mitigate this problem. Our goal is to investigate how and to what extent contention for shared resource can be mitigated via thread scheduling. Scheduling is an attractive tool, because it does not require extra hardware and is relatively easy to integrate into the system. Our study is the first to provide a comprehensive analysis of contention-mitigating techniques that use only scheduling. The most difficult part of the problem is to find a classification scheme for threads, which would determine how they affect each other when competing for shared resources. We provide a comprehensive analysis of such classification schemes using a newly proposed methodology that enables to evaluate these schemes separately from the scheduling algorithm itself and to compare them to the optimal. As a result of this analysis we discovered a classification scheme that addresses not only contention for cache space, but contention for other shared resources, such as the memory controller, memory bus and prefetching hardware. To show the applicability of our analysis we design a new scheduling algorithm, which we prototype at user level, and demonstrate that it performs within 2\% of the optimal. We also conclude that the highest impact of contention-aware scheduling techniques is not in improving performance of a workload as a whole but in improving quality of service or performance isolation for individual applications.

...read moreread less

532 citations

Proceedings Article•

A case for NUMA-aware contention management on multicore systems

[...]

Sergey Blagodurov¹, Sergey Zhuravlev¹, Mohammad Dashti¹, Alexandra Fedorova¹•Institutions (1)

Simon Fraser University¹

15 Jun 2011

TL;DR: The effects on performance imposed by resource contention and remote access latency are quantified and a new contention management algorithm is proposed and evaluated that significantly outperforms a NUMA-unaware algorithm proposed before as well as the default Linux scheduler.

...read moreread less

Abstract: On multicore systems, contention for shared resources occurs when memory-intensive threads are co-scheduled on cores that share parts of the memory hierarchy, such as last-level caches and memory controllers. Previous work investigated how contention could be addressed via scheduling. A contention-aware scheduler separates competing threads onto separate memory hierarchy domains to eliminate resource sharing and, as a consequence, to mitigate contention. However, all previous work on contention-aware scheduling assumed that the underlying system is UMA (uniform memory access latencies, single memory controller). Modern multicore systems, however, are NUMA, which means that they feature non-uniform memory access latencies and multiple memory controllers. We discovered that state-of-the-art contention management algorithms fail to be effective on NUMA systems and may even hurt performance relative to a default OS scheduler. In this paper we investigate the causes for this behavior and design the first contention-aware algorithm for NUMA systems.

...read moreread less

264 citations

Journal Article•DOI•

Survey of Energy-Cognizant Scheduling Techniques

[...]

Sergey Zhuravlev¹, Juan Carlos Saez², Sergey Blagodurov¹, Alexandra Fedorova¹, Manuel Prieto² - Show less +1 more•Institutions (2)

Simon Fraser University¹, Complutense University of Madrid²

01 Jul 2013-IEEE Transactions on Parallel and Distributed Systems

TL;DR: How the energy-cognizant scheduler's role has been extended beyond simple energy minimization to also include related issues like the avoidance of negative thermal effects as well as addressing asymmetric multicore architectures is explored.

...read moreread less

Abstract: Execution time is no longer the only metric by which computational systems are judged. In fact, explicitly sacrificing raw performance in exchange for energy savings is becoming a common trend in environments ranging from large server farms attempting to minimize cooling costs to mobile devices trying to prolong battery life. Hardware designers, well aware of these trends, include capabilities like DVFS (to throttle core frequency) into almost all modern systems. However, hardware capabilities on their own are insufficient and must be paired with other logic to decide if, when, and by how much to apply energy-minimizing techniques while still meeting performance goals. One obvious choice is to place this logic into the OS scheduler. This choice is particularly attractive due to the relative simplicity, low cost, and low risk associated with modifying only the scheduler part of the OS. Herein we survey the vast field of research on energy-cognizant schedulers. We discuss scheduling techniques to perform energy-efficient computation. We further explore how the energy-cognizant scheduler's role has been extended beyond simple energy minimization to also include related issues like the avoidance of negative thermal effects as well as addressing asymmetric multicore architectures.

...read moreread less

161 citations

Journal Article•DOI•

Survey of scheduling techniques for addressing shared resources in multicore processors

[...]

Sergey Zhuravlev¹, Juan Carlos Saez², Sergey Blagodurov¹, Alexandra Fedorova¹, Manuel Prieto² - Show less +1 more•Institutions (2)

Simon Fraser University¹, Complutense University of Madrid²

07 Dec 2012-ACM Computing Surveys

TL;DR: A multitude of new and exciting work is surveyed that explores the diverse new roles the OS scheduler can successfully take on, including those that exclusively make use of OS thread-level scheduling to achieve their goals.

...read moreread less

Abstract: Chip multicore processors (CMPs) have emerged as the dominant architecture choice for modern computing platforms and will most likely continue to be dominant well into the foreseeable future. As with any system, CMPs offer a unique set of challenges. Chief among them is the shared resource contention that results because CMP cores are not independent processors but rather share common resources among cores such as the last level cache (LLC). Shared resource contention can lead to severe and unpredictable performance impact on the threads running on the CMP. Conversely, CMPs offer tremendous opportunities for mulithreaded applications, which can take advantage of simultaneous thread execution as well as fast inter thread data sharing. Many solutions have been proposed to deal with the negative aspects of CMPs and take advantage of the positive. This survey focuses on the subset of these solutions that exclusively make use of OS thread-level scheduling to achieve their goals. These solutions are particularly attractive as they require no changes to hardware and minimal or no changes to the OS. The OS scheduler has expanded well beyond its original role of time-multiplexing threads on a single core into a complex and effective resource manager. This article surveys a multitude of new and exciting work that explores the diverse new roles the OS scheduler can successfully take on.

...read moreread less

161 citations

Journal Article•DOI•

Contention-Aware Scheduling on Multicore Systems

[...]

Sergey Blagodurov¹, Sergey Zhuravlev¹, Alexandra Fedorova¹•Institutions (1)

Simon Fraser University¹

01 Dec 2010-ACM Transactions on Computer Systems

...read moreread less

Abstract: Contention for shared resources on multicore processors remains an unsolved problem in existing systems despite significant research efforts dedicated to this problem in the past. Previous solutions focused primarily on hardware techniques and software page coloring to mitigate this problem. Our goal is to investigate how and to what extent contention for shared resource can be mitigated via thread scheduling. Scheduling is an attractive tool, because it does not require extra hardware and is relatively easy to integrate into the system. Our study is the first to provide a comprehensive analysis of contention-mitigating techniques that use only scheduling. The most difficult part of the problem is to find a classification scheme for threads, which would determine how they affect each other when competing for shared resources. We provide a comprehensive analysis of such classification schemes using a newly proposed methodology that enables to evaluate these schemes separately from the scheduling algorithm itself and to compare them to the optimal. As a result of this analysis we discovered a classification scheme that addresses not only contention for cache space, but contention for other shared resources, such as the memory controller, memory bus and prefetching hardware. To show the applicability of our analysis we design a new scheduling algorithm, which we prototype at user level, and demonstrate that it performs within 2p of the optimal. We also conclude that the highest impact of contention-aware scheduling techniques is not in improving performance of a workload as a whole but in improving quality of service or performance isolation for individual applications and in optimizing system energy consumption.

...read moreread less

158 citations

Cited by

PDF

Open Access

More filters

Book•

コンピュータ・サイエンス : ACM computing surveys

[...]

共立出版株式会社

01 Jan 1978

1,055 citations

Proceedings Article•DOI•

Ligra: a lightweight graph processing framework for shared memory

[...]

Julian Shun¹, Guy E. Blelloch¹•Institutions (1)

Carnegie Mellon University¹

23 Feb 2013

TL;DR: This paper presents a lightweight graph processing framework that is specific for shared-memory parallel/multicore machines, which makes graph traversal algorithms easy to write and significantly more efficient than previously reported results using graph frameworks on machines with many more cores.

...read moreread less

Abstract: There has been significant recent interest in parallel frameworks for processing graphs due to their applicability in studying social networks, the Web graph, networks in biology, and unstructured meshes in scientific simulation. Due to the desire to process large graphs, these systems have emphasized the ability to run on distributed memory machines. Today, however, a single multicore server can support more than a terabyte of memory, which can fit graphs with tens or even hundreds of billions of edges. Furthermore, for graph algorithms, shared-memory multicores are generally significantly more efficient on a per core, per dollar, and per joule basis than distributed memory systems, and shared-memory algorithms tend to be simpler than their distributed counterparts.In this paper, we present a lightweight graph processing framework that is specific for shared-memory parallel/multicore machines, which makes graph traversal algorithms easy to write. The framework has two very simple routines, one for mapping over edges and one for mapping over vertices. Our routines can be applied to any subset of the vertices, which makes the framework useful for many graph traversal algorithms that operate on subsets of the vertices. Based on recent ideas used in a very fast algorithm for breadth-first search (BFS), our routines automatically adapt to the density of vertex sets. We implement several algorithms in this framework, including BFS, graph radii estimation, graph connectivity, betweenness centrality, PageRank and single-source shortest paths. Our algorithms expressed using this framework are very simple and concise, and perform almost as well as highly optimized code. Furthermore, they get good speedups on a 40-core machine and are significantly more efficient than previously reported results using graph frameworks on machines with many more cores.

...read moreread less

816 citations

Proceedings Article•DOI•

Q-clouds: managing performance interference effects for QoS-aware clouds

[...]

Ripal Nathuji¹, Aman Kansal¹, Alireza Ghaffarkhah²•Institutions (2)

Microsoft¹, University of New Mexico²

13 Apr 2010

TL;DR: Q-Clouds, a QoS-aware control framework that tunes resource allocations to mitigate performance interference effects, is developed, which uses online feedback to build a multi-input multi-output (MIMO) model that captures performance interference interactions, and uses it to perform closed loop resource management.

...read moreread less

Abstract: Cloud computing offers users the ability to access large pools of computational and storage resources on demand. Multiple commercial clouds already allow businesses to replace, or supplement, privately owned IT assets, alleviating them from the burden of managing and maintaining these facilities. However, there are issues that must be addressed before this vision of utility computing can be fully realized. In existing systems, customers are charged based upon the amount of resources used or reserved, but no guarantees are made regarding the application level performance or quality-of-service (QoS) that the given resources will provide. As cloud providers continue to utilize virtualization technologies in their systems, this can become problematic. In particular, the consolidation of multiple customer applications onto multicore servers introduces performance interference between collocated workloads, significantly impacting application QoS. To address this challenge, we advocate that the cloud should transparently provision additional resources as necessary to achieve the performance that customers would have realized if they were running in isolation. Accordingly, we have developed Q-Clouds, a QoS-aware control framework that tunes resource allocations to mitigate performance interference effects. Q-Clouds uses online feedback to build a multi-input multi-output (MIMO) model that captures performance interference interactions, and uses it to perform closed loop resource management. In addition, we utilize this functionality to allow applications to specify multiple levels of QoS as application Q-states. For such applications, Q-Clouds dynamically provisions underutilized resources to enable elevated QoS levels, thereby improving system efficiency. Experimental evaluations of our solution using benchmark applications illustrate the benefits: performance interference is mitigated completely when feasible, and system utilization is improved by up to 35% using Q-states.

...read moreread less

614 citations

Proceedings Article•DOI•

Bubble-Up: increasing utilization in modern warehouse scale computers via sensible co-locations

[...]

Jason Mars¹, Lingjia Tang¹, Robert Hundt², Kevin Skadron¹, Mary Lou Soffa¹ - Show less +1 more•Institutions (2)

University of Virginia¹, Google²

03 Dec 2011

TL;DR: Bubble-Up is presented, a characterization methodology that enables the accurate prediction of the performance degradation that results from contention for shared resources in the memory subsystem and can predict the performance interference between co-locate applications with an accuracy within 1% to 2% of the actual performance degradation.

...read moreread less

Abstract: As much of the world's computing continues to move into the cloud, the overprovisioning of computing resources to ensure the performance isolation of latency-sensitive tasks, such as web search, in modern datacenters is a major contributor to low machine utilization. Being unable to accurately predict performance degradation due to contention for shared resources on multicore systems has led to the heavy handed approach of simply disallowing the co-location of high-priority, latency-sensitive tasks with other tasks. Performing this precise prediction has been a challenging and unsolved problem. In this paper, we present Bubble-Up, a characterization methodology that enables the accurate prediction of the performance degradation that results from contention for shared resources in the memory subsystem. By using a bubble to apply a tunable amount of “pressure” to the memory subsystem on processors in production datacenters, our methodology can predict the performance interference between co-locate applications with an accuracy within 1% to 2% of the actual performance degradation. Using this methodology to arrive at “sensible” co-locations in Google's production datacenters with real-world large-scale applications, we can improve the utilization of a 500-machine cluster by 50% to 90% while guaranteeing a high quality of service of latency-sensitive applications.

...read moreread less

596 citations

Proceedings Article•DOI•

Heracles: improving resource efficiency at scale

[...]

David Lo¹, Liqun Cheng², Rama K. Govindaraju², Parthasarathy Ranganathan², Christos Kozyrakis¹ - Show less +1 more•Institutions (2)

Stanford University¹, Google²

13 Jun 2015

TL;DR: Heracles is presented, a feedback-based controller that enables the safe colocation of best-effort tasks alongside a latency-critical service and dynamically manages multiple hardware and software isolation mechanisms to ensure that the latency-sensitive job meets latency targets while maximizing the resources given to best- Effort tasks.

...read moreread less

Abstract: User-facing, latency-sensitive services, such as websearch, underutilize their computing resources during daily periods of low traffic. Reusing those resources for other tasks is rarely done in production services since the contention for shared resources can cause latency spikes that violate the service-level objectives of latency-sensitive tasks. The resulting under-utilization hurts both the affordability and energy-efficiency of large-scale datacenters. With technology scaling slowing down, it becomes important to address this opportunity. We present Heracles, a feedback-based controller that enables the safe colocation of best-effort tasks alongside a latency-critical service. Heracles dynamically manages multiple hardware and software isolation mechanisms, such as CPU, memory, and network isolation, to ensure that the latency-sensitive job meets latency targets while maximizing the resources given to best-effort tasks. We evaluate Heracles using production latency-critical and batch workloads from Google and demonstrate average server utilizations of 90% without latency violations across all the load and colocation scenarios that we evaluated.

...read moreread less

464 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse