Home
/
Authors
/
Rakesh Kumar

Author

Rakesh Kumar

University of Illinois at Urbana–Champaign

Other affiliations: Kurukshetra University, University of California, Los Angeles, University of California, San Diego ...read more

Bio: Rakesh Kumar is an academic researcher from University of Illinois at Urbana–Champaign. The author has contributed to research in topics: Multi-core processor & Error detection and correction. The author has an hindex of 36, co-authored 122 publications receiving 5508 citations. Previous affiliations of Rakesh Kumar include Kurukshetra University & University of California, Los Angeles.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
1977

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Single-ISA heterogeneous multi-core architectures: the potential for processor power reduction

[...]

Rakesh Kumar¹, Keith Farkas², Norman P. Jouppi², Parthasarathy Ranganathan², Dean M. Tullsen¹ - Show less +1 more•Institutions (2)

University of California, San Diego¹, Hewlett-Packard²

03 Dec 2003

TL;DR: This paper proposes and evaluates single-ISA heterogeneousmulti-core architectures as a mechanism to reduceprocessor power dissipation and results indicate a 39% average energy reduction while only sacrificing 3% in performance.

...read moreread less

Abstract: This paper proposes and evaluates single-ISA heterogeneous multi-core architectures as a mechanism to reduce processor power dissipation. Our design incorporates heterogeneous cores representing different points in the power/performance design space; during an application's execution, system software dynamically chooses the most appropriate core to meet specific performance and power requirements. Our evaluation of this architecture shows significant energy benefits. For an objective function that optimizes for energy efficiency with a tight performance threshold, for 14 SPEC benchmarks, our results indicate a 39% average energy reduction while only sacrificing 3% in performance. An objective function that optimizes for energy-delay with looser performance bounds achieves, on average, nearly a factor of three improvements in energy-delay product while sacrificing only 22% in performance. Energy savings are substantially more than chip-wide voltage/frequency scaling.

...read moreread less

809 citations

Journal Article•DOI•

Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance

[...]

Rakesh Kumar¹, Dean M. Tullsen¹, Parthasarathy Ranganathan², Norman P. Jouppi², Keith Farkas² - Show less +1 more•Institutions (2)

University of California, San Diego¹, Hewlett-Packard²

02 Mar 2004

TL;DR: This paper examines two single-ISA heterogeneous multi-core architectures in detail, demonstrating dynamic core assignment policies that provide significant performance gains over naive assignment, and even outperform the best static assignment.

...read moreread less

Abstract: A single-ISA heterogeneous multi-core architecture is achip multiprocessor composed of cores of varying size, performance,and complexity. This paper demonstrates that thisarchitecture can provide significantly higher performance inthe same area than a conventional chip multiprocessor. It doesso by matching the various jobs of a diverse workload to thevarious cores. This type of architecture covers a spectrum ofworkloads particularly well, providing high single-thread performancewhen thread parallelism is low, and high throughputwhen thread parallelism is high.This paper examines two such architectures in detail,demonstrating dynamic core assignment policies that providesignificant performance gains over naive assignment, andeven outperform the best static assignment. It examines policiesfor heterogeneous architectures both with and withoutmultithreading cores. One heterogeneous architecture we examineoutperforms the comparable-area homogeneous architectureby up to 63%, and our best core assignment strategyachieves up to 31% speedup over a naive policy.

...read moreread less

647 citations

Journal Article•DOI•

Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling

[...]

Rakesh Kumar¹, Victor Zyuban², Dean M. Tullsen¹•Institutions (2)

University of California, San Diego¹, IBM²

01 May 2005

TL;DR: Examination of the area, power, performance, and design issues for the on-chip interconnects on a chip multiprocessor shows that designs that treat interconnect as an entity that can be independently architected and optimized would not arrive at the best multi-core design.

...read moreread less

Abstract: This paper examines the area, power, performance, and design issues for the on-chip interconnects on a chip multiprocessor, attempting to present a comprehensive view of a class of interconnect architectures. It shows that the design choices for the interconnect have significant effect on the rest of the chip, potentially consuming a significant fraction of the real estate and power budget. This research shows that designs that treat interconnect as an entity that can be independently architected and optimized would not arrive at the best multi-core design. Several examples are presented showing the need for careful co-design. For instance, increasing interconnect bandwidth requires area that then constrains the number of cores or cache sizes, and does not necessarily increase performance. Also, shared level-2 caches become significantly less attractive when the overhead of the resulting crossbar is accounted for. A hierarchical bus structure is examined which negates some of the performance costs of the assumed base-line architecture.

...read moreread less

402 citations

Journal Article•DOI•

Heterogeneous chip multiprocessors

[...]

Rakesh Kumar¹, Dean M. Tullsen¹, Norman P. Jouppi², Parthasarathy Ranganathan²•Institutions (2)

University of California, San Diego¹, Hewlett-Packard²

01 Nov 2005-IEEE Computer

TL;DR: Heterogeneous (or asymmetric) chip multiprocessors present unique opportunities for improving system throughput, reducing processor power, and mitigating Amdahl's law.

...read moreread less

Abstract: Heterogeneous (or asymmetric) chip multiprocessors present unique opportunities for improving system throughput, reducing processor power, and mitigating Amdahl's law. On-chip heterogeneity allow the processor to better match execution resources to each application's needs and to address a much wider spectrum of system loads - from low to high thread parallelism - with high efficiency.

...read moreread less

368 citations

Proceedings Article•DOI•

Core architecture optimization for heterogeneous chip multiprocessors

[...]

Rakesh Kumar¹, Dean M. Tullsen¹, Norman P. Jouppi²•Institutions (2)

University of California, San Diego¹, Hewlett-Packard²

16 Sep 2006

TL;DR: This work assumes the flexibility to design a multi-core architecture from the ground up and seeks to address the following question: what should be the characteristics of the cores for a heterogeneous multi-processor for the highest area or power efficiency?

...read moreread less

Abstract: Previous studies have demonstrated the advantages of single-ISA heterogeneous multi-core architectures for power and performance. However, none of those studies examined how to design such a processor; instead, they started with an assumed combination of pre-existing cores. This work assumes the flexibility to design a multi-core architecture from the ground up and seeks to address the following question: what should be the characteristics of the cores for a heterogeneous multi-processor for the highest area or power efficiency? The study is done for varying degrees of thread-level parallelism and for different area and power budgets. The most efficient chip multiprocessors are shown to be heterogeneous, with each core customized to a different subset of application characteristics — no single core is necessarily well suited to all applications. The performance ordering of cores on such processors is different for different applications; there is only a partial ordering among cores in terms of resources and complexity. This methodology produces performance gains as high as 40%. The performance improvements come with the added cost of customization.

...read moreread less

244 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

Collapse

Cited by

PDF

Open Access

More filters

The PASCAL Visual Object Classes Challenge

[...]

Jianguo Zhang

01 Jan 2006

3,012 citations

Journal Article•DOI•

National Institute of Standards and Technology における超伝導研究及び生活

[...]

尚島影

01 Oct 2001-Ieej Transactions on Fundamentals and Materials

2,687 citations

Proceedings Article•DOI•

McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

[...]

Sheng Li¹, Jung Ho Ahn², Richard Strong³, Jay B. Brockman¹, Dean M. Tullsen³, Norman P. Jouppi⁴ - Show less +2 more•Institutions (4)

University of Notre Dame¹, Seoul National University², University of California, San Diego³, Hewlett-Packard⁴

12 Dec 2009

TL;DR: Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks at the 22nm technology node for both common in-order and out-of-order manycore designs shows that when die cost is not taken into account clustering 8 cores together gives the best energy-delay product, whereas when cost is taking into account configuring clusters with 4 cores gives thebest EDA2P and EDAP.

...read moreread less

Abstract: This paper introduces McPAT, an integrated power, area, and timing modeling framework that supports comprehensive design space exploration for multicore and manycore processor configurations ranging from 90nm to 22nm and beyond. At the microarchitectural level, McPAT includes models for the fundamental components of a chip multiprocessor, including in-order and out-of-order processor cores, networks-on-chip, shared caches, integrated memory controllers, and multiple-domain clocking. At the circuit and technology levels, McPAT supports critical-path timing modeling, area modeling, and dynamic, short-circuit, and leakage power modeling for each of the device types forecast in the ITRS roadmap including bulk CMOS, SOI, and double-gate transistors. McPAT has a flexible XML interface to facilitate its use with many performance simulators. Combined with a performance simulator, McPAT enables architects to consistently quantify the cost of new ideas and assess tradeoffs of different architectures using new metrics like energy-delay-area2 product (EDA2P) and energy-delay-area product (EDAP). This paper explores the interconnect options of future manycore processors by varying the degree of clustering over generations of process technologies. Clustering will bring interesting tradeoffs between area and performance because the interconnects needed to group cores into clusters incur area overhead, but many applications can make good use of them due to synergies of cache sharing. Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks at the 22nm technology node for both common in-order and out-of-order manycore designs shows that when die cost is not taken into account clustering 8 cores together gives the best energy-delay product, whereas when cost is taken into account configuring clusters with 4 cores gives the best EDA2P and EDAP.

...read moreread less

2,487 citations

The Landscape of Parallel Computing Research: A View from Berkeley

[...]

Krste Asanovic, Ras Bodik, Bryan Catanzaro, Joseph Gebis, Parry Husbands, Kurt Keutzer, David A. Patterson, William Plishker, John Shalf, Samuel Williams, Katherine Yelick - Show less +7 more

18 Dec 2006

TL;DR: The parallel landscape is frame with seven questions, and the following are recommended to explore the design space rapidly: • The overarching goal should be to make it easy to write programs that execute efficiently on highly parallel computing systems • The target should be 1000s of cores per chip, as these chips are built from processing elements that are the most efficient in MIPS (Million Instructions per Second) per watt, MIPS per area of silicon, and MIPS each development dollar.

...read moreread less

Abstract: Author(s): Asanovic, K; Bodik, R; Catanzaro, B; Gebis, J; Husbands, P; Keutzer, K; Patterson, D; Plishker, W; Shalf, J; Williams, SW | Abstract: The recent switch to parallel microprocessors is a milestone in the history of computing. Industry has laid out a roadmap for multicore designs that preserves the programming paradigm of the past via binary compatibility and cache coherence. Conventional wisdom is now to double the number of cores on a chip with each silicon generation. A multidisciplinary group of Berkeley researchers met nearly two years to discuss this change. Our view is that this evolutionary approach to parallel hardware and software may work from 2 or 8 processor systems, but is likely to face diminishing returns as 16 and 32 processor systems are realized, just as returns fell with greater instruction-level parallelism. We believe that much can be learned by examining the success of parallelism at the extremes of the computing spectrum, namely embedded computing and high performance computing. This led us to frame the parallel landscape with seven questions, and to recommend the following: • The overarching goal should be to make it easy to write programs that execute efficiently on highly parallel computing systems • The target should be 1000s of cores per chip, as these chips are built from processing elements that are the most efficient in MIPS (Million Instructions per Second) per watt, MIPS per area of silicon, and MIPS per development dollar. • Instead of traditional benchmarks, use 13 “Dwarfs” to design and evaluate parallel programming models and architectures. (A dwarf is an algorithmic method that captures a pattern of computation and communication.) • “Autotuners” should play a larger role than conventional compilers in translating parallel programs. • To maximize programmer productivity, future programming models must be more human-centric than the conventional focus on hardware or applications. • To be successful, programming models should be independent of the number of processors. • To maximize application efficiency, programming models should support a wide range of data types and successful models of parallelism: task-level parallelism, word-level parallelism, and bit-level parallelism. 1 The Landscape of Parallel Computing Research: A View From Berkeley • Architects should not include features that significantly affect performance or energy if programmers cannot accurately measure their impact via performance counters and energy counters. • Traditional operating systems will be deconstructed and operating system functionality will be orchestrated using libraries and virtual machines. • To explore the design space rapidly, use system emulators based on Field Programmable Gate Arrays (FPGAs) that are highly scalable and low cost. Since real world applications are naturally parallel and hardware is naturally parallel, what we need is a programming model, system software, and a supporting architecture that are naturally parallel. Researchers have the rare opportunity to re-invent these cornerstones of computing, provided they simplify the efficient programming of highly parallel systems.

...read moreread less

2,262 citations

Tools and Algorithms for the Construction and Analysis of Systems. Proc. TACAS 2009

[...]

Stefan Kowalewski, Anna Philippou

01 Jan 2009

TL;DR: This paper presents a meta-modelling framework for modeling and testing the robustness of the modeled systems and some of the techniques used in this framework have been developed and tested in the field.

...read moreread less

Abstract: ing WS1S Systems to Verify Parameterized Networks . . . . . . . . . . . . 188 Kai Baukus, Saddek Bensalem, Yassine Lakhnech and Karsten Stahl FMona: A Tool for Expressing Validation Techniques over Infinite State Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 J.-P. Bodeveix and M. Filali Transitive Closures of Regular Relations for Verifying Infinite-State Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 Bengt Jonsson and Marcus Nilsson Diagnostic and Test Generation Using Static Analysis to Improve Automatic Test Generation . . . . . . . . . . . . . 235 Marius Bozga, Jean-Claude Fernandez and Lucian Ghirvu Efficient Diagnostic Generation for Boolean Equation Systems . . . . . . . . . . . . 251 Radu Mateescu Efficient Model-Checking Compositional State Space Generation with Partial Order Reductions for Asynchronous Communicating Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 Jean-Pierre Krimm and Laurent Mounier Checking for CFFD-Preorder with Tester Processes . . . . . . . . . . . . . . . . . . . . . . . 283 Juhana Helovuo and Antti Valmari Fair Bisimulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 Thomas A. Henzinger and Sriram K. Rajamani Integrating Low Level Symmetries into Reachability Analysis . . . . . . . . . . . . . 315 Karsten Schmidt Model-Checking Tools Model Checking Support for the ASM High-Level Language . . . . . . . . . . . . . . 331 Giuseppe Del Castillo and Kirsten Winter Table of

...read moreread less

1,687 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse