The PARSEC benchmark suite: characterization and architectural implications

doi:10.1145/1454115.1454128

Home
/
Papers
/
The PARSEC benchmark suite: characterization and architectural implications

Proceedings Article•DOI•

The PARSEC benchmark suite: characterization and architectural implications

Christian Bienia¹, Sanjeev Kumar², Jaswinder Pal Singh¹, Kai Li¹•Institutions (2)

Princeton University¹, Intel²

25 Oct 2008-pp 72-81

TL;DR: This paper presents and characterizes the Princeton Application Repository for Shared-Memory Computers (PARSEC), a benchmark suite for studies of Chip-Multiprocessors (CMPs), and shows that the benchmark suite covers a wide spectrum of working sets, locality, data sharing, synchronization and off-chip traffic.

read less

Abstract: This paper presents and characterizes the Princeton Application Repository for Shared-Memory Computers (PARSEC), a benchmark suite for studies of Chip-Multiprocessors (CMPs). Previous available benchmarks for multiprocessors have focused on high-performance computing applications and used a limited number of synchronization methods. PARSEC includes emerging applications in recognition, mining and synthesis (RMS) as well as systems applications which mimic large-scale multithreaded commercial programs. Our characterization shows that the benchmark suite covers a wide spectrum of working sets, locality, data sharing, synchronization and off-chip traffic. The benchmark suite has been made available to the public.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Rodinia: A benchmark suite for heterogeneous computing

[...]

Shuai Che¹, Michael Boyer¹, Jiayuan Meng¹, David Tarjan¹, Jeremy W. Sheaffer¹, Sang-Ha Lee¹, Kevin Skadron¹ - Show less +3 more•Institutions (1)

University of Virginia¹

04 Oct 2009

TL;DR: This characterization shows that the Rodinia benchmarks cover a wide range of parallel communication patterns, synchronization techniques and power consumption, and has led to some important architectural insight, such as the growing importance of memory-bandwidth limitations and the consequent importance of data layout.

...read moreread less

Abstract: This paper presents and characterizes Rodinia, a benchmark suite for heterogeneous computing. To help architects study emerging platforms such as GPUs (Graphics Processing Units), Rodinia includes applications and kernels which target multi-core CPU and GPU platforms. The choice of applications is inspired by Berkeley's dwarf taxonomy. Our characterization shows that the Rodinia benchmarks cover a wide range of parallel communication patterns, synchronization techniques and power consumption, and has led to some important architectural insight, such as the growing importance of memory-bandwidth limitations and the consequent importance of data layout.

...read moreread less

2,697 citations

Cites background or methods from "The PARSEC benchmark suite: charact..."

...Needleman-Wunsch uses 16 threads per block as discussed earlier, and Leukocyte uses different thread block sizes (128 and 256) for its two kernels because it operates on different working sets in the detection and tracking phases....
[...]
...• Fused CPU-GPU processors and other heterogeneous multiprocessor SoCs are likely to become common in PCs, servers and HPC environments....
[...]

Proceedings Article•DOI•

McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

[...]

Sheng Li¹, Jung Ho Ahn², Richard Strong³, Jay B. Brockman¹, Dean M. Tullsen³, Norman P. Jouppi⁴ - Show less +2 more•Institutions (4)

University of Notre Dame¹, Seoul National University², University of California, San Diego³, Hewlett-Packard⁴

12 Dec 2009

TL;DR: Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks at the 22nm technology node for both common in-order and out-of-order manycore designs shows that when die cost is not taken into account clustering 8 cores together gives the best energy-delay product, whereas when cost is taking into account configuring clusters with 4 cores gives thebest EDA2P and EDAP.

...read moreread less

Abstract: This paper introduces McPAT, an integrated power, area, and timing modeling framework that supports comprehensive design space exploration for multicore and manycore processor configurations ranging from 90nm to 22nm and beyond. At the microarchitectural level, McPAT includes models for the fundamental components of a chip multiprocessor, including in-order and out-of-order processor cores, networks-on-chip, shared caches, integrated memory controllers, and multiple-domain clocking. At the circuit and technology levels, McPAT supports critical-path timing modeling, area modeling, and dynamic, short-circuit, and leakage power modeling for each of the device types forecast in the ITRS roadmap including bulk CMOS, SOI, and double-gate transistors. McPAT has a flexible XML interface to facilitate its use with many performance simulators. Combined with a performance simulator, McPAT enables architects to consistently quantify the cost of new ideas and assess tradeoffs of different architectures using new metrics like energy-delay-area2 product (EDA2P) and energy-delay-area product (EDAP). This paper explores the interconnect options of future manycore processors by varying the degree of clustering over generations of process technologies. Clustering will bring interesting tradeoffs between area and performance because the interconnects needed to group cores into clusters incur area overhead, but many applications can make good use of them due to synergies of cache sharing. Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks at the 22nm technology node for both common in-order and out-of-order manycore designs shows that when die cost is not taken into account clustering 8 cores together gives the best energy-delay product, whereas when cost is taken into account configuring clusters with 4 cores gives the best EDA2P and EDAP.

...read moreread less

2,487 citations

Cites methods from "The PARSEC benchmark suite: charact..."

...For both the in-order and OOO cores running PARSEC benchmarks, if manycore die cost is not taken into account 8-core clusters provide the best EDP....
[...]
...We use all SPLASH-2 applications and 5 of the PARSEC applications (only canneal, streamcluster, blackscholes, fluidanimate, and swaptions currently run on our infrastructure.)...
[...]
...The same subset of PARSEC benchmark suite is used for the OOO simulations....
[...]
...At the 22nm technology node when running PARSEC benchmarks for manycores built from both in-order and outof-order cores, we found that when cost is not taken into account, clusters of 8 cores provide the best EDP, but when cost is included clusters of 4 cores provide the best EDA2P and EDAP....
[...]
...[5] C. Bienia, S. Kumar, J. P. Singh, and K. Li, “The PARSEC Benchmark Suite: Characterization and Architectural Implications,” in PACT, 2008....
[...]

Journal Article•DOI•

Roofline: an insightful visual performance model for multicore architectures

[...]

Samuel Williams¹, Andrew Waterman², David A. Patterson²•Institutions (2)

Lawrence Berkeley National Laboratory¹, University of California, Berkeley²

01 Apr 2009-Communications of The ACM

TL;DR: The Roofline model offers insight on how to improve the performance of software and hardware in the rapidly changing world of connected devices.

...read moreread less

Abstract: The Roofline model offers insight on how to improve the performance of software and hardware.

...read moreread less

2,181 citations

Proceedings Article•DOI•

DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning

[...]

Tianshi Chen¹, Zidong Du¹, Ninghui Sun¹, Jia Wang¹, Chengyong Wu¹, Yunji Chen¹, Olivier Temam² - Show less +3 more•Institutions (2)

Chinese Academy of Sciences¹, French Institute for Research in Computer Science and Automation²

24 Feb 2014

TL;DR: This study designs an accelerator for large-scale CNNs and DNNs, with a special emphasis on the impact of memory on accelerator design, performance and energy, and shows that it is possible to design an accelerator with a high throughput, capable of performing 452 GOP/s in a small footprint.

...read moreread less

Abstract: Machine-Learning tasks are becoming pervasive in a broad range of domains, and in a broad range of systems (from embedded systems to data centers). At the same time, a small set of machine-learning algorithms (especially Convolutional and Deep Neural Networks, i.e., CNNs and DNNs) are proving to be state-of-the-art across many applications. As architectures evolve towards heterogeneous multi-cores composed of a mix of cores and accelerators, a machine-learning accelerator can achieve the rare combination of efficiency (due to the small number of target algorithms) and broad application scope. Until now, most machine-learning accelerator designs have focused on efficiently implementing the computational part of the algorithms. However, recent state-of-the-art CNNs and DNNs are characterized by their large size. In this study, we design an accelerator for large-scale CNNs and DNNs, with a special emphasis on the impact of memory on accelerator design, performance and energy. We show that it is possible to design an accelerator with a high throughput, capable of performing 452 GOP/s (key NN operations such as synaptic weight multiplications and neurons outputs additions) in a small footprint of 3.02 mm2 and 485 mW; compared to a 128-bit 2GHz SIMD processor, the accelerator is 117.87x faster, and it can reduce the total energy by 21.08x. The accelerator characteristics are obtained after layout at 65 nm. Such a high throughput in a small footprint can open up the usage of state-of-the-art machine-learning algorithms in a broad set of systems and for a broad set of applications.

...read moreread less

1,582 citations

Cites background from "The PARSEC benchmark suite: charact..."

...This trend even starts to percolate in our community where it turns out that about half of the benchmarks of PARSEC [2], a suite partly introduced to highlight the emergence of new types of applications, can be implemented using machine-learning algorithms [4]....
[...]
...This trend even starts to percolate in our community where it turns out that about half of the benchmarks of PARSEC [2], a suite partly introduced to highlight the emergence of new types of applications, can be implemented using machine-learning algorithms [4]....
[...]

Journal Article•DOI•

Dark Silicon and the End of Multicore Scaling

[...]

Hadi Esmaeilzadeh¹, Emily Blem², R. St. Amant³, Karthikeyan Sankaralingam², Doug Burger⁴ - Show less +1 more•Institutions (4)

University of Washington¹, University of Wisconsin-Madison², University of Texas at Austin³, Microsoft⁴

01 May 2012-IEEE Micro

TL;DR: A comprehensive study that projects the speedup potential of future multicores and examines the underutilization of integration capacity-dark silicon-is timely and crucial.

...read moreread less

Abstract: A key question for the microprocessor research and design community is whether scaling multicores will provide the performance and value needed to scale down many more technology generations. To provide a quantitative answer to this question, a comprehensive study that projects the speedup potential of future multicores and examines the underutilization of integration capacity-dark silicon-is timely and crucial.

...read moreread less

1,556 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

The Pricing of Options and Corporate Liabilities

[...]

Fischer Black, Myron S. Scholes

01 May 1973-Journal of Political Economy

TL;DR: In this paper, a theoretical valuation formula for options is derived, based on the assumption that options are correctly priced in the market and it should not be possible to make sure profits by creating portfolios of long and short positions in options and their underlying stocks.

...read moreread less

Abstract: If options are correctly priced in the market, it should not be possible to make sure profits by creating portfolios of long and short positions in options and their underlying stocks. Using this principle, a theoretical valuation formula for options is derived. Since almost all corporate liabilities can be viewed as combinations of options, the formula and the analysis that led to it are also applicable to corporate liabilities such as common stock, corporate bonds, and warrants. In particular, the formula can be used to derive the discount that should be applied to a corporate bond because of the possibility of default.

...read moreread less

28,434 citations

"The PARSEC benchmark suite: charact..." refers methods in this paper

...The blackscholes benchmark was chosen to represent the wide field of analytic PDE solvers in general and their application in computational finance in particular....
[...]
...Because HJM models are non-Markovian the analytic approach of solving the PDE to price a derivative cannot be used....
[...]
...The workload was included in the benchmark suite because of the significance of PDEs and the wide use of Monte Carlo simulation....
[...]
...It calculates the prices for a portfolio of European options analytically with the Black-Scholes partial differential equation (PDE)[10] ∂V ∂t + 1 2 σ2S2 ∂2V ∂S2 + rS ∂V ∂S − rV = 0 where V is an option on the underlying S with volatility σ at time t if the constant interest rate is r....
[...]
...It calculates the prices for a portfolio of European options analytically with the Black-Scholes partial differential equation (PDE) [7]....
[...]

Book•

Computer Architecture: A Quantitative Approach

[...]

John L. Hennessy¹, David A. Patterson²•Institutions (2)

Stanford University¹, University of California, Berkeley²

01 Dec 1989

TL;DR: This best-selling title, considered for over a decade to be essential reading for every serious student and practitioner of computer design, has been updated throughout to address the most important trends facing computer designers today.

...read moreread less

Abstract: This best-selling title, considered for over a decade to be essential reading for every serious student and practitioner of computer design, has been updated throughout to address the most important trends facing computer designers today. In this edition, the authors bring their trademark method of quantitative analysis not only to high-performance desktop machine design, but also to the design of embedded and server systems. They have illustrated their principles with designs from all three of these domains, including examples from consumer electronics, multimedia and Web technologies, and high-performance computing.

...read moreread less

11,671 citations

"The PARSEC benchmark suite: charact..." refers background in this paper

...Program execution time is the only accurate way to measure performance[18]....
[...]

Journal Article•DOI•

Overview of the H.264/AVC video coding standard

[...]

Thomas Wiegand¹, Gary J. Sullivan², G. Bjontegaard, Ajay Luthra³•Institutions (3)

Heinrich Hertz Institute¹, Microsoft², Motorola³

01 Jul 2003-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: An overview of the technical features of H.264/AVC is provided, profiles and applications for the standard are described, and the history of the standardization process is outlined.

...read moreread less

Abstract: H.264/AVC is newest video coding standard of the ITU-T Video Coding Experts Group and the ISO/IEC Moving Picture Experts Group. The main goals of the H.264/AVC standardization effort have been enhanced compression performance and provision of a "network-friendly" video representation addressing "conversational" (video telephony) and "nonconversational" (storage, broadcast, or streaming) applications. H.264/AVC has achieved a significant improvement in rate-distortion efficiency relative to existing standards. This article provides an overview of the technical features of H.264/AVC, describes profiles and applications for the standard, and outlines the history of the standardization process.

...read moreread less

8,646 citations

"The PARSEC benchmark suite: charact..." refers background in this paper

...264 describes the lossy compression of a video stream [25] and is also part of ISO/IEC MPEG-4....
[...]

Journal Article•DOI•

Computer "Experiments" on Classical Fluids. I. Thermodynamical Properties of Lennard-Jones Molecules

[...]

Loup Verlet¹•Institutions (1)

Yeshiva University¹

05 Jul 1967-Physical Review

TL;DR: In this article, the equilibrium properties of a system of 864 particles interacting through a Lennard-Jones potential have been integrated for various values of the temperature and density, relative, generally, to a fluid state.

...read moreread less

Abstract: The equation of motion of a system of 864 particles interacting through a Lennard-Jones potential has been integrated for various values of the temperature and density, relative, generally, to a fluid state. The equilibrium properties have been calculated and are shown to agree very well with the corresponding properties of argon. It is concluded that, to a good approximation, the equilibrium state of argon can be described through a two-body potential.

...read moreread less

7,564 citations

"The PARSEC benchmark suite: charact..." refers methods in this paper

...fluidanimate uses a Verlet integrator[42] for these computations which is implemented in function AdvanceParticles....
[...]
...The workload uses Verlet integration[42] to update the position of the particles....
[...]

Book•

Options, Futures, and Other Derivatives

[...]

John Hull

01 Jan 1989

TL;DR: The Black-Scholes analysis of stock option prices was used in this paper to model the behavior of stock prices and the Yield Curve of stock options, as well as the Black's model for option pricing.

...read moreread less

Abstract: Contents: Introduction. Futures Markets and the Use of Futures for Hedging. Forward and Futures Prices. Interest Rate Futures. Swaps. Options Markets. Properties of Stock Option Prices. Trading Strategies Involving Options. Introduction to Binomial Trees. Model of the Behavior of Stock Prices. The Black-Scholes Analysis. Options on Stock Indices, Currencies, and Futures Contracts. General Approach to Pricing Derivatives. The Management of Market Risk. Numerical Procedures. Interest Rate Derivatives and the Use of Black's Model. Interest Rate Derivatives and Models of the Yield Curve. Exotic Options. Alternatives to Black-Scholes for Option Pricing. Credit Risk and Regulatory Capital. Review of Key Concepts.

...read moreread less

6,873 citations