scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

The PARSEC benchmark suite: characterization and architectural implications

25 Oct 2008-pp 72-81
TL;DR: This paper presents and characterizes the Princeton Application Repository for Shared-Memory Computers (PARSEC), a benchmark suite for studies of Chip-Multiprocessors (CMPs), and shows that the benchmark suite covers a wide spectrum of working sets, locality, data sharing, synchronization and off-chip traffic.
Abstract: This paper presents and characterizes the Princeton Application Repository for Shared-Memory Computers (PARSEC), a benchmark suite for studies of Chip-Multiprocessors (CMPs). Previous available benchmarks for multiprocessors have focused on high-performance computing applications and used a limited number of synchronization methods. PARSEC includes emerging applications in recognition, mining and synthesis (RMS) as well as systems applications which mimic large-scale multithreaded commercial programs. Our characterization shows that the benchmark suite covers a wide spectrum of working sets, locality, data sharing, synchronization and off-chip traffic. The benchmark suite has been made available to the public.

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI
04 Oct 2009
TL;DR: This characterization shows that the Rodinia benchmarks cover a wide range of parallel communication patterns, synchronization techniques and power consumption, and has led to some important architectural insight, such as the growing importance of memory-bandwidth limitations and the consequent importance of data layout.
Abstract: This paper presents and characterizes Rodinia, a benchmark suite for heterogeneous computing. To help architects study emerging platforms such as GPUs (Graphics Processing Units), Rodinia includes applications and kernels which target multi-core CPU and GPU platforms. The choice of applications is inspired by Berkeley's dwarf taxonomy. Our characterization shows that the Rodinia benchmarks cover a wide range of parallel communication patterns, synchronization techniques and power consumption, and has led to some important architectural insight, such as the growing importance of memory-bandwidth limitations and the consequent importance of data layout.

2,697 citations


Cites background or methods from "The PARSEC benchmark suite: charact..."

  • ...Needleman-Wunsch uses 16 threads per block as discussed earlier, and Leukocyte uses different thread block sizes (128 and 256) for its two kernels because it operates on different working sets in the detection and tracking phases....

    [...]

  • ...• Fused CPU-GPU processors and other heterogeneous multiprocessor SoCs are likely to become common in PCs, servers and HPC environments....

    [...]

Proceedings ArticleDOI
12 Dec 2009
TL;DR: Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks at the 22nm technology node for both common in-order and out-of-order manycore designs shows that when die cost is not taken into account clustering 8 cores together gives the best energy-delay product, whereas when cost is taking into account configuring clusters with 4 cores gives thebest EDA2P and EDAP.
Abstract: This paper introduces McPAT, an integrated power, area, and timing modeling framework that supports comprehensive design space exploration for multicore and manycore processor configurations ranging from 90nm to 22nm and beyond. At the microarchitectural level, McPAT includes models for the fundamental components of a chip multiprocessor, including in-order and out-of-order processor cores, networks-on-chip, shared caches, integrated memory controllers, and multiple-domain clocking. At the circuit and technology levels, McPAT supports critical-path timing modeling, area modeling, and dynamic, short-circuit, and leakage power modeling for each of the device types forecast in the ITRS roadmap including bulk CMOS, SOI, and double-gate transistors. McPAT has a flexible XML interface to facilitate its use with many performance simulators. Combined with a performance simulator, McPAT enables architects to consistently quantify the cost of new ideas and assess tradeoffs of different architectures using new metrics like energy-delay-area2 product (EDA2P) and energy-delay-area product (EDAP). This paper explores the interconnect options of future manycore processors by varying the degree of clustering over generations of process technologies. Clustering will bring interesting tradeoffs between area and performance because the interconnects needed to group cores into clusters incur area overhead, but many applications can make good use of them due to synergies of cache sharing. Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks at the 22nm technology node for both common in-order and out-of-order manycore designs shows that when die cost is not taken into account clustering 8 cores together gives the best energy-delay product, whereas when cost is taken into account configuring clusters with 4 cores gives the best EDA2P and EDAP.

2,487 citations


Cites methods from "The PARSEC benchmark suite: charact..."

  • ...For both the in-order and OOO cores running PARSEC benchmarks, if manycore die cost is not taken into account 8-core clusters provide the best EDP....

    [...]

  • ...We use all SPLASH-2 applications and 5 of the PARSEC applications (only canneal, streamcluster, blackscholes, fluidanimate, and swaptions currently run on our infrastructure.)...

    [...]

  • ...The same subset of PARSEC benchmark suite is used for the OOO simulations....

    [...]

  • ...At the 22nm technology node when running PARSEC benchmarks for manycores built from both in-order and outof-order cores, we found that when cost is not taken into account, clusters of 8 cores provide the best EDP, but when cost is included clusters of 4 cores provide the best EDA2P and EDAP....

    [...]

  • ...[5] C. Bienia, S. Kumar, J. P. Singh, and K. Li, “The PARSEC Benchmark Suite: Characterization and Architectural Implications,” in PACT, 2008....

    [...]

Journal ArticleDOI
TL;DR: The Roofline model offers insight on how to improve the performance of software and hardware in the rapidly changing world of connected devices.
Abstract: The Roofline model offers insight on how to improve the performance of software and hardware.

2,181 citations

Proceedings ArticleDOI
24 Feb 2014
TL;DR: This study designs an accelerator for large-scale CNNs and DNNs, with a special emphasis on the impact of memory on accelerator design, performance and energy, and shows that it is possible to design an accelerator with a high throughput, capable of performing 452 GOP/s in a small footprint.
Abstract: Machine-Learning tasks are becoming pervasive in a broad range of domains, and in a broad range of systems (from embedded systems to data centers). At the same time, a small set of machine-learning algorithms (especially Convolutional and Deep Neural Networks, i.e., CNNs and DNNs) are proving to be state-of-the-art across many applications. As architectures evolve towards heterogeneous multi-cores composed of a mix of cores and accelerators, a machine-learning accelerator can achieve the rare combination of efficiency (due to the small number of target algorithms) and broad application scope. Until now, most machine-learning accelerator designs have focused on efficiently implementing the computational part of the algorithms. However, recent state-of-the-art CNNs and DNNs are characterized by their large size. In this study, we design an accelerator for large-scale CNNs and DNNs, with a special emphasis on the impact of memory on accelerator design, performance and energy. We show that it is possible to design an accelerator with a high throughput, capable of performing 452 GOP/s (key NN operations such as synaptic weight multiplications and neurons outputs additions) in a small footprint of 3.02 mm2 and 485 mW; compared to a 128-bit 2GHz SIMD processor, the accelerator is 117.87x faster, and it can reduce the total energy by 21.08x. The accelerator characteristics are obtained after layout at 65 nm. Such a high throughput in a small footprint can open up the usage of state-of-the-art machine-learning algorithms in a broad set of systems and for a broad set of applications.

1,582 citations


Cites background from "The PARSEC benchmark suite: charact..."

  • ...This trend even starts to percolate in our community where it turns out that about half of the benchmarks of PARSEC [2], a suite partly introduced to highlight the emergence of new types of applications, can be implemented using machine-learning algorithms [4]....

    [...]

  • ...This trend even starts to percolate in our community where it turns out that about half of the benchmarks of PARSEC [2], a suite partly introduced to highlight the emergence of new types of applications, can be implemented using machine-learning algorithms [4]....

    [...]

Journal ArticleDOI
TL;DR: A comprehensive study that projects the speedup potential of future multicores and examines the underutilization of integration capacity-dark silicon-is timely and crucial.
Abstract: A key question for the microprocessor research and design community is whether scaling multicores will provide the performance and value needed to scale down many more technology generations. To provide a quantitative answer to this question, a comprehensive study that projects the speedup potential of future multicores and examines the underutilization of integration capacity-dark silicon-is timely and crucial.

1,556 citations

References
More filters
Journal ArticleDOI
TL;DR: In this paper, a theoretical valuation formula for options is derived, based on the assumption that options are correctly priced in the market and it should not be possible to make sure profits by creating portfolios of long and short positions in options and their underlying stocks.
Abstract: If options are correctly priced in the market, it should not be possible to make sure profits by creating portfolios of long and short positions in options and their underlying stocks. Using this principle, a theoretical valuation formula for options is derived. Since almost all corporate liabilities can be viewed as combinations of options, the formula and the analysis that led to it are also applicable to corporate liabilities such as common stock, corporate bonds, and warrants. In particular, the formula can be used to derive the discount that should be applied to a corporate bond because of the possibility of default.

28,434 citations


"The PARSEC benchmark suite: charact..." refers methods in this paper

  • ...The blackscholes benchmark was chosen to represent the wide field of analytic PDE solvers in general and their application in computational finance in particular....

    [...]

  • ...Because HJM models are non-Markovian the analytic approach of solving the PDE to price a derivative cannot be used....

    [...]

  • ...The workload was included in the benchmark suite because of the significance of PDEs and the wide use of Monte Carlo simulation....

    [...]

  • ...It calculates the prices for a portfolio of European options analytically with the Black-Scholes partial differential equation (PDE)[10] ∂V ∂t + 1 2 σ2S2 ∂2V ∂S2 + rS ∂V ∂S − rV = 0 where V is an option on the underlying S with volatility σ at time t if the constant interest rate is r....

    [...]

  • ...It calculates the prices for a portfolio of European options analytically with the Black-Scholes partial differential equation (PDE) [7]....

    [...]

Book
01 Dec 1989
TL;DR: This best-selling title, considered for over a decade to be essential reading for every serious student and practitioner of computer design, has been updated throughout to address the most important trends facing computer designers today.
Abstract: This best-selling title, considered for over a decade to be essential reading for every serious student and practitioner of computer design, has been updated throughout to address the most important trends facing computer designers today. In this edition, the authors bring their trademark method of quantitative analysis not only to high-performance desktop machine design, but also to the design of embedded and server systems. They have illustrated their principles with designs from all three of these domains, including examples from consumer electronics, multimedia and Web technologies, and high-performance computing.

11,671 citations


"The PARSEC benchmark suite: charact..." refers background in this paper

  • ...Program execution time is the only accurate way to measure performance[18]....

    [...]

Journal ArticleDOI
TL;DR: An overview of the technical features of H.264/AVC is provided, profiles and applications for the standard are described, and the history of the standardization process is outlined.
Abstract: H.264/AVC is newest video coding standard of the ITU-T Video Coding Experts Group and the ISO/IEC Moving Picture Experts Group. The main goals of the H.264/AVC standardization effort have been enhanced compression performance and provision of a "network-friendly" video representation addressing "conversational" (video telephony) and "nonconversational" (storage, broadcast, or streaming) applications. H.264/AVC has achieved a significant improvement in rate-distortion efficiency relative to existing standards. This article provides an overview of the technical features of H.264/AVC, describes profiles and applications for the standard, and outlines the history of the standardization process.

8,646 citations


"The PARSEC benchmark suite: charact..." refers background in this paper

  • ...264 describes the lossy compression of a video stream [25] and is also part of ISO/IEC MPEG-4....

    [...]

Journal ArticleDOI
Loup Verlet1
TL;DR: In this article, the equilibrium properties of a system of 864 particles interacting through a Lennard-Jones potential have been integrated for various values of the temperature and density, relative, generally, to a fluid state.
Abstract: The equation of motion of a system of 864 particles interacting through a Lennard-Jones potential has been integrated for various values of the temperature and density, relative, generally, to a fluid state. The equilibrium properties have been calculated and are shown to agree very well with the corresponding properties of argon. It is concluded that, to a good approximation, the equilibrium state of argon can be described through a two-body potential.

7,564 citations


"The PARSEC benchmark suite: charact..." refers methods in this paper

  • ...fluidanimate uses a Verlet integrator[42] for these computations which is implemented in function AdvanceParticles....

    [...]

  • ...The workload uses Verlet integration[42] to update the position of the particles....

    [...]

Book
01 Jan 1989
TL;DR: The Black-Scholes analysis of stock option prices was used in this paper to model the behavior of stock prices and the Yield Curve of stock options, as well as the Black's model for option pricing.
Abstract: Contents: Introduction. Futures Markets and the Use of Futures for Hedging. Forward and Futures Prices. Interest Rate Futures. Swaps. Options Markets. Properties of Stock Option Prices. Trading Strategies Involving Options. Introduction to Binomial Trees. Model of the Behavior of Stock Prices. The Black-Scholes Analysis. Options on Stock Indices, Currencies, and Futures Contracts. General Approach to Pricing Derivatives. The Management of Market Risk. Numerical Procedures. Interest Rate Derivatives and the Use of Black's Model. Interest Rate Derivatives and Models of the Yield Curve. Exotic Options. Alternatives to Black-Scholes for Option Pricing. Credit Risk and Regulatory Capital. Review of Key Concepts.

6,873 citations