scispace - formally typeset
Search or ask a question
Topic

Design space exploration

About: Design space exploration is a research topic. Over the lifetime, 4643 publications have been published within this topic receiving 67439 citations.


Papers
More filters
Proceedings ArticleDOI
12 Dec 2009
TL;DR: Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks at the 22nm technology node for both common in-order and out-of-order manycore designs shows that when die cost is not taken into account clustering 8 cores together gives the best energy-delay product, whereas when cost is taking into account configuring clusters with 4 cores gives thebest EDA2P and EDAP.
Abstract: This paper introduces McPAT, an integrated power, area, and timing modeling framework that supports comprehensive design space exploration for multicore and manycore processor configurations ranging from 90nm to 22nm and beyond. At the microarchitectural level, McPAT includes models for the fundamental components of a chip multiprocessor, including in-order and out-of-order processor cores, networks-on-chip, shared caches, integrated memory controllers, and multiple-domain clocking. At the circuit and technology levels, McPAT supports critical-path timing modeling, area modeling, and dynamic, short-circuit, and leakage power modeling for each of the device types forecast in the ITRS roadmap including bulk CMOS, SOI, and double-gate transistors. McPAT has a flexible XML interface to facilitate its use with many performance simulators. Combined with a performance simulator, McPAT enables architects to consistently quantify the cost of new ideas and assess tradeoffs of different architectures using new metrics like energy-delay-area2 product (EDA2P) and energy-delay-area product (EDAP). This paper explores the interconnect options of future manycore processors by varying the degree of clustering over generations of process technologies. Clustering will bring interesting tradeoffs between area and performance because the interconnects needed to group cores into clusters incur area overhead, but many applications can make good use of them due to synergies of cache sharing. Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks at the 22nm technology node for both common in-order and out-of-order manycore designs shows that when die cost is not taken into account clustering 8 cores together gives the best energy-delay product, whereas when cost is taken into account configuring clusters with 4 cores gives the best EDA2P and EDAP.

2,487 citations

Journal ArticleDOI
18 Jun 2016
TL;DR: This work explores an in-situ processing approach, where memristor crossbar arrays not only store input weights, but are also used to perform dot-product operations in an analog manner.
Abstract: A number of recent efforts have attempted to design accelerators for popular machine learning algorithms, such as those involving convolutional and deep neural networks (CNNs and DNNs). These algorithms typically involve a large number of multiply-accumulate (dot-product) operations. A recent project, DaDianNao, adopts a near data processing approach, where a specialized neural functional unit performs all the digital arithmetic operations and receives input weights from adjacent eDRAM banks.This work explores an in-situ processing approach, where memristor crossbar arrays not only store input weights, but are also used to perform dot-product operations in an analog manner. While the use of crossbar memory as an analog dot-product engine is well known, no prior work has designed or characterized a full-fledged accelerator based on crossbars. In particular, our work makes the following contributions: (i) We design a pipelined architecture, with some crossbars dedicated for each neural network layer, and eDRAM buffers that aggregate data between pipeline stages. (ii) We define new data encoding techniques that are amenable to analog computations and that can reduce the high overheads of analog-to-digital conversion (ADC). (iii) We define the many supporting digital components required in an analog CNN accelerator and carry out a design space exploration to identify the best balance of memristor storage/compute, ADCs, and eDRAM storage on a chip. On a suite of CNN and DNN workloads, the proposed ISAAC architecture yields improvements of 14.8×, 5.5×, and 7.5× in throughput, energy, and computational density (respectively), relative to the state-of-the-art DaDianNao architecture.

1,558 citations

Journal ArticleDOI
01 Jan 2006
TL;DR: This work reviews the state-of-the-art metamodel-based techniques from a practitioner's perspective according to the role of meetamodeling in supporting design optimization, including model approximation, design space exploration, problem formulation, and solving various types of optimization problems.
Abstract: Computation-intensive design problems are becoming increasingly common in manufacturing industries. The computation burden is often caused by expensive analysis and simulation processes in order to reach a comparable level of accuracy as physical testing data. To address such a challenge, approximation or metamodeling techniques are often used. Metamodeling techniques have been developed from many different disciplines including statistics, mathematics, computer science, and various engineering disciplines. These metamodels are initially developed as “surrogates” of the expensive simulation process in order to improve the overall computation efficiency. They are then found to be a valuable tool to support a wide scope of activities in modern engineering design, especially design optimization. This work reviews the state-of-the-art metamodel-based techniques from a practitioner’s perspective according to the role of metamodeling in supporting design optimization, including model approximation, design space exploration, problem formulation, and solving various types of optimization problems. Challenges and future development of metamodeling in support of engineering design is also analyzed and discussed.Copyright © 2006 by ASME

1,503 citations

Proceedings ArticleDOI
20 Apr 2009
TL;DR: The development of ORION 2.0, an extensive enhancement of the original ORION models which includes completely new subcomponent power models, area models, as well as improved and updated technology models, confirms the need for accurate early-stage NoC power estimation.
Abstract: As industry moves towards many-core chips, networks-on-chip (NoCs) are emerging as the scalable fabric for interconnecting the cores. With power now the first-order design constraint, early-stage estimation of NoC power has become crucially important. ORION [29] was amongst the first NoC power models released, and has since been fairly widely used for early-stage power estimation of NoCs. However, when validated against recent NoC prototypes -- the Intel 80-core Teraflops chip and the Intel Scalable Communications Core (SCC) chip -- we saw significant deviation that can lead to erroneous NoC design choices. This prompted our development of ORION 2.0, an extensive enhancement of the original ORION models which includes completely new subcomponent power models, area models, as well as improved and updated technology models. Validation against the two Intel chips confirms a substantial improvement in accuracy over the original ORION. A case study with these power models plugged within the COSI-OCC NoC design space exploration tool [23] confirms the need for, and value of, accurate early-stage NoC power estimation. To ensure the longevity of ORION 2.0, we will be releasing it wrapped within a semi-automated flow that automatically updates its models as new technology files become available.

799 citations

Proceedings ArticleDOI
01 Dec 2007
TL;DR: This work implements two major extensions to the CACTI cache modeling tool that focus on interconnect design for a large cache, and adopts state-of-the-art design space exploration strategies for non-uniform cache access (NUCA).
Abstract: A significant part of future microprocessor real estate will be dedicated to L2 or L3 caches. These on-chip caches will heavily impact processor perfor- mance, power dissipation, and thermal management strategies. There are a number of interconnect design considerations that influence power/performance/area characteristics of large caches, such as wire mod- els (width/spacing/repeaters), signaling strategy (RC/differential/transmission), router design, etc. Yet, to date, there exists no analytical tool that takes all of these parameters into account to carry out a design space exploration for large caches and estimate an optimal organization. In this work, we implement two major extensions to the CACTI cache modeling tool that focus on interconnect design for a large cache. First, we add the ability to model different types of wires, such as RC-based wires with different power/delay characteristics and differential low-swing buses. Second, we add the ability to model Non-uniform Cache Access (NUCA). We not only adopt state-of-the-art design space exploration strategies for NUCA, we also enhance this exploration by considering on-chip network contention and a wider spectrum of wiring and routing choices. We present a validation analysis of the new tool (to be released as CACTI 6.0) and present a case study to showcase how the tool can improve architecture research methodologies. Keywords: cache models, non-uniform cache archi- tectures (NUCA), memory hierarchies, on-chip intercon- nects.

778 citations


Network Information
Related Topics (5)
Benchmark (computing)
19.6K papers, 419.1K citations
90% related
Cache
59.1K papers, 976.6K citations
85% related
Logic gate
35.7K papers, 488.3K citations
83% related
Compiler
26.3K papers, 578.5K citations
83% related
CMOS
81.3K papers, 1.1M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202389
2022157
2021221
2020259
2019273
2018239