Home
/
Authors
/
Brandon H. Dwiel

Author

Brandon H. Dwiel

Bio: Brandon H. Dwiel is an academic researcher from North Carolina State University. The author has contributed to research in topics: Microarchitecture & Multi-core processor. The author has an hindex of 5, co-authored 10 publications receiving 207 citations.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

FabScalar: composing synthesizable RTL designs of arbitrary cores within a canonical superscalar template

[...]

Niket K. Choudhary¹, Salil V. Wadhavkar¹, Tanmay A. Shah², Hiran Mayukh³, Jayneel Gandhi³, Brandon H. Dwiel¹, Sandeep Navada¹, Hashem Hashemi Najaf-abadi², Eric Rotenberg¹ - Show less +5 more•Institutions (3)

North Carolina State University¹, Intel², University of Wisconsin-Madison³

04 Jun 2011

TL;DR: From this idea, a toolset is developed, called FabScalar, for automatically composing the synthesizable register-transfer-level (RTL) designs of arbitrary cores within a canonical superscalar template, which defines canonical pipeline stages and interfaces among them.

...read moreread less

Abstract: A growing body of work has compiled a strong case for the single-ISA heterogeneous multi-core paradigm. A single-ISA heterogeneous multi-core provides multiple, differently-designed superscalar core types that can streamline the execution of diverse programs and program phases. No prior research has addressed the 'Achilles' heel of this paradigm: design and verification effort is multiplied by the number of different core types. This work frames superscalar processors in a canonical form, so that it becomes feasible to quickly design many cores that differ in the three major superscalar dimensions: superscalar width, pipeline depth, and sizes of structures for extracting instruction-level parallelism (ILP). From this idea, we develop a toolset, called FabScalar, for automatically composing the synthesizable register-transfer-level (RTL) designs of arbitrary cores within a canonical superscalar template. The template defines canonical pipeline stages and interfaces among them. A Canonical Pipeline Stage Library (CPSL) provides many implementations of each canonical pipeline stage, that differ in their superscalar width and depth of sub-pipelining. An RTL generation tool uses the template and CPSL to automatically generate an overall core of desired configuration. Validation experiments are performed along three fronts to evaluate the quality of RTL designs generated by FabScalar: functional and performance (instructions-per-cycle (IPC)) validation, timing validation (cycle time), and confirmation of suitability for standard ASIC flows. With FabScalar, a chip with many different superscalar core types is conceivable.

...read moreread less

128 citations

Journal Article•DOI•

FabScalar: Automating Superscalar Core Design

[...]

Niket K. Choudhary¹, Salil V. Wadhavkar¹, Tanmay A. Shah¹, Hiran Mayukh¹, Jayneel Gandhi¹, Brandon H. Dwiel¹, Sandeep Navada¹, Hashem Hashemi Najaf-abadi¹, Eric Rotenberg¹ - Show less +5 more•Institutions (1)

North Carolina State University¹

01 May 2012-IEEE Micro

TL;DR: FabScalar aims to automate superscalar core design, opening up processor design to microarchitectural diversity and its many opportunities.

...read moreread less

Abstract: Providing multiple superscalar core types on a chip, each tailored to different classes of instruction-level behavior, is an exciting direction for increasing processor performance and energy efficiency. Unfortunately, processor design and verification effort increases with each additional core type, limiting the microarchitectural diversity that can be practically implemented. FabScalar aims to automate superscalar core design, opening up processor design to microarchitectural diversity and its many opportunities.

...read moreread less

27 citations

Proceedings Article•DOI•

Rationale for a 3D heterogeneous multi-core processor

[...]

Eric Rotenberg¹, Brandon H. Dwiel¹, Elliott Forbes¹, Zhenqian Zhang¹, Randy Widialaksono¹, Rangeen Basu Roy Chowdhury¹, Nyunyi M. Tshibangu¹, Steve Lipa¹, W. Rhett Davis¹, Paul D. Franzon¹ - Show less +6 more•Institutions (1)

North Carolina State University¹

07 Nov 2013

TL;DR: Single-ISA heterogeneous multi-core processors are comprised of multiple core types that are functionally equivalent but microarchitecturally diverse.

...read moreread less

Abstract: Single-ISA heterogeneous multi-core processors are comprised of multiple core types that are functionally equivalent but microarchitecturally diverse. This paradigm has gained a lot of attention as a way to optimize performance and energy. As the instruction-level behavior of the currently executing program varies, it is migrated to the most efficient core type for that behavior.

...read moreread less

20 citations

Proceedings Article•DOI•

FPGA modeling of diverse superscalar processors

[...]

Brandon H. Dwiel¹, Niket K. Choudhary¹, Eric Rotenberg¹•Institutions (1)

North Carolina State University¹

01 Apr 2012

TL;DR: FPGA-Sim is described, a configurable, automatically FGPA-synthesizable, and register-transfer-level (RTL) model of an out-of-order superscalar processor that enables FPGA modeling of diverse superscalars out- of-the-box.

...read moreread less

Abstract: There is increasing interest in using Field Programmable Gate Arrays (FPGAs) as platforms for computer architecture simulation. This paper is concerned with modeling superscalar processors with FPGAs. To be transformative, the FPGA modeling framework should meet three criteria.

...read moreread less

18 citations

Proceedings Article•DOI•

Under 100-cycle thread migration latency in a single-ISA heterogeneous multi-core processor

[...]

Elliott Forbes¹, Zhenqian Zhang¹, Randy Widialaksono¹, Brandon H. Dwiel¹, Rangeen Basu Roy Chowdhury¹, Vinesh Srinivasan¹, Steve Lipa¹, Eric Rotenberg¹, W. Rhett Davis¹, Paul D. Franzon¹ - Show less +6 more•Institutions (1)

North Carolina State University¹

01 Aug 2015

TL;DR: This article proposes hardware support for fast thread migration in Single-ISA Heterogeneous Multi-core, which combines general purpose cores with different microarchitectures, tuned for different energy/performance points.

...read moreread less

Abstract: This article consists of a single slide from the authors' conference presentation. Single-ISA Heterogeneous Multi-core: General purpose cores with different microarchitectures, tuned for different energy/performance points. Performance and energy of a program can be optimized by migrating among the core types as program characteristics change. Prior research has shown as much as a 50% improvement in energy when migrating every 1,000 cycles versus every 10,000 cycles. Such fine-grained thread migration requires very low migration overhead. We propose hardware support for fast thread migration. To migrate a thread, committed register values and the program counter must be moved from the source core to the destination core.

...read moreread less

7 citations

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

FANCI: identification of stealthy malicious logic using boolean functional analysis

[...]

Adam Waksman¹, Matthew Suozzo¹, Simha Sethumadhavan¹•Institutions (1)

Columbia University¹

04 Nov 2013

TL;DR: FANCI is a tool that flags suspicious wires, in a design, which have the potential to be malicious, which FANCI uses scalable, approximate, boolean functional analysis to detect these wires.

...read moreread less

Abstract: Hardware design today bears similarities to software design. Often vendors buy and integrate code acquired from third-party organizations into their designs, especially in embedded/system-on-chip designs. Currently, there is no way to determine if third-party designs have built-in backdoors that can compromise security after deployment.The key observation we use to approach this problem is that hardware backdoors incorporate logic that is nearly-unused, i.e. stealthy. The wires used in stealthy backdoor circuits almost never influence the outputs of those circuits. Typically, they do so only when triggered using external inputs from an attacker. In this paper, we present FANCI, a tool that flags suspicious wires, in a design, which have the potential to be malicious. FANCI uses scalable, approximate, boolean functional analysis to detect these wires.Our examination of the TrustHub hardware backdoor benchmark suite shows that FANCI is able to flag all suspicious paths in the benchmarks that are associated with backdoors. Unlike prior work in the area, FANCI is not hindered by incomplete test suite coverage and thus is able to operate in practice without false negatives. Furthermore, FANCI reports low false positive rates: less than 1% of wires are reported as suspicious in most cases. All TrustHub designs were analyzed in a day or less. We also analyze a backdoor-free out-of-order microprocessor core to demonstrate applicability beyond benchmarks.

...read moreread less

329 citations

Proceedings Article•DOI•

Architecture exploration for ambient energy harvesting nonvolatile processors

[...]

Kaisheng Ma¹, Yang Zheng¹, Shuangchen Li¹, Karthik Swaminathan¹, Xueqing Li¹, Yongpan Liu², Jack Sampson¹, Yuan Xie³, Vijaykrishnan Narayanan¹ - Show less +5 more•Institutions (3)

Pennsylvania State University¹, Tsinghua University², University of California, Santa Barbara³

09 Mar 2015

TL;DR: The simulation platform in this paper is calibrated using measured results from a fabricated nonvolatile processor and used to explore the design space for a nonVolatile processor with different architectures, different input power sources, and policies for maximizing forward progress.

...read moreread less

Abstract: Energy harvesting has been widely investigated as a promising method of providing power for ultra-low-power applications. Such energy sources include solar energy, radio-frequency (RF) radiation, piezoelectricity, thermal gradients, etc. However, the power supplied by these sources is highly unreliable and dependent upon ambient environment factors. Hence, it is necessary to develop specialized systems that are tolerant to this power variation, and also capable of making forward progress on the computation tasks. The simulation platform in this paper is calibrated using measured results from a fabricated nonvolatile processor and used to explore the design space for a nonvolatile processor with different architectures, different input power sources, and policies for maximizing forward progress.

...read moreread less

225 citations

Proceedings Article•DOI•

Memristive Boltzmann machine: A hardware accelerator for combinatorial optimization and deep learning

[...]

Mahdi Nazm Bojnordi¹, Engin Ipek¹•Institutions (1)

University of Rochester¹

12 Mar 2016

TL;DR: A new class of hardware accelerators for large-scale combinatorial optimization and deep learning based on memristive Boltzmann machines is examined based on recently developed resistive RAM (RRAM) technology, achieving 57x higher performance and 25x lower energy with virtually no loss in the quality of the solution to the optimization problems.

...read moreread less

Abstract: The Boltzmann machine is a massively parallel computational model capable of solving a broad class of combinatorial optimization problems. In recent years, it has been successfully applied to training deep machine learning models on massive datasets. High performance implementations of the Boltzmann machine using GPUs, MPI-based HPC clusters, and FPGAs have been proposed in the literature. Regrettably, the required all-to-all communication among the processing units limits the performance of these efforts. This paper examines a new class of hardware accelerators for large-scale combinatorial optimization and deep learning based on memristive Boltzmann machines. A massively parallel, memory-centric hardware accelerator is proposed based on recently developed resistive RAM (RRAM) technology. The proposed accelerator exploits the electrical properties of RRAm to realize in situ, fine-grained parallel computation within memory arrays, thereby eliminating the need for exchanging data between the memory cells and the computational units. Two classical optimization problems, graph partitioning and boolean satisfiability, and a deep belief network application are mapped onto the proposed hardware. As compared to a multicore system, the proposed accelerator achieves 57x higher performance and 25x lower energy with virtually no loss in the quality of the solution to the optimization problems. The memristive accelerator is also compared against an RRAM based processing-in-memory (PIM) system, with respective performance and energy improvements of 6.89x and 5.2x.

...read moreread less

173 citations

The Berkeley Out-of-Order Machine (BOOM): An Industry-Competitive, Synthesizable, Parameterized RISC-V Processor

[...]

Krste Asanovic, David A. Patterson, Christopher Celio

13 Jun 2015

TL;DR: BOOM is a synthesizable, parameterized, superscalar out-of-order RISC-V core designed to serve as the prototypical baseline processor for future micro-architectural studies of out- of-order processors.

...read moreread less

Abstract: : BOOM is a synthesizable, parameterized, superscalar out-of-order RISC-V core designed to serve as the prototypical baseline processor for future micro-architectural studies of out-of-order processors. Our goal is to provide a readable, open-source implementation for use in education, research, and industry. BOOM is written in roughly 9,000 lines of the hardware construction language Chisel. We leveraged Berkeleys open-source Rocket-chip SoC generator, allowing us to quickly bring up an entire multi-core processor system (including caches and uncore) by replacing the in-order Rocket core with an out-of-order BOOM core. BOOM supports atomics, IEEE754-2008 floating-point, and page-based virtual memory. We have demonstrated BOOM running Linux, SPEC CINT2006, and CoreMark.

...read moreread less

135 citations

Proceedings Article•DOI•

Quantifying sources of error in McPAT and potential impacts on architectural studies

[...]

Sam Likun Xi¹, Hans M. Jacobson², Pradip Bose², Gu-Yeon Wei¹, David Brooks¹ - Show less +1 more•Institutions (2)

Harvard University¹, IBM²

09 Mar 2015

TL;DR: This work presents the first rigorous assessment of McPAT's core power and area models with a detailed, validated power modeling toolchain used in current industrial practice and provides guidelines for creating accurateMcPAT models, even without access to detailed industrial power modeling tools.

...read moreread less

Abstract: Architectural power modeling tools are widely used by the computer architecture community for rapid evaluations of high-level design choices and design space explorations. Currently, McPAT [31] is the de facto power model, but the literature does not yet contain a careful examination of its modeling accuracy. In addition, the issue of how greatly power modeling error can affect architectural-level studies has not been quantified before. In this work, we present the first rigorous assessment of McPAT's core power and area models with a detailed, validated power modeling toolchain used in current industrial practice. We find that McPAT's predictions can have significant error because some of the models are either incomplete, too high-level, or assume implementations of structures that differ from that of the core at hand. We demonstrate that large errors are possible when using McPAT's dynamic power estimates in the context of voltage noise and thermal hotspots, but for steady-state properties, accurately modeling leakage power is more important. Based on our analysis, we are able to provide guidelines for creating accurate McPAT models, even without access to detailed industrial power modeling tools. We conclude that in spite of its accuracy gaps, McPAT is still a very useful tool for many architectural studies, and its limitations can often be adequately addressed for a given research study of interest.

...read moreread less

99 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37

Collapse