scispace - formally typeset
Search or ask a question
Author

Brandon H. Dwiel

Bio: Brandon H. Dwiel is an academic researcher from North Carolina State University. The author has contributed to research in topics: Microarchitecture & Multi-core processor. The author has an hindex of 5, co-authored 10 publications receiving 207 citations.

Papers
More filters
Proceedings ArticleDOI
04 Jun 2011
TL;DR: From this idea, a toolset is developed, called FabScalar, for automatically composing the synthesizable register-transfer-level (RTL) designs of arbitrary cores within a canonical superscalar template, which defines canonical pipeline stages and interfaces among them.
Abstract: A growing body of work has compiled a strong case for the single-ISA heterogeneous multi-core paradigm. A single-ISA heterogeneous multi-core provides multiple, differently-designed superscalar core types that can streamline the execution of diverse programs and program phases. No prior research has addressed the 'Achilles' heel of this paradigm: design and verification effort is multiplied by the number of different core types. This work frames superscalar processors in a canonical form, so that it becomes feasible to quickly design many cores that differ in the three major superscalar dimensions: superscalar width, pipeline depth, and sizes of structures for extracting instruction-level parallelism (ILP). From this idea, we develop a toolset, called FabScalar, for automatically composing the synthesizable register-transfer-level (RTL) designs of arbitrary cores within a canonical superscalar template. The template defines canonical pipeline stages and interfaces among them. A Canonical Pipeline Stage Library (CPSL) provides many implementations of each canonical pipeline stage, that differ in their superscalar width and depth of sub-pipelining. An RTL generation tool uses the template and CPSL to automatically generate an overall core of desired configuration. Validation experiments are performed along three fronts to evaluate the quality of RTL designs generated by FabScalar: functional and performance (instructions-per-cycle (IPC)) validation, timing validation (cycle time), and confirmation of suitability for standard ASIC flows. With FabScalar, a chip with many different superscalar core types is conceivable.

128 citations

Journal ArticleDOI
TL;DR: FabScalar aims to automate superscalar core design, opening up processor design to microarchitectural diversity and its many opportunities.
Abstract: Providing multiple superscalar core types on a chip, each tailored to different classes of instruction-level behavior, is an exciting direction for increasing processor performance and energy efficiency. Unfortunately, processor design and verification effort increases with each additional core type, limiting the microarchitectural diversity that can be practically implemented. FabScalar aims to automate superscalar core design, opening up processor design to microarchitectural diversity and its many opportunities.

27 citations

Proceedings ArticleDOI
07 Nov 2013
TL;DR: Single-ISA heterogeneous multi-core processors are comprised of multiple core types that are functionally equivalent but microarchitecturally diverse.
Abstract: Single-ISA heterogeneous multi-core processors are comprised of multiple core types that are functionally equivalent but microarchitecturally diverse. This paradigm has gained a lot of attention as a way to optimize performance and energy. As the instruction-level behavior of the currently executing program varies, it is migrated to the most efficient core type for that behavior.

20 citations

Proceedings ArticleDOI
01 Apr 2012
TL;DR: FPGA-Sim is described, a configurable, automatically FGPA-synthesizable, and register-transfer-level (RTL) model of an out-of-order superscalar processor that enables FPGA modeling of diverse superscalars out- of-the-box.
Abstract: There is increasing interest in using Field Programmable Gate Arrays (FPGAs) as platforms for computer architecture simulation. This paper is concerned with modeling superscalar processors with FPGAs. To be transformative, the FPGA modeling framework should meet three criteria.

18 citations

Proceedings ArticleDOI
01 Aug 2015
TL;DR: This article proposes hardware support for fast thread migration in Single-ISA Heterogeneous Multi-core, which combines general purpose cores with different microarchitectures, tuned for different energy/performance points.
Abstract: This article consists of a single slide from the authors' conference presentation. Single-ISA Heterogeneous Multi-core: General purpose cores with different microarchitectures, tuned for different energy/performance points. Performance and energy of a program can be optimized by migrating among the core types as program characteristics change. Prior research has shown as much as a 50% improvement in energy when migrating every 1,000 cycles versus every 10,000 cycles. Such fine-grained thread migration requires very low migration overhead. We propose hardware support for fast thread migration. To migrate a thread, committed register values and the program counter must be moved from the source core to the destination core.

7 citations


Cited by
More filters
Proceedings ArticleDOI
04 Nov 2013
TL;DR: FANCI is a tool that flags suspicious wires, in a design, which have the potential to be malicious, which FANCI uses scalable, approximate, boolean functional analysis to detect these wires.
Abstract: Hardware design today bears similarities to software design. Often vendors buy and integrate code acquired from third-party organizations into their designs, especially in embedded/system-on-chip designs. Currently, there is no way to determine if third-party designs have built-in backdoors that can compromise security after deployment.The key observation we use to approach this problem is that hardware backdoors incorporate logic that is nearly-unused, i.e. stealthy. The wires used in stealthy backdoor circuits almost never influence the outputs of those circuits. Typically, they do so only when triggered using external inputs from an attacker. In this paper, we present FANCI, a tool that flags suspicious wires, in a design, which have the potential to be malicious. FANCI uses scalable, approximate, boolean functional analysis to detect these wires.Our examination of the TrustHub hardware backdoor benchmark suite shows that FANCI is able to flag all suspicious paths in the benchmarks that are associated with backdoors. Unlike prior work in the area, FANCI is not hindered by incomplete test suite coverage and thus is able to operate in practice without false negatives. Furthermore, FANCI reports low false positive rates: less than 1% of wires are reported as suspicious in most cases. All TrustHub designs were analyzed in a day or less. We also analyze a backdoor-free out-of-order microprocessor core to demonstrate applicability beyond benchmarks.

329 citations

Proceedings ArticleDOI
09 Mar 2015
TL;DR: The simulation platform in this paper is calibrated using measured results from a fabricated nonvolatile processor and used to explore the design space for a nonVolatile processor with different architectures, different input power sources, and policies for maximizing forward progress.
Abstract: Energy harvesting has been widely investigated as a promising method of providing power for ultra-low-power applications. Such energy sources include solar energy, radio-frequency (RF) radiation, piezoelectricity, thermal gradients, etc. However, the power supplied by these sources is highly unreliable and dependent upon ambient environment factors. Hence, it is necessary to develop specialized systems that are tolerant to this power variation, and also capable of making forward progress on the computation tasks. The simulation platform in this paper is calibrated using measured results from a fabricated nonvolatile processor and used to explore the design space for a nonvolatile processor with different architectures, different input power sources, and policies for maximizing forward progress.

225 citations

Proceedings ArticleDOI
12 Mar 2016
TL;DR: A new class of hardware accelerators for large-scale combinatorial optimization and deep learning based on memristive Boltzmann machines is examined based on recently developed resistive RAM (RRAM) technology, achieving 57x higher performance and 25x lower energy with virtually no loss in the quality of the solution to the optimization problems.
Abstract: The Boltzmann machine is a massively parallel computational model capable of solving a broad class of combinatorial optimization problems. In recent years, it has been successfully applied to training deep machine learning models on massive datasets. High performance implementations of the Boltzmann machine using GPUs, MPI-based HPC clusters, and FPGAs have been proposed in the literature. Regrettably, the required all-to-all communication among the processing units limits the performance of these efforts. This paper examines a new class of hardware accelerators for large-scale combinatorial optimization and deep learning based on memristive Boltzmann machines. A massively parallel, memory-centric hardware accelerator is proposed based on recently developed resistive RAM (RRAM) technology. The proposed accelerator exploits the electrical properties of RRAm to realize in situ, fine-grained parallel computation within memory arrays, thereby eliminating the need for exchanging data between the memory cells and the computational units. Two classical optimization problems, graph partitioning and boolean satisfiability, and a deep belief network application are mapped onto the proposed hardware. As compared to a multicore system, the proposed accelerator achieves 57x higher performance and 25x lower energy with virtually no loss in the quality of the solution to the optimization problems. The memristive accelerator is also compared against an RRAM based processing-in-memory (PIM) system, with respective performance and energy improvements of 6.89x and 5.2x.

173 citations

13 Jun 2015
TL;DR: BOOM is a synthesizable, parameterized, superscalar out-of-order RISC-V core designed to serve as the prototypical baseline processor for future micro-architectural studies of out- of-order processors.
Abstract: : BOOM is a synthesizable, parameterized, superscalar out-of-order RISC-V core designed to serve as the prototypical baseline processor for future micro-architectural studies of out-of-order processors. Our goal is to provide a readable, open-source implementation for use in education, research, and industry. BOOM is written in roughly 9,000 lines of the hardware construction language Chisel. We leveraged Berkeleys open-source Rocket-chip SoC generator, allowing us to quickly bring up an entire multi-core processor system (including caches and uncore) by replacing the in-order Rocket core with an out-of-order BOOM core. BOOM supports atomics, IEEE754-2008 floating-point, and page-based virtual memory. We have demonstrated BOOM running Linux, SPEC CINT2006, and CoreMark.

135 citations

Proceedings ArticleDOI
Sam Likun Xi1, Hans M. Jacobson2, Pradip Bose2, Gu-Yeon Wei1, David Brooks1 
09 Mar 2015
TL;DR: This work presents the first rigorous assessment of McPAT's core power and area models with a detailed, validated power modeling toolchain used in current industrial practice and provides guidelines for creating accurateMcPAT models, even without access to detailed industrial power modeling tools.
Abstract: Architectural power modeling tools are widely used by the computer architecture community for rapid evaluations of high-level design choices and design space explorations. Currently, McPAT [31] is the de facto power model, but the literature does not yet contain a careful examination of its modeling accuracy. In addition, the issue of how greatly power modeling error can affect architectural-level studies has not been quantified before. In this work, we present the first rigorous assessment of McPAT's core power and area models with a detailed, validated power modeling toolchain used in current industrial practice. We find that McPAT's predictions can have significant error because some of the models are either incomplete, too high-level, or assume implementations of structures that differ from that of the core at hand. We demonstrate that large errors are possible when using McPAT's dynamic power estimates in the context of voltage noise and thermal hotspots, but for steady-state properties, accurately modeling leakage power is more important. Based on our analysis, we are able to provide guidelines for creating accurate McPAT models, even without access to detailed industrial power modeling tools. We conclude that in spite of its accuracy gaps, McPAT is still a very useful tool for many architectural studies, and its limitations can often be adequately addressed for a given research study of interest.

99 citations