Author

Gabriel Black

Bio: Gabriel Black is an academic researcher from Google. The author has an hindex of 1, co-authored 1 publications receiving 3392 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

The gem5 simulator

[...]

Nathan Binkert¹, Bradford M. Beckmann², Gabriel Black³, Steven K. Reinhardt², Ali G. Saidi, Arkaprava Basu⁴, Joel Hestness⁵, Derek R. Hower⁴, Tushar Krishna⁶, Somayeh Sardashti⁴, Rathijit Sen⁴, Korey Sewell⁷, Muhammad Shoaib⁴, Nilay Vaish⁴, Mark D. Hill⁴, Darien Wood⁴ - Show less +12 more•Institutions (7)

Hewlett-Packard¹, Advanced Micro Devices², Google³, University of Wisconsin-Madison⁴, University of Texas at Austin⁵, Massachusetts Institute of Technology⁶, University of Michigan⁷

31 Aug 2011-ACM Sigarch Computer Architecture News

TL;DR: The high level of collaboration on the gem5 project, combined with the previous success of the component parts and a liberal BSD-like license, make gem5 a valuable full-system simulation tool.

...read moreread less

Abstract: The gem5 simulation infrastructure is the merger of the best aspects of the M5 [4] and GEMS [9] simulators. M5 provides a highly configurable simulation framework, multiple ISAs, and diverse CPU models. GEMS complements these features with a detailed and exible memory system, including support for multiple cache coherence protocols and interconnect models. Currently, gem5 supports most commercial ISAs (ARM, ALPHA, MIPS, Power, SPARC, and x86), including booting Linux on three of them (ARM, ALPHA, and x86).The project is the result of the combined efforts of many academic and industrial institutions, including AMD, ARM, HP, MIPS, Princeton, MIT, and the Universities of Michigan, Texas, and Wisconsin. Over the past ten years, M5 and GEMS have been used in hundreds of publications and have been downloaded tens of thousands of times. The high level of collaboration on the gem5 project, combined with the previous success of the component parts and a liberal BSD-like license, make gem5 a valuable full-system simulation tool.

...read moreread less

4,039 citations

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

A detailed and flexible cycle-accurate Network-on-Chip simulator

[...]

Nan Jiang¹, Daniel U. Becker¹, George Michelogiannakis², James Balfour³, Brian Towles, David E. Shaw², John Kim¹, William J. Dally¹ - Show less +4 more•Institutions (3)

Stanford University¹, Lawrence Berkeley National Laboratory², Google³

21 Apr 2013

TL;DR: The simulator, BookSim, is designed for simulation flexibility and accurate modeling of network components and offers a large set of configurable network parameters in terms of topology, routing algorithm, flow control, and router microarchitecture, including buffer management and allocation schemes.

...read moreread less

Abstract: Network-on-Chips (NoCs) are becoming integral parts of modern microprocessors as the number of cores and modules integrated on a single chip continues to increase. Research and development of future NoC technology relies on accurate modeling and simulations to evaluate the performance impact and analyze the cost of novel NoC architectures. In this work, we present BookSim, a cycle-accurate simulator for NoCs. The simulator is designed for simulation flexibility and accurate modeling of network components. It features a modular design and offers a large set of configurable network parameters in terms of topology, routing algorithm, flow control, and router microarchitecture, including buffer management and allocation schemes. BookSim furthermore emphasizes detailed implementations of network components that accurately model the behavior of actual hardware. We have validated the accuracy of the simulator against RTL implementations of NoC routers.

...read moreread less

645 citations

Journal Article•DOI•

Ramulator: A Fast and Extensible DRAM Simulator

[...]

Yoongu Kim¹, Weikun Yang¹, Onur Mutlu¹•Institutions (1)

Carnegie Mellon University¹

01 Jan 2016-IEEE Computer Architecture Letters

TL;DR: This paper presents Ramulator, a fast and cycle-accurate DRAM simulator that is built from the ground up for extensibility, and is able to provide out-of-the-box support for a wide array of DRAM standards.

...read moreread less

Abstract: Recently, both industry and academia have proposed many different roadmaps for the future of DRAM. Consequently, there is a growing need for an extensible DRAM simulator, which can be easily modified to judge the merits of today's DRAM standards as well as those of tomorrow. In this paper, we present Ramulator , a fast and cycle-accurate DRAM simulator that is built from the ground up for extensibility. Unlike existing simulators, Ramulator is based on a generalized template for modeling a DRAM system, which is only later infused with the specific details of a DRAM standard. Thanks to such a decoupled and modular design, Ramulator is able to provide out-of-the-box support for a wide array of DRAM standards: DDR3/4, LPDDR3/4, GDDR5, WIO1/2, HBM, as well as some academic proposals (SALP, AL-DRAM, TL-DRAM, RowClone, and SARP). Importantly, Ramulator does not sacrifice simulation speed to gain extensibility: according to our evaluations, Ramulator is 2.5 $\times$ faster than the next fastest simulator. Ramulator is released under the permissive BSD license.

...read moreread less

535 citations

Proceedings Article•DOI•

DSENT - A Tool Connecting Emerging Photonics with Electronics for Opto-Electronic Networks-on-Chip Modeling

[...]

Chen Sun¹, Chia-Hsin Owen Chen¹, George Kurian¹, Lan Wei¹, Jason Miller¹, Anant Agarwal¹, Li-Shiuan Peh¹, Vladimir Stojanovic¹ - Show less +4 more•Institutions (1)

Massachusetts Institute of Technology¹

09 May 2012

TL;DR: DSENT, a NoC modeling tool for rapid design space exploration of electrical and opto-electrical networks, is presented and the results show the implications of different technology scenarios and the need to reduce laser and thermal tuning power in a photonic network due to their non-data-dependent nature.

...read moreread less

Abstract: With the rise of many-core chips that require substantial bandwidth from the network on chip (NoC), integrated photonic links have been investigated as a promising alternative to traditional electrical interconnects While numerous opto-electronic NoCs have been proposed, evaluations of photonic architectures have thus-far had to use a number of simplifications, reflecting the need for a modeling tool that accurately captures the tradeoffs for the emerging technology and its impacts on the overall network In this paper, we present DSENT, a NoC modeling tool for rapid design space exploration of electrical and opto-electrical networks We explain our modeling framework and perform an energy-driven case study, focusing on electrical technology scaling, photonic parameters, and thermal tuning Our results show the implications of different technology scenarios and, in particular, the need to reduce laser and thermal tuning power in a photonic network due to their non-data-dependent nature

...read moreread less

529 citations

Proceedings Article•DOI•

ZSim: fast and accurate microarchitectural simulation of thousand-core systems

[...]

Daniel Sanchez¹, Christos Kozyrakis²•Institutions (2)

Massachusetts Institute of Technology¹, Stanford University²

23 Jun 2013

TL;DR: Zsim, a fast, scalable, and accurate simulator, is built using bound-weave, a two-phase parallelization technique that scales parallel simulation on multicore hosts efficiently with minimal loss of accuracy, and lightweight user-level virtualization is implemented to support complex workloads.

...read moreread less

Abstract: Architectural simulation is time-consuming, and the trend towards hundreds of cores is making sequential simulation even slower. Existing parallel simulation techniques either scale poorly due to excessive synchronization, or sacrifice accuracy by allowing event reordering and using simplistic contention models. As a result, most researchers use sequential simulators and model small-scale systems with 16-32 cores. With 100-core chips already available, developing simulators that scale to thousands of cores is crucial.We present three novel techniques that, together, make thousand-core simulation practical. First, we speed up detailed core models (including OOO cores) with instruction-driven timing models that leverage dynamic binary translation. Second, we introduce bound-weave, a two-phase parallelization technique that scales parallel simulation on multicore hosts efficiently with minimal loss of accuracy. Third, we implement lightweight user-level virtualization to support complex workloads, including multiprogrammed, client-server, and managed-runtime applications, without the need for full-system simulation, sidestepping the lack of scalable OSs and ISAs that support thousands of cores.We use these techniques to build zsim, a fast, scalable, and accurate simulator. On a 16-core host, zsim models a 1024-core chip at speeds of up to 1,500 MIPS using simple cores and up to 300 MIPS using detailed OOO cores, 2-3 orders of magnitude faster than existing parallel simulators. Simulator performance scales well with both the number of modeled cores and the number of host cores. We validate zsim against a real Westmere system on a wide variety of workloads, and find performance and microarchitectural events to be within a narrow range of the real system.

...read moreread less

481 citations

Proceedings Article•DOI•

Ambit: in-memory accelerator for bulk bitwise operations using commodity DRAM technology

[...]

Vivek Seshadri¹, Donghyuk Lee¹, Thomas Mullins¹, Hasan Hassan², Amirali Boroumand¹, Jeremie S. Kim¹, Michael Kozuch³, Onur Mutlu¹, Phillip B. Gibbons¹, Todd C. Mowry¹ - Show less +6 more•Institutions (3)

Carnegie Mellon University¹, ETH Zurich², Intel³

14 Oct 2017

TL;DR: Ambit is proposed, an Accelerator-in-Memory for bulk bitwise operations that largely exploits existing DRAM structure, and hence incurs low cost on top of commodity DRAM designs (1% of DRAM chip area).

...read moreread less

Abstract: Many important applications trigger bulk bitwise operations, i.e., bitwise operations on large bit vectors. In fact, recent works design techniques that exploit fast bulk bitwise operations to accelerate databases (bitmap indices, BitWeaving) and web search (BitFunnel). Unfortunately, in existing architectures, the throughput of bulk bitwise operations is limited by the memory bandwidth available to the processing unit (e.g., CPU, GPU, FPGA, processing-in-memory).To overcome this bottleneck, we propose Ambit, an Accelerator-in-Memory for bulk bitwise operations. Unlike prior works, Ambit exploits the analog operation of DRAM technology to perform bitwise operations completely inside DRAM, thereby exploiting the full internal DRAM bandwidth. Ambit consists of two components. First, simultaneous activation of three DRAM rows that share the same set of sense amplifiers enables the system to perform bitwise AND and OR operations. Second, with modest changes to the sense amplifier, the system can use the inverters present inside the sense amplifier to perform bitwise NOT operations. With these two components, Ambit can perform any bulk bitwise operation efficiently inside DRAM. Ambit largely exploits existing DRAM structure, and hence incurs low cost on top of commodity DRAM designs (1% of DRAM chip area). Importantly, Ambit uses the modern DRAM interface without any changes, and therefore it can be directly plugged onto the memory bus.Our extensive circuit simulations show that Ambit works as expected even in the presence of significant process variation. Averaged across seven bulk bitwise operations, Ambit improves performance by 32X and reduces energy consumption by 35X compared to state-of-the-art systems. When integrated with Hybrid Memory Cube (HMC), a 3D-stacked DRAM with a logic layer, Ambit improves performance of bulk bitwise operations by 9.7X compared to processing in the logic layer of the HMC. Ambit improves the performance of three real-world data-intensive applications, 1) database bitmap indices, 2) BitWeaving, a technique to accelerate database scans, and 3) bit-vector-based implementation of sets, by 3X-7X compared to a state-of-the-art baseline using SIMD optimizations. We describe four other applications that can benefit from Ambit, including a recent technique proposed to speed up web search. We believe that large performance and energy improvements provided by Ambit can enable other applications to use bulk bitwise operations.CCS CONCEPTS• Computer systems organization → Single instruction, multiple data; • Hardware → Hardware accelerator; • Hardware → Dynamic memory;

...read moreread less

444 citations

Collapse