Home
/
Authors
/
Nathan Binkert

Author

Nathan Binkert

Other affiliations: Amazon.com, Lynn University, University of Michigan

Bio: Nathan Binkert is an academic researcher from Hewlett-Packard. The author has contributed to research in topics: Node (networking) & Optical switch. The author has an hindex of 32, co-authored 76 publications receiving 7796 citations. Previous affiliations of Nathan Binkert include Amazon.com & Lynn University.

Papers published on a yearly basis

2023
2019
2017
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002

Papers

PDF

Open Access

More filters

Journal Article•DOI•

The gem5 simulator

[...]

Nathan Binkert¹, Bradford M. Beckmann², Gabriel Black³, Steven K. Reinhardt², Ali G. Saidi, Arkaprava Basu⁴, Joel Hestness⁵, Derek R. Hower⁴, Tushar Krishna⁶, Somayeh Sardashti⁴, Rathijit Sen⁴, Korey Sewell⁷, Muhammad Shoaib⁴, Nilay Vaish⁴, Mark D. Hill⁴, Darien Wood⁴ - Show less +12 more•Institutions (7)

Hewlett-Packard¹, Advanced Micro Devices², Google³, University of Wisconsin-Madison⁴, University of Texas at Austin⁵, Massachusetts Institute of Technology⁶, University of Michigan⁷

31 Aug 2011-ACM Sigarch Computer Architecture News

TL;DR: The high level of collaboration on the gem5 project, combined with the previous success of the component parts and a liberal BSD-like license, make gem5 a valuable full-system simulation tool.

...read moreread less

Abstract: The gem5 simulation infrastructure is the merger of the best aspects of the M5 [4] and GEMS [9] simulators. M5 provides a highly configurable simulation framework, multiple ISAs, and diverse CPU models. GEMS complements these features with a detailed and exible memory system, including support for multiple cache coherence protocols and interconnect models. Currently, gem5 supports most commercial ISAs (ARM, ALPHA, MIPS, Power, SPARC, and x86), including booting Linux on three of them (ARM, ALPHA, and x86).The project is the result of the combined efforts of many academic and industrial institutions, including AMD, ARM, HP, MIPS, Princeton, MIT, and the Universities of Michigan, Texas, and Wisconsin. Over the past ten years, M5 and GEMS have been used in hundreds of publications and have been downloaded tens of thousands of times. The high level of collaboration on the gem5 project, combined with the previous success of the component parts and a liberal BSD-like license, make gem5 a valuable full-system simulation tool.

...read moreread less

4,039 citations

Journal Article•DOI•

The M5 Simulator: Modeling Networked Systems

[...]

Nathan Binkert¹, Ronald G. Dreslinski¹, Lisa R. Hsu¹, Kevin Lim¹, Ali G. Saidi¹, Steven K. Reinhardt¹ - Show less +2 more•Institutions (1)

University of Michigan¹

01 Jul 2006-IEEE Micro

TL;DR: The M5 simulator provides features necessary for simulating networked hosts, including full-system capability, a detailed I/O subsystem, and the ability to simulate multiple networked systems deterministically.

...read moreread less

Abstract: The M5 simulator is developed specifically to enable research in TCP/IP networking. The M5 simulator provides features necessary for simulating networked hosts, including full-system capability, a detailed I/O subsystem, and the ability to simulate multiple networked systems deterministically. M5's usefulness as a general-purpose architecture simulator and its liberal open-source license has led to its adoption by several academic and commercial groups

...read moreread less

839 citations

Journal Article•DOI•

Corona: System Implications of Emerging Nanophotonic Technology

[...]

Dana M. Vantrease¹, Robert Schreiber², Matteo Monchiero², Moray McLaren², Norman P. Jouppi², Marco Fiorentino², Al Davis³, Nathan Binkert², Raymond G. Beausoleil², Jung Ho Ahn² - Show less +6 more•Institutions (3)

University of Wisconsin-Madison¹, Hewlett-Packard², University of Utah³

01 Jun 2008

TL;DR: This work believes that in comparison with an electrically-connected many-core alternative that uses the same on-stack interconnect power, Corona can provide 2 to 6 times more performance on many memory intensive workloads, while simultaneously reducing power.

...read moreread less

Abstract: We expect that many-core microprocessors will push performance per chip from the 10 gigaflop to the 10 teraflop range in the coming decade. To support this increased performance, memory and inter-core bandwidths will also have to scale by orders of magnitude. Pin limitations, the energy cost of electrical signaling, and the non-scalability of chip-length global wires are significant bandwidth impediments. Recent developments in silicon nanophotonic technology have the potential to meet these off- and on-stack bandwidth requirements at acceptable power levels. Corona is a 3D many-core architecture that uses nanophotonic communication for both inter-core communication and off-stack communication to memory or I/O devices. Its peak floating-point performance is 10 teraflops. Dense wavelength division multiplexed optically connected memory modules provide 10 terabyte per second memory bandwidth. A photonic crossbar fully interconnects its 256 low-power multithreaded cores at 20 terabyte per second bandwidth. We have simulated a 1024 thread Corona system running synthetic benchmarks and scaled versions of the SPLASH-2 benchmark suite. We believe that in comparison with an electrically-connected many-core alternative that uses the same on-stack interconnect power, Corona can provide 2 to 6 times more performance on many memory intensive workloads, while simultaneously reducing power.

...read moreread less

688 citations

Proceedings Article•DOI•

HyperX: topology, routing, and packaging of efficient large-scale networks

[...]

Jung Ho Ahn¹, Nathan Binkert¹, Al Davis¹, Moray McLaren¹, Robert Schreiber¹ - Show less +1 more•Institutions (1)

Hewlett-Packard¹

14 Nov 2009

TL;DR: This work considers an extension of the hypercube and flattened butterfly topologies, the HyperX, and gives an adaptive routing algorithm, DAL, to take advantage of high-radix switch components that integrated photonics will make available.

...read moreread less

Abstract: In the push to achieve exascale performance, systems will grow to over 100,000 sockets, as growing cores-per-socket and improved single-core performance provide only part of the speedup needed. These systems will need affordable interconnect structures that scale to this level. To meet the need, we consider an extension of the hypercube and flattened butterfly topologies, the HyperX, and give an adaptive routing algorithm, DAL. HyperX takes advantage of high-radix switch components that integrated photonics will make available. Our main contributions include a formal descriptive framework, enabling a search method that finds optimal HyperX configurations; DAL; and a low cost packaging strategy for an exascale HyperX. Simulations show that HyperX can provide performance as good as a folded Clos, with fewer switches. We also describe a HyperX packaging scheme that reduces system cost. Our analysis of efficiency, performance, and packaging demonstrates that the HyperX is a strong competitor for exascale networks.

...read moreread less

269 citations

Proceedings Article•DOI•

PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor

[...]

Taeho Kgil¹, Shaun C. D'Souza¹, Ali G. Saidi¹, Nathan Binkert¹, Ronald G. Dreslinski¹, Trevor Mudge¹, Steven K. Reinhardt¹, Krisztian Flautner - Show less +4 more•Institutions (1)

University of Michigan¹

20 Oct 2006

TL;DR: It is shown how 3D stacking technology can be used to implement a simple, low-power, high-performance chip multiprocessor suitable for throughput processing and that a PicoServer performs comparably to a Pentium 4-like class machine while consuming only about 1/10 of the power.

...read moreread less

Abstract: In this paper, we show how 3D stacking technology can be used to implement a simple, low-power, high-performance chip multiprocessor suitable for throughput processing. Our proposed architecture, PicoServer, employs 3D technology to bond one die containing several simple slow processing cores to multiple DRAM dies sufficient for a primary memory. The 3D technology also enables wide low-latency buses between processors and memory. These remove the need for an L2 cache allowing its area to be re-allocated to additional simple cores. The additional cores allow the clock frequency to be lowered without impairing throughput. Lower clock frequency in turn reduces power and means that thermal constraints, a concern with 3D stacking, are easily satisfied.The PicoServer architecture specifically targets Tier 1 server applications, which exhibit a high degree of thread level parallelism. An architecture targeted to efficient throughput is ideal for this application domain. We find for a similar logic die area, a 12 CPU system with 3D stacking and no L2 cache outperforms an 8 CPU system with a large on-chip L2 cache by about 14% while consuming 55% less power. In addition, we show that a PicoServer performs comparably to a Pentium 4-like class machine while consuming only about 1/10 of the power, even when conservative assumptions are made about the power consumption of the PicoServer.

...read moreread less

229 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

The gem5 simulator

[...]

Hewlett-Packard¹, Advanced Micro Devices², Google³, University of Wisconsin-Madison⁴, University of Texas at Austin⁵, Massachusetts Institute of Technology⁶, University of Michigan⁷

31 Aug 2011-ACM Sigarch Computer Architecture News

TL;DR: The high level of collaboration on the gem5 project, combined with the previous success of the component parts and a liberal BSD-like license, make gem5 a valuable full-system simulation tool.

...read moreread less

4,039 citations

Proceedings Article•DOI•

McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

[...]

Sheng Li¹, Jung Ho Ahn², Richard Strong³, Jay B. Brockman¹, Dean M. Tullsen³, Norman P. Jouppi⁴ - Show less +2 more•Institutions (4)

University of Notre Dame¹, Seoul National University², University of California, San Diego³, Hewlett-Packard⁴

12 Dec 2009

TL;DR: Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks at the 22nm technology node for both common in-order and out-of-order manycore designs shows that when die cost is not taken into account clustering 8 cores together gives the best energy-delay product, whereas when cost is taking into account configuring clusters with 4 cores gives thebest EDA2P and EDAP.

...read moreread less

Abstract: This paper introduces McPAT, an integrated power, area, and timing modeling framework that supports comprehensive design space exploration for multicore and manycore processor configurations ranging from 90nm to 22nm and beyond. At the microarchitectural level, McPAT includes models for the fundamental components of a chip multiprocessor, including in-order and out-of-order processor cores, networks-on-chip, shared caches, integrated memory controllers, and multiple-domain clocking. At the circuit and technology levels, McPAT supports critical-path timing modeling, area modeling, and dynamic, short-circuit, and leakage power modeling for each of the device types forecast in the ITRS roadmap including bulk CMOS, SOI, and double-gate transistors. McPAT has a flexible XML interface to facilitate its use with many performance simulators. Combined with a performance simulator, McPAT enables architects to consistently quantify the cost of new ideas and assess tradeoffs of different architectures using new metrics like energy-delay-area2 product (EDA2P) and energy-delay-area product (EDAP). This paper explores the interconnect options of future manycore processors by varying the degree of clustering over generations of process technologies. Clustering will bring interesting tradeoffs between area and performance because the interconnects needed to group cores into clusters incur area overhead, but many applications can make good use of them due to synergies of cache sharing. Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks at the 22nm technology node for both common in-order and out-of-order manycore designs shows that when die cost is not taken into account clustering 8 cores together gives the best energy-delay product, whereas when cost is taken into account configuring clusters with 4 cores gives the best EDA2P and EDAP.

...read moreread less

2,487 citations

Journal Article•DOI•

Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset

[...]

Milo M. K. Martin¹, Daniel J. Sorin², Bradford M. Beckmann³, Michael R. Marty³, Min Xu³, Alaa R. Alameldeen³, Kevin E. Moore³, Mark D. Hill³, Darien Wood³ - Show less +5 more•Institutions (3)

University of Pennsylvania¹, Duke University², University of Wisconsin-Madison³

01 Nov 2005-ACM Sigarch Computer Architecture News

TL;DR: The Wisconsin Multifacet Project has created a simulation toolset to characterize and evaluate the performance of multiprocessor hardware systems commonly used as database and web servers as mentioned in this paper, which includes a set of timing simulator modules for modeling the timing of the memory system and microprocessors.

...read moreread less

Abstract: The Wisconsin Multifacet Project has created a simulation toolset to characterize and evaluate the performance of multiprocessor hardware systems commonly used as database and web servers. We leverage an existing full-system functional simulation infrastructure (Simics [14]) as the basis around which to build a set of timing simulator modules for modeling the timing of the memory system and microprocessors. This simulator infrastructure enables us to run architectural experiments using a suite of scaled-down commercial workloads [3]. To enable other researchers to more easily perform such research, we have released these timing simulator modules as the Multifacet General Execution-driven Multiprocessor Simulator (GEMS) Toolset, release 1.0, under GNU GPL [9].

...read moreread less

1,515 citations

Multifacets General Execution-Driven Multiprocessor Simulator (GEMS) Toolset

[...]

M. M. Martin

01 Jan 2005

TL;DR: The Wisconsin Multifacet Project has created a simulation toolset to characterize and evaluate the performance of multiprocessor hardware systems commonly used as database and web servers and has released a set of timing simulator modules for modeling the timing of the memory system and microprocessors.

...read moreread less

1,464 citations

Proceedings Article•DOI•

DevoFlow: scaling flow management for high-performance networks

[...]

Andrew R. Curtis¹, Jeffrey C. Mogul², Jean Tourrilhes², Praveen Yalagandula², Puneet Sharma², Sujata Banerjee² - Show less +2 more•Institutions (2)

University of Waterloo¹, Hewlett-Packard²

15 Aug 2011

TL;DR: DevoFlow is designed and evaluated, a modification of the OpenFlow model which gently breaks the coupling between control and global visibility, in a way that maintains a useful amount of visibility without imposing unnecessary costs.

...read moreread less

Abstract: OpenFlow is a great concept, but its original design imposes excessive overheads. It can simplify network and traffic management in enterprise and data center environments, because it enables flow-level control over Ethernet switching and provides global visibility of the flows in the network. However, such fine-grained control and visibility comes with costs: the switch-implementation costs of involving the switch's control-plane too often and the distributed-system costs of involving the OpenFlow controller too frequently, both on flow setups and especially for statistics-gathering.In this paper, we analyze these overheads, and show that OpenFlow's current design cannot meet the needs of high-performance networks. We design and evaluate DevoFlow, a modification of the OpenFlow model which gently breaks the coupling between control and global visibility, in a way that maintains a useful amount of visibility without imposing unnecessary costs. We evaluate DevoFlow through simulations, and find that it can load-balance data center traffic as well as fine-grained solutions, without as much overhead: DevoFlow uses 10--53 times fewer flow table entries at an average switch, and uses 10--42 times fewer control messages.

...read moreread less

1,132 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse