Showing papers by "Robert Schreiber published in 2009"

PDF

Open Access

Proceedings Article•DOI•

HyperX: topology, routing, and packaging of efficient large-scale networks

[...]

Jung Ho Ahn¹, Nathan Binkert¹, Al Davis¹, Moray McLaren¹, Robert Schreiber¹ - Show less +1 more•Institutions (1)

14 Nov 2009

TL;DR: This work considers an extension of the hypercube and flattened butterfly topologies, the HyperX, and gives an adaptive routing algorithm, DAL, to take advantage of high-radix switch components that integrated photonics will make available.

...read moreread less

Abstract: In the push to achieve exascale performance, systems will grow to over 100,000 sockets, as growing cores-per-socket and improved single-core performance provide only part of the speedup needed. These systems will need affordable interconnect structures that scale to this level. To meet the need, we consider an extension of the hypercube and flattened butterfly topologies, the HyperX, and give an adaptive routing algorithm, DAL. HyperX takes advantage of high-radix switch components that integrated photonics will make available. Our main contributions include a formal descriptive framework, enabling a search method that finds optimal HyperX configurations; DAL; and a low cost packaging strategy for an exascale HyperX. Simulations show that HyperX can provide performance as good as a folded Clos, with fewer switches. We also describe a HyperX packaging scheme that reduces system cost. Our analysis of efficiency, performance, and packaging demonstrates that the HyperX is a strong competitor for exascale networks.

...read moreread less

269 citations

Journal Article•DOI•

Devices and architectures for photonic chip-scale integration

[...]

Jung Ho Ahn¹, Marco Fiorentino¹, Raymond G. Beausoleil¹, Nathan Binkert¹, Al Davis¹, David A. Fattal¹, Norm Jouppi¹, Moray McLaren¹, Charles Santori¹, Robert Schreiber¹, Sean M. Spillane¹, Dana M. Vantrease¹, Qianfan Xu¹ - Show less +9 more•Institutions (1)

Hewlett-Packard¹

20 Feb 2009-Applied Physics A

TL;DR: This paper presents a design study for a many-core architecture called Corona which utilizes dense wavelength division multiplexing (DWDM) for on- and off-chip communication together with the devices which will be needed to implement such a communication infrastructure.

...read moreread less

Abstract: Silicon nanophotonics holds the promise of dramatically advancing the state of the art in computing by enabling parallel architectures that combine unprecedented performance and ease of use with affordable power consumption. This paper presents a design study for a many-core architecture called Corona which utilizes dense wavelength division multiplexing (DWDM) for on- and off-chip communication together with the devices which will be needed to implement such a communication infrastructure.

...read moreread less

170 citations

Proceedings Article•DOI•

Future scaling of processor-memory interfaces

[...]

Jung Ho Ahn¹, Norman P. Jouppi¹, Christos Kozyrakis², Jacob Leverich², Robert Schreiber¹ - Show less +1 more•Institutions (2)

Hewlett-Packard¹, Stanford University²

14 Nov 2009

TL;DR: The design of Multicore DIMM is extended for high-reliability systems and it is shown that compared with conventional chipkill approaches, it can lead to much higher system-level energy efficiency and performance at the cost of additional DRAM devices.

...read moreread less

Abstract: Continuous evolution in process technology brings energy-efficiency and reliability challenges, which are harder for memory system designs since chip multiprocessors demand high bandwidth and capacity, global wires improve slowly, and more cells are susceptible to hard and soft errors. Recently, there are proposals aiming at better main-memory energy efficiency by dividing a memory rank into subsets.We holistically assess the effectiveness of rank subsetting in the context of system-wide performance, energy-efficiency, and reliability perspectives. We identify the impact of rank subsetting on memory power and processor performance analytically, then verify the analyses by simulating a chipmultiprocessor system using multithreaded and consolidated workloads. We extend the design of Multicore DIMM, one proposal embodying rank subsetting, for high-reliability systems and show that compared with conventional chipkill approaches, it can lead to much higher system-level energy efficiency and performance at the cost of additional DRAM devices.

...read moreread less

126 citations

Proceedings Article•DOI•

Light speed arbitration and flow control for nanophotonic interconnects

[...]

Dana M. Vantrease¹, Nathan Binkert², Robert Schreiber², Mikko H. Lipasti¹•Institutions (2)

University of Wisconsin-Madison¹, Hewlett-Packard²

12 Dec 2009

TL;DR: This work exploits CMOS nanophotonic devices to create arbiters that meet the demands of on-chip optical interconnects to become the first arbitration protocols that exploit optics to simultaneously achieve low latency, high utilization, and fairness.

...read moreread less

Abstract: By providing high bandwidth chip-wide communication at low latency and low power, on-chip optics can improve many-core performance dramatically. Optical channels that connect many nodes and allow for single cycle cache-line transmissions will require fast, high bandwidth arbitration. We exploit CMOS nanophotonic devices to create arbiters that meet the demands of on-chip optical interconnects. We accomplish this by exploiting a unique property of optical devices that allows arbitration to scale with latency bounded by the time of flight of light through a silicon waveguide that passes all requesters. We explore two classes of distributed token-based arbitration, channel based and slot based, and tailor them to optics. Channel based protocols allocate an entire waveguide to one requester at a time, whereas slot based protocols allocate fixed sized slots in the waveguide. Simple optical protocols suffer from a fixed prioritization of users and can starve those with low priority; we correct this with new schemes that vary the priorities dynamically to ensure fairness. On a 64-node optical interconnect under uniform random single-cycle traffic, our fair slot protocol achieves 74% channel utilization, while our fair channel protocol achieves 45%. Ours are the first arbitration protocols that exploit optics to simultaneously achieve low latency, high utilization, and fairness.

...read moreread less

122 citations

Journal Article•DOI•

Multicore DIMM: an Energy Efficient Memory Module with Independently Controlled DRAMs

[...]

Jung Ho Ahn¹, Jacob Leverich², Robert Schreiber¹, Norm Jouppi¹•Institutions (2)

Hewlett-Packard¹, Stanford University²

01 Jan 2009-IEEE Computer Architecture Letters

TL;DR: The Multicore DIMM is designed to improve the energy efficiency of memory systems with small impact on system performance, where DRAM chips are grouped into multiple virtual memory devices, each of which has its own data path and receives separate commands.

...read moreread less

Abstract: Demand for memory capacity and bandwidth keeps increasing rapidly in modern computer systems, and memory power consumption is becoming a considerable portion of the system power budget. However, the current DDR DIMM standard is not well suited to effectively serve CMP memory requests from both a power and performance perspective. We propose a new memory module called a multicore DIMM, where DRAM chips are grouped into multiple virtual memory devices, each of which has its own data path and receives separate commands. The Multicore DIMM is designed to improve the energy efficiency of memory systems with small impact on system performance. Dividing each memory modules into 4 virtual memory devices brings a simultaneous 22%, 7.6%, and 18% improvement in memory power, IPC, and system energy-delay product respectively on a set of multithreaded applications and consolidated workloads.

...read moreread less

103 citations

Patent•

Incremental adaptive packet routing in a multi-dimensional network

[...]

Jung Ho Ahn¹, Nathan Binkert¹, Al Davis¹, Moray McLaren¹, Robert Schreiber¹ - Show less +1 more•Institutions (1)

Hewlett-Packard¹

13 Oct 2009

TL;DR: In this paper, a computer system and method that includes a Processing Element (PE) to generate a data packet that is routed along a shortest path that including a plurality of routers in a multiple dimension network is described.

...read moreread less

Abstract: Illustrated is a computer system and method that includes a Processing Element (PE) to generate a data packet that is routed along a shortest path that includes a plurality of routers in a multiple dimension network. The system and method further include a router, of the plurality of routers, to de-route the data packet from the shortest path to an additional path, the de-route to occur where the shortest path is congested and the additional path links the router and an additional router in a dimension of the multiple dimension network.

...read moreread less

27 citations

Patent•

Computer-implemented method for obtaining a minimum biclique cover in a bipartite dataset

[...]

Robert Schreiber¹, Alina Ene¹, Nikola Milosavljevic¹, Robert E. Tarjan¹, Mehul A. Shah¹ - Show less +1 more•Institutions (1)

Hewlett-Packard¹

07 Jan 2009

TL;DR: In this article, a bipartite graph having vertices of a first type, vertices from a second type, and a plurality of edges is constructed, where each edge joins a vertex of the first type with a vertex from the second type.

...read moreread less

Abstract: A method includes providing a bipartite graph having vertices of a first type, vertices of a second type, and a plurality of edges, wherein each edge joins a vertex of the first type with a vertex of the second type. A unipartite edge dual graph is generated from the bipartite graph, and a minimum clique partition of the edge dual graph is recursively determined. A biclique is then created in the bipartite graph corresponding to each clique in the minimum clique partition of the edge dual graph.

...read moreread less

13 citations

Patent•

Computer-implemented method for role discovery and simplification in access control systems

[...]

Robert Schreiber¹, William G. Horne¹•Institutions (1)

Hewlett-Packard¹

05 Jan 2009

TL;DR: In this paper, a method for selecting a first biclique role in a plurality of roles and finding all roles in the plurality that have a set of vertices of a second type that is a subset of the vertices in the first role is presented.

...read moreread less

Abstract: A method includes selecting a first biclique role in a plurality of roles and finding all roles in the plurality that have a set of vertices of a second type that is a subset of a set of vertices of the second type in the first role; removing each of the subsets from the set of vertices of the second type corresponding to the first role; and reassigning the vertices of the first type to the roles such that original associations between the vertices of the first type and the vertices of the second type are maintained.

...read moreread less

6 citations

Journal Article•DOI•

Robust software for computing camera motion parameters

[...]

Robert Schreiber¹, Zeyu Li¹, Harlyn Baker¹•Institutions (1)

Hewlett-Packard¹

01 Jan 2009-Journal of Mathematical Imaging and Vision

TL;DR: The method of Tsai, Huang, and Zhu for the computation of camera motion parameters in computer vision is revisited, and some spectral properties of the homography matrices that arise are elucidated, which are rank-one perturbations of rotation matrices.

...read moreread less

Abstract: We revisit the method of Tsai, Huang, and Zhu for the computation of camera motion parameters in computer vision. We elucidate some spectral properties of the homography matrices that arise, which are rank-one perturbations of rotation matrices. We show how to correct for noise by finding the rank-one perturbation of a rotation closest to a give matrix. We illustrate some of the inaccuracies and computational failures that can arise when using the formulas given by Tsai, and we propose new formulas that avoid these pitfalls. A computational experiment shows that the new methods are indeed quite robust.

...read moreread less

3 citations

Patent•

Fair token arbitration systems and methods

[...]

Nathan Binkert¹, Robert Schreiber¹•Institutions (1)

Hewlett-Packard¹

10 Dec 2009

TL;DR: In this article, the authors describe an arbitration system consisting of a loop-shaped arbitration waveguide, a hungry waveguide and a broadcast waveguide coupled to a home node and a number of requesting nodes.

...read moreread less

Abstract: Various embodiments of the present invention are directed to arbitration systems and methods. In one embodiment, an arbitration system comprises a loop-shaped arbitration waveguide (602), a loop-shaped hungry waveguide (603), and a loop-shaped broadcast waveguide (604). The arbitration, hungry, and broadcast waveguides optically coupled to a home node and a number of requesting nodes. The arbitration waveguide transmits tokens injected by the home node. A token extracted by a requesting node grants the node access to a resource for the duration or length of the token. The hungry waveguide transmits light injected by the home node. A requesting node in a hungry state extracts the light from the hungry waveguide. The broadcast waveguide transmits light injected by the home node such that the light indicates to requesting nodes not in the hungry state to stop extracting tokens from the arbitration waveguide.

...read moreread less

1 citations