scispace - formally typeset
Search or ask a question

Showing papers by "Robert Schreiber published in 2009"


Proceedings ArticleDOI
Jung Ho Ahn1, Nathan Binkert1, Al Davis1, Moray McLaren1, Robert Schreiber1 
14 Nov 2009
TL;DR: This work considers an extension of the hypercube and flattened butterfly topologies, the HyperX, and gives an adaptive routing algorithm, DAL, to take advantage of high-radix switch components that integrated photonics will make available.
Abstract: In the push to achieve exascale performance, systems will grow to over 100,000 sockets, as growing cores-per-socket and improved single-core performance provide only part of the speedup needed. These systems will need affordable interconnect structures that scale to this level. To meet the need, we consider an extension of the hypercube and flattened butterfly topologies, the HyperX, and give an adaptive routing algorithm, DAL. HyperX takes advantage of high-radix switch components that integrated photonics will make available. Our main contributions include a formal descriptive framework, enabling a search method that finds optimal HyperX configurations; DAL; and a low cost packaging strategy for an exascale HyperX. Simulations show that HyperX can provide performance as good as a folded Clos, with fewer switches. We also describe a HyperX packaging scheme that reduces system cost. Our analysis of efficiency, performance, and packaging demonstrates that the HyperX is a strong competitor for exascale networks.

269 citations


Journal ArticleDOI
TL;DR: This paper presents a design study for a many-core architecture called Corona which utilizes dense wavelength division multiplexing (DWDM) for on- and off-chip communication together with the devices which will be needed to implement such a communication infrastructure.
Abstract: Silicon nanophotonics holds the promise of dramatically advancing the state of the art in computing by enabling parallel architectures that combine unprecedented performance and ease of use with affordable power consumption. This paper presents a design study for a many-core architecture called Corona which utilizes dense wavelength division multiplexing (DWDM) for on- and off-chip communication together with the devices which will be needed to implement such a communication infrastructure.

170 citations


Proceedings ArticleDOI
14 Nov 2009
TL;DR: The design of Multicore DIMM is extended for high-reliability systems and it is shown that compared with conventional chipkill approaches, it can lead to much higher system-level energy efficiency and performance at the cost of additional DRAM devices.
Abstract: Continuous evolution in process technology brings energy-efficiency and reliability challenges, which are harder for memory system designs since chip multiprocessors demand high bandwidth and capacity, global wires improve slowly, and more cells are susceptible to hard and soft errors. Recently, there are proposals aiming at better main-memory energy efficiency by dividing a memory rank into subsets.We holistically assess the effectiveness of rank subsetting in the context of system-wide performance, energy-efficiency, and reliability perspectives. We identify the impact of rank subsetting on memory power and processor performance analytically, then verify the analyses by simulating a chipmultiprocessor system using multithreaded and consolidated workloads. We extend the design of Multicore DIMM, one proposal embodying rank subsetting, for high-reliability systems and show that compared with conventional chipkill approaches, it can lead to much higher system-level energy efficiency and performance at the cost of additional DRAM devices.

126 citations


Proceedings ArticleDOI
12 Dec 2009
TL;DR: This work exploits CMOS nanophotonic devices to create arbiters that meet the demands of on-chip optical interconnects to become the first arbitration protocols that exploit optics to simultaneously achieve low latency, high utilization, and fairness.
Abstract: By providing high bandwidth chip-wide communication at low latency and low power, on-chip optics can improve many-core performance dramatically. Optical channels that connect many nodes and allow for single cycle cache-line transmissions will require fast, high bandwidth arbitration. We exploit CMOS nanophotonic devices to create arbiters that meet the demands of on-chip optical interconnects. We accomplish this by exploiting a unique property of optical devices that allows arbitration to scale with latency bounded by the time of flight of light through a silicon waveguide that passes all requesters. We explore two classes of distributed token-based arbitration, channel based and slot based, and tailor them to optics. Channel based protocols allocate an entire waveguide to one requester at a time, whereas slot based protocols allocate fixed sized slots in the waveguide. Simple optical protocols suffer from a fixed prioritization of users and can starve those with low priority; we correct this with new schemes that vary the priorities dynamically to ensure fairness. On a 64-node optical interconnect under uniform random single-cycle traffic, our fair slot protocol achieves 74% channel utilization, while our fair channel protocol achieves 45%. Ours are the first arbitration protocols that exploit optics to simultaneously achieve low latency, high utilization, and fairness.

122 citations


Journal ArticleDOI
TL;DR: The Multicore DIMM is designed to improve the energy efficiency of memory systems with small impact on system performance, where DRAM chips are grouped into multiple virtual memory devices, each of which has its own data path and receives separate commands.
Abstract: Demand for memory capacity and bandwidth keeps increasing rapidly in modern computer systems, and memory power consumption is becoming a considerable portion of the system power budget. However, the current DDR DIMM standard is not well suited to effectively serve CMP memory requests from both a power and performance perspective. We propose a new memory module called a multicore DIMM, where DRAM chips are grouped into multiple virtual memory devices, each of which has its own data path and receives separate commands. The Multicore DIMM is designed to improve the energy efficiency of memory systems with small impact on system performance. Dividing each memory modules into 4 virtual memory devices brings a simultaneous 22%, 7.6%, and 18% improvement in memory power, IPC, and system energy-delay product respectively on a set of multithreaded applications and consolidated workloads.

103 citations


Patent
Jung Ho Ahn1, Nathan Binkert1, Al Davis1, Moray McLaren1, Robert Schreiber1 
13 Oct 2009
TL;DR: In this paper, a computer system and method that includes a Processing Element (PE) to generate a data packet that is routed along a shortest path that including a plurality of routers in a multiple dimension network is described.
Abstract: Illustrated is a computer system and method that includes a Processing Element (PE) to generate a data packet that is routed along a shortest path that includes a plurality of routers in a multiple dimension network. The system and method further include a router, of the plurality of routers, to de-route the data packet from the shortest path to an additional path, the de-route to occur where the shortest path is congested and the additional path links the router and an additional router in a dimension of the multiple dimension network.

27 citations


Patent
07 Jan 2009
TL;DR: In this article, a bipartite graph having vertices of a first type, vertices from a second type, and a plurality of edges is constructed, where each edge joins a vertex of the first type with a vertex from the second type.
Abstract: A method includes providing a bipartite graph having vertices of a first type, vertices of a second type, and a plurality of edges, wherein each edge joins a vertex of the first type with a vertex of the second type. A unipartite edge dual graph is generated from the bipartite graph, and a minimum clique partition of the edge dual graph is recursively determined. A biclique is then created in the bipartite graph corresponding to each clique in the minimum clique partition of the edge dual graph.

13 citations


Patent
05 Jan 2009
TL;DR: In this paper, a method for selecting a first biclique role in a plurality of roles and finding all roles in the plurality that have a set of vertices of a second type that is a subset of the vertices in the first role is presented.
Abstract: A method includes selecting a first biclique role in a plurality of roles and finding all roles in the plurality that have a set of vertices of a second type that is a subset of a set of vertices of the second type in the first role; removing each of the subsets from the set of vertices of the second type corresponding to the first role; and reassigning the vertices of the first type to the roles such that original associations between the vertices of the first type and the vertices of the second type are maintained.

6 citations


Journal ArticleDOI
TL;DR: The method of Tsai, Huang, and Zhu for the computation of camera motion parameters in computer vision is revisited, and some spectral properties of the homography matrices that arise are elucidated, which are rank-one perturbations of rotation matrices.
Abstract: We revisit the method of Tsai, Huang, and Zhu for the computation of camera motion parameters in computer vision. We elucidate some spectral properties of the homography matrices that arise, which are rank-one perturbations of rotation matrices. We show how to correct for noise by finding the rank-one perturbation of a rotation closest to a give matrix. We illustrate some of the inaccuracies and computational failures that can arise when using the formulas given by Tsai, and we propose new formulas that avoid these pitfalls. A computational experiment shows that the new methods are indeed quite robust.

3 citations


Patent
10 Dec 2009
TL;DR: In this article, the authors describe an arbitration system consisting of a loop-shaped arbitration waveguide, a hungry waveguide and a broadcast waveguide coupled to a home node and a number of requesting nodes.
Abstract: Various embodiments of the present invention are directed to arbitration systems and methods. In one embodiment, an arbitration system comprises a loop-shaped arbitration waveguide (602), a loop-shaped hungry waveguide (603), and a loop-shaped broadcast waveguide (604). The arbitration, hungry, and broadcast waveguides optically coupled to a home node and a number of requesting nodes. The arbitration waveguide transmits tokens injected by the home node. A token extracted by a requesting node grants the node access to a resource for the duration or length of the token. The hungry waveguide transmits light injected by the home node. A requesting node in a hungry state extracts the light from the hungry waveguide. The broadcast waveguide transmits light injected by the home node such that the light indicates to requesting nodes not in the hungry state to stop extracting tokens from the arbitration waveguide.

1 citations