Showing papers by "Thomas Sterling published in 2004"

PDF

Open Access

Journal Article•DOI•

Productivity Metrics and Models for High Performance Computing

[...]

01 Nov 2004

TL;DR: This paper explores the concept of productivity as a quantifiable; parameter through a series of analytical models and considers the factors that contribute to it.

...read moreread less

Abstract: Historically, high performance computing has been measured in terms of peak or delivered performance, and to a lesser extent to performance to cost. Such metrics fail to capture the impact on the usefulness and ease of use of such systems. Productivity has been identified as a new parameter for high end computing systems that include both delivered system performance and the programmability of the system. System productivity is directly affected by many factors contributing to the achieved performance, the speed with which users construct application programs, and the availability of the system to perform user applications. This paper explores the concept of productivity as a quantifiable; parameter through a series of analytical models and considers the factors that contribute to it.

...read moreread less

26 citations

Proceedings Article•DOI•

Analysis and Modeling of Advanced PIM Architecture Design Tradeoffs

[...]

Ed Upchurch¹, Thomas Sterling¹, Jay B. Brockman²•Institutions (2)

California Institute of Technology¹, University of Notre Dame²

06 Nov 2004

TL;DR: The work represented here was performed under the Cascade project to explore critical design space issues that will determine the value of PIM in supercomputers and contribute to the optimization of its design.

...read moreread less

Abstract: A major trend in high performance computer architecture over the last two decades is the migration of memory in the form of high speed caches onto the microprocessor semiconductor die. Where temporal locality in the computation is high, caches prove very effective at hiding memory access latency and contention for communication resources. However where temporal locality is absent, caches may exhibit low hit rates resulting in poor operational efficiency. Vector computing exploiting pipelined arithmetic units and memory access address this challenge for certain forms of data access patterns, for example involving long contiguous data sets exhibiting high spatial locality. But for many advanced applications for science, technology, and national security at least some data access patterns are not consistent to the restricted forms well handled by either caches or vector processing. An important alternative is the reverse strategy; that of migrating logic in to the main memory (DRAM) and performing those operations directly on the data stored there. Processor in Memory (PIM) architecture has advanced to the point where it may fill this role and provide an important new mechanism for improving performance and efficiency of future supercomputers for a broad range of applications. One important project considering both the role of PIM in supercomputer architecture and the design of such PIM components is the Cray Cascade Project sponsored by the DARPA High Productivity Computing Program. Cascade is a Petaflops scale computer targeted for deployment at the end of the decade that merges the raw speed of an advanced custom vector architecture with the high memory bandwidth processing delivered by an innovative class of PIM architecture. The work represented here was performed under the Cascade project to explore critical design space issues that will determine the value of PIM in supercomputers and contribute to the optimization of its design. But this work also has strong relevance to hybrid systems comprising a combination of conventional microprocessors and advanced PIM based intelligent main memory.

...read moreread less

16 citations

Proceedings Article•DOI•

In pursuit of petaflops - the Cray Cascade Project

[...]

Thomas Sterling¹•Institutions (1)

Jet Propulsion Laboratory¹

06 Mar 2004

TL;DR: Cascade will provide the first scalable commercial system that combines future generation vector heavy-weight processors and multi-threaded in-memory lightweight processors to support the two critical temporal locality modes of computing, exploit very high system-wide bandwidth, and address conventional sources of inefficiencies, performance degradation, and poor scalability.

...read moreread less

Abstract: For the last decade, the high performance computing community has explored the potential opportunities and challenges of realizing computing systems capable of operating in the trans-Petaflops performance regime. Since the inaugural multiagency 1994 Workshop on Enabling Technologies for Petaflops Computing, the goal of establishing and exploiting such unprecedented capability for defense, scientific, industrial, and commercial applications has been investigated in the abstract.

...read moreread less

1 citations

Proceedings Article•

The "MIND" scalable PIM architecture.

[...]

Thomas Sterling¹, Maciej Brodowicz¹•Institutions (1)

California Institute of Technology¹

01 Jan 2004

TL;DR: The MIND (Memory, Intelligence, and Network Device) architecture as discussed by the authors is an advanced parallel computer architecture for high performance computing and scalable embedded processing, which integrates both DRAM bit cells and CMOS logic devices on the same silicon die.

...read moreread less

Abstract: MIND (Memory, Intelligence, and Network Device) is an advanced parallel computer architecture for high performance computing and scalable embedded processing. It is a Processor-in-Memory (PIM) architecture integrating both DRAM bit cells and CMOS logic devices on the same silicon die. MIND is multicore with multiple memory/processor nodes on each chip and supports global shared memory across systems of MIND components. MIND is distinguished from other PIM architectures in that it incorporates mechanisms for efficient support of a global parallel execution model based on the semantics of message-driven multithreaded split-transaction processing. MIND is designed to operate either in conjunction with other conventional microprocessors or in standalone arrays of like devices. It also incorporates mechanisms for fault tolerance, real time execution, and active power management. This paper describes the major elements and operational methods of the MIND architecture.

...read moreread less

1 citations