scispace - formally typeset
Search or ask a question

Showing papers on "FLOPS published in 2005"


01 Jan 2005
TL;DR: The performance of the Cray X1's distributed shared-memory system and its interconnection network is characterized using microbench-marks and applications.
Abstract: The Cray X1 supercomputer, introduced in 2002, has several interesting architectural features. Two key features are the X1's distributed shared memory and its vector multiprocessors. The Cray X1 supercomputer's distributed shared memory presents a 64-bit global address space that is directly addressable from every MSP with an interconnect bandwidth per computation rate of 1 byte/flop. In this article, we characterize the performance of the X1's distributed shared-memory system and its interconnection network using microbench-marks and applications.

11 citations



Patent
06 May 2005
TL;DR: In this paper, the authors proposed a system and method for improving transition delay fault coverage through use of augmented flip-flops (TL flops) for a broadside test approach.
Abstract: The present invention is directed to a system and method for improving transition delay fault coverage through use of augmented flip-flops (TL flops) for a broadside test approach. The TL flops use the same clock for scan and functional operation. Thus, the TL flops do not require a fast signal switching between launch and test response capture. Each of the TL flops includes additional multiplexer in front of a standard scan flop and a transition enable (TEN) signal. Moreover, only a heuristically selected subset of scan flip-flops is replaced with the TL flops and only one additional MUX per selected scan flip-flop may contribute an area overhead. Consequently, the overall chip area overhead may be minimal. The present invention may be suitable for being implemented with currently available third party ATPG.

7 citations


Journal Article
TL;DR: An analytic framework is developed to predict Cell performance on dense and sparse matrix operations, using a variety of algorithmic approaches, that demonstrates Cell's potential to deliver more than an order of magnitude better GFLOP/s per watt performance, when compared with the Intel Itanium2 and Cray X1 processors.
Abstract: The slowing pace of commodity microprocessor performance improvements combined with ever-increasing chip power demands has become of utmost concern to computational scientists Therefore, the high performance computing community is examining alternative architectures that address the limitations of modern superscalar designs In this work, we examine STI's forthcoming Cell processor: a novel, low-power architecture that combines a PowerPC core with eight independent SIMD processing units coupled with a software-controlled memory to offer high FLOP/s/Watt Since neither Cell hardware nor cycle-accurate simulators are currently publicly available, we develop an analytic framework to predict Cell performance on dense and sparse matrix operations, using a variety of algorithmic approaches Results demonstrate Cell's potential to deliver more than an order of magnitude better GFLOP/s per watt performance, when compared with the Intel Itanium2 and Cray X1 processors

6 citations


Proceedings ArticleDOI
01 Jan 2005
TL;DR: This paper presents a study on the impact of sinusoidal clock signals on power and performance of six conventional flip-flops, and finds that two-phase master-slave flips and single-phase sense-amplifier flips both obtain robust timing behavior, and minimum power-delay degradation.
Abstract: This paper can be viewed as a supplement to recent interest in different on-chip resonant clocking techniques. We present a study on the impact of sinusoidal clock signals on power and performance of six conventional flip-flops. The dominating effects are delay penalties of 20-30 % for the best flip-flops, and reduced race-margins. Two-phase master-slave flip-flops and single-phase sense-amplifier flip-flops both obtain robust timing behavior, and minimum power-delay degradation.

6 citations


Proceedings ArticleDOI
18 Sep 2005
TL;DR: Low-density parity-check codes achieve coding performance which approaches the Shannon limit and use of generator polynomial reconstruction, partial product multiplication and functional sharing in the parity register results in a highly efficient design.
Abstract: Low-density parity-check codes achieve coding performance which approaches the Shannon limit. Based on a novel method for deriving regular quasi-cyclic sub-codes, a radiation tolerant encoder was implemented in 0.25/spl mu/m CMOS. Use of generator polynomial reconstruction, partial product multiplication and functional sharing in the parity register results in a highly efficient design. Only 1,492 flip flops along with a programmable 21-bit look-ahead scheme are used to achieve a 1 Gb/s data throughput. A comparable two-stage encoder requires 8,176 flip flops.

6 citations


Journal ArticleDOI
TL;DR: Two novel static D flip-flops based on a novel bistable-gated bipolar device are proposed and their logic functionality and improved speed in comparison to the conventional static D Flip-flop are verified with SPICE simulation.
Abstract: Two novel static D flip-flops based on a novel bistable-gated bipolar device are proposed. Their logic functionality and improved speed in comparison to the conventional static D flip-flop are verified with SPICE simulation.

5 citations


13 Mar 2005
TL;DR: In this paper, the authors proposed three exponentially correlated acceleration approaches for accuracy and computational complexity: Singer model, Bar-Shalom and Fortmann's model, and Sklansky model.
Abstract: This paper proposes three exponentially correlated acceleration approaches for accuracy and computational complexity. These models are Singer model, Bar-Shalom and Fortmann's model. Simulation results show that the Singer models and the Bar-Shalom and Fortmann models, each a six state estimate model, require approximately the same number of flops. The Bar-Shalom and Fortmann model requires more flops due to the size of the Q and G matrices. The Sklansky model is a four state estimator and requires about 2/3 of the number of flops of the Singer model.