Showing papers on "Latency (engineering) published in 1995"

PDF

Open Access

Journal Article•DOI•

Low-latency communication over ATM networks using active messages

[...]

T. von Eicken¹, Anindya Basu¹, V. Buch¹•Institutions (1)

01 Feb 1995-IEEE Micro

TL;DR: The differences in communication characteristics between workstation clusters built from standard hardware and software components and state-of-the-art multiprocessors are discussed, and a prototype implementation of an active message communication layer is evaluated.

...read moreread less

Abstract: Today's communication architectures for parallel machines reduce communication overheads and latencies by over an order of magnitude. However, carrying over these techniques to workstation clusters connected by an ATM network presents major design challenges. We discuss the differences in communication characteristics between workstation clusters built from standard hardware and software components and state-of-the-art multiprocessors, and then evaluate a prototype implementation of an active message communication layer. Application round-trip latencies of about 50 microseconds for small messages roughly compare to a similar implementation on the Thinking Machines CM-5 multiprocessor. >

...read moreread less

79 citations

Proceedings Article•DOI•

Are crossbars really dead?: the case for optical multiprocessor interconnect systems

[...]

Andreas Nowatzyk¹, Paul R. Prucnal²•Institutions (2)

Sun Microsystems¹, Princeton University²

01 May 1995

TL;DR: OTDM technology is discussed only to the extent necessary to understand its characteristics and capabilities and it is expected that cost-reduced OTDM systems will become competitive with the next generation of interconnect systems.

...read moreread less

Abstract: Crossbar switches are rarely considered for large, scalable multiprocessor interconnect systems because they require O(n2) switching elements, are difficult to control efficiently and are hard to implement once their size becomes too large to fit on one integrated circuit. However these problems are technology dependent and a recent innovation in fiber optic devices has led to a new implementation of crossbar switches that does not share these problems while retaining the full advantages of a crossbar switch: low latency, high throughput, complete connectivity and multi-cast capability. Moreover, this new technology has several characteristics that allow a distributed control system which scales linearly in the number of attached nodes.The innovation that led to this research is an optical and-gate that can be used to demultiplex multiple high speed data streams that are carried on one common optical medium. Optical time domain multiplexing can combine the data from many nodes and broadcast the result back to all nodes. This paper discusses OTDM technology only to the extent necessary to understand its characteristics and capabilities. The main contribution lies in the description and analysis of interconnect architectures that utilize OTDM to achieve a level performance that is beyond electronic means. It is expected that cost-reduced OTDM systems will become competitive with the next generation of interconnect systems.

...read moreread less

56 citations

Patent•

IP bridge for parallel machines

[...]

Ronald Mraz¹, Michael M. Tsao¹•Institutions (1)

IBM¹

10 Jan 1995

TL;DR: In this paper, the authors proposed a high performance, standard IO interconnect "bridge" hardware for a parallel machine with a packet switching network in place, combining new hardware and new software, this bridge connects parallel processors to the external world.

...read moreread less

Abstract: This invention is a high performance, standard IO interconnect "bridge" hardware for a parallel machine with a packet switching network in place. Combining new hardware and new software, this bridge connects parallel processors to the external world. The hardware is a "bridge" connecting an internal inter-processor switch to external asynchronous transfer node networks. The software is a "mirror" for making the connections. The invention provides high bandwidth, low latency and deterministic performance, and is inexpensive to build.

...read moreread less

24 citations

Patent•

Asynchronous low latency data recovery apparatus and method

[...]

Robert Betts¹, Howard Thomas Olnowich¹•Institutions (1)

IBM¹

21 Mar 1995

TL;DR: In this article, a low-latency recovery device for serial data transmission is presented. But the receiver device operates asynchronously in respect to a transmitting device and does not have a metastability proof latch.

...read moreread less

Abstract: A receiver device is provided with a low latency recovery apparatus for recovering serially transmitted digital data. The receiver device operates asynchronously in respect to a transmitting device. The low latency recovery apparatus synchronizes the receiver device in one clock time to support throughput of high speed transmission messages received from interconnection networks or interface cables. A metastability proof latch is provided. A synchronization method provides individual alignment for each incoming message. There is instantaneous response to back-to-back messages from different sources. Synchronization is accomplished in the receiving device by implementing a clocking system capable of generating N phase-shifted clocks all operating at the same frequency as the incoming data. The N clocks are shifted an approximately equal amount in relation to each other. The data recovery apparatus selects the one of N clocks which is best in synchronization with the incoming serial data and then to receive the message correctly. The apparatus has a two wire interface for serial data and a bracketing control signal. Serial data is synchronized first to the selected clock and then to a local clock. The bracketing control signals when each message recovery is complete and triggers the start of another message recovery in as little as one clock time.

...read moreread less

19 citations

Proceedings Article•DOI•

Low-latency asynchronous FIFO buffers

[...]

J.T. Yantchev¹, C.G. Huang¹, Mark B. Josephs¹, I.M. Nedelchev¹•Institutions (1)

University of Adelaide¹

30 May 1995

TL;DR: A parallel asynchronous implementation of a FIFO buffer is described and compared with the conventional alternative asynchronous implementation, Sutherland's micropipeline, and a high-throughput multiple-burst signalling scheme is supported, in which a second burst of data is transmitted at the same time as the previous burst is acknowledged, effectively increasing the overall throughput.

...read moreread less

Abstract: A parallel asynchronous implementation of a FIFO buffer is described and compared with the conventional alternative asynchronous implementation, Sutherland's micropipeline. The parallel design has the potential for significant reductions in propagation delay at the cost of insignificant increases in cycle-time (i.e. reduced throughput) and area. Although in certain applications, e.g. DSP, only high throughput may be important, in others, e.g. packet switching, throughout and propagation delay both matter. We consider the parallel design to be most useful as part of the interface circuitry required by devices that asynchronously exchange data in bursts over inter-chip communication wires and use a single acknowledge signal for each burst of data. In particular, a high-throughput multiple-burst signalling scheme is supported, in which a second burst of data is transmitted at the same time as the previous burst is acknowledged, effectively increasing the overall throughput.

...read moreread less

18 citations

Proceedings Article•DOI•

A prototype router for the massively parallel computer RWC-1

[...]

T. Yokota, Hiroshi Matsuoka, K. Okamoto, Hideo Hirono, A. Hori, Shuichi Sakai - Show less +2 more

02 Oct 1995

TL;DR: A new class of direct interconnection networks called MDCE (Multidimensional Directed Cycles Ensemble extension) has many desirable features for RWC-1 including small degree, low latency, and high throughput, and MDCE is thus adopted for a R WC-1 network.

...read moreread less

Abstract: The RWC-1 is a massively parallel computer based on a multi-threaded architecture. This architecture requires extremely high communication performance with reasonable hardware cost. ln this paper, we first introduce a new class of direct interconnection networks called MDCE (Multidimensional Directed Cycles Ensemble extension). MDCE has many desirable features for RWC-1 including small degree, low latency, and high throughput. MDCE is thus adopted for a RWC-1 network. We have designed an MDCE router and fabricated an experimental VLSI chip. We explain the design details in this paper. The chip employs operating system support features as well as communication functions, and enables advanced resource management, A prototype chip with about 125,000 gates has been fabricated using 0.6-/spl mu/m CMOS gate array technology. Its clock runs at 50 MHz and a transmission rate of 300 M bytes per second per communication port is achieved.

...read moreread less

12 citations

Proceedings Article•DOI•

The simultaneous optical multiprocessor exchange bus

[...]

Jeffrey H. Kulick¹, W.E. Cohen¹, Constantine Katsinis¹, E. Wells¹, A. Thomsen¹, Rhonda Kay Gaede¹, Robert G. Lindquist¹, Gregory P. Nordin¹, M. Abushagur¹, D. Shen¹ - Show less +6 more•Institutions (1)

University of Alabama¹

23 Oct 1995

TL;DR: This paper proposes an optical interconnect architecture for over a hundred processors, which contains a dedicated channel for each processor to eliminate global arbitration and to provide bandwidth that scales with the number of processors in the machine.

...read moreread less

Abstract: Low latency, high bandwidth interconnection networks that directly link arbitrary pairs of processing elements without contention are very desirable for parallel computers. Most communication networks in parallel machines have made compromises due to the limitations of electronics. Many of the optical interconnection schemes proposed have simply replaced the point-to-point copper wiring with fiber optics and have not made use of the unique properties of optics. This paper proposes an optical interconnect architecture for over a hundred processors, which contains a dedicated channel for each processor to eliminate global arbitration and to provide bandwidth that scales with the number of processors in the machine. Unlike electrical buses, this architecture is not limited by the medium (fiber optics) used to connect the transmitters and receivers. Each processor has an array of receivers, one receiver for each processor channel. The architecture of the receiver array permits a variety of different parallel programming models to be efficiently supported.

...read moreread less

8 citations

Journal Article•DOI•

A performance model of pipelined k-ary n-cubes

[...]

P.T. Gaughan¹, S. Yalamanchili•Institutions (1)

University of Alabama¹

01 Aug 1995-IEEE Transactions on Computers

TL;DR: An analytic performance model of pipelined communication in k-ary n cubes is presented intended to capture and study key performance issues and the modeling of throughput and latency is addressed.

...read moreread less

Abstract: Pipelined communication using virtual channels can realize low latency, high throughput, inter-processor communication. This paper presents an analytic performance model of pipelined communication in k-ary n cubes. The model contains elements intended to capture and study key performance issues. In addition to the modeling of throughput and latency, the following issues are addressed using this model: (1) the tradeoff between full-duplex vs. half-duplex links, (2) the effects of intranode delay, (3) the effects of buffer size for each virtual channel. Detailed simulation experiments under a variety of conditions establish the viability of this model. >

...read moreread less

6 citations

Proceedings Article•DOI•

An efficient implementation of motion estimation algorithms

[...]

Qingming Shu¹, Hongyi Chen¹•Institutions (1)

Tsinghua University¹

24 Oct 1995

TL;DR: A novel low latency and high throughput programmable motion estimator architecture is proposed, which can efficiently implement both fall search and hierarchical search algorithms in motion estimation in VLSI.

...read moreread less

Abstract: In this paper, a novel low latency and high throughput programmable motion estimator architecture is proposed, which can efficiently implement both fall search and hierarchical search algorithms in motion estimation in VLSI.

...read moreread less

4 citations

Proceedings Article•DOI•

Improving the demand-priority protocol

[...]

Jörg Ottensmeyer¹, Peter Martini¹•Institutions (1)

University of Paderborn¹

20 Sep 1995

TL;DR: The drawbacks of the demand-priority protocol are revealed and the advantages of using service strategies different from those included in the current version of the draft standard are clearly shown.

...read moreread less

Abstract: The demand-priority protocol currently in the process of standardization by IEEE 802.12 aims at supporting interactive multimedia applications by providing a low latency service for high-priority traffic. This goal may be achieved in case of large frames and/or small distances. Otherwise, improvements are required. This paper starts with a brief description of the basic characteristics of the network topology and the medium access control protocol. Then, it presents simulation results for normal and high priority traffic in scenarios with variable bit rate high priority loads for networks of different sizes. It reveals the drawbacks of the demand-priority protocol and clearly shows the advantages of using service strategies different from those included in the current version of the draft standard.

...read moreread less

4 citations

Proceedings Article•DOI•

A supercomputer system interconnect and scalable IOS

[...]

S. Johnson¹, S. Scott¹•Institutions (1)

Cray¹

11 Sep 1995

TL;DR: A new system channel is developing to meet the need for a new supercomputer system interconnect that integrates control and data on a single, physical path while providing low latency and variance for control messages.

...read moreread less

Abstract: The evolution of system architectures and system configurations has created the need for a new supercomputer system interconnect. Attributes required of the new interconnect include commonality among system and subsystem types, scalability, low latency, high bandwidth, a high level of resiliency, and flexibility. Cray Research Inc. is developing a new system channel to meet these interconnect requirements in future systems. The channel has a ring-based architecture, but can also function as a point-to-point link. It integrates control and data on a single, physical path while providing low latency and variance for control messages. Extensive features for client isolation, diagnostic capabilities, and fault tolerance have been incorporated into the design. The attributes and features of this channel are discussed along with implementation and protocol specifics.

...read moreread less

Book Chapter•DOI•

A low latency digital neural network architecture

[...]

William Fornaciari, Fabio Salice

02 Jan 1995

TL;DR: Neural networks are an effective approach to solve non-standard or non-algorithmic problems such as system control, classification and pattern recognition that are balanced by the costs both of design and silicon implementation.

...read moreread less

Abstract: Neural networks are an effective approach to solve non-standard or non-algorithmic problems such as system control, classification and pattern recognition. These important capabilities, are balanced by the costs both of design and silicon implementation.

...read moreread less

Proceedings Article•

Ultrafast low-latency soliton logic gate using low-birefringence polarization-maintaining fiber

[...]

M. Vaziri, K.H. Ahn, B. C. Barnett, G. R. Williams, Mohammed N. Islam, K. O. Hill, B. Malo - Show less +3 more

21 May 1995

Journal Article•DOI•

Architecture of a parallel machine: Cenju‐3

[...]

Tsutomu Maruyama, Yasushi Kanoh, Tetsuya Hirose, Kazuhiro Muramatsu, Toshiyuki Nakata, Yoshihiro Asano, Yu Inamura - Show less +3 more

01 Jan 1995-Systems and Computers in Japan

TL;DR: The dedicated network interface hardware designed for the Cenju-3 system achieves low latency and high throughput and is presented in this paper.

...read moreread less

Abstract: Cenju-3 is a parallel computer in which up to 256 processing elements (PEs) are connected by a highspeed multistage interconnection network. In designing the system, the architecture is tuned for up to a 256 processor system. A VR4400 with 1 MB of secondary cache memory is implemented on a multi-chip-module to realize a compact and high-performance PE. The multistage network is implemented very compactly. The number of the cables is equal to the number of processors. The dedicated network interface hardware designed for the system achieves low latency and high throughput. This paper presents the machine architecture and its evaluation.

...read moreread less

Improving the emand-Priori ty Pro tocol

[...]

Jörg Ottensmeyer, Peter Martini

01 Jan 1995

TL;DR: Simulation results for normal and high priority traffic in scenarios with variable bit rate high priority loads for networks of different sizes are presented and the drawbacks of the demand-priority protocol are revealed.

...read moreread less

Abstract: The demand-priority protocol currently in the process of standardization by IEEE 802. I2 aims at supporting interactive multimedia applications by providing a low latency service for high-priority traffic. This goal may be achieved in case of large frames and/or small distances. Otherwise, improvements are required. This paper starts with a brief description of basic characteristics of the network topology and the medium access control protocol. Then, it presents simulation results for normal and high priority traffic in scenarios with variable bit rate high priority loads for networks of different sizes. It reveals the drawbacks of the demand-priority protocol and clearly shows the advantages of using service strategies different from those included in the current version of the draft standard.

...read moreread less

Journal Article•DOI•

Design and simulation of the on-line trigger and reconstruction farm for the hera-b experiment

[...]

I. C. Legrand, U. Gensch, H. Leich, P. Wegner

01 Aug 1995-International Journal of Modern Physics C

TL;DR: The common third level trigger and forth level reconstruction farm for the future HERA-B experiment will have to perform full on-line event reconstruction and calibration for an expected input rate of 2000 events/s.

...read moreread less

Abstract: The common third level trigger and forth level reconstruction farm for the future HERA-B experiment will have to perform full on-line event reconstruction and calibration for an expected input rate of 2000 events/s. More than a hundred powerful RISC processors connected in a network capable of distributing several hundreds of MB/s with low latency are likely to be necessary for this task. Proper simulation of the real time multi-processor systems is central for an optimal design (hardware and software protocol) of a scalable and flexible parallel data processing architecture. A discrete event, process oriented simulation developed in concurrent μC++ is used as a framework for modelling and evaluating different farm architectures. An object oriented graphic interface to the simulation allows the monitoring of various features and provides an easier way to optimize the system.

...read moreread less

Architecture and Performance Analysis of DIRSMIN: A Fault-Tolerant Switch Using Dilated Reduced-Stage MIN.

[...]

Arun K. Somani¹, Tianming Zhang•Institutions (1)

University of Washington¹

01 Nov 1995

TL;DR: This paper develops and analyzes a dilated high performance fault tolerant fast packet multistage interconnection network (MIN) and shows that the new design has considerably higher performance in the presence of a faulty switching element or link in comparison to dilated networks.

...read moreread less

Abstract: We develop and analyze a dilated high performance fault tolerant fast packet multistage interconnection network (MIN) in this paper. In this new design, the links at the input and the output stages of a dilated banyan-based MIN are rearranged to create multiple routes for each source-destination pair in the network after removing one stage in the network. These multiple paths are link- and node-disjoint. Fault tolerance at low latency is achieved by sending multiple copies of each input packet simultaneously using different routes and different priorities. This guarantees that high throughput is maintained even in the presence of faults. Throughput is analyzed using simulation and analysis and we show that the new design has considerably higher performance in the presence of a faulty switching element (SE) or link in comparison to dilated networks. We also analyze the reliability and show that the new design has superior reliability in comparison to competing proposals.

...read moreread less