scispace - formally typeset
Search or ask a question

Showing papers on "Latency (engineering) published in 1996"


Proceedings ArticleDOI
23 Jun 1996
TL;DR: A rapid synchronization method is presented for an OFDM system using either a continuous transmission or a burst operation over a time-varying, fading channel to acquire the signal and provide channel estimation upon the receipt of just one training sequence of two symbols.
Abstract: A rapid synchronization method is presented for an OFDM system using either a continuous transmission or a burst operation over a time-varying, fading channel. It will acquire the signal and provide channel estimation upon the receipt of just one training sequence of two symbols in the presence of unknown symbol and frame timing, large carrier and sampling frequency offsets, and very low SNRs, while maintaining low latency and low complexity. It can then track the signal with the same algorithms. Digital TV and wireless LAN are used as examples.

153 citations


Journal ArticleDOI
01 Sep 1996
TL;DR: In this article, speculative completion is introduced for designing asynchronous datapath components, which has many of the advantages of a bundled data approach, such as the use of single-rail synchronous datapaths, but it also allows early completion.
Abstract: A new general method for designing asynchronous datapath components, called speculative completion, is introduced. The method has many of the advantages of a bundled data approach, such as the use of single-rail synchronous datapaths, but it also allows early completion. As a case study, the method is applied to the high-performance parallel BLC adder design of Brent and Kung. Through careful gate-level analysis, performance improvements of up to 30% over a comparable synchronous implementation are expected.

100 citations


Journal ArticleDOI
TL;DR: Clock and power distribution as well as circuit design techniques of several blocks are addressed and the MIPS R10000, 200-MHz, 64-b superscalar dynamic issue RISC microprocessor is presented.
Abstract: Design and implementation details of the MIPS R10000, 200-MHz, 64-b superscalar dynamic issue RISC microprocessor is presented It fetches and decodes four instructions per cycle and dynamically issues them to five fully pipelined, low latency execution units, Its hierarchical nonblocking memory system helps hide memory latency with two levels of set-associative, write-back caches The processor has over 68 M transistors and is built in 33-V, 030 /spl mu/m, four-layer metal CMOS technology with under 30 W of power consumption The processor delivers peak performance of Spec95int of 9 and Spec95fp of 19 operating at 200 MHz Clock and power distribution as well as circuit design techniques of several blocks are addressed

88 citations


01 Mar 1996
TL;DR: This dissertation concentrates on two specific areas of handoff processing: routing updates and state distribution and describes the design, implementation and evaluation of these techniques in a variety of networking and computing environments.
Abstract: In this dissertation, we examine the problem of performing handoff quickly in cellular data net-works. We define handoff as the process of reconfiguring the mobile host, wireless network and backbone wired network to support communication after a user enters a different cell of the wire-less network. In order to support applications and protocols used on wired networks, the handoff processing must not significantly affect the typical end-to-end loss or delay of any communications. This dissertation concentrates on two specific areas of handoff processing: routing updates and state distribution. The techniques we use to solve these problems are: (1) Multicast to set up routing in advance of handoff. (2) Hints, based on information from the cellular wireless system, to predict handoff. (3) Intelligent buffering, enabled by the multicast of data, to prevent data loss without the use of complicated forwarding. (4) State replication, enabled by the multicast, to avoid explicit state transfers during the handoff processing. This dissertation describes the design, implementation and evaluation of these techniques in a variety of networking and computing environments. We have shown that any necessary routing updates and state transfers can be performed in a few milliseconds. For example, our implementation in an IP-based testbed completes typical handoffs in 5-15 msec. In addition, the handoff processing introduces no additional packet delays or data loss. The primary cost of our algorithms to improve handoff latency is the use of excess bandwidth on the wired backbone networks. However we have introduced base station layout algorithms that reduce this cost. In current systems, the performance improvement provided by these techniques easily outweigh the resources consumed. Since wired backbone networks will continue to have much greater available bandwidth than their wireless counterparts, this trade-off between handoff performance and network resources will continue to be advantageous in the future.

48 citations


01 Jan 1996
TL;DR: The design in this note describes a new ADI that provides lower latency in common cases and is still easy to implement, while retaining many opportunities for customization to any advanced capabilities that the underlying hardware may support.
Abstract: In this paper we describe an abstract device interface (ADI) that may be used to e ciently implement the Message Passing Interface (MPI). After experience with a rst-generation ADI that made certain assumptions about the devices and tradeo s in the design, it has become clear that, particularly on systems with low-latency communication, the rst-generation ADI design imposes too much additional latency. In addition, the rst-generation design is awkward for heterogeneous systems, complex for noncontiguous messaging, and inadequate at error handling. The design in this note describes a new ADI that provides lower latency in common cases and is still easy to implement, while retaining many opportunities for customization to any advanced capabilities that the underlying hardware may support.

39 citations


25 Sep 1996
TL;DR: The proposed protocol, that is based on TDMA, exploits the available bandwidth fully, the throughput per mobile station is higher compared to other multiple-access protocols, it offers low latency for both real-time and nonreal- time communication and the unused reserved bandwidth is reallocated for non-real-time communication.
Abstract: This paper describes a cellular multiple-access scheme based on TDMA for multimedia communication networks. The scheme proposes an admission control of two different multimedia application stream types: real-time and non-real-time. We do not consider interference between cells. The proposed protocol, that is based on TDMA, exploits the available bandwidth fully. The throughput per mobile station is higher compared to other multiple-access protocols, it offers low latency for both real-time and nonreal-time communication and the unused reserved bandwidth is reallocated for non-real-time communication. Furthermore, the throughput and latency remain stable under high loads.

22 citations


Journal ArticleDOI
TL;DR: A new scheme for a high-throughput and low-latency systolic implementation of FIR digital filters is proposed, in bit-parallel LSB-first bit-skewed form, which is limited by the propagation delay of a gated full adder and a latch.
Abstract: A new scheme for a high-throughput and low-latency systolic implementation of FIR digital filters is proposed. The input and output sequences are in bit-parallel LSB-first bit-skewed form, and the throughput is limited by the propagation delay of a gated full adder and a latch. The bits of a full-bit output sample start coming out of the array three clock cycles after the bits of the corresponding input sample enter the array.

19 citations


Proceedings ArticleDOI
S. Rathnavelu1
18 Nov 1996
TL;DR: A new scheduling scheme called the adaptive time slot (ATS) is described, which can be used at an end point host for shaping the traffic on individual virtual channels whilst multiplexing them on to a common asynchronous transfer mode (ATM) link.
Abstract: A new scheduling scheme called the adaptive time slot (ATS) is described. The ATS scheme can be used at an end point host for shaping the traffic on individual virtual channels whilst multiplexing them on to a common asynchronous transfer mode (ATM) link. The traffic parameters for a wide variety of traffic classes can be accommodated while achieving high link utilization levels. The virtual channels can be grouped in to multiple priority levels. The cell spacing can be changed at every schedule time. Specific techniques to handle the constant bit rate (CBR) and low latency traffic have been developed. The ATS scheme is easy to implement in both hardware and firmware. Due to the low memory requirements the scheme is suited for integration in silicon. The scheme was tested by simulation.

9 citations


Book ChapterDOI
26 Aug 1996
TL;DR: Simulation results show that randomization of output buffer selection in the input driven algorithm increases its performance and substantially reduces the performance discrepancy between the input and output driven algorithms.
Abstract: Communication in parallel computers requires a low latency router. Once a suitable routing algorithm is selected, an implementation must be designed. Issues such as whether the router should be input or output driven need to be considered. In this paper, we use simulations to compare input driven and output driven routing algorithms. Three algorithms, the Dally-Seitz oblivious router, the *-channels router, and the minimal triplex algorithm are evaluated. Each router is implemented as both an input and an output driven router. Experiments are run for each of the router implementations with seven different traffic patterns on both a 256-node two dimensional mesh and torus networks. The results show that in almost all cases, the output driven router matches or outperforms the input driven router. Furthermore, we find that randomization of output buffer selection in the input driven algorithm increases its performance and substantially reduces the performance discrepancy between the input and output driven algorithms. Although the findings apply to the routers considered, we believe the results generalize to other routers.

7 citations



Proceedings Article
22 Jan 1996
TL;DR: For medium size messages, the FLIPC system significantly outperforms other messaging systems on the Intel Paragon and an explicit design focus on programmable communication hardware and the resulting use of wait-free synchronization was a key factor in achieving this level of performance.
Abstract: FLIPC is a new messaging system intended to support distributed real time applications on high performance communication hardware. Application messaging systems designed for high performance computing environments are not well suited to other environments because they lack support for the complex application structures involving multiple processes, threads, and classes of message traffic found in environments such as distributed real time. These messaging systems also have not been optimized for medium size messages found in important classes of real time applications. FLIPC includes additional features to support applications outside the high performance computing domain. For medium size messages, our system significantly outperforms other messaging systems on the Intel Paragon. An explicit design focus on programmable communication hardware and the resulting use of wait-free synchronization was a key factor in achieving this level of performance. The implementation of FLIPC was accelerated by our use of PC clusters connected by ethernet or by a SCSI bus as development platforms to reduce the need for Paragon time.

Journal ArticleDOI
TL;DR: Digital's low latency interconnect, Memory Channel as discussed by the authors, is described from a low-level programmer's perspective, and an overview of the implementation of message passing libraries on top of Memory Channel, and finally the impact of such an optimized message passing facility on an application.
Abstract: Traditional implementations of message passing libraries such as PVM [1] have incurred high latencies. The use of a low latency interconnect [2] and well tuned message passing software greatly reduces this latency. For an interesting class of applications, this reduced latency has a great impact on execution time. This paper describes Digital's low latency interconnect, Memory Channel, from a low-level programmer's perspective. It then gives an overview of the implementation of message passing libraries on top of Memory Channel, and finally describes the impact of such an optimized message passing facility on an application.


Proceedings ArticleDOI
TL;DR: An extremely low latency semaphore passing network within a point to point system is implemented where the tradeoff between latency and bandwidth can be made on a case by case basis.
Abstract: Currently all networking hardware must have predefined tradeoffs between latency and bandwidth. In some applications one feature is more important than the other. We present a system where the tradeoff can be made on a case by case basis. To show this we implement an extremely low latency semaphore passing network within a point to point system.© (1996) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.