scispace - formally typeset
Search or ask a question

Showing papers on "Latency (engineering) published in 2003"


Proceedings ArticleDOI
22 Mar 2003
TL;DR: An experiment investigating the effect of latency on other metrics of VE effectiveness: physiological response, simulator sickness, and self-reported sense of presence found participants in the low latency condition had a higher self- reported sense of Presence and a statistically higher change in heart rate between the two rooms.
Abstract: Previous research has shown that even low end-to-end latency can have adverse effects on performance in virtual environments (VE). This paper reports on an experiment investigating the effect of latency on other metrics of VE effectiveness: physiological response, simulator sickness, and self-reported sense of presence. The VE used in the study includes two rooms: the first is normal and non-threatening; the second is designed to evoke a fear/stress response. Participants were assigned to either a low latency (/spl sim/50 ms) or high latency (/spl sim/90 ms) group. Participants in the low latency condition had a higher self-reported sense of presence and a statistically higher change in heart rate between the two rooms than did those in the high latency condition. There were no significant relationships between latency and simulator sickness.

278 citations


Proceedings ArticleDOI
01 Dec 2003
TL;DR: This paper proposes a heuristic solution for the problem of minimum energy convergecast which also works toward minimizing data latency and results show that this algorithms performance for broadcasting is better compared to other broadcast techniques.
Abstract: In wireless sensor networks (WSN) the process of dissemination of data among various sensors (broadcast) and collection of data from all sensors (convergecast or data aggregation) are common communication operations. With increasing demands on efficient use of battery power, many efficient broadcast tree construction and channel allocation algorithms have been proposed. Generally convergecast is preceded by broadcast. Hence the tree used for broadcast is also used for convergecast. Our research shows that this approach is inefficient in terms of latency and energy consumption. In this paper we propose a heuristic solution for the problem of minimum energy convergecast which also works toward minimizing data latency. This algorithm constructs a tree using a greedy approach where new nodes are added to the tree such that weight on the branch to which it is added is less. The algorithm then allocates direct sequence spread spectrum or frequency hopping spread spectrum codes. Simulation results show that energy consumed and communication latency of our approach is lower than some of the existing approaches for convergecast. We have then used our algorithm to perform broadcast. Surprisingly our results show that this algorithms performance for broadcasting is better compared to other broadcast techniques.

84 citations


Proceedings ArticleDOI
01 Nov 2003
TL;DR: The present design allows us to instantiate arbitrary network topologies, has a low latency and high throughput, and is part of the platform the author is developing for reconfigurable systems.
Abstract: An efficient methodology for building the billion-transistors systems on chip of tomorrow is a necessity. Networks on chip promise to be the solution for the numerous technological, economical and productivity problems. We believe that different types of networks are required for each application domains. Our approach therefore is to have a very flexible network design, highly scalable, that allows to easily accommodate the various needs. This paper presents the design of our network on chip, which is part of the platform we are developing for reconfigurable systems. The present design allows us to instantiate arbitrary network topologies, has a low latency and high throughput.

72 citations


Proceedings ArticleDOI
22 Apr 2003
TL;DR: The implementation of shielded processors in RedHawk Linux and their benefits are described and the results of real time performance benchmarks are presented.
Abstract: The low latency and preemption patches provide significant progress making standard Linux into a more responsive system for real-time applications. These patches allow guarantees on worst case interrupt response time at slightly above a millisecond. However, these guarantees can only be met when there is no networking or graphics activity in the system. This paper describes the implementation of shielded processors in RedHawk Linux and their benefits. It also presents the results of real time performance benchmarks. Interrupt response time guarantees are significantly below one millisecond and can be guaranteed even in the presence of networking and graphics activity.

46 citations


Proceedings ArticleDOI
11 May 2003
TL;DR: The authors propose the use of OBS to realize a geographically distributed packet switch for metro rings by combining a multi-token based protocol for contention less and loss-free transmission of bursts, known as the lightring protocol, with the creation of bursts that contain packets belonging to multiple traffic flows.
Abstract: Optical burst switching (OBS) provides statistical multiplexing capabilities at the optical layer with relaxed hardware requirements when compared to optical packet switching. One of the open challenges of OBS is to assemble as many packets as possible in the same burst, while at the same time ensuring low latency of the transmitted packets. The authors propose the use of OBS to realize a geographically distributed packet switch for metro rings. High efficiency of the ring bandwidth and low packet latency are obtained at the ring node by combining a multi-token based protocol for contention less and loss-free transmission of bursts, known as the lightring protocol, with the creation of bursts that contain packets belonging to multiple traffic flows (classified by priority and destination). As illustrated in the paper, the proposed solution yields throughput that is significantly higher than that one offered by a centralized packet switch connected to the ring nodes via dedicated optical circuits. Latency of real time packets is kept at few dozens of milliseconds under a variety of network scenarios. The solution scales well geographically for metro applications.

31 citations


Patent
Da-Shan Shiu1, Li Zhang1, Eugene Sy1
24 Apr 2003
TL;DR: In this paper, a controller receives a frequency switch command and generates a signal at a time determined in accordance with a system timer, with DC cancellation control and gain control, with signaling to control the iterations without need for processor intervention.
Abstract: Techniques for improved low latency frequency switching are disclosed. In one embodiment, a controller receives a frequency switch command and generates a frequency switch signal at a time determined in accordance with a system timer. In another embodiment, gain calibration is initiated subsequent to the frequency switch signal delayed by the expected frequency synthesizer settling time. In yet another embodiment, DC cancellation control and gain control are iterated to perform gain calibration, with signaling to control the iterations without need for processor intervention. Various other embodiments are also presented. Aspects of the embodiments disclosed may yield the benefit of reducing latency during frequency switching, allowing for increased measurements at alternate frequencies, reduced time spent on alternate frequencies, and the capacity and throughput improvements that follow from minimization of disruption of an active communication session and improved neighbor selection.

30 citations


Patent
06 Oct 2003
TL;DR: In this article, a technique that enables a shared communications medium to achieve an increased data rate under lossy conditions while maintaining low latency is disclosed, which incorporates two aspects that enable the improved performance.
Abstract: A technique that enables a shared communications medium to achieve an increased data rate under lossy conditions while maintaining low latency is disclosed. The technique incorporates two aspects that enable the improved performance. The first aspect comprises the rigorous use of a single message flow between two stations at any given time with interframe spaces that are adjusted to allow an uninterrupted flow of frames. An admission control protocol enforces the single flow. The second aspect is the creation of high shared channel utilization (i.e., 'efficiency'). Efficiency is achieved by generating enough opportunities for stations to get on the air, in part by minimizing backoff intervals when a priority flow is needed.

27 citations


Proceedings ArticleDOI
15 Sep 2003
TL;DR: The performance results show that all three interconnects achieve low latency, high bandwidth and low host overhead, however, they show quite different performance behaviors when handling completion notification, unbalanced communication patterns and different communication buffer reuse patterns.
Abstract: In this paper we present a comprehensive performance evaluation of three high speed cluster interconnects: Infini-Band, Myrinet and Quadrics. We propose a set of micro-benchmarks to characterize different performance aspects of these interconnects. Our micro-benchmark suite includes not only traditional tests and performance parameters, but also those specifically tailored to the interconnects advanced features such as user-level access for performing communication and remote direct memory access. In order to explore the full communication capability of the interconnects, we have implemented the micro-benchmark suite at the low level messaging layer provided by each interconnect. Our performance results show that all three interconnects achieve low latency, high bandwidth and low host overhead. However, they show quite different performance behaviors when handling completion notification, unbalanced communication patterns and different communication buffer reuse patterns.

26 citations


Proceedings ArticleDOI
11 May 2003
TL;DR: A low latency handoff protocol for MIPv4, the post-registration handoff method, is evaluated and a simple queuing model is proposed to study the influence of various parameters on the protocol performance.
Abstract: In this paper, we evaluate a low latency handoff protocol for MIPv4, the post-registration handoff method. This mechanism proposed by the IETF tries to improve the performance of hierarchical mobile IP by decreasing the handoff latency. We give a detailed description of the protocol behavior by means of an ns simulation and propose a simple queuing model to study the influence of various parameters on the protocol performance.

24 citations


Proceedings Article
01 Jan 2003

24 citations


Journal ArticleDOI
01 Jul 2003
TL;DR: This paper presents the design and analysis of a lightweight service for message-passing communication and parallel process coordination, based on the message passing interface specification, for unicast and collective communications.
Abstract: Rapid increases in the complexity of algorithms for real-time signal processing applications have led to performance requirements exceeding the capabilities of conventional digital signal processor (DSP) architectures. Many applications, such as autonomous sonar arrays, are distributed in nature and amenable to parallel computing on embedded systems constructed from multiple DSPs networked together. However, to realize the full potential of such applications, a lightweight service for message-passing communication and parallel process coordination is needed that is able to provide high throughput and low latency while minimizing processor and memory utilization. This paper presents the design and analysis of such a service, based on the message passing interface specification, for unicast and collective communications.

Book ChapterDOI
TL;DR: An analytical model is proposed to study the influence of various system parameters on the performance of the two protocols and the results are compared to show the impact of packet loss on handoff latency.
Abstract: In this paper we compare the performance of two low latency handoff protocols for MIPv4, Pre- and Post-Registration Handoff. These mechanisms proposed by the IETF aim at improving the performance of Hierarchical Mobile IP with respect to handoff latency and packet loss. We propose an analytical model to study the influence of various system parameters on the performance of the two protocols, followed by a comparison of the two schemes. We describe several handoff implementations over a wireless access based on the IEEE 802.11 standard and analyze them by means of an ns simulation.

Proceedings Article
01 Jan 2003
TL;DR: The truncation error for a two-pass decoder is analyzed in a problem of phonetic speech recognition for very demanding latency constraints and for applications where applications where look-ahead length < 100ms is required.
Abstract: The truncation error for a two-pass decoder is analyzed in a problem of phonetic speech recognition for very demanding latency constraints (look-ahead length < 100ms) and for applications where ...

15 Oct 2003
TL;DR: This paper describes how all the asynchronous overhead can be completely removed by instead running the entire coherence protocol in the requesting processor, and how this technique is applicable to both page-based and fine-grain software shared memory.
Abstract: Software-implementations of shared memory are still far behind the performance of hardwarebased shared memory implementations and are not viable options for most fine-grain sharedmemory applications. The major source for their inefficiency comes from the cost of interruptbased asynchronous protocol processing, not from the actual network latency. As the raw hardware latency of inter-node communication decreases, the asynchronous overhead in the communication becomes more dominant. Elaborate schemes, involving dedicated hardware and/or dedicated protocol processors, have been suggested to cut the overhead. This paper describes how all the asynchronous overhead can be completely removed by instead running the entire coherence protocol in the requesting processor. This not only removes the asynchronous overhead, but also makes use of a processor that otherwise would stall. The technique is applicable to both page-based and fine-grain software shared memory. Our proof-of-concept implementation—DSZOOM-EMU—is a fine-grained software-based shared memory. It demonstrates a protocol-handling overhead below a microsecond for all the actions involved in a remote load operation, to be compared to the fastest implementation to date of around ten microseconds. The all-software protocol is implemented assuming only some basic low-level primitives in the cluster interconnect. Based on a remote atomic and simple remote put/get operations the requesting processor can assume the role of the directory agent, traditionally assumed by a remote protocol agent in the home node in other implementations. The implementation is thread-safe and allows all processors in a node to simultaneously perform remote operations.

Book ChapterDOI
18 May 2003
TL;DR: The proposed method allows the mobile node to perform Low Latency Handoff with fast as well as secure operation and re-uses the previously assigned session keys in the phase of key exchange between old FA and new FA.
Abstract: Mobile IP Low Latency Handoffs[1] allow greater support for real-time services on a Mobile IPv4 network by minimising the period of time when a mobile node is unable to send or receive IP packets due to the delay in the Mobile IP Registration process. However, on Mobile IP network with AAA servers that are capable of performing Authentication, Authorization, and Accounting(AAA) services, every Regional Registration has to be traversed to the home network to achieve new session keys, that are distributed by home AAA server, for a new Mobile IP session. This communication delay is the time taken to reauthenticate the mobile node and to traverse between foreign and home network even if the mobile node has been previously authorized to old foreign agent. In order to reduce these extra time overheads, we present a method that performs Low Latency Handoff without requiring further involvement by home AAA server. The method re-uses the previously assigned session keys. To provide the confidentiality of session keys in the phase of key exchange between old FA and new FA, it uses a key sharing method with a trusted third party. The proposed method allows the mobile node to perform Low Latency Handoff with fast as well as secure operation.

Proceedings ArticleDOI
Krishna Kant1, Ravi Iyer1
13 Oct 2003
TL;DR: The results indicate that the proposed technique has a potential to reduce address bus width in most cases and data bus widths in some cases while maintaining equal or better performance than in the uncompressed case.
Abstract: As microprocessors scale rapidly in frequency, the design of fast and efficient interconnects becomes extremely important for low latency data access and high performance. We evaluate a technique for reducing the interconnect width by exploiting the spatial and temporal locality in communication transfers (addresses & data). The width reduction implies a number of other advantages including higher operating frequency, reduced pin-count, lower chip & board cost, etc. We evaluate the effectiveness of the proposed scheme by performing trace-driven simulations for two well-known commercial server workloads (SPECWeb99 and TPC-C). We also study the sensitivity of the compression hit ratio with respect to the number of bits compressed, size of the encoding/decoding table used and the replacement policy. The results indicate that the proposed technique has a potential to reduce address bus width in most cases and data bus widths in some cases while maintaining equal or better performance than in the uncompressed case.

09 Sep 2003
TL;DR: A local area sub-network design avoiding optical buffering, bit level synchronization and regeneration is presented, and uses wavelength striping of the packets as a method of achieving low latency and high capacity for use in the target systems.
Abstract: We present a local area sub-network design avoiding optical buffering, bit level synchronization and regeneration. Using currently available components we calculate acceptable utilisation when scalability is limited to local, system, storage and desk area networks. The architecture draws upon well-understood computer networking concepts, and uses wavelength striping of the packets as a method of achieving low latency and high capacity for use in the target systems.

Journal Article
TL;DR: In this paper, two adaptive power-aware schemes, PT with secure region (PTSR) and adaptive prefetching with sliding caches (APSC), are proposed to achieve low latency and low power consumption for dynamic request patterns.
Abstract: Caching and prefetching are common techniques in mobile broadcast environments. Several prefetching policies have been proposed in the literature to either achieve the minimum access latency or save power as much as possible. However, little work has been carried out to adaptively balance these two metrics in a dynamic environment where the user request arrival patterns change over time. This paper investigates how both low latency and low power consumption can be achieved for dynamic request patterns. Two adaptive power-aware schemes, PT with secure region (PTSR) and adaptive prefetching with sliding caches (APSC), are proposed. Experimental results show the proposed policies, in particular APSC, achieve a much lower power consumption while maintaining a similar access latency as the existing policies.

Proceedings Article
22 May 2003
TL;DR: SensorBox is a low cost, low latency, high-resolution interface for obtaining gestural data from sensors for use in realtime with a computer-based interactive system.
Abstract: SensorBox is a low cost, low latency, high-resolution interface for obtaining gestural data from sensors for use in realtime with a computer-based interactive system. We discuss its implementation, benefits, current limitations, and compare it with several popular interfaces for gestural data acquisition.

01 Jan 2003
TL;DR: A memory buffer free switch node for connection circuits set up by packet routing and implemented with a test chip with two switch nodes to solve the SoC integration problem.
Abstract: Switch nodes in a 2D mesh SoC connection network have been suggested to solve the SoC integration problem. We have presented a memory buffer free switch node for connection circuits set up by packet routing. A test chip with two switch nodes has been implemented. The function of a switch node is to set up and tear down connections based on small packets and then to transport the payload data without buffering and with very low latency. The silicon cost of a node is 0.5 mm2 based on a 2 metal layer 0.8 micrometer AMS CMOS technology. The connection latency cost is one clock cycle/switch node. The test chip works properly at 50 MHz.

Proceedings ArticleDOI
TL;DR: A critical analysis of intrinsic limitations of electrical interconnect indicates that these limitations can be overcome, and a scheme for this is presented, based on the utilization of upper-level metals as transmission lines.
Abstract: Global interconnects have been identified as a serious limitation to chip scaling, due to their limited bandwidth and large delay. A critical analysis of intrinsic limitations of electrical interconnect indicates that these limitations can be overcome. This basic analysis is presented, together with design constraints. We demonstrate a scheme for this, based on the utilization of upper-level metals as transmission lines. A global communication architecture based on a global mesochronous, local synchronous approach allows very high data-rate per wire and therefore very high bandwidth in buses of limited width. As an example, we demonstrate a 320μm wide bus with a capacity of 160Gb/s in a nearly standard 0.18μm process.

Book ChapterDOI
07 Dec 2003
TL;DR: A hybrid routing algorithm is presented, which can fully exploit multicast in order to reduce the used network bandwidth and make the overlay network dynamically adapt to the changing of lower network topologic raised by node or link failure.
Abstract: Efficient routing algorithms and self-configuration are two key challenges in the area of large-scale content-based publish-subscribe systems In this paper we first propose a hierarchical system model with multicast clustering Then a hybrid routing algorithm is presented, which can fully exploit multicast in order to reduce the used network bandwidth We also propose multicast clustering replication protocol and content-based multicast tree protocol to make the overlay network dynamically adapt to the changing of lower network topologic raised by node or link failure, requiring no manual tuning or system administration Simulation results show that the system has low cost, and event delivered over it experiences moderately low latency

Proceedings Article
01 Jan 2003
TL;DR: This paper directly addresses the performance of the core OPS module and results obtained from simulation models show that the proposed asynchronous OPS architecture exhibits low latency and packet losses allied with relatively high throughput.
Abstract: The prime objective of the EPSRC-funded OPSnet project is the design and demonstration of an asynchronous DWDM optical packet switch (OPS) capable of directly carrying IP packets over DWDM-based core networks at transport rates in the order of 100 Gb/s and above. To achieve such an objective demands a highly flexible and innovative core switch architecture. The operation and performance of such an architecture is the subject of this paper. The paper directly addresses the performance of the core OPS module and results obtained from simulation models show that the proposed asynchronous OPS architecture exhibits low latency and packet losses allied with relatively high throughput

Proceedings ArticleDOI
19 Nov 2003
TL;DR: This research aims at implementing a SoC platform that could support high throughput and low latency real time video streaming and presents a custom embedded processor used at the core of the platform.
Abstract: Network processing devices emerged as a result of the growing demand for enhanced, flexible next generation communication services Vendors are increasingly in need of a network processor solution that allows meeting bandwidth requirements and new features while shortening time-to-market This research aims at implementing a SoC platform that could support high throughput and low latency real time video streaming This paper explores a hardware/software system on chip (SoC) solution for a protocol conversion application A methodology to develop an architecture for a SoC platform will be discussed We also present a custom embedded processor used at the core of our platform

Patent
17 Jan 2003
TL;DR: An apparatus for and method of implementing a cluster lock processing system using highly scalable, off-the-shelf commodity processors is described in this article, which is the central component of a clustered computer system, providing locking and coordination between multiple host systems.
Abstract: An apparatus for and method of implementing a cluster lock processing system using highly scalable, off-the-shelf commodity processors. The cluster lock processing system is the central component of a clustered computer system, providing locking and coordination between multiple host systems. The host systems are coupled to the cluster lock processing system using off-the-shelf, low latency interconnects. The cluster lock processing system is composed of multiple commodity platforms that are also coupled to each other using low latency interconnects. Failure of one of the commodity platforms that comprise the cluster lock processing system results in no loss of functionality or interruption of service. This is made possible through the use of specialized software that runs on the commodity platforms. Through the use of custom software and inexpensive hardware the overall system cost is dramatically reduced when compared to typical solutions that use custom built hardware. By allowing the individual commodity platforms to be physically separated, the cluster lock processing system also provides for resiliency against physical damage to an individual platform that may be caused by a catastrophic site failure.

Journal Article
TL;DR: The current development, the principle and implementations of VIA are analyzed, and a user-level high-performance communication software MyVIA based on Myrinet is presented, which is comfortable with VIA specification.
Abstract: Virtual interface architecture (VIA) established a communication model with low latency and high bandwidth, and defined the standard of user-level high-performance communication specification in cluster system. In this paper, the current development, the principle and implementations of VIA are analyzed, and a user-level high-performance communication software MyVIA based on Myrinet is presented, which is comfortable with VIA specification. First, the design principle and the framework of MyVIA are described, and then the optimized technologies for MyVIA are proposed, which include UTLB, continued physical memory and varied NIC buffer, the pipelining process based on resource and DMA chain, physical descriptor ring and dynamic cache. The experimental results indicate that the bandwidth of MyVIA for 4KB message is 250MB/s, the lowest one-way latency is 8.46ms, which show that the performance of MyVIA surpasses that of other VIA.

Proceedings ArticleDOI
07 Apr 2003
TL;DR: The European Union project 'HOLMS' aims at demonstrating the feasibility of an optical bus system for CPU memory access based on planar integrated free-space optics (PIFSO) in combination with fibre and PCB integrated waveguide optics to demonstrate a novel architecture of low latency memory access.
Abstract: In computer architecture bandwidth and memory latency represent a major bottleneck. One possibility for solving these problems is the use of optical interconnections with their inherent capability for large fanin and fanout, low skew, etc. Today the possibilities to produce integrated chips with optical and electronic connections are advanced and the barrier for their adoption in computer systems gets smaller. The European Union project 'High-Speed Opto-Electronic Memory Systems' (HOLMS) aims at demonstrating the feasibility of an optical bus system for CPU memory access. The bus system is based on planar integrated free-space optics (PIFSO) in combination with fibre and PCB integrated waveguide optics. The goal is to demonstrate a novel architecture of low latency memory access. Here, we will discuss the task of the free-space optics. The assignment of the PIFSO is to perform all fanin and fanout operations for the interconnection between CPU and memory. Longer distances like connections between CPU and memory will be broadcasted by waveguides in the PCB; and fibres are used to combine two PCBs to a multiprocessor system. The first task consists of the design and the realization of the interface between the PIFSO and the PCB integrated waveguides. Besides the optical coupling, it is the main aspect to find an optical solution that allows large mechanical tolerances in the packaging of the different parts of the system. The large number of optical lines and their fanout and fanin are a challenge for design and construction, too. Design issues will be discussed and first experimental results will be presented.

Proceedings ArticleDOI
16 Jun 2003
TL;DR: In this article, the issues involved in combining electronic and optical design constraints in opto-electronic multichip (OE-MCM) modules are exploited to implement a low latency optoelectronic memory system.
Abstract: This paper exploits the issues involved in combining electronic and optical design constraints in opto-electronic multichip (OE-MCM) modules. It focuses on the OE-MCM components used in the HOLMS EU-project aimed at implementing a low latency opto-electronic memory system.

01 Jan 2003
TL;DR: Performance comparisons between PVM and MPI, as well as the optimizations achieved exploiting the GAMMA (Genoa Active Message MAchine) Active Message paradigm, are presented, and a GAMMA implementation for 3COM 3c966 NICs is presented.
Abstract: The main solutions currently adopted in deploying parallel applications are based on the use of highperformance parallel platforms and Networks of Workstations (NOW) exploiting off-the-shelf communication hardware. However, the former solutions are highly expensive, while the latter ones only achieve limited performances. An optimal solution consists in employing NOW (workstations or high-end PCs) combined with high performance Network Cards, effective parallel environments such as PVM or MPI and in modifying the standard communication protocol layer. In this paper, performance comparisons between PVM and MPI, as well as the optimizations achieved exploiting the GAMMA (Genoa Active Message MAchine) Active Message paradigm, are presented, and a GAMMA implementation for 3COM 3c966 NICs is pro-

Journal Article
TL;DR: The pCoR programming model as mentioned in this paper combines primitives to launch remote processes and threads with communication over Myrinet and achieves high performance communication among threads of parallel/distributed applications.
Abstract: In this paper we present some implementation details of a programming model - pCoR - that combines primitives to launch remote processes and threads with communication over Myrinet. Basically, we present the efforts we have made to achieve high performance communication among threads of parallel/distributed applications. The expected advantages of multiple threads launched across a low latency cluster of SMP workstations are emphasized with a graphical application that manages huge maps consisting of several JPEG images.