scispace - formally typeset
Search or ask a question

Showing papers on "Latency (engineering) published in 2004"


Proceedings ArticleDOI
04 Oct 2004
TL;DR: An Adaptive ARF (AARF) algorithm for low latency systems that improves upon ARF to provide both short-term and long-term adaptation and a new rate adaptation algorithm designed for high latency Systems that has been implemented and evaluated on an AR5212-based device.
Abstract: Today, three different physical (PHY) layers for the IEEE 802.11 WLAN are available (802.11a/b/g); they all provide multi-rate capabilities. To achieve a high performance under varying conditions, these devices need to adapt their transmission rate dynamically. While this rate adaptation algorithm is a critical component of their performance, only very few algorithms such as Auto Rate Fallback (ARF) or Receiver Based Auto Rate (RBAR) have been published and the implementation challenges associated with these mechanisms have never been publicly discussed. In this paper, we first present the important characteristics of the 802.11 systems that must be taken into account when such algorithms are designed. Specifically, we emphasize the contrast between low latency and high latency systems, and we give examples of actual chipsets that fall in either of the different categories. We propose an Adaptive ARF (AARF) algorithm for low latency systems that improves upon ARF to provide both short-term and long-term adaptation. The new algorithm has very low complexity while obtaining a performance similar to RBAR, which requires incompatible changes to the 802.11 MAC and PHY protocol. Finally, we present a new rate adaptation algorithm designed for high latency systems that has been implemented and evaluated on an AR5212-based device. Experimentation results show a clear performance improvement over the algorithm previously implemented in the AR5212 driver we used.

723 citations


Proceedings ArticleDOI
04 Dec 2004
TL;DR: This paper develops L2 cache designs for CMPs that incorporate block migration, stride-based prefetching between L1 and L2 caches, and presents a hybrid design-combining all three techniques-that improves performance by an additional 2% to 19% overPrefetching alone.
Abstract: In response to increasing (relative) wire delay, architects have proposed various technologies to manage the impact of slow wires on large uniprocessor L2 caches. Block migration (e.g., D-NUCA and NuRapid) reduces average hit latency by migrating frequently used blocks towards the lower-latency banks. Transmission Line Caches (TLC) use on-chip transmission lines to provide low latency to all banks. Traditional stride-based hardware prefetching strives to tolerate, rather than reduce, latency. Chip multiprocessors (CMPs) present additional challenges. First, CMPs often share the on-chip L2 cache, requiring multiple ports to provide sufficient bandwidth. Second, multiple threads mean multiple working sets, which compete for limited on-chip storage. Third, sharing code and data interferes with block migration, since one processor's low-latency bank is another processor's high-latency bank. In this paper, we develop L2 cache designs for CMPs that incorporate these three latency management techniques. We use detailed full-system simulation to analyze the performance trade-offs for both commercial and scientific workloads. First, we demonstrate that block migration is less effective for CMPs because 40-60% of L2 cache hits in commercial workloads are satisfied in the central banks, which are equally far from all processors. Second, we observe that although transmission lines provide low latency, contention for their restricted bandwidth limits their performance. Third, we show stride-based prefetching between L1 and L2 caches alone improves performance by at least as much as the other two techniques. Finally, we present a hybrid design-combining all three techniques-that improves performance by an additional 2% to 19% over prefetching alone.

391 citations


Journal ArticleDOI
TL;DR: A low-latency, high-throughput scalable optical interconnect switch for high-performance computer systems that features a broadcast-and-select architecture based on wavelength- and space-division multiplexing that is optimized for low latency and high use is described.
Abstract: Feature Issue on Optical Interconnection Networks (OIN) We describe a low-latency, high-throughput scalable optical interconnect switch for high-performance computer systems that features a broadcast-and-select architecture based on wavelength- and space-division multiplexing Its electronic control architecture is optimized for low latency and high use Our demonstration system will support 64 nodes with a line rate of 40 Gbit/s per node and operate on fixed-length packets with a duration of 512 ns using burst-mode receivers We address the key system-level requirements and challenges for such applications

156 citations


Journal ArticleDOI
01 May 2004
TL;DR: Using an electronic ink display and a touch panel input device, a paper-like drawing tablet was created, closely mimicking the behaviour of normal paper as mentioned in this paper, which was initially intended to serve as an input-device for cartoon drawing and editing.
Abstract: Using an electronic ink display and a touchpanel input device, a paper-like drawing tablet was created, closely mimicking the behaviour of normal paper. The tablet is initially intended to serve as an input-device for cartoon drawing and editing. However, its potential use goes far beyond that.

73 citations


Proceedings ArticleDOI
20 Dec 2004
TL;DR: OSMOSIS is an optical packet switching interconnection network for high-performance computing systems that aims at delivering sustained high bandwidth, very low latency, and cost-effective scalability.
Abstract: OSMOSIS is an optical packet switching interconnection network for high-performance computing systems. It aims at delivering sustained high bandwidth, very low latency, and cost-effective scalability. We describe its system and control architecture.

55 citations


Proceedings Article
01 Jan 2004
TL;DR: It is argued that somewhat high latencies (around 20–30ms) are probably perfectly acceptable for typical musical applications and it should be possible to accept various levels of latency on a system if one can be aware of the effects of this latency on the users of the system.
Abstract: Low latency processing is usually a goal in real-time audio applications; however, it is not clear how little latency is to be considered low enough. We discuss currently available experimental data on human perception and argue that somewhat high latencies (around 20–30ms) are probably perfectly acceptable for typical musical applications. We also argue that it should be possible to accept various levels of latency on a system if we can be aware of the effects of this latency on the users of the system; therefore, we still need more experimental data on latency perception to be able to better assess the effects of latency on musical applications. Such an experi-

52 citations


Proceedings ArticleDOI
20 Jun 2004
TL;DR: This paper discusses die cost vs. performance tradeoffs for a PIM system that could serve as the memory system of a host processor and develops a custom multithreaded processor architecture and implementation style that is well-suited for embedding in a memory.
Abstract: This paper discusses die cost vs. performance tradeoffs for a PIM system that could serve as the memory system of a host processor. For an increase of less than twice the cost of a commodity DRAM part, it is possible to realize a performance speedup of nearly a factor of 4 on irregular applications. This cost efficiency derives from developing a custom multithreaded processor architecture and implementation style that is well-suited for embedding in a memory. Specifically, it takes advantage of the low latency and high row bandwidth to both simplify processor design --- reducing area --- as well as to improve processing throughput. To support our claims of cost and performance, we have used simulation, analysis of existing chips, and also designed and fully implemented a prototype chip, PIM Lite.

37 citations


Journal ArticleDOI
TL;DR: A novel K-nested layered look-ahead method and its corresponding architecture, which combine K-trellis steps into one trellis step (where K is the encoder constraint length), are proposed for implementing low-latency high-throughput rate Viterbi decoders.
Abstract: In this paper, a novel K-nested layered look-ahead method and its corresponding architecture, which combine K-trellis steps into one trellis step (where K is the encoder constraint length), are proposed for implementing low-latency high-throughput rate Viterbi decoders. The proposed method guarantees parallel paths between any two-trellis states in the look-ahead trellises and distributes the add-compare-select (ACS) computations to all trellis layers. It leads to regular and simple architecture for the Viterbi decoding algorithm. The look-ahead ACS computation latency of the proposed method increases logarithmically with respect to the look-ahead step (M) divided by the encoder constraint length (K) as opposed to linearly as in prior work. For a 4-state (i.e., K=3) convolutional code, the decoding latency of the Viterbi decoder using proposed method is reduced by 84%, at the expense of about 22% increase in hardware complexity, compared with conventional M-step look-ahead method with M=48 (where M is also the level of parallelism). The main advantage of our proposed design is that it has the least latency among all known look-ahead Viterbi decoders for a given level of parallelism.

35 citations


Patent
01 Mar 2004
TL;DR: In this paper, the authors describe a network interface coupled directly to a CPU by a dedicated full-duplex packetized interconnect, where data may be exchanged between compute nodes using eager or rendezvous protocols.
Abstract: Compute nodes in a high performance computer system are interconnected by an inter-node communication network Each compute node has a network interface coupled directly to a CPU by a dedicated full-duplex packetized interconnect Data may be exchanged between compute nodes using eager or rendezvous protocols The network interfaces may include facilities to manage data transfer between computer nodes

30 citations


Proceedings ArticleDOI
24 Oct 2004
TL;DR: A video distortion model is presented analyzing the performance of multi-path routing for low latency video streaming, in congestion-limited ad hoc networks, which captures the impact of quantization and of packet loss on the overall video quality.
Abstract: We present a video distortion model analyzing the performance of multi-path routing for low latency video streaming, in congestion-limited ad hoc networks. In such environments, a single node transmitting multimedia data may have an impact on the overall network conditions and may need to limit its rate to achieve the highest sustainable video quality. For this purpose, optimal routing which seeks to minimize congestion and distributes traffic over multiple paths is attractive. To predict the end-to-end rate-distortion tradeoff, we develop a model which captures the impact of quantization and of packet loss on the overall video quality. Network simulations are performed to confirm the validity of the model in different streaming scenarios over different numbers of paths.

30 citations


Journal ArticleDOI
TL;DR: This paper investigates the performance of two Layer 3 low latency handoff mechanisms proposed by the IETF, namely Pre- and Post-Registration, and proposes a simple analytical model that allows assessing the packet loss and the delay characteristics of these mechanisms.
Abstract: This paper investigates the performance of two Layer 3 low latency handoff mechanisms proposed by the IETF, namely Pre- and Post-Registration. These protocols use Layer 2 triggers to reduce the built-in delay components of Mobile IP. We propose a simple analytical model that allows assessing the packet loss and the delay characteristics of these mechanisms. We describe several handoff implementations over a wireless access based on the IEEE 802.11 standard and analyze several implementation issues. Finally we study the scalability of the protocols using an OPNET simulation.

Book ChapterDOI
31 Aug 2004
TL;DR: A mechanism to efficiently synchronize the distributed state of a game based on the concept of eventual consistency is presented, which is aimed at improved scalability of multiplayer games and low latency in client-server data transmission.
Abstract: Computer games played over the Internet have recently become an important class of distributed applications. In this paper we present a novel proxy server-network topology aiming at improved scalability of multiplayer games and low latency in client-server data transmission. We present a mechanism to efficiently synchronize the distributed state of a game based on the concept of eventual consistency. We analyse the benefits of our approach compared to commonly used client-server and peer-to-peer topologies, and present first experimental results.

Proceedings ArticleDOI
20 Jun 2004
TL;DR: This thesis presents an algorithm which provides low latency and efficient packet scheduling service for streaming applications called LLEPS.
Abstract: Adequate bandwidth allocations and strict delay requirements are critical for real time applications. Packet scheduling algorithms like class based queue (CBQ), nested deficit round robin (nested-DRR) are designed to ensure the bandwidth reservation function. However, they might cause unsteady packet latencies and introduce extra application handling overhead, such as allocating a large buffer for playing the media stream. High and unstable latency of packets might jeopardize the corresponding quality of service since real-time applications prefer low playback latency. Existing scheduling algorithms which keep latency of packets stable require knowing the details of individual flows. GPS (general processor sharing)-like algorithms does not consider the real behavior of a stream. A real stream is not perfectly smooth after forwarding by the routers. GPS-like algorithms introduces extra delay on the stream which is not perfectly smooth. This thesis presents an algorithm which provides low latency and efficient packet scheduling service for streaming applications called LLEPS.

01 Jan 2004
TL;DR: To the knowledge, this is the first peer-to-peer protocol that is both cheat-proof and maintains the low latency required by interactive, real-time games.
Abstract: In this paper, we describe a new protocol for ordering events in peer-to-peer games that is provably cheat-proof. We describe how we can optimize this protocol to react to changing delays and congestion in the network. We validate our protocol through simulations and demon- strate its feasibility as a real-time, interactive protoco l. To our knowledge, this is the first peer-to-peer protocol that is both cheat-proof and maintains the low latency required by interactive, real-time games.

Patent
25 Oct 2004
TL;DR: In this article, a system and method for implementing an analog-to-digital converter (ADC) is described, which includes a converter for generating a timed pulse based on an analog signal and a control signal.
Abstract: A system and method for implementing an analog-to-digital converter (ADC). The ADC includes a converter for generating a timed pulse based on an analog signal and a control signal. The ADC also includes a timing analyzer for generating a digital signal based on the timed pulse. According to the system and method disclosed herein, the present invention achieves a high sampling rate and low latency at low power.

Patent
07 Jun 2004
TL;DR: In this article, a push-to-talk over cellular (PoC) system is provided in which a negative indication to speak may be delivered to a requesting unit when the requested wireless unit is not available.
Abstract: A push-to-talk over cellular (PoC) system is provided in which a negative indication-to-speak may be delivered to a requesting unit when the requested wireless unit is not available. In one embodiment of the instant invention, a first wireless unit is paged in response to receiving a request from the second wireless unit to transmit a message to the first wireless unit. A page response signal is received from the first wireless unit, and the negative indication-to-speak to the second wireless unit is delivered to the second wireless unit in response to receiving the page response signal.

Journal ArticleDOI
TL;DR: The analysis given shows that LLR avoids resource blocking of e-business systems and significantly reduces response time between such systems, and significantly improves the resiliency ofe-business transactions in the event of failures or the unavailability of requested services.
Abstract: This paper proposes a novel protocol for e-business transaction management, called the low latency resilient (LLR) protocol. LLR applies new correctness criteria based upon enforcing semantic atomicity with increased resilience to failure, and has been implemented as a prototype system for e-business transaction processing. The analysis given shows that LLR avoids resource blocking of e-business systems and significantly reduces response time between such systems. In addition, LLR significantly improves the resiliency of e-business transactions in the event of failures or the unavailability of requested services. This is achieved through the use of e-business transaction within which alternative sub-transactions are specified.

Patent
21 May 2004
TL;DR: In this paper, a rate-converting, low-latency, low power interleaver architecture is implemented using block read-write methods, such that multiple input bits can be written into memory simultaneously.
Abstract: A rate-converting, low-latency, low power interleaver architecture is implemented using block read-write methods. The memory architecture is such that it allows multiple input bits to be written into memory simultaneously. In some embodiments, the number of simultaneous bits written into memory corresponds to an error encoding rate, such that an encoder and interleaver can operate within the same clock domain, regardless of the code rate. The memory architecture also allows an entire row of interleaved data to be read out in one clock cycle.

Patent
03 Mar 2004
TL;DR: In this paper, techniques to separate a file system and its related meta-data from associated data stored in a mass storage device and store the meta data on a low latency random access storage device with approximately uniform access times are discussed.
Abstract: Briefly, techniques to separate a file system and its related meta-data from associated data stored in a mass storage device and store the meta-data on a low latency random access storage device with approximately uniform access times.

Proceedings ArticleDOI
01 Jun 2004
TL;DR: Modifying the server and kernel to avoid server-induced latency yields both qualitative and quantitative changes in the latency profiles -- latency drops by more than an order of magnitude, and the effective service discipline also improves.
Abstract: We investigate the origins of server-induced latency to understand how to improve latency optimization techniques. Using the Flash Web server [4], we analyze latency behavior under various loads. Despite latency profiles that suggest standard queuing delays, we find that most latency actually originates from negative interactions between the application and the locking and blocking mechanisms in the kernel. Modifying the server and kernel to avoid these problems yields both qualitative and quantitative changes in the latency profiles -- latency drops by more than an order of magnitude, and the effective service discipline also improves.We find our modifications also mitigate service burstiness in the application, reducing the event queue lengths dramatically and eliminating any benefit from application-level connection scheduling. We identify one remaining source of unfairness, related to competition in the networking stack. We show that adjusting the TCP congestion window size addresses this problem, reducing latency by an additional factor of three.

Patent
06 Apr 2004
TL;DR: In this article, a system and method for distributing time division multiplexed (TDM) data through low latency connections between TDM conversion entities using a packet-based infrastructure is presented.
Abstract: One aspect relates to a system and method for distributing TDM data using a packet-based infrastructure. In particular, a system and method is provided for distributing time division multiplexed (TDM) data through low latency connections between TDM conversion entities. In one example, a packet-to-TDM conversion method and device is provided that allows transport of TDM data over a packet-based infrastructure, and a method is provided to create and delete connections among separate conversion devices connected via the transport mechanism. The transport mechanism may include a packet transport such as Ethernet. Data may be switched based on MAC header information in an Ethernet frame. Because, according to one example, the network has a low latency in transmission of TDM data, receivers may be implemented without buffering, and therefore receiver circuitry may be less-expensive.

Patent
12 Mar 2004
TL;DR: In this article, a page-event confirmed indication-to-speak (Page-Event Confirmed Indication-To-Speech) method is proposed to confirm a targeted but dormant mobile station is within radio reach by first sending a paging message to the mobile station, and when the active mobile station responds to the page message, an indication to speak is delivered to an initiating mobile station to indicate to the user that he/she may now speak.
Abstract: The method described herein is useful in a push-to-talk over cellular (PoC) system to provide an accurate but low-latency indication-to-speak. Generally, the method may be referred to as a page-event confirmed indication-to-speak. The page-event confirmed indication-to-speak accurately confirms a targeted but dormant mobile station is within radio reach by first sending a paging message to the dormant mobile station. When the dormant mobile station responds to the page message, an indication-to-speak is delivered to an initiating mobile station to indicate to the user that he/she may now speak.

Proceedings ArticleDOI
04 May 2004
TL;DR: Non-linear multiple input multiple output (MIMO) control models significantly improve positive process outcomes and low latency computing grids and advanced control optimization techniques solve for the computational problems.
Abstract: Advanced Process Control (APC) plays an important role in the modern semiconductor fab. Using non-linear multiple input multiple output (MIMO) control models we significantly improve positive process outcomes. However, non-linear models also impose greater computing requirements, creating computing bottleneck on the fab-wide scale, especially for the wafer-to-wafer control. Low latency computing grids and advanced control optimization techniques solve for the computational problems.

Book ChapterDOI
26 Sep 2004
TL;DR: An advanced optical tracking system for computer assisted surgery (CAS) that supports an arbitrary number of cameras that may be placed at suitable positions and allows adaptation to tracking scenarios of different complexity is presented.
Abstract: An advanced optical tracking system for computer assisted surgery (CAS) is presented. The system supports an arbitrary number of cameras that may be placed at suitable positions e.g. fixed cameras at the ceiling of the operating theater or movable cameras on the operating lamps. The modular scalable system architecture reduces occlusion problems and allows adaptation to tracking scenarios of different complexity. The camera modules each integrate hardware-based image processing to allow for low latency of 10ms required in demanding applications like robot control. As a first application tracking of a handheld robotic manipulator has been implemented.

Proceedings ArticleDOI
01 Jan 2004
TL;DR: Experimental results show that the proposed sender-based approach outperform the receiver- based approach, and the offline-computation approach very closely approximates the fully-optimized approach.
Abstract: We propose a sender-based rate-distortion optimized framework to stream scalable bitstreams of 3-D wavelet video stored at the sender to a remote receiver. Based on the requests and feedback from the receiver, the source rate-distortion profiles, the desired playout latency and transmission rate, and the network characteristics, the sender optimizes the responses sent to the receiver throughout the video playout session in order to minimize the distortion in the reconstructed frames. Rate-distortion optimized response is formulated as a convex optimization problem, and an offline-computation approach is proposed to further reduce the complexity at the sender. Experimental results show that the proposed sender-based approach outperform the receiver-based approach, and the offline-computation approach very closely approximates the fully-optimized approach.

Proceedings ArticleDOI
20 Jun 2004
TL;DR: This paper presents the results of an experimental activity concerning the implementation and validation of an EF-PHB service in a high-speed metropolitan optical network with a differentiated services architecture.
Abstract: This paper presents the results of an experimental activity concerning the implementation and validation of an EF-PHB service in a high-speed metropolitan optical network with a differentiated services architecture. As EF-PHB can be used to create low loss, low latency and assured bandwidth services, real-time traffic flows have been aggregated and classified as EF. Measurements have been performed in different operating conditions to evaluate how the setting of the EF class of service parameters affects the time metrics of each traffic flow belonging to the aggregate. The transfer delay and the delay variation values, as specified by the ITU-T recommendation Y.1541 for real-time, jitter sensitive applications, have been taken as QoS performance objectives to satisfy.

01 Nov 2004
TL;DR: This paper introduces an Adaptive, Information-centric and Lightweight MAC(AI-LMAC) protocol that adapts its operation depending on the requirements of the application and presents a completely localised data management framework that helps capture information about traffic patterns in the network.
Abstract: In this paper we present a novel TDMA-based medium access control (MAC) protocol for wireless sensor networks. Unlike conventional MAC protocols which function independently of the application, we introduce an Adaptive, Information-centric and Lightweight MAC(AI-LMAC) protocol that adapts its operation depending on the requirements of the application. We also present a completely localised data management framework that helps capture information about traffic patterns in the network. This information is subsequently used by AI-LMAC to modify its operation accordingly. We present preliminary results showing how the MAC protocol efficiently manages the issues of fairness, latency and message buffer management.

Patent
24 Aug 2004
TL;DR: In this article, the authors propose a method for determining which parts of a network transmission might contain viruses and checking and cleaning only those parts of the transmission which could potentially be infected as soon as sufficient information is available for such a check.
Abstract: This invention relates generally to systems and methods for rapid, low-latency detection of viruses in network transmissions and specifically to methods for determining which parts of a network transmission might contain viruses and checking and cleaning only those parts of the transmission which could potentially be infected as soon as sufficient information is available for such a check.