scispace - formally typeset
Search or ask a question

Showing papers on "Latency (engineering) published in 2005"


Proceedings ArticleDOI
07 Mar 2005
TL;DR: In this article, a design flow for automatic generation of heterogeneous NoCs is presented, /spl times/pipes Lite, based on highly customizable, high frequency and low latency NoC modules that are fully synthesizeable.
Abstract: The limited scalability of current bus topologies for systems on chips (SoCs) dictates the adoption of networks on chips (NoCs) as a scalable interconnection scheme. Current SoCs are highly heterogeneous in nature, denoting homogeneous, preconfigured NoCs as inefficient drop-in alternatives. While highly parametric, fully synthesizeable (soft) NoC building blocks appear as a good match for heterogeneous MPSoC architectures, the impact of instantiation-time flexibility on performance, power and silicon cost has not yet been quantified. The paper details /spl times/pipes Lite, a design flow for automatic generation of heterogeneous NoCs. /spl times/pipes Lite is based on highly customizable, high frequency and low latency NoC modules, that are fully synthesizeable. Synthesis results provide modules that are directly comparable, if not better, than the current published state-of-the-art NoCs in terms of area, power latency and target operating frequency measurements.

159 citations


Patent
11 Mar 2005
TL;DR: In this paper, a real-time bandwidth monitor (RTBM) for VoIP applications senses the call path bandwidth between two endpoints (100 and 130) of a VoIP communication and adapts in realtime the packet transmission rate to utilize that bandwidth.
Abstract: A system and method for dynamically adapting the transmission rate of packets in real-time voice over IP communications to the available bandwidth. A real-time bandwidth monitor (RTBM) for VoIP applications senses the call path bandwidth between two endpoints (100 and 130) of a VoIP communication and adapts in real-time the packet transmission rate to utilize that bandwidth. If sufficient bandwidth is available, the RTBM (115) selects a low compression, low latency CODEC (110) to offer best possible voice quality to the user. If the bandwidth is constrained, the RTBM degrades gracefully by switching to a high compression CODEC (140). On further bandwidth reduction, the RTBM increases the media frames per packet. Because the bandwidth reduction may be transitory, the RTBM constantly monitors the end-to-end available bandwidth so as to invoke the CODEC/frame per packet combination that provides the best quality of service (QoS).

108 citations


Journal ArticleDOI
TL;DR: This article evaluates different low-latency schemes based on mobile IP and compare their performances in terms of disruption time for VoIP services and focuses on network-layer mobility and mobile IP since it is a natural candidate for providing such mobility.
Abstract: The introduction of IP-based real-time services in next-generation mobile systems requires coupling mobility with quality of service. The mobility of the node can disrupt or even intermittently disconnect an ongoing real-time session. The duration of such an interruption is called disruption time or handover latency, and can heavily affect user satisfaction. Therefore, this delay needs to be minimized to provide good quality of VoIP services. In this article, we focus on network-layer mobility and mobile IP since it is a natural candidate for providing such mobility. We evaluate different low-latency schemes based on mobile IP and compare their performances in terms of disruption time for VoIP services. Low-latency handoffs are performed by anticipating and/or postponing the mobile IP registration process. With these methods, disruption time is reduced to 200 ms in most considered cases.

88 citations


Journal ArticleDOI
08 Jul 2005
TL;DR: A very flexible network design is proposed that is highly scalable, and can be easily changed to accomodate various needs, suitable for building networks with irregular topologies, and with low latency and high throughput.
Abstract: Network-on-chip designs promise to offer considerable advantages over the traditional bus-based designs in solving the numerous technological, economic and productivity problems associated with billion-transistor system-on-chip development. The authors believe that different types of networks will be required, depending on the application domain. Therefore, a very flexible network design is proposed that is highly scalable, and can be easily changed to accomodate various needs. A network-on-chip design, realised as part of the platform that the authors are developing for reconfigurable systems, is presented. This design is suitable for building networks with irregular topologies, and with low latency and high throughput.

80 citations


Proceedings ArticleDOI
25 Sep 2005
TL;DR: The proposed BSN-MAC is an adaptive, feedback-based and IEEE 802.15.4-compatible MAC protocol that exploits the feedback information from the deployed sensors to form a closed-loop control of the MAC parameters.
Abstract: In this paper, BSN-MAC, a medium access control (MAC) protocol designed for Body Sensor Networks (BSNs) is proposed. Due to the traffic coupling and sensor diversity characteristics of BSNs, common MAC protocols can not satisfy the unique requirements of the biomedical sensors in BSNs. BSN-MAC exploits the feedback information from the deployed sensors to form a closed-loop control of the MAC parameters. A control algorithm is proposed to enable the BSN coordinator to adjust parameters of the IEEE 802.15.4 superframe to achieve both energy efficiency and low latency on energy critical nodes. We evaluate the performance of BSN-MAC by comparing it with the IEEE 802.15.4 MAC protocol using energy efficiency as the primary metric.

77 citations


Patent
12 Jan 2005
TL;DR: A low latency switch architecture for high performance packet-switched networks is proposed in this paper, which is a combination of input buffers capable of avoiding head-of-line blocking and an internal switch interconnect capable of allowing different input ports to access a single output simultaneously.
Abstract: A low latency switch architecture for high performance packet-switched networks which is a combination of input buffers capable of avoiding head-of-line blocking and an internal switch interconnect capable of allowing different input ports to access a single output simultaneously.

68 citations


Proceedings ArticleDOI
13 Jun 2005
TL;DR: An additional flow control is described which enhances the overall performance of the multi-stream DDR-SDRAM controller IP and demonstrates the superiority of this architecture with an FPGA based high-end video platform.
Abstract: Today high-end video and multimedia processing applications require huge amounts of memory. For cost reasons, the usage of conventional dynamic RAM (SDRAM) is preferred. However, SDRAM access optimization is a complex task, especially if multi-stream access with different QoS requirements is involved. In (Heithecker et al., 2003), a multi-stream DDR-SDRAM controller IP covering combinations of low latency requirements for processor cache access, hard real-time constraints for periodic video signals and hard real-time bursty accesses for video coprocessors was described. To handle these contradictory QoS requirements at high system performance, a combination of a 2-stage scheduling algorithm and static priorities were used. This paper describes an additional flow control which enhances the overall performance. Experiments with an FPGA based high-end video platform demonstrate the superiority of this architecture.

66 citations


Patent
25 May 2005
TL;DR: In this article, a video processor that uses a low latency pyramid processing technique for fusing images from multiple sensors (115) is enhanced (202, 204, 206, 208, 210), warped into alignment (212,214,216,218) and then fused with one another (236,238) in a manner that provides the fusing to occur within a single frame of video, i.e., sub-frame processing.
Abstract: A video processor (114) that uses a low latency pyramid processing technique for fusing images from multiple sensors (115). The imagery from multiple sensors (115) is enhanced (202, 204, 206, 208, 210), warped into alignment (212,214,216,218) and then fused with one another (236,238) in a manner that provides the fusing to occur within a single frame of video, i.e., sub-frame processing. Such sub-frame processing results in a sub-frame delay between a moment of capturing the images to the display of the fused imagery.

66 citations


Proceedings ArticleDOI
27 Jun 2005
TL;DR: The floating-point unit in the synergistic processor element of the 1st generation multi-core CELL processor is described, optimizing the performance critical single precision FMA operations, which are executed with a 6-cycle latency at an 11FO4 cycle time.
Abstract: The floating-point unit in the synergistic processor element of the 1st generation multi-core CELL processor is described. The FPU supports 4-way SIMD single precision and integer operations and 2-way SIMD double precision operations. The design required a high-frequency, low latency, power and area efficiency with primary application to the multimedia streaming workloads, such as 3D graphics. The FPU has 3 different latencies, optimizing the performance critical single precision FMA operations, which are executed with a 6-cycle latency at an 11FO4 cycle time. The latency includes the global forwarding of the result. These challenging performance, power, and area goals were achieved through the co-design of architecture and implementation with optimizations at all levels of the design. This paper focuses on the logical and algorithmic aspects of the FPU we developed, to achieve these goals.

64 citations


Patent
05 Aug 2005
TL;DR: In this paper, an asynchronous network interface and method of synchronisation between two applications on different computers is provided, where the network interface contains snooping hardware which can be programmed to contain triggering values comprising either addresses, address ranges or other data which are to be matched.
Abstract: Asynchronous network interface and method of synchronisation between two applications on different computers is provided. The network interface contains snooping hardware which can be programmed to contain triggering values comprising either addresses, address ranges or other data which are to be matched. These data are termed “trip wires”. Once programmed, the interface monitors the data stream, including address data, passing through the interface for addresses and data which match the trip wires which have been set. On a match, the snooping hardware can generate interrupts, increment event counters, or perform some other application-specified action. This snooping hardware is preferably based upon Content-Addressable Memory. The invention thus provides in-band synchronisation by using synchronisation primitives which are programmable by user level applications, while still delivering high bandwidth and low latency. The programming of the synchronisation primitives can be made by the sending and receiving applications independently of each other and no synchronisation information is required to traverse the network.

43 citations


Proceedings ArticleDOI
13 Jun 2005
TL;DR: Initial experimental results indicate that these two approaches can be composed to yield voice quality on par with the PSTN, and deployed on the routers of an application-level overlay network and require no changes to the underlying infrastructure.
Abstract: The cost savings and novel features associated with Voice over IP (VoIP) are driving its adoption by service providers. Such a transition however can successfully happen only if the quality and reliability offered is comparable to the existing PSTN. Unfortunately, the Internet's best effort service model provides no inherent quality of service guarantees. Because low latency and jitter is the key requirement for supporting high quality interactive conversations, VoIP applications use UDP to transfer data, thereby subjecting themselves to performance degradations caused by packet loss and network failures.In this paper we describe two algorithms to improve the performance of such VoIP applications. These mechanisms are used for localized packet loss recovery and rapid rerouting in the event of network failures. The algorithms are deployed on the routers of an application-level overlay network and require no changes to the underlying infrastructure. Initial experimental results indicate that these two approaches can be composed to yield voice quality on par with the PSTN.

Proceedings ArticleDOI
15 Nov 2005
TL;DR: The use of UDP as a binding protocol for SOAP provides throughput that is ten times higher compared to SOAP-over-HTTP in a wireless setting and using UDP to transport SOAP messages reduces transmission overhead by more than 50% compared to SoAP- over-HTTP.
Abstract: Existing Web services rely on HTTP and TCP as the underlying transport protocols for SOAP messaging. While these protocols provide a number of benefits, including being able to pass through firewalls and are universally supported across different platforms, they were designed for wired networks with high bandwidth, low latency and low error rate transmissions. Due to the variability of wireless channels however, these assumptions do not hold in wireless environments. In this paper, we investigate the performance of HTTP and TCP as transport protocols for SOAP in wireless environments. Through extensive testing, we show that SOAP-over-HTTP and SOAP-over-TCP are inefficient and lead to high latency and transmission overhead for wireless applications. To overcome these limitations, we study the use of UDP as a binding protocol for SOAP. The results obtained are promising and show that SOAP-over-UDP provides throughput that is ten times higher compared to SOAP-over-HTTP in a wireless setting. Furthermore, using UDP to transport SOAP messages reduces transmission overhead by more than 50% compared to SOAP-over-HTTP. Finally, to illustrate where UDP binding can be useful, example applications are also described in this paper

Patent
21 Nov 2005
TL;DR: In this paper, a pair of processing modules and methods that enable low latency communications between a data processing system and devices located at a remote graphic user interface across a standard shared network in accordance with the present invention is disclosed.
Abstract: A pair of processing modules and methods that enable low latency communications between a data processing system and devices located at a remote graphic user interface across a standard shared network in accordance with the present invention is disclosed. The present invention provides a method for communicating graphics data in a synchronous manner from the data processing system to the user. This method is used in conjunction with a feedback error recovery method to provide lossless, low-latency communications of graphics data across the network.

01 Dec 2005
TL;DR: This document examines this overhead, and addresses an architectural, IP- based "copy avoidance" solution for its elimination, by enabling Remote Direct Memory Access (RDMA).
Abstract: Overhead due to the movement of user data in the end-system network I/O processing path at high speeds is significant, and has limited the use of Internet protocols in interconnection networks, and the Internet itself -- especially where high bandwidth, low latency, and/or low overhead are required by the hosted application. This document examines this overhead, and addresses an architectural, IP- based "copy avoidance" solution for its elimination, by enabling Remote Direct Memory Access (RDMA). This memo provides information for the Internet community.

Proceedings ArticleDOI
17 Aug 2005
TL;DR: A focus is on solving the technical issues related to the electronic control path, and it is shown that it is feasible at the targeted design point.
Abstract: A crucial part of any high-performance computing system is its interconnection network. In the OSMOSIS project, Corning and IBM are jointly developing a demonstrator interconnect based on optical cell switching with electronic control. Starting from the core set of requirements, we present the system design rationale and show how it impacts the practical implementation. Our focus is on solving the technical issues related to the electronic control path, and we show that it is feasible at the targeted design point.

01 Jan 2005
TL;DR: Drawing on the experiences deploying Tor (the second-generation onion routing network), social challenges and technical issues that must be faced in building, deploying, and sustaining a scalable, distributed, low-latency anonymity network are described.
Abstract: There are many unexpected or unexpectedly difficult obstacles to deploying anonymous communications. Drawing on our experiences deploying Tor (the second-generation onion routing network), we describe social challenges and technical issues that must be faced in building, deploying, and sustaining a scalable, distributed, low-latency anonymity network.

Proceedings ArticleDOI
17 Jan 2005
TL;DR: In this article, a peer-to-peer streaming architecture called ACTIVE is proposed, which is based on the following observation: even in large group discussions only a fraction of the users are active at a given time.
Abstract: Peer-to-peer (P2P) streaming is emerging as a viable communications paradigm. Recent research has focused on building efficient and optimal overlay multicast trees at the application level. However, scant attention has been paid to interactive scenarios where the end-to-end delay is crucial. Furthermore, even algorithms that construct an optimal minimum spanning tree often make the unreasonable assumption that the processing time involved at each node is zero. However, these delays can add up to a significant amount of time after just a few overlay hops and make interactive applications difficult. In this paper, we introduce a novel peer-to-peer streaming architecture called ACTIVE that is based on the following observation. Even in large group discussions only a fraction of the users are active at a given time. We term these users, who have more critical demands for low-latency, active users. The ACTIVE system significantly reduces the end-to-end delay experienced among active users while at the same time being capable of providing streaming services to very large multicast groups. ACTIVE uses realistic processing assumptions at each node and dynamically optimizes the multicast tree while the group of active users changes over time. Consequently, it provides virtually all users with the low-latency service that before was only possible with a centralized approach. We present results that show the feasibility and performance of our approach.

Proceedings ArticleDOI
14 Aug 2005
TL;DR: This study proposes a new MAC layer protocol called MOBMAC to support mobility in WSNs, which uses an adaptive frame size approach to overcome the effect of frame losses caused by the Doppler shifts in mobile scenarios.
Abstract: Numerous MAC protocols have been proposed for stationary wireless sensor networks (WSNs). However, there have been very few approaches proposed to make the MAC layer in WSNs suitable for mobile scenarios. We propose a new MAC layer protocol called MOBMAC to support mobility in WSNs. MOBMAC uses an adaptive frame size approach to overcome the effect of frame losses caused by the Doppler shifts in mobile scenarios. An extended Kalman filter (EKF) is used to predict the frame size for each transmission, which enhances the energy efficiency of the system and minimizes latencies. Our study shows that under mobile scenarios, the MOBMAC reduces energy consumption by 60% and shows decrease of 25% in latency in comparison with the well known base protocol $SMAC. Our study also includes the contribution of a more realistic physical layer implementation model in ns-2, which processes the received frame based not only on the fading characteristics of the signal but also on the SNR and relative velocity between the communicating sensor nodes.

Journal ArticleDOI
01 Mar 2005
TL;DR: Preliminary benchmark results showing exciting performances similar or better than those found in high-end commercial network systems are discussed.
Abstract: Developed by the APE group, APENet is a new high speed, low latency, 3-dimensional interconnect architecture optimized for PC clusters running LQCD-like numerical applications. The hardware implementation is based on a single PCI-X 133MHz network interface card hosting six independent bi-directional channels with a peak bandwidth of 676 MB/s each direction. We discuss preliminary benchmark results showing exciting performances similar or better than those found in high-end commercial network systems.

Proceedings ArticleDOI
24 Oct 2005
TL;DR: XenoSearch is introduced and evaluated, a new distributed service for selecting the machines to host components of multi-node distributed systems and which is uniquely able to express and efficiently answer complex queries with inter-related location constraints.
Abstract: The high bandwidth and low latency of the modern internet has made possible the deployment of distributed computing platforms. The XenoServe platform provides a distributed computing platform open to all and presents three major new challenges for resource discovery: Firstly, network location is key for effectively provisioning services, to mitigate against high-latency, high-load or component failure. Secondly, many services require a presence on several servers, with inter-related requirements. Finally, as the platform is open with respect to users and servers, large numbers of queries and updates are expected.To address these requirements we introduce and evaluate XenoSearch, a new distributed service for selecting the machines to host components of multi-node distributed systems and which is uniquely able to express and efficiently answer complex queries with inter-related location constraints. We demonstrate that XenoSearch represents a trade-off between accuracy and query time which avoids exhaustive search and supports multiple resources. In addition the performance of the algorithm and the quality of its server selections is investigated and the performance of the distributed service shown to be invariant as the number of nodes or items indexed increases.

Patent
12 Aug 2005
TL;DR: In this article, it is determined that a processing system is to transition from a higher-power state to a lower power state, and system information may then be copied from a volatile memory device to a low-latency persistent memory device.
Abstract: According to some embodiments, it may be determined that a processing system is to transition from a higher-power state to a lower-power state. System information may then be copied from a volatile memory device to a low-latency persistent memory device, and it may be arranged for the processing system to transition from the higher-power state to the lower-power state.

Proceedings ArticleDOI
25 Sep 2005
TL;DR: FuzzyMAC is proposed, a CSMA/CA based MAC protocol that utilizes two separate fuzzy logic controllers to optimize both, MAC parameters and a sleeping schedule duty-cycle and some increased latency results from the adaptive sleeping clycles.
Abstract: Quality-of-Service, fairness, low power consumption, low latency and high throughput are all desirable attributes of medium access control (MAC) protocols. MAC protocols for sensor networks are designed with one principal attribute in mind - low power consumption. Minimizing power usage becomes the principal objective of MAC protocol design. This paper proposes FuzzyMAC, a CSMA/CA based MAC protocol that utilizes two separate fuzzy logic controllers to optimize both, MAC parameters and a sleeping schedule duty-cycle. A second fuzzy logic controller attempts to optimize the size of the contention window using three performance metrics as inputs. The primary goal of both fuzzy logic controllers is to ensure maximum power efficiency of the network, which is demonstrated by the results. However, some increased latency results from the adaptive sleeping clycles.

Patent
29 Jun 2005
TL;DR: In this paper, a data buffer that is a target for data received over a communication channel is examined, and a device associated with the channel is polled, to find, process, and return data transmitted over the channel.
Abstract: A data buffer that is a target for data received over a communication channel is examined, and a device associated with the communication channel is polled, to find, process, and return data transmitted over the channel. Other methods and apparatus to reduce network latency are described and claimed.

Proceedings ArticleDOI
16 May 2005
TL;DR: In this paper, a cross-layer measurement and link analysis strategy for video transport over IEEE 802.11g links is presented for streamed H.264 video over ad-hoc 802.
Abstract: This work introduces a cross-layer measurement and link analysis strategy for video transport over IEEE 802.11. Field trial measurement data is presented for streamed H.264 video over ad-hoc 802.11g links. The data is used to analyze the interactions between the physical/network/transport and application layers. The development of cross layer optimized low latency error resilient video transmission schemes is discussed.

Proceedings ArticleDOI
01 Jan 2005
TL;DR: The proposed BSN-MAC is an adaptive, feedback-based and IEEE 802.15.4-compatible MAC protocol that exploits the feedback information from the deployed sensors to form a closed-loop control of the MAC parameters.
Abstract: In this paper, a medium access control (MAC) protocol designed for body sensor network (BSN-MAC) is proposed. BSN-MAC is an adaptive, feedback-based and IEEE 802.15.4-compatible MAC protocol. Due to the traffic coupling and sensor diversity characteristics of BSNs, common MAC protocols can not satisfy the unique requirements of the biomedical sensors in BSN. BSN-MAC exploits the feedback information from the deployed sensors to form a closed-loop control of the MAC parameters. A control algorithm is proposed to enable the BSN coordinator to adjust parameters of the IEEE 802.15.4 superframe to achieve both energy efficiency and low latency on energy critical nodes. We evaluate the performance of BSN-MAC using energy efficiency as the primary metric

Proceedings ArticleDOI
02 Apr 2005
TL;DR: This paper compares performance - bandwidth, bandwidth density, latency and power consumption - of the package level transmission lines with conventional on-chip global interconnects for different ITRS technology nodes and shows package level interConnects are an effective alternative for on- chip global wiring.
Abstract: Scaling enhances intrinsic transistor performance and degrades interconnects. As the technology steps into nanometer era, global interconnects are becoming bottleneck for overall chip performance. In this paper, we show package level interconnects are an effective alternative for on-chip global wiring. These interconnects behave as LC transmission lines and can be exploited for their near speed of light transmission and low attenuation characteristics. We compare performance - bandwidth, bandwidth density, latency and power consumption - of the package level transmission lines with conventional on-chip global interconnects for different ITRS technology nodes. Based on these results, we show package level interconnects are well suited for power demanding low latency applications and we analyze different interconnect options like memory buses, long inter tile interconnects, clock and power distribution.

01 Jan 2005
TL;DR: Cluster computers—parallel com-puters built from commodity processors—are becoming the predominant supercomputerarchitecture because of their combined scal-able performance and attractive price.
Abstract: Cluster computers—parallel com-puters built from commodity processors—arebecoming the predominant supercomputerarchitecture because of their combined scal-able performance and attractive price. As ofJune 2005, 61 percent of the world’s top-500supercomputers were clusters (http://www.top500.org). This is a significant paradigmshift from a few decades ago, when super-computers were special purpose, like the Crayvector machines, and designers built themfrom expensive, custom components.Clusters that use commodity processors stillrequire high-performance, low-latency net-works, if their applications are fine-grained,or if the cluster has many processors. Clusterscan use commodity networks, such as GigabitEthernet, but these fall short in many scala-bility and performance aspects.

Patent
05 Jan 2005
TL;DR: In this paper, a look-ahead technique was used to improve the performance of a clock and data recovery (CDR) circuit by employing lookahead techniques to produce a low latency timing adjustment.
Abstract: The present invention enhances the performance of a clock and data recovery (CDR) circuit by employing look-ahead techniques to produce a low latency timing adjustment. In one example of the invention employed in a CDR circuit having a decimation filter processing the CDR's phase detector output, the invention uses the most significant bits of the decimation filter output to quickly determine a look-ahead adjustment.

Journal ArticleDOI
TL;DR: A high-throughput architecture of a decoder for structured LDPC codes is presented, thanks to the peculiar code definition and to the envisaged architecture featuring memory paging, and the support of different code rates is achieved with no significant hardware overhead.
Abstract: As an enhancement of the state-of-the-art solutions, a high-throughput architecture of a decoder for structured LDPC codes is presented in this paper. Thanks to the peculiar code definition and to the envisaged architecture featuring memory paging, the decoder is very flexible, and the support of different code rates is achieved with no significant hardware overhead. A top-down design flow of a real decoder is reported, starting from the analysis of the system performance in finite-precision arithmetic, up to the VLSI implementation details of the elementary modules. The synthesis of the whole decoder on 0.18 μm standard cells CMOS technology showed remarkable performances: small implementation loss (0.2 dB down to BER = 10-8), low latency (less than 6.0 μs), high useful throughput (up to 940 Mbps) and low complexity (about 375 Kgates).

Proceedings ArticleDOI
18 May 2005
TL;DR: An approach toreal-time Java based on ahead-of-time compilation is presented, and real-time properties and problems are examined, and optimizations in both the compiler and run-time system are presented.
Abstract: One of the main challenges in getting acceptance for safe object-oriented languages in hard real-time systems is to combine automatic memory management with hard real-time constraints, while providing adequate general execution performance. An approach to real-time Java based on ahead-of-time compilation is presented, and real-time properties and problems are examined. In particular, achieving both low latency and high throughput in an environment where neither the back-end compiler nor the scheduler is aware of automatic memory management is considered. Optimizations in both the compiler and run-time system, aimed at reducing the execution time overhead while still allowing very short latency times, is presented and experimentally verified.