Showing papers on "Latency (engineering) published in 2013"

PDF

Open Access

Proceedings Article•

Stronger semantics for low-latency geo-replicated storage

[...]

Wyatt Lloyd¹, Michael J. Freedman¹, Michael Kaminsky², David G. Andersen³•Institutions (3)

Princeton University¹, Intel², Carnegie Mellon University³

02 Apr 2013

TL;DR: The evaluation shows that the Eiger system achieves low latency, has throughput competitive with eventually-consistent and non-transactional Cassandra, and scales out to large clusters almost linearly (averaging 96% increases up to 128 server clusters).

...read moreread less

Abstract: We present the first scalable, geo-replicated storage system that guarantees low latency, offers a rich data model, and provides "stronger" semantics. Namely, all client requests are satisfied in the local datacenter in which they arise; the system efficiently supports useful data model abstractions such as column families and counter columns; and clients can access data in a causally-consistent fashion with read-only and write-only transactional support, even for keys spread across many servers. The primary contributions of this work are enabling scalable causal consistency for the complex columnfamily data model, as well as novel, non-blocking algorithms for both read-only and write-only transactions. Our evaluation shows that our system, Eiger, achieves low latency (single-ms), has throughput competitive with eventually-consistent and non-transactional Cassandra (less than 7% overhead for one of Facebook's real-world workloads), and scales out to large clusters almost linearly (averaging 96% increases up to 128 server clusters).

...read moreread less

284 citations

Proceedings Article•DOI•

TimeStream: reliable stream computation in the cloud

[...]

Zhengping Qian¹, Yong He², Chunzhi Su³, Zhuojie Wu³, Hongyu Zhu³, Taizhi Zhang¹, Lidong Zhou¹, Yuan Yu¹, Zheng Zhang¹ - Show less +5 more•Institutions (3)

Microsoft¹, South China University of Technology², Shanghai Jiao Tong University³

15 Apr 2013

TL;DR: This work advocates a powerful new abstraction called resilient substitution that caters to the specific needs in this new computation model to handle failure recovery and dynamic reconfiguration in response to load changes.

...read moreread less

Abstract: TimeStream is a distributed system designed specifically for low-latency continuous processing of big streaming data on a large cluster of commodity machines. The unique characteristics of this emerging application domain have led to a significantly different design from the popular MapReduce-style batch data processing. In particular, we advocate a powerful new abstraction called resilient substitution that caters to the specific needs in this new computation model to handle failure recovery and dynamic reconfiguration in response to load changes. Several real-world applications running on our prototype have been shown to scale robustly with low latency while at the same time maintaining the simple and concise declarative programming model. TimeStream handles an on-line advertising aggregation pipeline at a rate of 700,000 URLs per second with a 2-second delay, while performing sentiment analysis of Twitter data at a peak rate close to 10,000 tweets per second, with approximately 2-second delay.

...read moreread less

262 citations

Posted Content•

Low latency via redundancy

[...]

Ashish Vulimiri, P. Brighten Godfrey, Radhika Mittal, Justine Sherry, Sylvia Ratnasamy, Scott Shenker - Show less +2 more

16 Jun 2013-arXiv: Networking and Internet Architecture

TL;DR: In this paper, the authors argue that the use of redundancy is an effective way to convert extra capacity into reduced latency, and demonstrate empirically that replicating all operations can result in significant mean and tail latency reduction in real-world systems including DNS queries, database servers, and packet forwarding within networks.

...read moreread less

Abstract: Low latency is critical for interactive networked applications. But while we know how to scale systems to increase capacity, reducing latency --- especially the tail of the latency distribution --- can be much more difficult. In this paper, we argue that the use of redundancy is an effective way to convert extra capacity into reduced latency. By initiating redundant operations across diverse resources and using the first result which completes, redundancy improves a system's latency even under exceptional conditions. We study the tradeoff with added system utilization, characterizing the situations in which replicating all tasks reduces mean latency. We then demonstrate empirically that replicating all operations can result in significant mean and tail latency reduction in real-world systems including DNS queries, database servers, and packet forwarding within networks.

...read moreread less

184 citations

Proceedings Article•DOI•

Per-packet load-balanced, low-latency routing for clos-based data center networks

[...]

Jiaxin Cao¹, Rui Xia¹, Pengkun Yang², Chuanxiong Guo¹, Guohan Lu¹, Lihua Yuan¹, Yixin Zheng³, Haitao Wu¹, Yongqiang Xiong¹, David A. Maltz¹ - Show less +6 more•Institutions (3)

Microsoft¹, University of Illinois at Urbana–Champaign², Tsinghua University³

09 Dec 2013

TL;DR: Digit-Reversal Bouncing achieves perfect packet interleaving and results in smaller and bounded queues even when traffic load approaches 100%, and it uses smaller re-sequencing buffer for absorbing out-of-order packet arrivals.

...read moreread less

Abstract: Clos-based networks including Fat-tree and VL2 are being built in data centers, but existing per-flow based routing causes low network utilization and long latency tail. In this paper, by studying the structural properties of Fat-tree and VL2, we propose a per-packet round-robin based routing algorithm called Digit-Reversal Bouncing (DRB). DRB achieves perfect packet interleaving. Our analysis and simulations show that, compared with random-based load-balancing algorithms, DRB results in smaller and bounded queues even when traffic load approaches 100%, and it uses smaller re-sequencing buffer for absorbing out-of-order packet arrivals. Our implementation demonstrates that our design can be readily implemented with commodity switches. Experiments on our testbed, a Fat-tree with 54 servers, confirm our analysis and simulations, and further show that our design handles network failures in 1-2 seconds and has the desirable graceful performance degradation property.

...read moreread less

159 citations

Proceedings Article•DOI•

Low-latency localization by active LED markers tracking using a dynamic vision sensor

[...]

Andrea Censi¹, Jonas Strubel², Christian Brandli³, Tobi Delbruck³, Davide Scaramuzza² - Show less +1 more•Institutions (3)

California Institute of Technology¹, University of Zurich², ETH Zurich³

01 Nov 2013

TL;DR: A method for low-latency pose tracking using a DVS and Active Led Markers, which are LEDs blinking at high frequency (>1 KHz), which is compared to traditional pose tracking based on a CMOS camera.

...read moreread less

Abstract: At the current state of the art, the agility of an autonomous flying robot is limited by its sensing pipeline, because the relatively high latency and low sampling frequency limit the aggressiveness of the control strategies that can be implemented. To obtain more agile robots, we need faster sensing pipelines. A Dynamic Vision Sensor (DVS) is a very different sensor than a normal CMOS camera: rather than providing discrete frames like a CMOS camera, the sensor output is a sequence of asynchronous timestamped events each describing a change in the perceived brightness at a single pixel. The latency of such sensors can be measured in the microseconds, thus offering the theoretical possibility of creating a sensing pipeline whose latency is negligible compared to the dynamics of the platform. However, to use these sensors we must rethink the way we interpret visual data. This paper presents a method for low-latency pose tracking using a DVS and Active Led Markers (ALMs), which are LEDs blinking at high frequency (>1 KHz). The sensor's time resolution allows distinguishing different frequencies, thus avoiding the need for data association. This approach is compared to traditional pose tracking based on a CMOS camera. The DVS performance is not affected by fast motion, unlike the CMOS camera, which suffers from motion blur.

...read moreread less

93 citations

Proceedings Article•

Zoolander: Efficiently Meeting Very Strict, Low-Latency SLOs.

[...]

Christopher Stewart¹, Aniket Chakrabarti¹, Rean Griffith²•Institutions (2)

Ohio State University¹, VMware²

01 Jan 2013

TL;DR: Zoolander is presented, a key value store that meets strict, low latency service level objectives (SLOs), and scales out using replication for predictability, an old but seldom-used approach that uses redundant accesses to mask outlier response times.

...read moreread less

Abstract: Internet services access networked storage many times while processing a request. Just a few slow storage accesses per request can raise response times a lot, making the whole service less usable and hurting profits. This paper presents Zoolander, a key value store that meets strict, low latency service level objectives (SLOs). Zoolander scales out using replication for predictability, an old but seldom-used approach that uses redundant accesses to mask outlier response times. Zoolander also scales out using traditional replication and partitioning. It uses an analytic model to efficiently combine these competing approaches based on systems data and workload conditions. For example, when workloads under utilize system resources, Zoolander’s model often suggests replication for predictability, strengthening service levels by reducing outlier response times. When workloads use system resources heavily, causing large queuing delays, Zoolander’s model suggests scaling out via traditional approaches. We used a diurnal trace to test Zoolander at scale (up to 40M accesses per hour). Zoolander reduced SLO violations by 32%.

...read moreread less

73 citations

Proceedings Article•DOI•

Towards efficient traffic-analysis resistant anonymity networks

[...]

Stevens Le Blond¹, David Choffnes², Wenxuan Zhou³, Peter Druschel¹, Hitesh Ballani⁴, Paul Francis¹ - Show less +2 more•Institutions (4)

Max Planck Society¹, University of Washington², University of Illinois at Urbana–Champaign³, Microsoft⁴

27 Aug 2013

TL;DR: Aqua, a high-bandwidth anonymity system that resists traffic analysis, is presented, and it is shown that Aqua achieves latency low enough for efficient bulk TCP flows, bandwidth sufficient to carry BitTorrent traffic with reasonable efficiency, and resistance to traffic analysis within anonymity sets of hundreds of clients.

...read moreread less

Abstract: Existing IP anonymity systems tend to sacrifice one of low latency, high bandwidth, or resistance to traffic-analysis. High-latency mix-nets like Mixminion batch messages to resist traffic-analysis at the expense of low latency. Onion routing schemes like Tor deliver low latency and high bandwidth, but are not designed to withstand traffic analysis. Designs based on DC-nets or broadcast channels resist traffic analysis and provide low latency, but are limited to low bandwidth communication. In this paper, we present the design, implementation, and evaluation of Aqua, a high-bandwidth anonymity system that resists traffic analysis. We focus on providing strong anonymity for BitTorrent, and evaluate the performance of Aqua using traces from hundreds of thousands of actual BitTorrent users. We show that Aqua achieves latency low enough for efficient bulk TCP flows, bandwidth sufficient to carry BitTorrent traffic with reasonable efficiency, and resistance to traffic analysis within anonymity sets of hundreds of clients. We conclude that Aqua represents an interesting new point in the space of anonymity network designs.

...read moreread less

72 citations

First Low-Latency LIGO+Virgo Search for Binary Inspirals and Their Electromagnetic Counterparts

[...]

Larry R. Price

01 Jan 2013

TL;DR: In this article, the first low-latency search for gravitational-waves from binary inspirals in LIGO and Virgo data was conducted, and the resulting triggers were sent to electromagnetic observatories for followup.

...read moreread less

Abstract: Aims: The detection and measurement of gravitational-waves from coalescing neutron-star binary systems is an important science goal for ground-based gravitational-wave detectors. In addition to emitting gravitational-waves at frequencies that span the most sensitive bands of the LIGO and Virgo detectors, these sources are also amongst the most likely to produce an electromagnetic counterpart to the gravitational-wave emission. A joint detection of the gravitational-wave and electromagnetic signals would provide a powerful new probe for astronomy. Methods: During the period between September 19 and October 20, 2010, the first low-latency search for gravitational-waves from binary inspirals in LIGO and Virgo data was conducted. The resulting triggers were sent to electromagnetic observatories for followup. We describe the generation and processing of the low-latency gravitational-wave triggers. The results of the electromagnetic image analysis will be described elsewhere. Results: Over the course of the science run, three gravitational-wave triggers passed all of the low-latency selection cuts. Of these, one was followed up by several of our observational partners. Analysis of the gravitational-wave data leads to an estimated false alarm rate of once every 6.4 days, falling far short of the requirement for a detection based solely on gravitational-wave data.

...read moreread less

64 citations

Patent•

Hybrid systems and methods for low-latency user input processing and feedback

[...]

Daniel Wigdor, Steven Leonard Sanders, Ricardo Jorge Jota Costa, Clifton Forlines

04 Oct 2013

TL;DR: In this article, a system for processing user input includes an input device, an input processing unit, a high latency subsystem, a low-latency subsystem, and an output device.

...read moreread less

Abstract: A system for processing user input includes an input device, an input processing unit, a high- latency subsystem, a low-latency subsystem, input processing unit software for generating signals in response to user inputs, and an output device. The low-latency subsystem receives the signals and generates low-latency output and the high-latency subsystem processes the signals and generates high-latency output.

...read moreread less

58 citations

Book Chapter•DOI•

Networked Performances and Natural Interaction via LOLA: Low Latency High Quality A/V Streaming System

[...]

Carlo Drioli¹, Claudio Allocchio, Nicola Buso•Institutions (1)

University of Udine¹

08 Apr 2013

TL;DR: LTA (LOw LAtency audio visual streaming system), a system for distributed performing arts interaction over advanced packet networks, demonstrated its effectiveness and suitability for distance musical interaction, even when professional players are involved and very ”tempo sensitive” classical baroque music repertoire is concerned.

...read moreread less

Abstract: We present LOLA (LOw LAtency audio visual streaming system), a system for distributed performing arts interaction over advanced packet networks. It is intended to operate on high performance networking infrastructures, and is based on low latency audio/video acquisition hardware and on the integration and optimization of audio/video data acquisition, presentation and transmission. The extremely low round trip delay of the transmitted data makes the system suitable for remote musical education, real time distributed musical performance and performing arts activities, but in general also for any human-human interactive distributed activity in which timing and responsiveness are critical factors for the quality of the interaction. The experimentation conducted so far with professional music performers and skilled music students, on geographical distances up to 3500 Km, demonstrated its effectiveness and suitability for distance musical interaction, even when professional players are involved and very ”tempo sensitive” classical baroque music repertoire is concerned.

...read moreread less

48 citations

Proceedings Article•DOI•

Centralized buffer router: A low latency, low power router for high radix NOCs

[...]

Syed Minhaj Hassan¹, Sudhakar Yalamanchili¹•Institutions (1)

Georgia Institute of Technology¹

21 Apr 2013

TL;DR: This paper proposes centralized elastic bubble router - a router micro-architecture based on the use of centralized buffers with elastic buffered links that enables end-to-end latency reduction via high radix switches with low overall buffer requirements.

...read moreread less

Abstract: While router buffers have been used as performance multipliers, they are also major consumers of area and power in on-chip networks. In this paper, we propose centralized elastic bubble router - a router micro-architecture based on the use of centralized buffers (CB) with elastic buffered (EB) links. At low loads, the CB is power gated, bypassed, and optimized to produce single cycle operation. A novel extension to bubble flow control enables routing deadlock and message dependent deadlock to be avoided with the same mechanism having constant buffer size per router independent of the number of message types. This solution enables end-to-end latency reduction via high radix switches with low overall buffer requirements. Comparisons made with other low latency routers across different topologies show consistent performance improvement, for example 26% improvement in no load latency of a 2D Mesh and 4X improvement in saturation throughput in a 2D-Generalized Hypercube.

...read moreread less

Journal Article•DOI•

A low latency path diversity mechanism for sender-oriented broadcast protocols in VANETs

[...]

Celimuge Wu¹, Satoshi Ohzahata¹, Toshihiko Kato¹•Institutions (1)

University of Electro-Communications¹

01 Sep 2013

TL;DR: A mechanism which utilizes path diversity so that broadcast messages can be disseminated with a short delay and a high reliability compared with the acknowledgment based retransmission approach and the message overhead is low.

...read moreread less

Abstract: In vehicular ad hoc networks, many applications require a low latency and high reliability especially the safety applications. Reliable multi-hop broadcast protocols have been widely discussed recently. However, most of them use explicit acknowledgments and timeout retransmissions to provide reliability. The retransmission method incurs delays when a packet loss cannot be detected on time. Acknowledgment messages also increase the MAC layer contention time at each node. In order to provide a high reliability and low latency with a low overhead, we propose a mechanism which utilizes path diversity. In the proposed mechanism, a message is delivered through two different paths. By cooperation of these different paths, broadcast messages can be disseminated with a short delay and a high reliability compared with the acknowledgment based retransmission approach. Since the proposed mechanism does not use any explicit acknowledgment message, the message overhead is low. We evaluate the proposed mechanism using both theoretical analysis and computer simulations.

...read moreread less

Posted Content•

Low Latency Datacenter Networking: A Short Survey

[...]

Shuhao Liu, Hong Xu, Zhiping Cai

12 Dec 2013-arXiv: Networking and Internet Architecture

TL;DR: A taxonomy to categorize existing work based on four main techniques, reducing queue length, accelerating retransmissions, prioritizing mice flows, and exploiting multi-path is proposed.

...read moreread less

Abstract: Datacenters are the cornerstone of the big data infrastructure supporting numerous online services. The demand for interactivity, which significantly impacts user experience and provider revenue, is translated into stringent timing requirements for flows in datacenter networks. Thus low latency networking is becoming a major concern of both industry and academia. We provide a short survey of recent progress made by the networking community for low latency datacenter networks. We propose a taxonomy to categorize existing work based on four main techniques, reducing queue length, accelerating retransmissions, prioritizing mice flows, and exploiting multi-path. Then we review select papers, highlight the principal ideas, and discuss their pros and cons. We also present our perspectives of the research challenges and opportunities, hoping to aspire more future work in this space.

...read moreread less

Journal Article•DOI•

NaNet: a flexible and configurable low-latency NIC for real-time trigger systems based on GPUs

[...]

Roberto Ammendola, Andrea Biagioni, Ottorino Frezza, Gianluca Lamanna¹, Alessandro Lonardo, F. Lo Cicero, Pierluigi Paolucci, Felice Pantaleo, Davide Rossetti, Francesco Simula, M. Sozzi¹, Laura Tosoratto, Piero Vicini - Show less +9 more•Institutions (1)

CERN¹

16 Nov 2013-arXiv: Instrumentation and Detectors

TL;DR: NaNet is an FPGA-based PCIe X8 Gen2 NIC supporting 1/10 GbE links and the custom 34 Gbps APElink channel, making it suitable for building low-latency, real-time GPU-based computing systems.

...read moreread less

Abstract: NaNet is an FPGA-based PCIe X8 Gen2 NIC supporting 1/10 GbE links and the custom 34 Gbps APElink channel. The design has GPUDirect RDMA capabilities and features a network stack protocol offloading module, making it suitable for building low-latency, real-time GPU-based computing systems. We provide a detailed description of the NaNet hardware modular architecture. Benchmarks for latency and bandwidth for GbE and APElink channels are presented, followed by a performance analysis on the case study of the GPU-based low level trigger for the RICH detector in the NA62 CERN experiment, using either the NaNet GbE and APElink channels. Finally, we give an outline of project future activities.

...read moreread less

Proceedings Article•DOI•

A high-throughput FPGA architecture for parallel connected components analysis based on label reuse

[...]

Michael J. Klaiber¹, Donald G. Bailey², Silvia Ahmed¹, Yousef Baroud¹, Sven Simon¹ - Show less +1 more•Institutions (2)

University of Stuttgart¹, Massey University²

01 Dec 2013

TL;DR: A memory efficient architecture for single-pass connected components analysis suited for high throughput embedded image processing systems is proposed which achieves a high throughput by partitioning the image into several vertical slices processed in parallel.

...read moreread less

Abstract: A memory efficient architecture for single-pass connected components analysis suited for high throughput embedded image processing systems is proposed which achieves a high throughput by partitioning the image into several vertical slices processed in parallel. The low latency of the architecture allows reuse of labels associated with the image objects. This reduces the amount of memory by a factor of more than 5 compared to previous work. This is significant, since memory is a critical resource in embedded image processing on FPGAs.

...read moreread less

Journal Article•DOI•

Energy-neutral scheduling and forwarding in environmentally-powered wireless sensor networks

[...]

Alvin Valera¹, Wee-Seng Soh¹, Hwee-Pink Tan²•Institutions (2)

National University of Singapore¹, Institute for Infocomm Research Singapore²

01 May 2013

TL;DR: Expected transmission delay (ETD), a metric that simultaneously considers sleep latency and wireless link quality, is formulated and it is shown that the metric is left-monotonic and left-isotonic, proving that its use in distributed algorithms such as the distributed Bellman-Ford yields consistent, loop-free and optimal paths.

...read moreread less

Abstract: In environmentally-powered wireless sensor networks (EPWSNs), low latency wakeup scheduling and packet forwarding is challenging due to dynamic duty cycling, posing time-varying sleep latencies and necessitating the use of dynamic wakeup schedules. We show that the variance of the intervals between receiving wakeup slots affects the expected sleep latency: when the variance of the intervals is low (high), the expected latency is low (high). We therefore propose a novel scheduling scheme that uses the bit-reversal permutation sequence (BRPS) - a finite integer sequence that positions receiving wakeup slots as evenly as possible to reduce the expected sleep latency. At the same time, the sequence serves as a compact representation of wakeup schedules thereby reducing storage and communication overhead. But while low latency wakeup schedule can reduce per-hop delay in ideal conditions, it does not necessarily lead to low latency end-to-end paths because wireless link quality also plays a significant role in the performance of packet forwarding. We therefore formulate expected transmission delay (ETD), a metric that simultaneously considers sleep latency and wireless link quality. We show that the metric is left-monotonic and left-isotonic, proving that its use in distributed algorithms such as the distributed Bellman-Ford yields consistent, loop-free and optimal paths. We perform extensive simulations using real-world energy harvesting traces to evaluate the performance of the scheduling and forwarding scheme.

...read moreread less

Journal Article•DOI•

The design of ultra scalable MPI collective communication on the K computer

[...]

Tomoya Adachi¹, Naoyuki Shida¹, Kenichi Miura¹, Shinji Sumimoto¹, Atsuya Uno, Motoyoshi Kurokawa, Fumiyoshi Shoji, Mitsuo Yokokawa - Show less +4 more•Institutions (1)

Fujitsu¹

01 May 2013-Computer Science - Research and Development

TL;DR: The evaluation results on up to 55,296 nodes of the K computer show the new implementation of MPI collective communication outperforms the existing one for long messages by a factor of 4 to 11 times and shows the short-message algorithms complement the long-message ones.

...read moreread less

Abstract: This paper proposes the design of ultra scalable MPI collective communication for the K computer, which consists of 82,944 computing nodes and is the world's first system over 10 PFLOPS. The nodes are connected by a Tofu interconnect that introduces six dimensional mesh/torus topology. Existing MPI libraries, however, perform poorly on such a direct network system since they assume typical cluster environments. Thus, we design collective algorithms optimized for the K computer. On the design of the algorithms, we place importance on collision-freeness for long messages and low latency for short messages. The long-message algorithms use multiple RDMA network interfaces and consist of neighbor communication in order to gain high bandwidth and avoid message collisions. On the other hand, the short-message algorithms are designed to reduce software overhead, which comes from the number of relaying nodes. The evaluation results on up to 55,296 nodes of the K computer show the new implementation outperforms the existing one for long messages by a factor of 4 to 11 times. It also shows the short-message algorithms complement the long-message ones.

...read moreread less

Journal Article•DOI•

Low-Latency Shack–Hartmann Wavefront Sensor Based on an Industrial Smart Camera

[...]

Markus Thier¹, Rene Paris¹, Thomas Thurner², Georg Schitter¹•Institutions (2)

Vienna University of Technology¹, Graz University of Technology²

01 May 2013-IEEE Transactions on Instrumentation and Measurement

TL;DR: This paper proposes a versatile Shack–Hartmann WFS based on an industrial smart camera for high-performance measurements of wavefront deformations, using a low-cost field-programmable gate array as the parallel processing platform.

...read moreread less

Abstract: Wavefront sensing is important in various optical measurement systems, particularly in the field of adaptive optics (AO). For AO systems, the sampling rate, as well as the latency time, of the wavefront sensors (WFSs) imposes a restriction on the overall achievable temporal resolution. In this paper, we propose a versatile Shack–Hartmann WFS based on an industrial smart camera for high-performance measurements of wavefront deformations, using a low-cost field-programmable gate array as the parallel processing platform. The proposed wavefront reconstruction adds a processing latency of only 740 ns for calculating wavefront characteristics from the pixel stream of the image sensor, providing great potential for demanding AO system designs.

...read moreread less

Patent•

Methods and devices for communications systems using multiplied rate transmission

[...]

Jan Bajcsy¹, Yong-jin Kim¹, Aminata A. Garba¹•Institutions (1)

McGill University¹

29 Apr 2013

TL;DR: In this article, the authors propose variable bandwidth allocations such that smaller frequency sub-bands are allocated to users, as their number increases, but the individual users/nodes insert more data-carrying signals in order to compensate for the loss of operating bandwidth arising from the accommodation of more users.

...read moreread less

Abstract: Cost, electronic circuitry limitations, and communication channel behaviour yield communication systems with strict bandwidth constraints. Hence, maximally utilizing available bandwidth is crucial, for example in wireless networks, to supporting ever increasing numbers of users and their demands for increased data volumes, low latency, and high download speeds. Accordingly, it would be beneficial for such networks to support variable bandwidth allocations such that smaller frequency sub-bands are allocated to users, as their number increases, but the individual users/nodes insert more data-carrying signals in order to compensate for the loss of operating bandwidth arising from the accommodation of more users. It would further be beneficial for transmitters and receivers according to embodiments of such a network architecture to be based upon low cost design methodologies allowing their deployment within a wide range of applications including high volume, low cost consumer electronics for example.

...read moreread less

Journal Article•DOI•

Low Latency Fault Tolerance System

[...]

Wenbing Zhao¹, Peter M. Melliar-Smith², Louise E. Moser²•Institutions (2)

Cleveland State University¹, University of California, Santa Barbara²

01 Jun 2013-The Computer Journal

TL;DR: The Low Latency Fault Tolerance (LLFT) system provides fault tolerance for distributed applications, using the leader-follower replication technique, and achieves low latency message delivery during normal operation and low latency reconfiguration and recovery when a fault occurs.

...read moreread less

Abstract: The low latency fault tolerance (LLFT) system provides fault tolerance for distributed applications within a local-area network, using a leader–follower replication strategy. LLFT provides application-transparent replication, with strong replica consistency, for applications that involve multiple interacting processes or threads. Its novel system model enables LLFT to maintain a single consistent infinite computation, despite faults and asynchronous communication. The LLFT messaging protocol provides reliable, totally ordered message delivery by employing a group multicast, where the message ordering is determined by the primary replica in the destination group. The leader-determined membership protocol provides reconfiguration and recovery when a replica becomes faulty and when a replica joins or leaves a group, where the membership of the group is determined by the primary replica. The virtual determinizer framework captures the ordering information at the primary replica and enforces the same ordering of non-deterministic operations at the backup replicas. LLFT does not employ a majority-based, multiple-round consensus algorithm and, thus, it can operate in the common industrial case where there is a primary replica and only one backup replica. The LLFT system achieves low latency message delivery during normal operation and low latency reconfiguration and recovery when a fault occurs.

...read moreread less

Journal Article•DOI•

Packet switching optical network-on-chip architectures

[...]

Lei Zhang¹, Emma E. Regentova¹, Xianfang Tan¹•Institutions (1)

University of Nevada, Las Vegas¹

01 Feb 2013-Computers & Electrical Engineering

TL;DR: Simulation and analysis results show that the proposed architectures can be considered as a viable solution for future NoCs and yield highly scalabilities, high bandwidth, low latency and low power consumption.

...read moreread less

Patent•

Low-latency touch-input device

[...]

Timothy Large¹, Steven Bathiche¹, Paul Henry Dietz¹, Bernard K. Rihn¹•Institutions (1)

Microsoft¹

23 Jan 2013

TL;DR: In this paper, a low-latency touch-input device receives writing as input to the device and temporarily displays the writing on a physical layer that overlays a touchscreen display of the device.

...read moreread less

Abstract: This document describes embodiments of a low-latency touch-input device. The low-latency touch-input device receives writing as input to the device and temporarily displays the writing on a physical layer that overlays a touchscreen display of the device. The writing is displayed instantaneously on the physical layer before the touch-input device processes the input. The low-latency touch-input device then processes the input to generate a digital representation of the writing and renders the digital representation of the writing on the touchscreen display to replace the writing displayed on the physical layer.

...read moreread less

Journal Article•DOI•

Architecture of a Low Latency Image Rectification Engine for Stereoscopic 3-D HDTV Processing

[...]

H. Hubert¹, Benno Stabernack¹, Frederik Zilly¹•Institutions (1)

Heinrich Hertz Institute¹

01 May 2013-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: A hardware image rectification engine, which supports the processing of stereo high-definition serial digital interfaces video streams with up to 1080p30 video with a latency below 1 ms.

...read moreread less

Abstract: The emerging market of digital 3-D film productions in HD resolution leads to the need for high-quality equipment in the production chain The incoming video streams of the two cameras require an image rectification due to unavoidable misalignments within the stereoscopic camera setup This rectification can either take place in postprocessing of the recorded material or it can be applied in real time during the shooting Especially in the case of streaming and recording of live events, real-time processing is necessary and, additionally, the system has to provide a very low latency We present a hardware image rectification engine, which supports the processing of stereo high-definition serial digital interfaces video streams with up to 1080p30 video with a latency below 1 ms The image rectification engines for the two channels are implemented on two Altera Stratix III EP3SL340 running at 7425 MHz They can be controlled by the stereoscopy analysis software, which calculates the parameters required for the image rectification at runtime

...read moreread less

DOI•

A 240x180 120dB 10mW 12us-latency sparse output vision sensor for mobile applications

[...]

R Berner, Christian Brandli, Minhao Yang, Shih-Chii Liu, Tobi Delbruck - Show less +1 more

16 Jun 2013

TL;DR: A CMOS vision sensor that combines event-driven asychronous read out of temporal contrast with synchronous frams-based active pixel sensor (APS) readout of intensity that allows low latency at low data rate and low system-level power consumption is proposed.

...read moreread less

Abstract: This paper proposes a CMOS vision sensor that combines event-driven asychronous readout of temporal contrast with synchronous frams-based active pixel sensor (APS) readout of intensity. The image frames can be used for scene content analysis and the temporal constrast events can be used to track fast moving objects, to adjust the frame rate, or to guide a region of interest readout Therefore the sensor is suitable for mobile applications because it allows low latency at low data rate and low system-level power consumption. Sharing the photodiode for both readout types allows a compact pixel design that is 60% smaller than a comparable design. The 240x180 sensor has a power consumption of 10mW. It is built in 0.18um technology with 18.5um pixels. The temporal contrast pathway has a minimum latency of 12us, a dynamic range of 120dB, 12% contrast detection threshold and 3.5% contrast matching. The APS readout has 55dB dynamic range with 1% FPN.

...read moreread less

Journal Article•DOI•

A two phased service oriented Broker for replica selection in data grids

[...]

Rafah M. Almuttairi¹, Rajeev Wankar¹, Atul Negi¹, C. R. Rao¹, Arun Agarwal¹, Rajkumar Buyya² - Show less +2 more•Institutions (2)

University UCINF¹, University of Melbourne²

01 Jun 2013-Future Generation Computer Systems

TL;DR: This work's concern is designing a Two phased Service Oriented Broker (2SOB) for replica selection, and it is possible to achieve an enhancement in the speed of executing Data Grid jobs through reducing the transfer time.

...read moreread less

Proceedings Article•DOI•

High throughput, low latency, memory optimized 64K point FFT architecture using novel radix-4 butterfly unit

[...]

S Kala¹, S. Nalesh¹, Arka Maity², S. K. Nandy¹, Ranjani Narayan - Show less +1 more•Institutions (2)

Indian Institute of Science¹, National Institute of Technology, Durgapur²

19 May 2013

TL;DR: A fully parallel 64K point radix-44 FFT processor that shows significant reduction in intermediate memory but with increased hardware complexity and reduced latency with comparable throughput and area is proposed.

...read moreread less

Abstract: In this paper we propose a fully parallel 64K point radix-44 FFT processor. The radix-44 parallel unrolled architecture uses a novel radix-4 butterfly unit which takes all four inputs in parallel and can selectively produce one out of the four outputs. The radix-44 block can take all 256 inputs in parallel and can use the select control signals to generate one out of the 256 outputs. The resultant 64K point FFT processor shows significant reduction in intermediate memory but with increased hardware complexity. Compared to the state-of-art implementation [5], our architecture shows reduced latency with comparable throughput and area. The 64K point FFT architecture was synthesized using a 130nm CMOS technology which resulted in a throughput of 1.4 GSPS and latency of 47.7μs with a maximum clock frequency of 350MHz. When compared to [5], the latency is reduced by 303μs with 50.8% reduction in area.

...read moreread less

Proceedings Article•

Dynamic interval polling and pipelined post I/O processing for low-latency storage class memory

[...]

Dong In Shin, Young Jin Yu¹, Hyeong S. Kim¹, Jae Woo Choi, Do Yung Jung, Heon Y. Yeom¹ - Show less +2 more•Institutions (1)

Seoul National University¹

27 Jun 2013

TL;DR: This work presents new cooperative schemes including software and hardware to address performance issues with deploying storage-class memory technologies as a storage device, including a new polling scheme called dynamic interval polling and a pipelined execution between storage device and host OS called pipelining post I/O processing.

...read moreread less

Abstract: Emerging non-volatile memory technologies as a disk drive replacement raise some issues of software stack and interfaces, which have not been considered in disk-based storage systems. In this work, we present new cooperative schemes including software and hardware to address performance issues with deploying storage-class memory technologies as a storage device. First, we propose a new polling scheme called dynamic interval polling to avoid the unnecessary polls and reduce the burden on storage system bus. Second, we propose a pipelined execution between storage device and host OS called pipelined post I/O processing. By extending vendor-specific I/O interfaces between software and hardware, we can improve the responsiveness of I/O requests with no sacrifice of throughput.

...read moreread less

Proceedings Article•DOI•

On achieving low latency in data centers

[...]

Ali Munir, Ihsan Ayyub Qazi¹, Saad Qaisar•Institutions (1)

Lahore University of Management Sciences¹

09 Jun 2013

TL;DR: RACS is proposed, a data center transport protocol that minimizes flow completion times by approximating the Shortest Remaining Processing Time (SRPT) scheduling policy, which is known to be optimal, in a distributed manner.

...read moreread less

Abstract: Today's data centers face extreme challenges in providing low latency for online services such as web search, social networking, and recommendation systems. Achieving low latency is important as it impacts user experience, which in turn impacts operator revenue. However, most current congestion control protocols approximate Processor Sharing (PS), which is known to be sub-optimal for minimizing latency. In this paper, we propose Router Assisted Capacity Sharing (RACS), a data center transport protocol that minimizes flow completion times by approximating the Shortest Remaining Processing Time (SRPT) scheduling policy, which is known to be optimal, in a distributed manner. With RACS, flows are assigned weights which determine their relative priority and thus the rate assigned to them. By changing these weights, RACS can approximate a range of scheduling disciplines. Through extensive ns-2 simulations, we demonstrate that RACS outperforms TCP, DCTCP, and RCP in data center environments. In particular, it improves completion times by up to 95% over TCP, 88% over DCTCP, and 80% over RCP. Our results also show that RACS can outperform deadline-aware transport protocols for typical data center workloads.

...read moreread less

Patent•

Methods and systems for pin-efficient memory controller interface using vector signaling codes for chip-to-chip communication

[...]

John T. Fox, Brian Holden, Amin Shokrollahi, Anant Singh, Giuseppe Surace - Show less +1 more

16 Dec 2013

TL;DR: In this paper, the authors describe a system and methods for transmitting data over physical channels to provide a high speed, low latency interface such as between a memory controller and memory devices.

...read moreread less

Abstract: Systems and methods are described for transmitting data over physical channels to provide a high speed, low latency interface such as between a memory controller and memory devices. Controller-side and memory-side embodiments of such channel interfaces are disclosed which require a low pin count and have low power utilization. In some embodiments of the invention, different voltage, current, etc. levels are used for signaling and more than two levels may be used, such as a vector signaling code wherein each wire signal may take on one of four signal values.

...read moreread less

Patent•

Mass small file low latency storage method based on HBase

[...]

Wei Chao, Yan Xiaozheng, Luan Jiangxia

14 Aug 2013

TL;DR: In this paper, a mass small file low latency storage method based on HBase is proposed, where a small file list comprising a row primary key and two column families is established on the condition of Hadoop and HBase; storage environment suitable for small files is established, an application process including small file writing, small file inserting and small file reading is provided, and further reasonable storage and low latency reading and writing of the mass small files are realized.

...read moreread less

Abstract: The invention provides a mass small file low latency storage method based on HBase. A small file list comprising a row primary key and two column families is established on the condition of Hadoop and HBase; storage environment suitable for small files is established, an application process including small file writing, small file inserting and small file reading is provided; and further reasonable storage and low latency reading and writing of the mass small files are realized, and practical requirements are met.

...read moreread less