scispace - formally typeset
Search or ask a question
Topic

Latency (engineering)

About: Latency (engineering) is a research topic. Over the lifetime, 7278 publications have been published within this topic receiving 115409 citations. The topic is also known as: lag.


Papers
More filters
Patent
22 Feb 2002
TL;DR: In this paper, a multi-dimensional traffic classification scheme is proposed, in which multiple orthogonal traffic classification methods are successively implemented for each traffic stream traversing the system.
Abstract: A method and system for conveying an arbitrary mixture of high and low latency traffic streams across a common switch fabric implements a multi-dimensional traffic classification scheme, in which multiple orthogonal traffic classification methods are successively implemented for each traffic stream traversing the system. At least two diverse paths are mapped through the switch fabric, each path being optimized to satisfy respective different latency requirements. A latency classifier is adapted to route each traffic stream to a selected path optimized to satisfy latency requirements most closely matching a respective latency requirement of the traffic stream. A prioritization classifier independently prioritizes traffic streams in each path. A fairness classifier at an egress of each path can be used to enforce fairness between responsive and non-responsive traffic streams in each path. This arrangement enables traffic streams having similar latency requirements to traverse the system through a path optimized for those latency requirements.

166 citations

Proceedings ArticleDOI
01 Jul 2019
TL;DR: This paper propose a prefix-to-prefix framework for multaneous translation that implicitly learns to anticipate in a single translation model, which achieves low latency and reasonable qual- ity (compared to full-sentence translation) on 4 directions.
Abstract: Simultaneous translation, which translates sentences before they are finished, is use- ful in many scenarios but is notoriously dif- ficult due to word-order differences. While the conventional seq-to-seq framework is only suitable for full-sentence translation, we pro- pose a novel prefix-to-prefix framework for si- multaneous translation that implicitly learns to anticipate in a single translation model. Within this framework, we present a very sim- ple yet surprisingly effective “wait-k” policy trained to generate the target sentence concur- rently with the source sentence, but always k words behind. Experiments show our strat- egy achieves low latency and reasonable qual- ity (compared to full-sentence translation) on 4 directions: zh↔en and de↔en.

163 citations

Proceedings ArticleDOI
24 Feb 2014
TL;DR: This work proposes Ubik, a dynamic partitioning technique that predicts and exploits the transient behavior of latency-critical workloads to maintain their tail latency while maximizing the cache space available to batch applications.
Abstract: Chip-multiprocessors (CMPs) must often execute workload mixes with different performance requirements. On one hand, user-facing, latency-critical applications (e.g., web search) need low tail (i.e., worst-case) latencies, often in the millisecond range, and have inherently low utilization. On the other hand, compute-intensive batch applications (e.g., MapReduce) only need high long-term average performance. In current CMPs, latency-critical and batch applications cannot run concurrently due to interference on shared resources. Unfortunately, prior work on quality of service (QoS) in CMPs has focused on guaranteeing average performance, not tail latency. In this work, we analyze several latency-critical workloads, and show that guaranteeing average performance is insufficient to maintain low tail latency, because microarchitectural resources with state, such as caches or cores, exert inertia on instantaneous workload performance. Last-level caches impart the highest inertia, as workloads take tens of milliseconds to warm them up. When left unmanaged, or when managed with conventional QoS frameworks, shared last-level caches degrade tail latency significantly. Instead, we propose Ubik, a dynamic partitioning technique that predicts and exploits the transient behavior of latency-critical workloads to maintain their tail latency while maximizing the cache space available to batch applications. Using extensive simulations, we show that, while conventional QoS frameworks degrade tail latency by up to 2.3x, Ubik simultaneously maintains the tail latency of latency-critical workloads and significantly improves the performance of batch applications.

163 citations

Patent
07 Jul 2008
TL;DR: In this paper, a network device with integrated functionalities and a cache is provided that stores policy information to reduce the amount of signaling that is necessary to setup and teardown sessions.
Abstract: Systems and methods for reducing latency in call setup and teardown are provided. A network device with integrated functionalities and a cache is provided that stores policy information to reduce the amount of signaling that is necessary to setup and teardown sessions. By handling various aspects of the setup and teardown within a network device, latency is reduced and the amount of bandwidth needed for setup signaling is also reduced.

161 citations

Journal ArticleDOI
TL;DR: In four experiments, subjects freely recalled previously studied items while a voice key and computer recorded each item’s recall latency relative to the onset of the recall period, suggesting that retrieval includes a brief normally distributed initiation stage followed by a longer exponentially distributed search stage.
Abstract: In four experiments, subjects freely recalled previously studied items while a voice key and computer recorded each item’s recall latency relative to the onset of the recall period. The measures of recall probability and mean recall latency were shown to be empirically independent, demonstrating that there exists no a priori relationship between the two. In all four experiments, latency distributions were fit well by the ex-Gaussian, suggesting that retrieval includes a brief normally distributed initiation stage followed by a longer exponentially distributed search stage. Further, the variation in mean latency stemmed from the variation in the duration of the search stage, not the initiation stage. Interresponse times (IRTs), the time elapsed between two successive item recalls, were analyzed as well. The growth of mean IRTs, plotted as a function of output position, was shown to be a simple function of the number of items not yet recalled. Finally, the mathematical nature of both free recall latency and IRT growth are shown to be consistent with a simple theoretical account of retrieval that depicts mean recall latency as a measure of the breadth of search.

161 citations


Network Information
Related Topics (5)
The Internet
213.2K papers, 3.8M citations
75% related
Node (networking)
158.3K papers, 1.7M citations
75% related
Wireless
133.4K papers, 1.9M citations
74% related
Server
79.5K papers, 1.4M citations
74% related
Network packet
159.7K papers, 2.2M citations
74% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20222
2021485
2020529
2019533
2018500
2017405