scispace - formally typeset
Search or ask a question

Showing papers on "Latency (engineering) published in 2020"


Proceedings ArticleDOI
04 May 2020
TL;DR: In this article, a first-pass Recurrent Neural Network Transducer (RNN-T) model and a second-pass Listen, Attend, Spell (LAS) rescorer were developed.
Abstract: Thus far, end-to-end (E2E) models have not been shown to outperform state-of-the-art conventional models with respect to both quality, i.e., word error rate (WER), and latency, i.e., the time the hypothesis is finalized after the user stops speaking. In this paper, we develop a first-pass Recurrent Neural Network Transducer (RNN-T) model and a second-pass Listen, Attend, Spell (LAS) rescorer that surpasses a conventional model in both quality and latency. On the quality side, we incorporate a large number of utterances across varied domains [1] to increase acoustic diversity and the vocabulary seen by the model. We also train with accented English speech to make the model more robust to different pronunciations. In addition, given the increased amount of training data, we explore a varied learning rate schedule. On the latency front, we explore using the end-of-sentence decision emitted by the RNN-T model to close the microphone, and also introduce various optimizations to improve the speed of LAS rescoring. Overall, we find that RNN-T+LAS offers a better WER and latency tradeoff compared to a conventional model. For example, for the same latency, RNN-T+LAS obtains a 8% relative improvement in WER, while being more than 400-times smaller in model size.

167 citations


Journal ArticleDOI
TL;DR: This work proposes a load balancing scheme in a fog network to minimize the latency of data flows in the communications and processing procedures by associating IoT devices to suitable BSs and proves the convergence and the optimality of the proposed workload balancing scheme.
Abstract: As latency is the key performance metric for IoT applications, fog nodes co-located with cellular base stations can move the computing resources close to IoT devices Therefore, data flows of IoT devices can be offloaded to fog nodes in their proximity, instead of the remote cloud, for processing However, the latency of data flows in IoT devices consist of both the communications latency and computing latency Owing to the spatial and temporal dynamics of IoT device distributions, some BSs and fog nodes are lightly loaded, while others, which may be overloaded, may incur congestion Thus, the traffic load allocation among base stations (BSs) and computing load allocation among fog nodes affect the communications latency and computing latency of data flows, respectively To solve this problem, we propose a workload balancing scheme in a fog network to minimize the latency of data flows in the communications and processing procedures by associating IoT devices to suitable BSs We further prove the convergence and the optimality of the proposed workload balancing scheme Through extensive simulations, we have compared the performance of the proposed load balancing scheme with other schemes and verified its advantages for fog networking

142 citations


Journal ArticleDOI
TL;DR: This work introduces a customized implementation of the genetic algorithm (GA) as a heuristic approach to schedule the IoT requests to achieve the objective of minimizing the overall latency.

138 citations


Journal ArticleDOI
TL;DR: This work provides an integrated framework for partial offloading and interference management using orthogonal frequency-division multiple access (OFDMA) scheme and proposes a novel scheme named Joint Partial Offloading and Resource Allocation (JPORA), with aim to reduce the task execution latency.
Abstract: We consider Device-to-Device (D2D)-enabled mobile edge computing offloading scenario, where a device can partially offload its computation task to the edge server or exploit the computation resources of proximal devices. Keeping in view the millisecond-scale latency requirement in 5G service scenarios and the spectrum scarcity, we focus on minimizing the sum of task execution latency of all the devices in a shared spectrum with interference. In particular, we provide an integrated framework for partial offloading and interference management using orthogonal frequency-division multiple access (OFDMA) scheme. Accordingly, we formulate total latency minimization as a mixed integer nonlinear programming (MINLP) problem by considering desired energy consumption, partial offloading, and resource allocation constraints. We use decomposition approach to solve our problem and propose a novel scheme named Joint Partial Offloading and Resource Allocation (JPORA). With aim to reduce the task execution latency, JPORA iteratively adjusts data segmentation and solves the underlying problem of quality of service (QoS)-aware communication resource allocation to the cellular links, and interference-aware communication resource allocation to D2D links. Extensive evaluation results demonstrate that JPORA achieves the lowest latency as compared to the other baseline schemes, meanwhile limiting the local energy consumption of user devices.

126 citations


Posted Content
TL;DR: This paper employs differentiable NAS (DNAS) to search for models with low memory usage and low op count, where op count is treated as a viable proxy to latency, and obtains state-of-the-art results for all three TinyMLperf industry-standard benchmark tasks.
Abstract: Executing machine learning workloads locally on resource constrained microcontrollers (MCUs) promises to drastically expand the application space of IoT. However, so-called TinyML presents severe technical challenges, as deep neural network inference demands a large compute and memory budget. To address this challenge, neural architecture search (NAS) promises to help design accurate ML models that meet the tight MCU memory, latency and energy constraints. A key component of NAS algorithms is their latency/energy model, i.e., the mapping from a given neural network architecture to its inference latency/energy on an MCU. In this paper, we observe an intriguing property of NAS search spaces for MCU model design: on average, model latency varies linearly with model operation (op) count under a uniform prior over models in the search space. Exploiting this insight, we employ differentiable NAS (DNAS) to search for models with low memory usage and low op count, where op count is treated as a viable proxy to latency. Experimental results validate our methodology, yielding our MicroNet models, which we deploy on MCUs using Tensorflow Lite Micro, a standard open-source NN inference runtime widely used in the TinyML community. MicroNets demonstrate state-of-the-art results for all three TinyMLperf industry-standard benchmark tasks: visual wake words, audio keyword spotting, and anomaly detection.

105 citations


Proceedings ArticleDOI
Bo Li1, Shuo-Yiin Chang1, Tara N. Sainath1, Ruoming Pang1, Yanzhang He1, Trevor Strohman1, Yonghui Wu1 
24 Apr 2020
TL;DR: This work proposes to reduce E2E model’s latency by extending the RNN-T endpointer (RNN- T EP) model with additional early and late penalties and achieves 8.0% relative word error rate (WER) reduction and 130ms 90-percentile latency reduction over [2] on a Voice Search test set.
Abstract: End-to-end (E2E) models fold the acoustic, pronunciation and language models of a conventional speech recognition model into one neural network with a much smaller number of parameters than a conventional ASR system, thus making it suitable for on-device applications. For example, recurrent neural network transducer (RNN-T) as a streaming E2E model has shown promising potential for on-device ASR [1]. For such applications, quality and latency are two critical factors. We propose to reduce E2E model’s latency by extending the RNN-T endpointer (RNN-T EP) model [2] with additional early and late penalties. By further applying the minimum word error rate (MWER) training technique [3], we achieved 8.0% relative word error rate (WER) reduction and 130ms 90-percentile latency reduction over [2] on a Voice Search test set. We also experimented with a second-pass Listen, Attend and Spell (LAS) rescorer [4]. Although it did not directly improve the first pass latency, the large WER reduction provides extra room to trade WER for latency. RNN-T EP+LAS, together with MWER training brings in 18.7% relative WER reduction and 160ms 90-percentile latency reductions compared to the original proposed RNN-T EP [2] model.

92 citations


Posted Content
TL;DR: A first-pass Recurrent Neural Network Transducer model and a second-pass Listen, Attend, Spell (LAS) rescorer that surpasses a conventional model in both quality and latency is developed and found that RNN-T+LAS offers a better WER and latency tradeoff compared to a conventional models.
Abstract: Thus far, end-to-end (E2E) models have not been shown to outperform state-of-the-art conventional models with respect to both quality, i.e., word error rate (WER), and latency, i.e., the time the hypothesis is finalized after the user stops speaking. In this paper, we develop a first-pass Recurrent Neural Network Transducer (RNN-T) model and a second-pass Listen, Attend, Spell (LAS) rescorer that surpasses a conventional model in both quality and latency. On the quality side, we incorporate a large number of utterances across varied domains to increase acoustic diversity and the vocabulary seen by the model. We also train with accented English speech to make the model more robust to different pronunciations. In addition, given the increased amount of training data, we explore a varied learning rate schedule. On the latency front, we explore using the end-of-sentence decision emitted by the RNN-T model to close the microphone, and also introduce various optimizations to improve the speed of LAS rescoring. Overall, we find that RNN-T+LAS offers a better WER and latency tradeoff compared to a conventional model. For example, for the same latency, RNN-T+LAS obtains a 8% relative improvement in WER, while being more than 400-times smaller in model size.

78 citations


Journal ArticleDOI
26 Nov 2020
TL;DR: The causes and effects of latency with regard to cybersickness are described, different existing approaches to measure and report latency are reported on, and readers are provided with the knowledge to understand and reports latency for their own applications, evaluations, and experiments.
Abstract: Latency is a key characteristic inherent to any computer system. Motion-to-Photon (MTP) latency describes the time between the movement of a tracked object and its corresponding movement rendered and depicted by computer-generated images on a graphical output screen. High MTP latency can cause a loss of performance in interactive graphics applications and, even worse, can provoke cybersickness in Virtual Reality (VR) applications. Here, cybersickness can deteriorate VR experiences or it may render the experiences completely unusable. It can confound research findings of an otherwise sound experiment. Latency, as a contributing factor to cybersickness needs to be properly understood. Its effects need to be analyzed, its sources need to be identified, good measurement methods need to be developed, and proper counter measures need to be developed in order to reduce potentially harmful impacts of latency on the usability and safety of VR systems. Research shows that latency can exhibit intricate timing patterns with various spiking and periodic behavior. These timing behaviors may vary, still most are found to provoke cybersickness. Overall, latency can differ drastically between different systems which hinders generalization of measurement results. This review article describes causes and effects of latency with regard to cybersickness. We report on different existing approaches to measure and report latency. Hence, the article provides readers with the knowledge to understand and report latency for their own applications, evaluations, and experiments. It should also help to measure, identify, and finally control and counteract latency and hence gain confidence into the soundness of empirical data collected by VR exposures. Low latency increases the usability and safety of VR systems.

61 citations


Journal ArticleDOI
TL;DR: Simulation results show that the proposed cluster based Data Aggregation Scheme for Latency and Packet Loss Reduction in WSN reduces the latency and overhead and increases the packet delivery ratio and residual energy.

59 citations


Proceedings ArticleDOI
12 Oct 2020
TL;DR: This paper introduces InferLine, a system which provisions and manages the individual stages of prediction pipelines to meet end-to-end tail latency constraints while minimizing cost and generalizes across state-of-the-art model serving frameworks.
Abstract: Serving ML prediction pipelines spanning multiple models and hardware accelerators is a key challenge in production machine learning. Optimally configuring these pipelines to meet tight end-to-end latency goals is complicated by the interaction between model batch size, the choice of hardware accelerator, and variation in the query arrival process.In this paper we introduce InferLine, a system which provisions and manages the individual stages of prediction pipelines to meet end-to-end tail latency constraints while minimizing cost. InferLine consists of a low-frequency combinatorial planner and a high-frequency auto-scaling tuner. The low-frequency planner leverages stage-wise profiling, discrete event simulation, and constrained combinatorial search to automatically select hardware type, replication, and batching parameters for each stage in the pipeline. The high-frequency tuner uses network calculus to auto-scale each stage to meet tail latency goals in response to changes in the query arrival process. We demonstrate that InferLine outperforms existing approaches by up to 7.6x in cost while achieving up to 34.5x lower latency SLO miss rate on realistic workloads and generalizes across state-of-the-art model serving frameworks.

51 citations


Journal ArticleDOI
Youmin Chen1, Youyou Lu1, Kedong Fang1, Qing Wang1, Jiwu Shu1 
01 Jul 2020
TL;DR: A B-tree variant named μTree, which incorporates a shadow list-based layer to the leaf nodes of a B- tree to gain benefits from both list and tree data structures, and achieves a 99th percentile latency that is one order of magnitude lower and 2.7 times higher throughput.
Abstract: Tail latency is a critical design issue in recent storage systems. B+-tree, as a fundamental building block in storage systems, incurs high tail latency, especially when placed in persistent memory (PM). Our empirical study specifies two factors that lead to such latency spikes: (i) the internal structural refinement operations (i.e., split, merge, and balance), and (ii) the interference between concurrent operations. The problem is even worse when high concurrency meets with the low write bandwidth of persistent memory.In this paper, we propose a B+-tree variant named μTree. It incorporates a shadow list-based layer to the leaf nodes of a B+-tree to gain benefits from both list and tree data structures. The list layer in PM is exempt from the structural refinement operations since list nodes in the list layer own separate PM spaces, which are organized in an element-based way. Meanwhile, μTree still gains the locality benefit from the tree-based nodes. To alleviate the interference overhead, μTree coordinates the concurrency control between the tree and list layer, which moves the slow PM accesses out of the critical path. We compare μTree to state-of-the-art designs of PM-aware B+-tree indices under both YCSB workload and real-world applications. μTree achieves a 99th percentile latency that is one order of magnitude lower and 2.8 - 4.7 times higher throughput.

Journal ArticleDOI
TL;DR: During herpes simplex virus (HSV) latency, the viral genome is harbored in peripheral neurons in the absence of infectious virus but with the potential to restart infection.
Abstract: During herpes simplex virus (HSV) latency, the viral genome is harbored in peripheral neurons in the absence of infectious virus but with the potential to restart infection. Advances in epigenetics have helped explain how viral gene expression is largely inhibited during latency. Paradoxically, at the same time, the view that latency is entirely silent has been eroding. This low-level noise has implications for our understanding of HSV latency and should not be ignored.

Journal ArticleDOI
28 Jan 2020
TL;DR: This article forms a joint optimization problem of the offloading decision, the local computation capability, and the computing resource allocation of fog node to minimize the task completion time with energy constraint, in which practically considers M/M/1 waiting queues in the wireless channel and fog node.
Abstract: The rapid growth of the number of sensing devices enables computation offloading to be a promising solution to alleviate the burden of core network communication and provide low delay serv...

Journal ArticleDOI
05 Aug 2020-Cortex
TL;DR: It is found that correlations between the P3 latency and the SSRT are indeed replicable, but also unspecific, suggesting that these manifest effects are driven by underlying latent processes other than inhibition, such as behavioral adaptations in context of performance monitoring operations.

Journal ArticleDOI
TL;DR: Important components of 5G will be discussed, all the important aspects like latency, MIMO, cell distribution, speed, mmWave, slicing, Spectrum will be briefly described which will form a new Platform for all the upcoming technologies.
Abstract: The increasing rate of connectivity is driving towards a new era of virtual reality which is possible with the high end speed internet. The connectivity between every object is forming a shape of latest technology, i.e. IoT (Internet of Things) which requires the enormous networking connection. Comparing to all those previous technologies, the outrageous demands which are placed on 5G are continuing getting very high, with data rates of up to 20 Gbps and the capacity a thousand times greater. 5G networks will be providing a flexible platform for the upcoming services such as IoT, Artificial Intelligence, Cloud computing, Natural Language Processing, machine communication and all the latest technologies. In this research paper important components of 5G will be discussed. All the important aspects like latency, MIMO, cell distribution, speed, mmWave, slicing, Spectrum will be briefly described which will form a new Platform for all the upcoming technologies.

Proceedings ArticleDOI
16 Jun 2020
TL;DR: In this article, the authors propose data content aware PCM writes (DATACON), a new mechanism that reduces the latency and energy of phase change memory writes by redirecting these requests to overwrite memory locations containing all-zeros or all-ones.
Abstract: Phase change memory (PCM) is a scalable non-volatile memory technology that has low access latency (like DRAM) and high capacity (like Flash). Writing to PCM incurs significantly higher latency and energy penalties compared to reading its content. A prominent characteristic of PCM’s write operation is that its latency and energy are sensitive to the data to be written as well as the content that is overwritten. We observe that overwriting unknown memory content can incur significantly higher latency and energy compared to overwriting known all-zeros or all-ones content. This is because all-zeros or all-ones content is overwritten by programming the PCM cells only in one direction, i.e., using either SET or RESET operations, not both. In this paper, we propose data content aware PCM writes (DATACON), a new mechanism that reduces the latency and energy of PCM writes by redirecting these requests to overwrite memory locations containing all-zeros or all-ones. DATACON operates in three steps. First, it estimates how much a PCM write access would benefit from overwriting known content (e.g., all-zeros, or all-ones) by comprehensively considering the number of set bits in the data to be written, and the energy-latency trade-offs for SET and RESET operations in PCM. Second, it translates the write address to a physical address within memory that contains the best type of content to overwrite, and records this translation in a table for future accesses. We exploit data access locality in work- loads to minimize the address translation overhead. Third, it re-initializes unused memory locations with known all- zeros or all-ones content in a manner that does not interfere with regular read and write accesses. DATACON overwrites unknown content only when it is absolutely necessary to do so. We evaluate DATACON with workloads from state- of-the-art machine learning applications, SPEC CPU2017, and NAS Parallel Benchmarks. Results demonstrate that DATACON improves the effective access latency by 31%, overall system performance by 27%, and total memory system energy consumption by 43% compared to the best of performance-oriented state-of-the-art techniques.

Journal ArticleDOI
TL;DR: In this article, an AI-assisted SMART is presented to address the information latency optimization problem in wireless networked control systems, and case studies of typical applications (i.e., dense platooning and intersection management) are demonstrated.
Abstract: The 5G Phase-2 and beyond wireless systems will focus more on vertical applications such as autonomous driving and the Industrial Internet of Things, many of which are categorized as uRLLC. In this article, an alternative view of uRLLC is presented, information latency, measuring the distortion of information resulting from time lag of its acquisition process, which is more relevant than conventional communication latency of uRLLC in wireless networked control systems. An AI-assisted SMART is presented to address the information latency optimization challenge. Case studies of typical applications (i.e., dense platooning and intersection management) in AD are demonstrated, which show that SMART can effectively optimize information latency, and more importantly, information latency-optimized systems outperform conventional uRLLC-oriented systems significantly in terms of AD performance such as traffic efficiency, thus pointing out a new research and system design paradigm.

Journal ArticleDOI
TL;DR: The relationship between latency, throughput, and resource consumption, and the performance impact of adding different common operations to the pipeline is analyzed and the results show that the latency disadvantages of using a micro-batch system are most apparent for stateless operations.
Abstract: The increasing need for real-time insights in data sparked the development of multiple stream processing frameworks. Several benchmarking studies were conducted in an effort to form guidelines for identifying the most appropriate framework for a use case. In this article, we extend this research and present the results gathered. In addition to Spark Streaming and Flink, we also include the emerging frameworks Structured Streaming and Kafka Streams. We define four workloads with custom parameter tuning. Each of these is optimized for a certain metric or for measuring performance under specific scenarios such as bursty workloads. We analyze the relationship between latency, throughput, and resource consumption and we measure the performance impact of adding different common operations to the pipeline. To ensure correct latency measurements, we use a single Kafka broker. Our results show that the latency disadvantages of using a micro-batch system are most apparent for stateless operations. With more complex pipelines, customized implementations can give event-driven frameworks a large latency advantage. Due to its micro-batch architecture, Structured Streaming can handle very high throughput at the cost of high latency. Under tight latency SLAs, Flink sustains the highest throughput. Additionally, Flink shows the least performance degradation when confronted with periodic bursts of data. When a burst of data needs to be processed right after startup, however, micro-batch systems catch up faster while event-driven systems output the first events sooner.

Journal ArticleDOI
Xiaoqun Yu1, Hai Qiu, Shuping Xiong1
TL;DR: The proposed novel hybrid ConvLSTM model has great potential to be embedded into wearable inertial sensor-based systems to predict pre-impact fall in real-time so that protective devices could be triggered in time to prevent fall-related injuries for older people.
Abstract: Falls in the elderly is a major public health concern due to its high prevalence, serious consequences and heavy burden on the society. Many falls in older people happen within a very short time, which makes it difficult to predict a fall before it occurs and then to provide protection for the person who is falling. The primary objective of this study was to develop deep neural networks for predicting a fall during its initiation and descending but before the body impacts to the ground so that a safety mechanism can be enabled to prevent fall-related injuries. We divided the falling process into three stages (non-fall, pre-impact fall and fall) and developed deep neutral networks to perform three-class classification. Three deep learning models, convolutional neural network (CNN), long short term memory (LSTM), and a novel hybrid model integrating both convolution and long short term memory (ConvLSTM) were proposed and evaluated on a large public dataset of various falls and activities of daily living (ADL) acquired with wearable inertial sensors (accelerometer and gyroscope). Fivefold cross validation results showed that the hybrid ConvLSTM model had mean sensitivities of 93.15, 93.78, and 96.00% for non-fall, pre-impact fall and fall, respectively, which were higher than both LSTM (except the fall class) and CNN models. ConvLSTM model also showed higher specificities for all three classes (96.59, 94.49, and 98.69%) than LSTM and CNN models. In addition, latency test on a microcontroller unit showed that ConvLSTM model had a short latency of 1.06 ms, which was much lower than LSTM model (3.15 ms) and comparable with CNN model (0.77 ms). High prediction accuracy (especially for pre-impact fall) and low latency on the microboard indicated that the proposed hybrid ConvLSTM model outperformed both LSTM and CNN models. These findings suggest that our proposed novel hybrid ConvLSTM model has great potential to be embedded into wearable inertial sensor-based systems to predict pre-impact fall in real-time so that protective devices could be triggered in time to prevent fall-related injuries for older people.

Proceedings ArticleDOI
Hirofumi Inaguma1, Yashesh Gaur2, Liang Lu2, Jinyu Li2, Yifan Gong2 
04 May 2020
TL;DR: This work proposes several strategies during training by leveraging external hard alignments extracted from the hybrid model to reduce latency, and investigates to utilize the alignments in both the encoder and the decoder.
Abstract: Recently, a few novel streaming attention-based sequence-to-sequence (S2S) models have been proposed to perform online speech recognition with linear-time decoding complexity. However, in these models, the decisions to generate tokens are delayed compared to the actual acoustic boundaries since their unidirectional encoders lack future information. This leads to an inevitable latency during inference. To alleviate this issue and reduce latency, we propose several strategies during training by leveraging external hard alignments extracted from the hybrid model. We investigate to utilize the alignments in both the encoder and the decoder. On the encoder side, (1) multi-task learning and (2) pre-training with the framewise classification task are studied. On the decoder side, we (3) remove inappropriate alignment paths beyond an acceptable latency during the alignment marginalization, and (4) directly min-imize the differentiable expected latency loss. Experiments on the Cortana voice search task demonstrate that our proposed methods can significantly reduce the latency, and even improve the recognition accuracy in certain cases on the decoder side. We also present some analysis to understand the behaviors of streaming S2S models.

Journal ArticleDOI
TL;DR: A new efficient scheduling algorithm (ETDMA-GA) based on Genetic Algorithm (GA) minimizes the latency of communication where two dimensional encoding representations are designed to allocate slots and minimized the total network latency using a proposed fitness function.
Abstract: Data collection is a major operation in Wireless Sensor Networks (WSNs) and minimizing the delay in transmitting the collected data is critical for a lot of applications where specific actions depend on the required deadline, such as event-based mission-critical applications. Scheduling algorithms such as Time Division Multiple Access (TDMA) are extensively used for data delivery with the aim of minimizing the time duration for transporting data to the sink. To minimize the average latency and the average normalized latency in TDMA, we propose a new efficient scheduling algorithm (ETDMA-GA) based on Genetic Algorithm(GA). ETDMA-GA minimizes the latency of communication where two dimensional encoding representations are designed to allocate slots and minimizes the total network latency using a proposed fitness function. The simulation results show that the performance of the proposed algorithm outperforms the existing state-of-the-art approaches such as Rand-LO, Depth-LO, DepthRe-LO, IDegRe-LO, and IDeg-LO in terms of average latency, average normalized latency, and average schedule length.

Journal ArticleDOI
03 Mar 2020
TL;DR: A new generic 5G‐based IoV architecture named IoVs based on fifth generation communications is suggested to enhance data dissemination by the promising and efficient transmission technologies such as fifth‐generation network properties, software‐defined network functionalities (SDN), and cloud‐fog computing services.
Abstract: With the increase of the number of connected vehicles on roads, the data dissemination within traditional network suffers from many limitations like high latency, a significant amount of d...

Journal ArticleDOI
TL;DR: Three criteria have been set forth to define latency and differentiate it from persistent or abortive infection: 1) persistence of the viral genome, 2) limited viral gene expression with no viral particle production, and 3) the ability to reactivate to a lytic cycle.
Abstract: Latency establishment is the hallmark feature of herpesviruses, a group of viruses, of which nine are known to infect humans. They have co-evolved alongside their hosts, and mastered manipulation of cellular pathways and tweaking various processes to their advantage. As a result, they are very well adapted to persistence. The members of the three subfamilies belonging to the family Herpesviridae differ with regard to cell tropism, target cells for the latent reservoir, and characteristics of the infection. The mechanisms governing the latent state also seem quite different. Our knowledge about latency is most complete for the gammaherpesviruses due to previously missing adequate latency models for the alpha and beta-herpesviruses. Nevertheless, with advances in cell biology and the availability of appropriate cell-culture and animal models, the common features of the latency in the different subfamilies began to emerge. Three criteria have been set forth to define latency and differentiate it from persistent or abortive infection: 1) persistence of the viral genome, 2) limited viral gene expression with no viral particle production, and 3) the ability to reactivate to a lytic cycle. This review discusses these criteria for each of the subfamilies and highlights the common strategies adopted by herpesviruses to establish latency.

Journal ArticleDOI
TL;DR: A novel framework for performance analysis and design of relay selection schemes in millimeter-wave based multi-hop vehicle-to-vehicle (V2V) communications to facilitate ultra-reliable low-latency information sharing among AVs is proposed.
Abstract: Timely and reliable information sharing among autonomous vehicles (AVs) provides a promising approach for reducing traffic congestion and improving traffic efficiency in future intelligent transportation systems In this paper, we consider millimeter-wave (mmWave) based multi-hop vehicle-to-vehicle (V2V) communications to facilitate ultra-reliable low-latency information sharing among AVs We propose a novel framework for performance analysis and design of relay selection schemes in mmWave multi-hop V2V communications, while taking into account the mmWave signal propagation characteristics, road topology, and traffic conditions In particular, considering the minimum tracking distance requirement of road traffic, the headway, ie, the distance between adjacent AVs, is modeled as shifted-exponential distribution Moreover, we model the communication path losses using the Manhattan distance metric in the taxicab geometry, which can more accurately capture the characteristics of mmWave signal propagation in urban grid roads than conventional Euclidean distance geometry Based on the proposed model, we investigate the latency and reliability of mmWave multi-hop V2V communications for three widely adopted relay selection schemes, ie, random with forward progress (RFP), most forward with fixed radius (MFR), and nearest with forward progress (NFP), respectively Furthermore, we propose a novel relay selection scheme for joint optimization of the single-hop forward progress (FP) and single-hop latency according to the AVs’ instantaneous locations and an estimate of the residual multi-hop latency Simulation results show that, by balancing the current single-hop latency and the residual multi-hop latency for the multi-hop V2V network, the proposed relay selection scheme significantly outperforms the MFR, NFP and RFP in both multi-hop transmission latency and reliability of mmWave V2V communications

Journal ArticleDOI
01 Mar 2020
TL;DR: A tractable model of NB-IoT connectivity is developed, comprising message exchanges in random-access, control, and data channels, and it is confirmed that channel scheduling and coexistence of coverage classes significantly affect latency and battery lifetime performance of IoT devices.
Abstract: Narrowband Internet-of-Things (NB-IoT) offers a significant link budget improvement in comparison with the legacy networks by introducing different coverage classes, allowing repeated transmissions, and tuning the repetition order based on the path-loss in communications. However, those repetitions necessarily increase energy consumption and latency in the whole NB-IoT system. The extent to which the whole system is affected depends on the scheduling of the uplink and downlink channels. We address this question, not treated previously, by developing a tractable model of NB-IoT connectivity, comprising message exchanges in random-access, control, and data channels. The model is then used to analyze the impact of channel scheduling and interaction of coverage classes on the performance of IoT devices through the derivation of the expected latency and battery lifetime. These results are subsequently employed in determining the optimized operation points, i.e., (i) scheduling of data and control channels for a given set of users and respective coverage classes, or (ii) determining the optimal set of coverage classes and served users per coverage class for a given scheduling strategy. Simulations results show the validity of the analysis and confirm that channel scheduling and coexistence of coverage classes significantly affect latency and battery lifetime performance of NB-IoT devices.

Book ChapterDOI
23 Aug 2020
TL;DR: This paper rethink three freedoms of differentiable NAS, i.e. operation-level, depth-level and width- level, and proposes a novel method, named Three-Freedom NAS (TF-NAS), to achieve both good classification accuracy and precise latency constraint.
Abstract: With the flourish of differentiable neural architecture search (NAS), automatically searching latency-constrained architectures gives a new perspective to reduce human labor and expertise. However, the searched architectures are usually suboptimal in accuracy and may have large jitters around the target latency. In this paper, we rethink three freedoms of differentiable NAS, i.e. operation-level, depth-level and width-level, and propose a novel method, named Three-Freedom NAS (TF-NAS), to achieve both good classification accuracy and precise latency constraint. For the operation-level, we present a bi-sampling search algorithm to moderate the operation collapse. For the depth-level, we introduce a sink-connecting search space to ensure the mutual exclusion between skip and other candidate operations, as well as eliminate the architecture redundancy. For the width-level, we propose an elasticity-scaling strategy that achieves precise latency constraint in a progressively fine-grained manner. Experiments on ImageNet demonstrate the effectiveness of TF-NAS. Particularly, our searched TF-NAS-A obtains 76.9% top-1 accuracy, achieving state-of-the-art results with less latency. Code is available at https://github.com/AberHu/TF-NAS.

Journal ArticleDOI
TL;DR: In this article, the authors investigated the content service provision of information-centric vehicular networks (ICVNs) from the aspect of mobile edge caching, considering the dynamic driving-related context information.
Abstract: In this paper, the content service provision of information-centric vehicular networks (ICVNs) is investigated from the aspect of mobile edge caching, considering the dynamic driving-related context information. To provide up-to-date information with low latency, two schemes are designed for cache update and content delivery at the roadside units (RSUs). The roadside unit centric (RSUC) scheme decouples cache update and content delivery through bandwidth splitting, where the cached content items are updated regularly in a round-robin manner. The request adaptive (ReA) scheme updates the cached content items upon user requests with certain probabilities. The performance of both proposed schemes are analyzed, whereby the average age of information (AoI) and service latency are derived in closed forms. Surprisingly, the AoI-latency trade-off does not always exist, and frequent cache update can degrade both performances. Thus, the RSUC and ReA schemes are further optimized to balance the AoI and latency. Extensive simulations are conducted on SUMO and OMNeT++ simulators, and the results show that the proposed schemes can reduce service latency by up to 80% while guaranteeing content freshness in heavily loaded ICVNs.

Journal ArticleDOI
TL;DR: A VR simulator of a forestry crane used for loading logs onto a truck is investigated, to study the effects of latency on the subjective experience, with regards to delays in the crane control interface.
Abstract: In this article, we have investigated a VR simulator of a forestry crane used for loading logs onto a truck. We have mainly studied the Quality of Experience (QoE) aspects that may be relevant for task completion, and whether there are any discomfort related symptoms experienced during the task execution. QoE experiments were designed to capture the general subjective experience of using the simulator, and to study task performance. The focus was to study the effects of latency on the subjective experience, with regards to delays in the crane control interface. Subjective studies were performed with controlled delays added to the display update and hand controller (joystick) signals. The added delays ranged from 0 to 30 ms for the display update, and from 0 to 800 ms for the hand controller. We found a strong effect on latency in the display update and a significant negative effect for 800 ms added delay on latency in the hand controller (in total approx. 880 ms latency including the system delay). The Simulator Sickness Questionnaire (SSQ) gave significantly higher scores after the experiment compared to before the experiment, but a majority of the participants reported experiencing only minor symptoms. Some test subjects ceased the test before finishing due to their symptoms, particularly due to the added latency in the display update.

Posted Content
Hirofumi Inaguma1, Yashesh Gaur2, Liang Lu2, Jinyu Li2, Yifan Gong2 
TL;DR: In this paper, the authors propose several strategies during training by leveraging external hard alignments extracted from the hybrid model to alleviate the inevitable latency during inference and reduce the expected expected latency loss.
Abstract: Recently, a few novel streaming attention-based sequence-to-sequence (S2S) models have been proposed to perform online speech recognition with linear-time decoding complexity. However, in these models, the decisions to generate tokens are delayed compared to the actual acoustic boundaries since their unidirectional encoders lack future information. This leads to an inevitable latency during inference. To alleviate this issue and reduce latency, we propose several strategies during training by leveraging external hard alignments extracted from the hybrid model. We investigate to utilize the alignments in both the encoder and the decoder. On the encoder side, (1) multi-task learning and (2) pre-training with the framewise classification task are studied. On the decoder side, we (3) remove inappropriate alignment paths beyond an acceptable latency during the alignment marginalization, and (4) directly minimize the differentiable expected latency loss. Experiments on the Cortana voice search task demonstrate that our proposed methods can significantly reduce the latency, and even improve the recognition accuracy in certain cases on the decoder side. We also present some analysis to understand the behaviors of streaming S2S models.

Journal ArticleDOI
TL;DR: A simple epidemiological model with two infectious stages, where hosts in the first stage can be partially or fully asymptomatic, and proves the uniqueness of interior evolutionarily singular strategies for power-law and exponential trade-offs, which means bistability is always between zero and maximal latency.
Abstract: Pathogens exhibit a rich variety of life history strategies, shaped by natural selection. An important pathogen life history characteristic is the propensity to induce an asymptomatic yet productive (transmissive) stage at the beginning of an infection. This characteristic is subject to complex trade-offs, ranging from immunological considerations to population-level social processes. We aim to classify the evolutionary dynamics of such asymptomatic behavior of pathogens (hereafter “latency”) in order to unify epidemiology and evolution for this life history strategy. We focus on a simple epidemiological model with two infectious stages, where hosts in the first stage can be partially or fully asymptomatic. Immunologically, there is a trade-off between transmission and progression in this first stage. For arbitrary trade-offs, we derive different conditions that guarantee either at least one evolutionarily stable strategy (ESS) at zero, some, or maximal latency of the first stage or, perhaps surprisingly, at least one unstable evolutionarily singular strategy. In this latter case, there is bistability between zero and nonzero (possibly maximal) latency. We then prove the uniqueness of interior evolutionarily singular strategies for power-law and exponential trade-offs: Thus, bistability is always between zero and maximal latency. Overall, previous multistage infection models can be summarized with a single model that includes evolutionary processes acting on latency. Since small changes in parameter values can lead to abrupt transitions in evolutionary dynamics, appropriate disease control strategies could have a substantial impact on the evolution of first-stage latency.