scispace - formally typeset
Search or ask a question

Showing papers on "Latency (engineering) published in 2021"


Journal ArticleDOI
TL;DR: In this paper, a detailed overview of the URLLC features from 5G Release 15 to Release 16 by describing how these features allow meeting ULLLC target requirements in 5G networks is presented.
Abstract: Ultra-reliable low-latency communication (URLLC) has been introduced in 5G new radio for new applications that have strict reliability and latency requirements such as augmented/virtual reality, industrial automation and autonomous vehicles. The first full set of the physical layer design of 5G release, Release 15, was finalized in December 2017. It provided a foundation for URLLC with new features such as flexible sub-carrier spacing, a sub-slot-based transmission scheme, new channel quality indicator, new modulation and coding scheme tables, and configured-grant transmission with automatic repetitions. The second 5G release, Release 16, was finalized in December 2019 and allows achieving improved metrics for latency and reliability to support new use cases of URLLC. A number of new features such as enhanced physical downlink (DL) control channel monitoring capability, new DL control information format, sub-slot physical uplink (UL) control channel transmission, sub-slot-based physical UL shared channel repetition, enhanced mobile broadband and URLLC inter-user-equipment multiplexing with cancellation indication and enhanced power control were standardized. This article provides a detailed overview of the URLLC features from 5G Release 15 to Release 16 by describing how these features allow meeting URLLC target requirements in 5G networks. The ongoing Release 17 targets further enhanced URLLC operation by improving mechanisms such as feedback, intra-user-equipment multiplexing and prioritization of traffic with different priority, support of time synchronization and new quality of service related parameters. In addition, a fundamental feature targeted in URLLC Release 17 is to enable URLLC operation over shared unlicensed spectrum. The potential directions of URLLC research in unlicensed spectrum in Release 17 are presented to serve as a bridge from URLLC in licensed spectrum in Release 16 to URLLC in unlicensed spectrum in Release 17.

88 citations


Journal ArticleDOI
TL;DR: DIET-SNN as mentioned in this paper proposes to optimize the membrane leak and threshold for each layer of the SNN with end-to-end backpropagation to achieve competitive accuracy at reduced latency.
Abstract: Bio-inspired spiking neural networks (SNNs), operating with asynchronous binary signals (or spikes) distributed over time, can potentially lead to greater computational efficiency on event-driven hardware. The state-of-the-art SNNs suffer from high inference latency, resulting from inefficient input encoding, and sub-optimal settings of the neuron parameters (firing threshold, and membrane leak). We propose DIET-SNN, a low latency deep spiking network that is trained with gradient descent to optimize the membrane leak and the firing threshold along with other network parameters (weights). The membrane leak and threshold for each layer of the SNN are optimized with end-to-end backpropagation to achieve competitive accuracy at reduced latency. The analog pixel values of an image are directly applied to the input layer of DIET-SNN without the need to convert to spike-train. The first convolutional layer is trained to convert inputs into spikes where leaky-integrate-and-fire (LIF) neurons integrate the weighted inputs and generate an output spike when the membrane potential crosses the trained firing threshold. The trained membrane leak controls the flow of input information and attenuates irrelevant inputs to increase the activation sparsity in the convolutional and linear layers of the network. The reduced latency combined with high activation sparsity provides large improvements in computational efficiency. We evaluate DIET-SNN on image classification tasks from CIFAR and ImageNet datasets on VGG and ResNet architectures. We achieve top-1 accuracy of 69% with 5 timesteps (inference latency) on the ImageNet dataset with 12x less compute energy than an equivalent standard ANN. Additionally, DIET-SNN performs 20-500x faster inference compared to other state-of-the-art SNN models.

86 citations


Journal ArticleDOI
TL;DR: Simulation results show that the proposed hybrid model can appropriately fit the problem with near-optimal accuracy regarding the offloading decision-making, the latency, and the energy consumption predictions in the proposed self-management framework.

75 citations


Journal ArticleDOI
TL;DR: In this paper, a deep learning framework for enabling proactive handoff in wireless networks is presented. But the authors focus on the use of visual data captured by red-green-blue (RGB) cameras deployed at the base stations.
Abstract: The sensitivity to blockages is a key challenge for millimeter wave and terahertz networks in 5G and beyond. Since these networks mainly rely on line-of-sight (LOS) links, sudden link blockages highly threaten the reliability of the networks. Further, when the LOS link is blocked, the network typically needs to hand off the user to another LOS basestation, which may incur critical time latency, especially if a search over a large codebook of narrow beams is needed. A promising way to tackle the reliability and latency challenges lies in enabling proaction in wireless networks. Proaction allows the network to anticipate future blockages, especially dynamic blockages, and initiate user hand-off beforehand. This article presents a complete machine learning framework for enabling proaction in wireless networks relying on visual data captured, for example, by red-green-blue (RGB) cameras deployed at the base stations. In particular, the article proposes a vision-aided wireless communication solution that utilizes bimodal machine learning to perform proactive blockage prediction and user hand-off. This is mainly achieved via a deep learning algorithm that learns from visual and wireless data how to predict incoming blockages. The predictions of this algorithm are used by the wireless network to proactively initiate hand-off decisions and avoid any unnecessary latency. The algorithm is developed on a vision-wireless dataset generated using the ViWi data-generation framework. Experimental results on two basestations with different cameras indicate that the algorithm is capable of accurately detecting incoming blockages more than ${\sim} 90\%$ of the time. Such blockage prediction ability is directly reflected in the accuracy of proactive hand-off, which also approaches 87%. This highlights a promising direction for enabling high reliability and low latency in future wireless networks.

74 citations


Proceedings ArticleDOI
Xie Chen1, Yu Wu1, Zhenghao Wang1, Shujie Liu1, Jinyu Li1 
06 Jun 2021
TL;DR: In this article, the authors explored the potential of Transformer Transducer (T-T) models for the fist pass decoding with low latency and fast speed on a large-scale dataset.
Abstract: Recently, Transformer based end-to-end models have achieved great success in many areas including speech recognition. However, compared to LSTM models, the heavy computational cost of the Transformer during inference is a key issue to prevent their applications. In this work, we explored the potential of Transformer Transducer (T-T) models for the fist pass decoding with low latency and fast speed on a large-scale dataset. We combine the idea of Transformer- XL and chunk-wise streaming processing to design a streamable Transformer Transducer model. We demonstrate that T-T outperforms the hybrid model, RNN Transducer (RNN-T), and streamable Transformer attention-based encoder-decoder model in the streaming scenario. Furthermore, the runtime cost and latency can be optimized with a relatively small look-ahead.

71 citations


Proceedings ArticleDOI
06 Jun 2021
TL;DR: In this article, the long-range history context is distilled into an augmented memory bank to reduce self-attention's computation complexity, and a cache mechanism saves the computation for the key and value in selfattention for the left context.
Abstract: This paper proposes an efficient memory transformer Emformer for low latency streaming speech recognition. In Emformer, the long-range history context is distilled into an augmented memory bank to reduce self-attention’s computation complexity. A cache mechanism saves the computation for the key and value in self-attention for the left context. Emformer applies a parallelized block processing in training to support low latency models. We carry out experiments on benchmark LibriSpeech data. Under average latency of 960 ms, Emformer gets WER 2.50% on test-clean and 5.62% on test-other. Comparing with a strong baseline augmented memory transformer (AM-TRF), Emformer gets 4.6 folds training speedup and 18% relative real-time factor (RTF) reduction in decoding with relative WER reduction 17% on test-clean and 9% on test-other. For a low latency scenario with an average latency of 80 ms, Emformer achieves WER 3.01% on test-clean and 7.09% on test-other. Comparing with the LSTM baseline with the same latency and model size, Emformer gets relative WER reduction 9% and 16% on test-clean and test-other, respectively.

63 citations


Journal ArticleDOI
TL;DR: In this article, the authors present a possibility to use the federated reinforcement learning (FRL) technique, which is one of the ML techniques, for 5G NR URLLC requirements and summarizes the corresponding achievements.
Abstract: The tactile internet (TI) is believed to be the prospective advancement of the internet of things (IoT), comprising human-to-machine and machine-to-machine communication. TI focuses on enabling real-time interactive techniques with a portfolio of engineering, social, and commercial use cases. For this purpose, the prospective $5^{th}$ generation (5G) technology focuses on achieving ultra-reliable low latency communication (URLLC) services. TI applications require an extraordinary degree of reliability and latency. The $3^{rd}$ generation partnership project (3GPP) defines that URLLC is expected to provide 99.99% reliability of a single transmission of 32 bytes packet with a latency of less than one millisecond. 3GPP proposes to include an adjustable orthogonal frequency division multiplexing (OFDM) technique, called 5G new radio (5G NR), as a new radio access technology (RAT). Whereas, with the emergence of a novel physical layer RAT, the need for the design for prospective next-generation technologies arises, especially with the focus of network intelligence. In such situations, machine learning (ML) techniques are expected to be essential to assist in designing intelligent network resource allocation protocols for 5G NR URLLC requirements. Therefore, in this survey, we present a possibility to use the federated reinforcement learning (FRL) technique, which is one of the ML techniques, for 5G NR URLLC requirements and summarizes the corresponding achievements for URLLC. We provide a comprehensive discussion of MAC layer channel access mechanisms that enable URLLC in 5G NR for TI. Besides, we identify seven very critical future use cases of FRL as potential enablers for URLLC in 5G NR.

62 citations


Journal ArticleDOI
TL;DR: In this paper, a method for designing optimally heterogeneously quantized versions of deep neural network models for minimum energy, high-accuracy, nanosecond inference and fully automated deployment on chip is introduced.
Abstract: Although the quest for more accurate solutions is pushing deep learning research towards larger and more complex algorithms, edge devices demand efficient inference and therefore reduction in model size, latency and energy consumption. One technique to limit model size is quantization, which implies using fewer bits to represent weights and biases. Such an approach usually results in a decline in performance. Here, we introduce a method for designing optimally heterogeneously quantized versions of deep neural network models for minimum-energy, high-accuracy, nanosecond inference and fully automated deployment on chip. With a per-layer, per-parameter type automatic quantization procedure, sampling from a wide range of quantizers, model energy consumption and size are minimized while high accuracy is maintained. This is crucial for the event selection procedure in proton–proton collisions at the CERN Large Hadron Collider, where resources are strictly limited and a latency of $${\mathcal{O}}(1)\,\upmu{\rm{s}}$$ is required. Nanosecond inference and a resource consumption reduced by a factor of 50 when implemented on field-programmable gate array hardware are achieved. With edge computing on custom hardware, real-time inference with deep neural networks can reach the nanosecond timescale. An important application in this regime is event processing at particle collision detectors like those at the Large Hadron Collider (LHC). To ensure high performance as well as reduced resource consumption, a method is developed, and made available as an extension of the Keras library, to automatically design optimal quantization of the different layers in a deep neural network.

60 citations


Journal ArticleDOI
TL;DR: In this paper, a novel experienced deep reinforcement learning (deep-RL) framework is proposed to provide model-free resource allocation for ultra reliable low latency communication (URLLC-6G) in the downlink of a wireless network.
Abstract: In this paper, a novel experienced deep reinforcement learning (deep-RL) framework is proposed to provide model-free resource allocation for ultra reliable low latency communication (URLLC-6G) in the downlink of a wireless network. The goal is to guarantee high end-to-end reliability and low end-to-end latency, under explicit data rate constraints, for each wireless user without any models of or assumptions on the users’ traffic. In particular, in order to enable the deep-RL framework to account for extreme network conditions and operate in highly reliable systems, a new approach based on generative adversarial networks (GANs) is proposed. This GAN approach is used to pre-train the deep-RL framework using a mix of real and synthetic data, thus creating an experienced deep-RL framework that has been exposed to a broad range of network conditions. The proposed deep-RL framework is particularly applied to a multi-user orthogonal frequency division multiple access (OFDMA) resource allocation system. Formally, this URLLC-6G resource allocation problem in OFDMA systems is posed as a power minimization problem under reliability, latency, and rate constraints. To solve this problem using experienced deep-RL, first, the rate of each user is determined. Then, these rates are mapped to the resource block and power allocation vectors of the studied wireless system. Finally, the end-to-end reliability and latency of each user are used as feedback to the deep-RL framework. It is then shown that at the fixed-point of the deep-RL algorithm, the reliability and latency of the users are near-optimal. Moreover, for the proposed GAN approach, a theoretical limit for the generator output is analytically derived. Simulation results show how the proposed approach can achieve near-optimal performance within the rate-reliability-latency region, depending on the network and service requirements. The results also show that the proposed experienced deep-RL framework is able to remove the transient training time that makes conventional deep-RL methods unsuitable for URLLC-6G. Moreover, during extreme conditions, it is shown that the proposed, experienced deep-RL agent can recover instantly while a conventional deep-RL agent takes several epochs to adapt to new extreme conditions.

58 citations


Journal ArticleDOI
TL;DR: The advantages of edge computing over cloud computing to overcome latency and reliability issues in critical applications and the demonstration of aforementioned massive application in the form of a case study on combating COVID-19 situations using 6G-based edge intelligence is discussed.

58 citations


Journal ArticleDOI
TL;DR: A vehicular-edge computing (VEC) fog-enabled scheme allowing offloading intrusion detection tasks to federated vehicle nodes located within nearby formed ad hoc vehicular fog to be cooperatively executed with minimal latency is proposed.
Abstract: Internet of Vehicles and vehicular networks have been compelling targets for malicious security attacks where several intrusion detection solutions have been proposed for protecting them. Nonetheless, their main problem lies in their heavy computation, which makes them unsuitable for next-generation artificial intelligence-powered self-driving vehicles whose computational power needs to be primarily reserved for real-time driving decisions. To address this challenge, several approaches have been lately presented to take advantage of the cloud computing for offloading intrusion detection tasks to central cloud servers, thus reducing storage and processing costs on vehicles. However, centralized cloud computing entails high latency on intrusion detection related data transmission and plays against its adoption in delay-critical intelligent applications. In this context, this article proposes a vehicular-edge computing (VEC) fog-enabled scheme allowing offloading intrusion detection tasks to federated vehicle nodes located within nearby formed ad hoc vehicular fog to be cooperatively executed with minimal latency. The problem has been formulated as a multiobjective optimization model and solved using a genetic algorithm maximizing offloading survivability in the presence of high mobility and minimizing computation execution time and energy consumption. Experiments performed on resource-constrained devices within actual ad hoc fog environment illustrate that our solution significantly reduces the execution time of the detection process while maximizing the offloading survivability under different real-life scenarios.

Journal ArticleDOI
TL;DR: The optimal value of packet length is obtained with the objective of maximizing the ET by applying one-dimensional search in unmanned-aerial-vehicle (UAV) communications.
Abstract: In this paper, we study the average packet error probability (APEP) and effective throughput (ET) of the control link in unmanned-aerial-vehicle (UAV) communications, where the ground central station (GCS) sends control signals to the UAV that requires ultra-reliable and low-latency communications (URLLC). To ensure the low latency, short packets are adopted for the control signal. As a result, the Shannon capacity theorem cannot be adopted here due to its assumption of infinite channel blocklength. We consider both free space (FS) and 3-Dimensional (3D) channel models by assuming that the locations of the UAV are randomly distributed within a restricted space. We first characterize the statistical characteristics of the signal-to-noise ratio (SNR) for both FS and 3D models. Then, the closed-form analytical expressions of APEP and ET are derived by using Gaussian-Chebyshev quadrature. Also, the lower bounds are derived to obtain more insights. Finally, we obtain the optimal value of packet length with the objective of maximizing the ET by applying one-dimensional search. Our analytical results are verified by the Monte-Carlo simulations.

Journal ArticleDOI
TL;DR: A novel live video transcoding and streaming scheme that maximizes the video bitrate and decreases time-delays and bitrate variations in vehicular fog-computing (VFC)-enabled IoV is proposed, by jointly optimizing vehicle scheduling, bitrate selection, and computational/spectrum resource allocation.
Abstract: With the rapid development of automotive industry and telecommunication technologies, live streaming services in the Internet of Vehicles (IoV) play an even more crucial role in vehicular infotainment systems However, it is a big challenge to provide a high quality, low latency, and low bitrate variance live streaming service for vehicles due to the dynamic properties of wireless resources and channels of IoV To solve this challenge, we propose a novel live video transcoding and streaming scheme that maximizes the video bitrate and decreases time-delays and bitrate variations in vehicular fog-computing (VFC)-enabled IoV, by jointly optimizing vehicle scheduling, bitrate selection, and computational/spectrum resource allocation This joint optimization problem is modeled as a Markov decision process (MDP), considering time-varying characteristics of the available resources and wireless channels of IoV A soft actor–critic deep reinforcement learning (DRL) algorithm that is based on the maximum entropy framework, is subsequently utilized to solve the above MDP Extensive simulation results based on the data set of the real world show that compared to other baseline algorithms, the proposed scheme can effectively improve video quality while decreasing latency and bitrate variations, and access excellent performance in terms of learning speed and stability

Journal ArticleDOI
TL;DR: The proposed algorithm outperforms the Full MEC, Full Local and Full Cloud three schemes in terms of execution latency and can quickly and effectively reach the convergence state and getting the minimum task execution delay in the algorithm convergence with different number of tasks.

Journal ArticleDOI
TL;DR: The role of 5G and its impact on supply chain management through a systematic literature review is explored, finding that most studies have focussed on technical features of emerging 5Gand its impact being conceptualised on Industry 4.0 and supply chain processes.
Abstract: The high bandwidth and low latency features of 5G network are perceived to offer a unified platform for multiple device connectivity in real time. Despite the promising benefits of disruptive 5G wi...

Journal ArticleDOI
TL;DR: A digital-twin-enabled model-based scheme is proposed to achieve an intelligent clock synchronization for reducing resource consumption associated with distributed synchronization in fast-changing IIoT environments and a significant enhancement on the clock accuracy is accomplished with dramatically reduced communication resource consumption in networks with different packet delay variations.
Abstract: Tight cooperation among distributively connected equipment and infrastructures of an Industrial-Internet-of-Things (IIoT) system hinges on low latency data exchange and accurate time synchronization within sophisticated networks. However, the temperature-induced clock drift in connected industry facilities constitutes a fundamental challenge for conventional synchronization techniques due to dynamic industrial environments. Furthermore, the variation of packet delivery latency in IIoT networks hinders the reliability of time information exchange, leading to deteriorated clock synchronization performance in terms of synchronization accuracy and network resource consumption. In this article, a digital-twin-enabled model-based scheme is proposed to achieve an intelligent clock synchronization for reducing resource consumption associated with distributed synchronization in fast-changing IIoT environments. By leveraging the digital-twin-enabled clock models at remote locations, required interactions among distributed IIoT facilities to achieve synchronization is dramatically reduced. The virtual clock modeling in advance of the clock calibrations helps to characterize each clock so that its behavior under dynamic operating environments is predictable, which is beneficial to avoiding excessive synchronization-related timestamp exchange. An edge-cloud collaborative architecture is also developed to enhance the overall system efficiency during the development of remote digital-twin models. Simulation results demonstrate that the proposed scheme can create an accurate virtual model remotely for each local clock according to the information gathered. Meanwhile, a significant enhancement on the clock accuracy is accomplished with dramatically reduced communication resource consumption in networks with different packet delay variations.

Journal ArticleDOI
TL;DR: This article proposes a resource representation scheme, allowing each ED to expose its resource information to the supervisor of the edge node through the mobile EC application programming interfaces proposed by the European Telecommunications Standards Institute.
Abstract: Low-latency IoT applications, such as autonomous vehicles, augmented/virtual reality devices, and security applications, require high computation resources to make decisions on the fly. However, these kinds of applications cannot tolerate offloading their tasks to be processed on a cloud infrastructure due to the experienced latency. Therefore, edge computing (EC) is introduced to enable low latency by moving the tasks processing closer to the users at the edge of the network. The edge of the network is characterized by the heterogeneity of edge devices (EDs) forming it; thus, it is crucial to devise novel solutions that take into account the different physical resources of each ED. In this article, we propose a resource representation scheme, allowing each ED to expose its resource information to the supervisor of the edge node through the mobile EC application programming interfaces proposed by the European Telecommunications Standards Institute. The information about the ED resource is exposed to the supervisor of the edge node each time a resource allocation is required. To this end, we leverage a Lyapunov optimization framework to dynamically allocate resources at the EDs. To test our proposed model, we performed intensive theoretical and experimental simulations on a testbed to validate the proposed scheme and its impact on different system’s parameters. The simulations have shown that our proposed approach outperforms other benchmark approaches and provides low latency and optimal resource consumption.

Proceedings ArticleDOI
01 Nov 2021
TL;DR: In this article, the authors present Atoll, a serverless platform that overcomes the challenges of serverless platforms via a ground-up redesign of the control and data planes.
Abstract: With user-facing apps adopting serverless computing, good latency performance of serverless platforms has become a strong fundamental requirement. However, it is difficult to achieve this on platforms today due to the design of their underlying control and data planes that are particularly ill-suited to short-lived functions with unpredictable arrival patterns. We present Atoll, a serverless platform, that overcomes the challenges via a ground-up redesign of the control and data planes. In Atoll, each app is associated with a latency deadline. Atoll achieves its per-app request latency goals by: (a) partitioning the cluster into (semi-global scheduler, worker pool) pairs, (b) performing deadline-aware scheduling and proactive sandbox allocation, and (c) using a load balancing layer to do sandbox-aware routing, and automatically scale the semi-global schedulers per app. Our results show that Atoll reduces missed deadlines by ~66x and tail latencies by ~3x compared to state-of-the-art alternatives.

Journal ArticleDOI
TL;DR: DeepSlicing as mentioned in this paper is a collaborative and adaptive inference system that adapts to various CNNs and supports customized fine-grained scheduling by partitioning both model and data.
Abstract: The booming of Convolutional Neural Networks (CNNs) has empowered lots of computer-vision applications. Due to its stringent requirement for computing resources, substantial research has been conducted on how to optimize its deployment and execution on resource-constrained devices. However, previous works have several weaknesses, including limited support for various CNN structures, fixed scheduling strategies, overlapped computations, high synchronization overheads, etc. In this article, we present DeepSlicing, a collaborative and adaptive inference system that adapts to various CNNs and supports customized flexible fine-grained scheduling. As a built-in functionality, DeepSlicing has supported typical CNNs including GoogLeNet, ResNet, etc. By partitioning both model and data, we also design an efficient scheduler, Proportional Synchronized Scheduler (PSS), which achieves the trade-off between computation and synchronization. Based on PyTorch, we have implemented DeepSlicing on the testbed with real-world edge settings that consists of 8 heterogeneous Raspberry Pi's. The results indicate that DeepSlicing with PSS outperforms the existing systems dramatically, e.g., the inference latency and memory footprint are reduced up to 5.79× and 14.72×, respectively.

Journal ArticleDOI
TL;DR: This work investigates workflow scheduling in fog–cloud environments to provide an energy-efficient task schedule within acceptable application completion times and introduces a scheduling algorithm, Energy Makespan Multi-Objective Optimization, that works in two phases.
Abstract: The rapid evolution of smart services and Internet of Things devices accessing cloud data centers can lead to network congestion and increased latency. Fog computing, focusing on ubiquitously connected heterogeneous devices, addresses latency and privacy requirements of workflows executing at the network edge. However, allocating resources in this paradigm is challenging due to the complex and strict Quality of Service constraints. Moreover, simultaneously optimizing conflicting objectives, e.g., energy consumption and workflow makespan increases the complexity of the scheduling process. We investigate workflow scheduling in fog–cloud environments to provide an energy-efficient task schedule within acceptable application completion times. We introduce a scheduling algorithm, Energy Makespan Multi-Objective Optimization, that works in two phases. First, it models the problem as a multi-objective optimization problem and computes a tradeoff between conflicting objectives while allocating fog and cloud resources, and schedules latency-sensitive tasks (with lower computational requirements) to fog resources and computationally complex tasks (with low latency requirements) on cloud resources. We adapt the Deadline-Aware stepwise Frequency Scaling approach to further reduce energy consumption by utilizing unused time slots between two already scheduled tasks on a single node. Our evaluation using synthesized and real-world applications shows that our approach reduces energy consumption, up to 50%, as compared to existing approaches with minimal impact on completion times.

Proceedings ArticleDOI
06 Jun 2021
TL;DR: FastEmit as mentioned in this paper applies latency regularization directly on per-sequence probability in training transducer models, and does not require any alignment, which is more suitable to the sequence-level optimization of transducers for streaming ASR.
Abstract: Streaming automatic speech recognition (ASR) aims to emit each hypothesized word as quickly and accurately as possible. However, emitting fast without degrading quality, as measured by word error rate (WER), is highly challenging. Existing approaches including Early and Late Penalties [1] and Constrained Alignments [2], [3] penalize emission delay by manipulating per-token or per-frame probability prediction in sequence transducer models [4]. While being successful in reducing delay, these approaches suffer from significant accuracy regression and also require additional word alignment information from an existing model. In this work, we propose a sequence-level emission regularization method, named FastEmit, that applies latency regularization directly on per-sequence probability in training transducer models, and does not require any alignment. We demonstrate that FastEmit is more suitable to the sequence-level optimization of transducer models [4] for streaming ASR by applying it on various end-to-end streaming ASR networks including RNN-Transducer [5], Transformer-Transducer [6], [7], ConvNet-Transducer [8] and Conformer-Transducer [9]. We achieve 150 ~ 300ms latency reduction with significantly better accuracy over previous techniques on a Voice Search test set. FastEmit also improves streaming ASR accuracy from 4.4%/8.9% to 3.1%/7.5% WER, meanwhile reduces 90th percentile latency from 210ms to only 30ms on LibriSpeech.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a rendering-aware tile caching scheme to optimize the end-to-end latency for VR video delivery over multi-cell MEC networks by enabling multiple cell sites to share caches so that the network caching performance is improved.
Abstract: Delivering high fidelity virtual reality (VR) video over mobile networks is very challenging since VR applications usually require very high bandwidth and ultra low latency. With the evolution of 5G mobile networks, multi-cell multi-access edge computing (MEC) networks enable low latency data communication. However, even in this setting, the requirements of VR applications are tough to meet. To optimize the end-to-end latency for VR video delivery over multi-cell MEC networks, we propose a rendering-aware tile caching scheme. As a first step we propose collaborative tile caching in 5G MEC networks by enabling multiple cell sites to share caches so that the network caching performance is improved. Hence, the amount of redundant data delivered and eventually the latency are both significantly reduced. Second, our scheme offloads viewport rendering of VR video to the MEC server and closely couples this with cache placement, allowing thus the rendering-induced latency to be reduced. Finally, a low-delay request routing algorithm is integrated with the proposed cache placement scheme to further optimize the end-to-end latency of VR video delivery. Extensive simulation results show that the proposed rendering-aware caching scheme can achieve better latency performance than the state-of-the-art decoupled caching/rendering schemes.

Journal ArticleDOI
TL;DR: This paper proposes a scalable, responsive, and reliable AI-enabled IoT and edge computing-based healthcare solution with low latency when serving patients with results for end-to-end time, computing, optimization, and transmission latency.

Proceedings ArticleDOI
21 Apr 2021
TL;DR: ChameleonDB as discussed by the authors uses an in-DRAM hash table to bypass LSM-tree's multiple levels for fast reads, which helps avoid long-tail read latency.
Abstract: The emergence of Intel's Optane DC persistent memory (Optane Pmem) draws much interest in building persistent key-value (KV) stores to take advantage of its high throughput and low latency. A major challenge in the efforts stems from the fact that Optane Pmem is essentially a hybrid storage device with two distinct properties. On one hand, it is a high-speed byte-addressable device similar to DRAM. On the other hand, the write to the Optane media is conducted at the unit of 256 bytes, much like a block storage device. Existing KV store designs for persistent memory do not take into account of the latter property, leading to high write amplification and constraining both write and read throughput. In the meantime, a direct re-use of a KV store design intended for block devices, such as LSM-based ones, would cause much higher read latency due to the former property. In this paper, we propose ChameleonDB, a KV store design specifically for this important hybrid memory/storage device by considering and exploiting these two properties in one design. It uses LSM tree structure to efficiently admit writes with low write amplification. It uses an in-DRAM hash table to bypass LSM-tree's multiple levels for fast reads. In the meantime, ChameleonDB may choose to opportunistically maintain the LSM multi-level structure in the background to achieve short recovery time after a system crash. ChameleonDB's hybrid structure is designed to be able to absorb sudden bursts of a write workload, which helps avoid long-tail read latency. Our experiment results show that ChameleonDB improves write throughput by 3.3× and reduces read latency by around 60% compared with a legacy LSM-tree based KV store design. ChameleonDB provides performance competitive even with KV stores using fully in-DRAM index by using much less DRAM space. Compared with CCEH, a persistent hash table design, ChameleonDB provides 6.4× higher write throughput.

Journal ArticleDOI
TL;DR: The hardware implementation of wireless SHARP (w-SHARP), a promising wireless technology for real-time industrial applications that follows the principles of time-sensitive networking and provides time synchronization, time-aware scheduling with bounded latency and high reliability is presented.
Abstract: Real-time Industrial applications in the scope of Industry 4.0. present significant challenges from the communication perspective: low latency, ultra-reliability, and determinism. Given that wireless networks provide a significant cost reduction, lower deployment time, and free movement of the wireless nodes, wireless solutions have attracted the industry attention. However, industrial networks are mostly built by wired means because state-of-the-art wireless networks cannot cope with the industrial applications requirements. In this article, we present the hardware implementation of wireless SHARP (w-SHARP), a promising wireless technology for real-time industrial applications. w-SHARP follows the principles of time-sensitive networking and provides time synchronization, time-aware scheduling with bounded latency and high reliability. The implementation has been carried out on a field-programmable gate array-based software-defined radio platform. We demonstrate, through a hardware testbed, that w-SHARP is able to provide ultra-low control cycles, low latency, and high reliability. This implementation may open new perspectives in the implementation of high-performance industrial wireless networks, as both PHY and MAC layers are now subject to be optimized for specific industrial applications.

Journal ArticleDOI
TL;DR: This paper investigates the novel problem of UAV-aided ultra-reliable low-latency computation offloading which would enable future IoT services with strict requirements and proposes a two-stage approximate algorithm where the two problems are transformed into approximate convex programs.
Abstract: Modern 5G services with stringent reliability and latency requirements such as smart healthcare and industrial automation have become possible through the advancement of Multi-access Edge Computing (MEC). However, the rigidity of ground MEC and its susceptibility to infrastructure failure would prevent satisfying the resiliency and strict requirements of those services. Unmanned Aerial Vehicles (UAVs) have been proposed for providing flexible edge computing capability through UAV-mounted cloudlets, harnessing their advantages such as mobility, low-cost, and line-of-sight communication. However, UAV-mounted cloudlets may have failure rates that would impact mission-critical applications, necessitating a novel study for the provisioned reliability considering UAV node reliability and task redundancy. In this paper, we investigate the novel problem of UAV-aided ultra-reliable low-latency computation offloading which would enable future IoT services with strict requirements. We aim at maximizing the rate of served requests, by optimizing the UAVs’ positions, the offloading decisions, and the allocated resources while respecting the stringent latency and reliability requirements. To do so, the problem is divided into two phases, the first being a planning problem to optimize the placement of UAVs and the second an operational problem to make optimized offloading and resource allocation decisions with constrained UAVs’ energy. We formulate both problems associated with each phase as non-convex mixed-integer programs, and due to their non-convexity, we propose a two-stage approximate algorithm where the two problems are transformed into approximate convex programs. Further, we approach the problem considering the task partitioning model which will be prevalent in 5G networks. Through numerical analysis, we demonstrate the efficiency of our solution considering various scenarios, and compare it to other baseline approaches.

Journal ArticleDOI
TL;DR: Mez as discussed by the authors is a publish-subscribe messaging system for latency sensitive multi-camera machine vision applications at the IoT edge that adapts to channel conditions by dynamically adjusting the video frame quality using the image transformation control knobs.
Abstract: Mez is a novel publish-subscribe messaging system for latency sensitive multi-camera machine vision applications at the IoT Edge The unlicensed wireless communication in IoT Edge systems are characterized by large latency variations due to intermittent channel interference To achieve user specified latency in the presence of wireless channel interference, Mez takes advantage of the ability of machine vision applications to temporarily tolerate lower quality video frames if overall application accuracy is not too adversely affected Control knobs that involve lossy image transformation techniques that modify the frame size, and thereby the video frame transfer latency, are identified Mez implements a network latency feedback controller that adapts to channel conditions by dynamically adjusting the video frame quality using the image transformation control knobs, so as to simultaneously satisfy latency and application accuracy requirements Additionally, Mez uses an application domain specific design of the storage layer to provide low latency operations Experimental evaluation on an IoT Edge testbed with a pedestrian detection machine vision application indicates that Mez is able to tolerate latency variations of up to 10x with a worst-case reduction of 42% of the application accuracy F1 score metric The performance of Mez is also experimentally evaluated against state-of-the-art low latency NATS messaging system

Journal ArticleDOI
TL;DR: A unique fusion of Deep Learning based mobility prediction and Genetic Algorithm assisted service orchestration is devised to retain the average service latency minimal by offering personalized service migration, while tightly packing as many services as possible in the edge of the network, for maximizing resource utilization.
Abstract: As technology progresses, cars can not only be considered as a transportation medium but also as an intelligent part of the cellular network that generates highly valuable data and offers both entertainment and security services to the passengers. Therefore, forthcoming 5G networks are said to enhance Ultra-Reliable Ultra-Low-Latency that will allow for a new breed of services that will disrupt the industry as we know it today. In this work, we devise a unique fusion of Deep Learning based mobility prediction and Genetic Algorithm assisted service orchestration to retain the average service latency minimal by offering personalized service migration, while tightly packing as many services as possible in the edge of the network, for maximizing resource utilization. Through an extensive simulation based on real data, we evaluate the proposed mobility orchestration combination and we find gains in low latency in all examined scenarios.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed an update-importance-based client scheduling scheme to reduce the training time of federated learning (FL) by exploiting the update importance of edge devices.
Abstract: Motivated by the ever-increasing demands for massive data processing and intelligent data analysis at the network edge, federated learning (FL), a distributed architecture for machine learning, has been introduced to enhance edge intelligence without compromising data privacy. Nonetheless, due to the large number of edge devices (referred to as clients in FL) with only limited wireless resources, client scheduling, which chooses only a subset of devices to participate in each round of FL, becomes a more feasible option. Unfortunately, the training latency can be intolerable in the iterative process of FL. To tackle the challenge, this article introduces update-importance-based client scheduling schemes to reduce the required number of rounds. Then latency-based client scheduling schemes are proposed to shorten the time interval for each round. We consider the scenario where no prior information regarding the channel state and the resource usage of the devices is available, and propose a scheme based on the multi-armed bandit theory to strike a balance between exploration and exploitation. Finally, we propose a latency-based technique that exploits update importance to reduce the training time. Computer simulation results are presented to evaluate the convergence rate with respect to the rounds and wall-clock time consumption.

Proceedings ArticleDOI
Siyu Yan1, Xiaoliang Wang2, Zheng Xiaolong1, Xia Yinben1, Derui Liu2, Deng Weishan1 
09 Aug 2021
TL;DR: In this paper, the authors report the design and implementation of an automatic run-time optimization scheme, which leverages the multi-agent reinforcement learning technique to dynamically adjust the marking threshold at each switch.
Abstract: For the widely deployed ECN-based congestion control schemes, the marking threshold is the key to deliver high bandwidth and low latency. However, due to traffic dynamics in the high-speed production networks, it is difficult to maintain persistent performance by using the static ECN setting. To meet the operational challenge, in this paper we report the design and implementation of an automatic run-time optimization scheme, ACC, which leverages the multi-agent reinforcement learning technique to dynamically adjust the marking threshold at each switch. The proposed approach works in a distributed fashion and combines offline and online training to adapt to dynamic traffic patterns. It can be easily deployed based on the common features supported by major commodity switching chips. Both testbed experiments and large-scale simulations have shown that ACC achieves low flow completion time (FCT) for both mice flows and elephant flows at line-rate. Under heterogeneous production environments with 300 machines, compared with the well-tuned static ECN settings, ACC achieves up to 20\% improvement on IOPS and 30\% lower FCT for storage service. ACC has been applied in high-speed datacenter networks and significantly simplifies the network operations.