scispace - formally typeset
Search or ask a question

Showing papers on "Latency (engineering) published in 2020"


Journal ArticleDOI
TL;DR: The problem of joint power and resource allocation (JPRA) for ultra-reliable low-latency communication (URLLC) in vehicular networks is studied and a novel distributed approach based on federated learning (FL) is proposed to estimate the tail distribution of the queues.
Abstract: In this paper, the problem of joint power and resource allocation (JPRA) for ultra-reliable low-latency communication (URLLC) in vehicular networks is studied. Therein, the network-wide power consumption of vehicular users (VUEs) is minimized subject to high reliability in terms of probabilistic queuing delays. Using extreme value theory (EVT), a new reliability measure is defined to characterize extreme events pertaining to vehicles’ queue lengths exceeding a predefined threshold. To learn these extreme events, assuming they are independently and identically distributed over VUEs, a novel distributed approach based on federated learning (FL) is proposed to estimate the tail distribution of the queue lengths. Considering the communication delays incurred by FL over wireless links, Lyapunov optimization is used to derive the JPRA policies enabling URLLC for each VUE in a distributed manner. The proposed solution is then validated via extensive simulations using a Manhattan mobility model. Simulation results show that FL enables the proposed method to estimate the tail distribution of queues with an accuracy that is close to a centralized solution with up to 79% reductions in the amount of exchanged data. Furthermore, the proposed method yields up to 60% reductions of VUEs with large queue lengths, while reducing the average power consumption by two folds, compared to an average queue-based baseline.

353 citations


Journal ArticleDOI
Jun Du1, Chunxiao Jiang1, Jian Wang1, Yong Ren1, Merouane Debbah2 
TL;DR: Some state-of-the-art techniques based on AI/ML and their applications in 6G to support ultrabroadband, ultramassive access, and ultrareliable and lowlatency services are surveyed.
Abstract: To satisfy the expected plethora of demanding services, the future generation of wireless networks (6G) has been mandated as a revolutionary paradigm to carry forward the capacities of enhanced broadband, massive access, and ultrareliable and lowlatency service in 5G wireless networks to a more powerful and intelligent level. Recently, the structure of 6G networks has tended to be extremely heterogeneous, densely deployed, and dynamic. Combined with tight quality of service (QoS), such complex architecture will result in the untenability of legacy network operation routines. In response, artificial intelligence (AI), especially machine learning (ML), is emerging as a fundamental solution to realize fully intelligent network orchestration and management. By learning from uncertain and dynamic environments, AI-/ML-enabled channel estimation and spectrum management will open up opportunities for bringing the excellent performance of ultrabroadband techniques, such as terahertz communications, into full play. Additionally, challenges brought by ultramassive access with respect to energy and security can be mitigated by applying AI-/ML-based approaches. Moreover, intelligent mobility management and resource allocation will guarantee the ultrareliability and low latency of services. Concerning these issues, this article introduces and surveys some state-of-the-art techniques based on AI/ML and their applications in 6G to support ultrabroadband, ultramassive access, and ultrareliable and lowlatency services.

140 citations


Journal ArticleDOI
TL;DR: In this paper, the authors considered a scenario where the central controller transmits different packets to a robot and an actuator, where the actuator is located far from the controller, and the robot can move between the controller and the actuators.
Abstract: Ultra-reliable and low-latency communication (URLLC) is one of three pillar applications defined in the fifth generation new radio (5G NR), and its research is still in its infancy due to the difficulties in guaranteeing extremely high reliability (say 10−9 packet loss probability) and low latency (say 1 ms) simultaneously. In URLLC, short packet transmission is adopted to reduce latency, such that conventional Shannon’s capacity formula is no longer applicable, and the achievable data rate in finite blocklength becomes a complex expression with respect to the decoding error probability and the blocklength. To provide URLLC service in a factory automation scenario, we consider that the central controller transmits different packets to a robot and an actuator, where the actuator is located far from the controller, and the robot can move between the controller and the actuator. In this scenario, we consider four fundamental downlink transmission schemes, including orthogonal multiple access (OMA), non-orthogonal multiple access (NOMA), relay-assisted, and cooperative NOMA (C-NOMA) schemes. For all these transmission schemes, we aim for jointly optimizing the blocklength and power allocation to minimize the decoding error probability of the actuator subject to the reliability requirement of the robot, the total energy constraints, as well as the latency constraints. We further develop low-complexity algorithms to address the optimization problems for each transmission scheme. For the general case with more than two devices, we also develop a low-complexity efficient algorithm for the OMA scheme. Our results show that the relay-assisted transmission significantly outperforms the OMA scheme, while the NOMA scheme performs well when the blocklength is very limited. We further show that the relay-assisted transmission has superior performance over the C-NOMA scheme due to larger feasible region of the former scheme.

134 citations


Proceedings ArticleDOI
08 Jun 2020
TL;DR: In this paper, the authors focus on the downlink of a RIS-assisted multi-user MISO communication system and present a method based on the PARAllel FACtor (PARAFAC) decomposition to unfold the resulting cascaded channel model.
Abstract: Reconfigurable Intelligent Surfaces (RISs) have been recently considered as an energy-efficient solution for future wireless networks due to their fast and low power configuration enabling massive connectivity and low latency communications. Channel estimation in RIS-based systems is one of the most critical challenges due to the large number of reflecting unit elements and their distinctive hardware constraints. In this paper, we focus on the downlink of a RIS-assisted multi-user Multiple Input Single Output (MISO) communication system and present a method based on the PARAllel FACtor (PARAFAC) decomposition to unfold the resulting cascaded channel model. The proposed method includes an alternating least squares algorithm to iteratively estimate the channel between the base station and RIS, as well as the channels between RIS and users. Our selective simulation results show that the proposed iterative channel estimation method outperforms a benchmark scheme using genie-aided information. We also provide insights on the impact of different RIS settings on the proposed algorithm.

98 citations


Proceedings Article
01 Jun 2020
TL;DR: This work adopts a principled design methodology to successively build a fully distributed model serving system that achieves predictable end-to-end performance and demonstrates that Clockwork exploits predictable execution times to achieve tight request- level service-level objectives (SLOs) as well as a high degree of request-level performance isolation.
Abstract: Machine learning inference is becoming a core building block for interactive web applications. As a result, the underlying model serving systems on which these applications depend must consistently meet low latency targets. Existing model serving architectures use well-known reactive techniques to alleviate common-case sources of latency, but cannot effectively curtail tail latency caused by unpredictable execution times. Yet the underlying execution times are not fundamentally unpredictable - on the contrary we observe that inference using Deep Neural Network (DNN) models has deterministic performance. Here, starting with the predictable execution times of individual DNN inferences, we adopt a principled design methodology to successively build a fully distributed model serving system that achieves predictable end-to-end performance. We evaluate our implementation, Clockwork, using production trace workloads, and show that Clockwork can support thousands of models while simultaneously meeting 100ms latency targets for 99.9999% of requests. We further demonstrate that Clockwork exploits predictable execution times to achieve tight request-level service-level objectives (SLOs) as well as a high degree of request-level performance isolation.

83 citations


Posted Content
TL;DR: A quantification of the risk for an unreliable VR performance is conducted through a novel and rigorous characterization of the tail of the end-to-end (E2E) delay, and system reliability for scenarios with guaranteed line-of-sight (LoS) is derived as a function of THz network parameters after deriving a novel expression for the probability distribution function of the THz transmission delay.
Abstract: Wireless virtual reality (VR) imposes new visual and haptic requirements that are directly linked to the quality-of-experience (QoE) of VR users. These QoE requirements can only be met by wireless connectivity that offers high-rate and high-reliability low latency communications (HRLLC), unlike the low rates usually considered in vanilla ultra-reliable low latency communication scenarios. The high rates for VR over short distances can only be supported by an enormous bandwidth, which is available in terahertz (THz) wireless networks. Guaranteeing HRLLC requires dealing with the uncertainty that is specific to the THz channel. To explore the potential of THz for meeting HRLLC requirements, a quantification of the risk for an unreliable VR performance is conducted through a novel and rigorous characterization of the tail of the end-to-end (E2E) delay. Then, a thorough analysis of the tail-value-at-risk (TVaR) is performed to concretely characterize the behavior of extreme wireless events crucial to the real-time VR experience. System reliability for scenarios with guaranteed line-of-sight (LoS) is then derived as a function of THz network parameters after deriving a novel expression for the probability distribution function of the THz transmission delay. Numerical results show that abundant bandwidth and low molecular absorption are necessary to improve the reliability. However, their effect remains secondary compared to the availability of LoS, which significantly affects the THz HRLLC performance. In particular, for scenarios with guaranteed LoS, a reliability of 99.999% (with an E2E delay threshold of 20 ms) for a bandwidth of 15 GHz can be achieved by the THz network, compared to a reliability of 96% for twice the bandwidth, when blockages are considered.

79 citations


Proceedings Article
01 Jan 2020
TL;DR: Opera is presented, a dynamic network that delivers latency-sensitive traffic quickly by relying on multi-hop forwarding in the same way as expander-graph-based approaches, but provides near-optimal bandwidth for bulk flows through direct forwarding over time-varying source-to-destination circuits.
Abstract: Datacenters need networks that support both low-latency and high-bandwidth packet delivery to meet the stringent requirements of modern applications. We present Opera, a dynamic network that delivers latency-sensitive traffic quickly by relying on multi-hop forwarding in the same way as expander-graph-based approaches, but provides near-optimal bandwidth for bulk flows through direct forwarding over time-varying source-to-destination circuits. The key to Opera's design is the rapid and deterministic reconfiguration of the network, piece-by-piece, such that at any moment in time the network implements an expander graph, yet, integrated across time, the network provides bandwidth-efficient single-hop paths between all racks. We show that Opera supports low-latency traffic with flow completion times comparable to cost-equivalent static topologies, while delivering up to 4x the bandwidth for all-to-all traffic and supporting 60% higher load for published datacenter workloads.

64 citations


Journal ArticleDOI
TL;DR: An optimization problem is formulated that maximizes the number of URLLC services supported by the system by optimizing time and frequency resources and the prediction horizon, and shows that the tradeoff between user experienced delay and reliability can be improved significantly via prediction and communication co-design.
Abstract: Ultra-reliable and low-latency communications (URLLC) are considered as one of three new application scenarios in the fifth generation cellular networks. In this work, we aim to reduce the user experienced delay through prediction and communication co-design, where each mobile device predicts its future states and sends them to a data center in advance. Since predictions are not error-free, we consider prediction errors and packet losses in communications when evaluating the reliability of the system. Then, we formulate an optimization problem that maximizes the number of URLLC services supported by the system by optimizing time and frequency resources and the prediction horizon. Simulation results verify the effectiveness of the proposed method, and show that the tradeoff between user experienced delay and reliability can be improved significantly via prediction and communication co-design. Furthermore, we carried out an experiment on the remote control in a virtual factory, and validated our concept on prediction and communication co-design with the practical mobility data generated by a real tactile device.

58 citations


Journal ArticleDOI
18 Mar 2020
TL;DR: The proposed algorithm, based on stochastic optimization, strikes an optimal balance between the service delay and the energy spent at the mobile device, while guaranteeing a target out-of-service probability, in a multi-access edge computing scenario.
Abstract: The goal of this work is to propose an energy-efficient algorithm for dynamic computation offloading, in a multi-access edge computing scenario, where multiple mobile users compete for a common pool of radio and computational resources. We focus on delay-critical applications, incorporating an upper bound on the probability that the overall time required to send the data and process them exceeds a prescribed value. In a dynamic setting, the above constraint translates into preventing the sum of the communication and computation queues’ lengths from exceeding a given value. Ultra-reliable low latency communications (URLLC) are also taken into account using finite blocklengths and reliability constraints. The proposed algorithm, based on stochastic optimization, strikes an optimal balance between the service delay and the energy spent at the mobile device, while guaranteeing a target out-of-service probability. Starting from a long-term average optimization problem, our algorithm is based on the solution of a convex problem in each time slot, which is provided with a very fast iterative strategy. Finally, we extend the approach to mobile devices having energy harvesting capabilities, typical of Internet of Things scenarios, thus devising an energy efficient dynamic offloading strategy that stabilizes the battery level of each device around a prescribed operating level.

57 citations


Journal ArticleDOI
TL;DR: To reduce the traffic load of backhaul and transmission latency from the remote cloud, this study uses Q-learning to design the cache mechanism and propose an action selection strategy for the cache problem through reinforcement learning to find the appropriate cache state.

54 citations


Journal ArticleDOI
TL;DR: This work considers a fundamental problem of NFV-enabled multicasting in a mobile edge cloud, where each multicast request has both service function chain and end-to-end delay requirements, and devise an approximation algorithm with a provable approximation ratio and an efficient heuristic.
Abstract: Stringent delay requirements of many mobile applications have led to the development of mobile edge clouds, to offer low latency network services at the network edges. Most conventional network services are implemented via hardware-based network functions, including firewalls and load balancers, to guarantee service security and performance. However, implementing hardware-based network functions usually incurs both a high capital expenditure (CAPEX) and operating expenditure (OPEX). Network Function Virtualization (NFV) exhibits a potential to reduce CAPEX and OPEX significantly, by deploying software-based network functions in virtual machines (VMs) on edge-clouds. We consider a fundamental problem of NFV-enabled multicasting in a mobile edge cloud, where each multicast request has both service function chain and end-to-end delay requirements. Specifically, each multicast request requires chaining of a sequence of network functions (referred to as a service function chain) from a source to a set of destinations within specified end-to-end delay requirements. We devise an approximation algorithm with a provable approximation ratio for a single multicast request admission if its delay requirement is negligible; otherwise, we propose an efficient heuristic. Furthermore, we also consider admissions of a given set of the delay-aware NFV-enabled multicast requests, for which we devise an efficient heuristic such that the system throughput is maximized, while the implementation cost of admitted requests is minimized. We finally evaluate the performance of the proposed algorithms in a real test-bed, and experimental results show that our algorithms outperform other similar approaches reported in literature.

Journal ArticleDOI
TL;DR: A distributed learning-based AMC (DistAMC) method is proposed, which relies on the cooperation of multiple edge devices and model averaging (MA) algorithm, which has two advantages: the higher training efficiency and the lower computing overhead, which are very consistent with the characteristics of edge devices.
Abstract: Automatic modulation classification (AMC) is a typical technology for identifying different modulation types, which has been widely applied into various scenarios. Recently, deep learning (DL), one of the most advanced classification algorithms, has been applied into AMC. However, these previously proposed AMC methods are centralized in nature, i.e., all training data must be collected together to train the same neural network. In addition, they are generally based on powerful computing devices and may not be suitable for edge devices. Thus, a distributed learning-based AMC (DistAMC) method is proposed, which relies on the cooperation of multiple edge devices and model averaging (MA) algorithm. When compared with the centralized AMC (CentAMC), there are two advantages of the DistAMC: the higher training efficiency and the lower computing overhead, which are very consistent with the characteristics of edge devices. Simulation results show that there are slight performance gap between the DistAMC and the CentAMC, and they also have similar convergence speed, but the consumed training time per epoch in the former method will be shorter than that on the latter method, if the low latency and the high bandwidth are considered in model transmission process of the DistAMC. Moreover, the DistAMC can combine the computing power of multiple edge devices to reduce the computing overhead of a single edge device in the CentAMC.

Journal ArticleDOI
TL;DR: This work collects more than 200 packet traces under different application settings and network conditions from a broadband network to poor mobile network conditions, for 3 cloud gaming services, namely Stadia from Google, GeForce Now from NVIDIA and PS Now from Sony, and analyses the employed protocols and the workload that they impose on the network.
Abstract: Cloud gaming is a new class of services that promises to revolutionize the videogame market. It allows the user to play a videogame with basic equipment while using a remote server for the actual execution. The multimedia content is streamed through the network from the server to the user. This service requires low latency and a large bandwidth to work properly with low response time and high-definition video. Three of the leading tech companies, (Google, Sony and NVIDIA) entered this market with their own products, and others, like Microsoft and Amazon, are planning to launch their own platforms in the near future. However, these companies released so far little information about their cloud gaming operation and how they utilize the network. In this work, we study these new cloud gaming services from the network point of view. We collect more than 200 packet traces under different application settings and network conditions for 3 cloud gaming services, namely Stadia from Google, GeForce Now from NVIDIA and PS Now from Sony. We analyze the employed protocols and the workload they impose on the network. We find that GeForce Now and Stadia use the RTP protocol to stream the multimedia content, with the latter relying on the standard WebRTC APIs. They result in bandwidth-hungry and consume up to 45 Mbit/s, depending on the network and video quality. PS Now instead uses only undocumented protocols and never exceeds 13 Mbit/s.

Journal ArticleDOI
TL;DR: A novel adaptive power storage replica management system, named PARMS, based on stochastic configuration networks (SCNs), in which the network traffic and the data center (DC) geodistribution are taken into consideration to improve data real-time processing.
Abstract: In the power industry, processing business big data from geographically distributed locations, such as online line-loss analysis, has emerged as an important application. How to achieve highly efficient big data storage to meet the requirements of low latency processing applications is quite challenging. In this paper, we propose a novel adaptive power storage replica management system, named PARMS, based on stochastic configuration networks (SCNs), in which the network traffic and the data center (DC) geodistribution are taken into consideration to improve data real-time processing. First, as a fast learning model with less computation burden and sound prediction performance, the SCN model is employed to estimate the traffic state of power data networks. Then, a series of data replica management algorithms is proposed to lower the effects of limited bandwidths and a fixed underlying infrastructure. Finally, the proposed PARMS is implemented using data-parallel computing frameworks (DCFs) for the power industry. Experiments are carried out in an electric power corporation of 230 million users, China Southern power grid, and the results show that our proposed solution can deal with power big data storage efficiently and the job completion times across geodistributed DCs are reduced by 12.19% on average.

Journal ArticleDOI
01 Dec 2020
TL;DR: A Deep Reinforcement Learning (DRL) based joint computation offloading and resource allocation scheme that achieves a suboptimal solution in F-RANs is proposed and results show that the proposed approach significantly minimizes latency and increases throughput in the system.
Abstract: Fog Radio Access Networks (F-RANs) have been considered a groundbreaking technique to support the services of Internet of Things by leveraging edge caching and edge computing. However, the current contributions in computation offloading and resource allocation are inefficient; moreover, they merely consider the static communication mode, and the increasing demand for low latency services and high throughput poses tremendous challenges in F-RANs. A joint problem of mode selection, resource allocation, and power allocation is formulated to minimize latency under various constraints. We propose a Deep Reinforcement Learning (DRL) based joint computation offloading and resource allocation scheme that achieves a suboptimal solution in F-RANs. The core idea of the proposal is that the DRL controller intelligently decides whether to process the generated computation task locally at the device level or offload the task to a fog access point or cloud server and allocates an optimal amount of computation and power resources on the basis of the serving tier. Simulation results show that the proposed approach significantly minimizes latency and increases throughput in the system.

Journal ArticleDOI
TL;DR: The experimental outcomes show the better performance of the developed protocol in terms of high packets delivery ratio (PDR) and network throughput (NT) with low latency and energy consumption (EC) compared to existing routing protocols in UWSNs.
Abstract: Energy-efficient and reliable data gathering using highly stable links in underwater wireless sensor networks (UWSNs) is challenging because of time and location-dependent communication characteristics of the acoustic channel. In this paper, we propose a novel dynamic firefly mating optimization inspired routing scheme called FFRP for the internet of UWSNs-based events monitoring applications. The proposed FFRP scheme during the events data gathering employs a self-learning based dynamic firefly mating optimization intelligence to find the highly stable and reliable routing paths to route packets around connectivity voids and shadow zones in UWSNs. The proposed scheme during conveying information minimizes the high energy consumption and latency issues by balancing the data traffic load evenly in a large-scale network. In additions, the data transmission over highly stable links between acoustic nodes increases the overall packets delivery ratio and network throughput in UWSNs. Several simulation experiments are carried out to verify the effectiveness of the proposed scheme against the existing schemes through NS2 and AquaSim 2.0 in UWSNs. The experimental outcomes show the better performance of the developed protocol in terms of high packets delivery ratio (PDR) and network throughput (NT) with low latency and energy consumption (EC) compared to existing routing protocols in UWSNs.

Journal ArticleDOI
TL;DR: The main limitations of CSI at the transmitter side (CSIT)-based designs toward the 6G era are overviewed, how to design and use efficient CSIT-limited schemes that allow meeting the new and stringent requirements, and some key research directions are highlighted.
Abstract: Channel state information (CSI) has been a key component in traditional wireless communication systems. This might no longer hold in future networks supporting services with stringent quality of service constraints such as extremely low latency, low energy, and/or large number of simultaneously connected devices, where acquiring CSI would become extremely costly or even impractical. We overview the main limitations of CSI at the transmitter side (CSIT)-based designs toward the 6G era, assess how to design and use efficient CSIT-limited schemes that allow meeting the new and stringent requirements, and highlight some key research directions. We delve into the ideas of efficiently allocating pilot sequences, relying on statistical CSIT and/or using location-based strategies, and demonstrate viability via a selected use case.

Journal ArticleDOI
TL;DR: ElasticFog, which runs on top of the Kubernetes platform and enables real-time elastic resource provisioning for containerized applications in Fog computing, collects network traffic information in real time and allocates computational resources proportionally to the distribution of network traffic.
Abstract: The recent increase in the number of Internet of Things (IoT) devices has led to the generation of a large amount of data. These data are generally processed by cloud servers because of their high scalability and ability to provide resources on demand. However, processing large amounts of data in the cloud is an impractical solution for the strict requirements of IoT services, such as low latency and high bandwidth. Fog computing, which brings computational resources closer to the IoT devices, has emerged as a suitable solution to mitigate these problems. Resource provisioning and application orchestration are two of the key challenges when running IoT applications in a Fog computing environment. In this article, we present ElasticFog, which runs on top of the Kubernetes platform and enables real-time elastic resource provisioning for containerized applications in Fog computing. Specifically, ElasticFog collects network traffic information in real time and allocates computational resources proportionally to the distribution of network traffic. The experimental results prove that ElasticFog achieves a significant improvement in terms of throughput and network latency compared with the default mechanism in Kubernetes.

Journal ArticleDOI
TL;DR: In this article, the authors investigated the content service provision of information-centric vehicular networks (ICVNs) from the aspect of mobile edge caching, considering the dynamic driving-related context information.
Abstract: In this paper, the content service provision of information-centric vehicular networks (ICVNs) is investigated from the aspect of mobile edge caching, considering the dynamic driving-related context information. To provide up-to-date information with low latency, two schemes are designed for cache update and content delivery at the roadside units (RSUs). The roadside unit centric (RSUC) scheme decouples cache update and content delivery through bandwidth splitting, where the cached content items are updated regularly in a round-robin manner. The request adaptive (ReA) scheme updates the cached content items upon user requests with certain probabilities. The performance of both proposed schemes are analyzed, whereby the average age of information (AoI) and service latency are derived in closed forms. Surprisingly, the AoI-latency trade-off does not always exist, and frequent cache update can degrade both performances. Thus, the RSUC and ReA schemes are further optimized to balance the AoI and latency. Extensive simulations are conducted on SUMO and OMNeT++ simulators, and the results show that the proposed schemes can reduce service latency by up to 80% while guaranteeing content freshness in heavily loaded ICVNs.

Proceedings ArticleDOI
16 Nov 2020
TL;DR: ApproxDet is introduced, an adaptive video object detection framework for mobile devices to meet accuracy-latency requirements in the face of changing content and resource contention scenarios and is able to adapt to a wide variety of contention and content characteristics and outshines all baselines.
Abstract: Advanced video analytic systems, including scene classification and object detection, have seen widespread success in various domains such as smart cities and autonomous systems. With an evolution of heterogeneous client devices, there is incentive to move these heavy video analytics workloads from the cloud to mobile devices for low latency and real-time processing and to preserve user privacy. However, most video analytic systems are heavyweight and are trained offline with some pre-defined latency or accuracy requirements. This makes them unable to adapt at runtime in the face of three types of dynamism --- the input video characteristics change, the amount of compute resources available on the node changes due to co-located applications, and the user's latency-accuracy requirements change. In this paper we introduce ApproxDet, an adaptive video object detection framework for mobile devices to meet accuracy-latency requirements in the face of changing content and resource contention scenarios. To achieve this, we introduce a multi-branch object detection kernel, which incorporates a data-driven modeling approach on the performance metrics, and a latency SLA-driven scheduler to pick the best execution branch at runtime. We evaluate ApproxDet on a large benchmark video dataset and compare quantitatively to AdaScale and YOLOv3. We find that ApproxDet is able to adapt to a wide variety of contention and content characteristics and outshines all baselines, e.g., it achieves 52% lower latency and 11.1% higher accuracy over YOLOv3. Our software is open-sourced at https://github.com/purdue-dcsl/ApproxDet.

Proceedings ArticleDOI
25 Oct 2020
TL;DR: Experimental results show that the acoustic model can produce feature sequences with minimal latency about 31 times faster than real-time on a computer CPU and 6.5 times on a mobile CPU, enabling it to meet the conditions required for real- time applications on both devices.
Abstract: This paper presents an end-to-end text-to-speech system with low latency on a CPU, suitable for real-time applications. The system is composed of an autoregressive attention-based sequence-to-sequence acoustic model and the LPCNet vocoder for waveform generation. An acoustic model architecture that adopts modules from both the Tacotron 1 and 2 models is proposed, while stability is ensured by using a recently proposed purely location-based attention mechanism, suitable for arbitrary sentence length generation. During inference, the decoder is unrolled and acoustic feature generation is performed in a streaming manner, allowing for a nearly constant latency which is independent from the sentence length. Experimental results show that the acoustic model can produce feature sequences with minimal latency about 31 times faster than real-time on a computer CPU and 6.5 times on a mobile CPU, enabling it to meet the conditions required for real-time applications on both devices. The full end-to-end system can generate almost natural quality speech, which is verified by listening tests.

Proceedings ArticleDOI
TL;DR: An efficient training algorithm is described that guarantees the correctness of the RQ-RMI-based classification, and a novel approach, NuevoMatch, is presented, which improves the memory scaling of existing methods.
Abstract: Multi-field packet classification is a crucial component in modern software-defined data center networks. To achieve high throughput and low latency, state-of-the-art algorithms strive to fit the rule lookup data structures into on-die caches; however, they do not scale well with the number of rules. We present a novel approach, NuevoMatch, which improves the memory scaling of existing methods. A new data structure, Range Query Recursive Model Index (RQ-RMI), is the key component that enables NuevoMatch to replace most of the accesses to main memory with model inference computations. We describe an efficient training algorithm that guarantees the correctness of the RQ-RMI-based classification. The use of RQ-RMI allows the rules to be compressed into model weights that fit into the hardware cache. Further, it takes advantage of the growing support for fast neural network processing in modern CPUs, such as wide vector instructions, achieving a rate of tens of nanoseconds per lookup. Our evaluation using 500K multi-field rules from the standard ClassBench benchmark shows a geometric mean compression factor of 4.9x, 8x, and 82x, and average performance improvement of 2.4x, 2.6x, and 1.6x in throughput compared to CutSplit, NeuroCuts, and TupleMerge, all state-of-the-art algorithms.

Proceedings ArticleDOI
30 Jul 2020
TL;DR: Neutrino is designed, a cellular control plane that provides users an abstraction of reliable access to cellular services while ensuring lower latency, and how these improvements translate into improving end-user application performance is shown.
Abstract: 5G networks aim to provide ultra-low latency and higher reliability to support emerging and near real-time applications such as augmented and virtual reality, remote surgery, self-driving cars, and multi-player online gaming. This imposes new requirements on the design of cellular core networks. A key component of the cellular core is the control plane. Time to complete control plane operations (e.g. mobility handoff, service establishment) directly impacts the delay experienced by end-user applications. In this paper, we design Neutrino, a cellular control plane that provides users an abstraction of reliable access to cellular services while ensuring lower latency. Our testbed evaluations based on real cellular control traffic traces show that Neutrino provides an improvement in control procedure completion times by up to 3.1x without failures, and up to 5.6x under control plane failures, over existing cellular core proposals. We also show how these improvements translate into improving end-user application performance: for AR/VR applications and self-driving cars, Neutrino performs up to 2.5x and up to 2.8x better, respectively, as compared to existing EPC.

Journal ArticleDOI
01 Aug 2020
TL;DR: This demonstration presents Apache IoTDB managing time-series data to enable new classes of IoT applications and shows how IoTDB handles time- series data in real-time and supports advanced analytics by integrating with Hadoop and Spark.
Abstract: The amount of time-series data that is generated has exploded due to the growing popularity of Internet of Things (IoT) devices and applications. These applications require efficient management of the time-series data on both the edge and cloud side that support high throughput ingestion, low latency query and advanced time series analysis. In this demonstration, we present Apache IoTDB managing time-series data to enable new classes of IoT applications. IoTDB has both edge and cloud versions, provides an optimized columnar file format for efficient time-series data storage, and time-series database with high ingestion rate, low latency queries and data analysis support. It is specially optimized for time-series oriented operations like aggregations query, down-sampling and sub-sequence similarity search. An edge-to-cloud time-series data management application is chosen to demonstrate how IoTDB handles time-series data in real-time and supports advanced analytics by integrating with Hadoop and Spark. An end-to-end IoT data management solution is shown by integrating IoTDB with PLC4x, Calcite, and Grafana.

Book ChapterDOI
01 Jan 2020
TL;DR: This chapter introduces fog and edge computing as a model in which computing power moves toward the sources where the data are generated, including contextual location awareness and low latency.
Abstract: Thanks to innovations like the Internet of Things and autonomous driving, millions of new devices, sensors, and applications will be going online in the near future. They will generate huge amounts of data, which connected technologies will have to be able to handle. Measuring, monitoring, analyzing, processing, and reacting are just a few examples of tasks involving the vast quantities of data that these devices, sensors, and applications will generate. Existing models like cloud computing are reaching their limits and will struggle to cope with this deluge of data. This chapter introduces fog and edge computing as a model in which computing power moves toward the sources where the data are generated. Following a brief definition and overview of fog and edge computing, eight of their unique characteristics are described, including contextual location awareness and low latency. Differences between this model and the better-known cloud computing model, as well as other related models, are also explained, and the challenges and opportunities of fog and edge computing are discussed. In addition to the definition and characteristics of fog and edge computing, examples of practical implementation are presented.

Proceedings ArticleDOI
27 May 2020
TL;DR: A novel algorithm for bitrate adaptation in HTTP Adaptive Streaming (HAS), based on Online Convex Optimization (OCO), is shown to provide a robust adaptation strategy which, unlike most of the state-of-the-art techniques, does not require parameter tuning, channel model assumptions, throughput estimation or application-specific adjustments.
Abstract: Achieving low-latency is paramount for live streaming scenarios, that are now-days becoming increasingly popular. In this paper, we propose a novel algorithm for bitrate adaptation in HTTP Adaptive Streaming (HAS), based on Online Convex Optimization (OCO). The proposed algorithm, named Learn2Adapt-LowLatency (L2A-LL), is shown to provide a robust adaptation strategy which, unlike most of the state-of-the-art techniques, does not require parameter tuning, channel model assumptions, throughput estimation or application-specific adjustments. These properties make it very suitable for users who typically experience fast variations in channel characteristics. The proposed algorithm has been implemented in DASH-IF's reference video player (dash.js) and has been made publicly available for research purposes at [22]. Real experiments show that L2A-LL reduces latency significantly, while providing a high average streaming bit-rate, without impairing the overall Quality of Experience (QoE); a result that is independent of the channel and application scenarios. The presented optimization framework, is robust due to its design principle; its ability to learn and allows for modular QoE prioritization, while it facilitates easy adjustments to consider applications beyond live streaming and/or multiple user classes.

Journal ArticleDOI
TL;DR: This article proposes a novel QoE-driven Tactile Internet architecture for smart city, containing five layers: sensing layer, transmission layer, storage layer, computing layer, and application layer, where the techniques supported in each layer follow the requirements of low latency, high reliability, and high quality of user experience.
Abstract: The Tactile Internet, described as a communication infrastructure combining low latency and high reliability, is considered as a leap compared with mobile Internet and Internet of things, and will thereby provide a revolution for almost every segment of society. When the Tactile Internet is introduced into smart city, it not only innovates existing communication and interaction, but also promotes the intelligent process of smart city and end-users' quality of experience (QoE) by taking advantage of haptic information and haptic-related applications. To fully integrate tactile technology with smart city, this article proposes a novel QoE-driven Tactile Internet architecture for smart city, containing five layers: sensing layer, transmission layer, storage layer, computing layer, and application layer. Importantly, the techniques supported in each layer of this architecture follow the requirements of low latency, high reliability, and high quality of user experience. Under this architecture, we devise a fast and reliable QoE management framework based on the broad learning system. Simulation results show that it can achieve high QoE performance under the premise of low computational complexity and high availability.

Journal ArticleDOI
TL;DR: This paper proposes RoPE, an architecture that adapts the routing strategy of the underlying edge network based on future available bandwidth and proposes a bandwidth prediction method that adjusts dynamically based on the required time-to-solution and on the available data.
Abstract: The demand of low latency applications has fostered interest in edge computing, a recent paradigm in which data is processed locally, at the edge of the network. The challenge of delivering services with low-latency and high bandwidth requirements has seen the flourishing of Software-Defined Networking (SDN) solutions that utilize ad-hoc data-driven statistical learning solutions to dynamically steer edge computing resources. In this paper, we propose RoPE, an architecture that adapts the routing strategy of the underlying edge network based on future available bandwidth. The bandwidth prediction method is a policy that we adjust dynamically based on the required time-to-solution and on the available data. An SDN controller keeps track of past link loads and takes a new route if the current path is predicted to be congested. We tested RoPE on different use case applications comparing different well-known prediction policies. Our evaluation results demonstrate that our adaptive solution outperforms other ad-hoc routing solutions and edge-based applications, in turn, benefit from adaptive routing, as long as the prediction is accurate and easy to obtain.

Journal ArticleDOI
TL;DR: This analysis aims to introduce the QKD blocks as a pillar of the quantum-safe security framework of the 5G/B5G-oriented fronthaul infrastructure and finds that for the dark fiber case, secret keys can be distilled at fiber lengths much longer than the maximum fiber fron fourthaul distance corresponding to the round-trip latency barrier.
Abstract: A research contribution focusing on the Quantum Key Distribution (QKD)-enabled solutions assisting in the security framework of an optical 5G fronthaul segment is presented. We thoroughly investigate the integration of a BB84-QKD link, operating at telecom band, delivering quantum keys for the Advanced Encryption Standard (AES)-256 encryption engines of a packetized fronthaul layer interconnecting multiple 5G terminal nodes. Secure Key Rate calculations are studied for both dedicated and shared fiber configurations to identify the attack surface of AES-encrypted data links in each deployment scenario. We also propose a converged fiber-wireless scenario, exploiting a mesh networking extension operated by mmWave wireless links. In addition to the quantum layer performance, emphasis is placed on the strict requirements of 5G-oriented optical edge segments, such as the latency and the availability of quantum keys. We find that for the dark fiber case, secret keys can be distilled at fiber lengths much longer than the maximum fiber fronthaul distance corresponding to the round-trip latency barrier, for both P2P and P2MP topologies. On the contrary, the inelastic Raman scattering makes the simultaneous transmission of quantum and classical signals much more challenging. To counteract the contamination of noise photons, a resilient classical/QKD coexistence scheme is adopted. Motivated by the recent advancements in quantum technology roadmap, our analysis aims to introduce the QKD blocks as a pillar of the quantum-safe security framework of the 5G/B5G-oriented fronthaul infrastructure.

Proceedings ArticleDOI
16 Jun 2020
TL;DR: In this article, the authors propose an architectural change, where each long bitline in DRAM and NVM is split into two segments by an isolation transistor, which enables non-uniform accesses within each memory type (i.e., intra-memory asymmetry), leading to performance and reliability trade-offs in the DRAM-NVM hybrid memory.
Abstract: Modern computing systems are embracing hybrid memory comprising of DRAM and non-volatile memory (NVM) to combine the best properties of both memory technologies, achieving low latency, high reliability, and high density. A prominent characteristic of DRAM-NVM hybrid memory is that it has NVM access latency much higher than DRAM access latency. We call this inter-memory asymmetry. We observe that parasitic components on a long bitline are a major source of high latency in both DRAM and NVM, and a significant factor contributing to high-voltage operations in NVM, which impact their reliability. We propose an architectural change, where each long bitline in DRAM and NVM is split into two segments by an isolation transistor. One segment can be accessed with lower latency and operating voltage than the other. By introducing tiers, we enable non-uniform accesses within each memory type (which we call intra-memory asymmetry), leading to performance and reliability trade-offs in DRAM-NVM hybrid memory. We show that our hybrid tiered-memory architecture has a tremendous potential to improve performance and reliability, if exploited by an efficient page management policy at the operating system (OS). Modern OSes are already aware of inter-memory asymmetry. They migrate pages between the two memory types during program execution, starting from an initial allocation of the page to a randomly-selected free physical address in the memory. We extend existing OS awareness in three ways. First, we exploit both inter- and intra-memory asymmetries to allocate and migrate memory pages between the tiers in DRAM and NVM. Second, we improve the OS’s page allocation decisions by predicting the access intensity of a newly-referenced memory page in a program and placing it to a matching tier during its initial allocation. This minimizes page migrations during program execution, lowering the performance overhead. Third, we propose a solution to migrate pages between the tiers of the same memory without transferring data over the memory channel, minimizing channel occupancy and improving performance. Our overall approach, which we call MNEME, to enable and exploit asymmetries in DRAM-NVM hybrid tiered memory improves both performance and reliability for both single-core and multi-programmed workloads.