Showing papers on "Overhead (computing) published in 2021"

PDF

Open Access

Proceedings Article•DOI•

Bottleneck Transformers for Visual Recognition

[...]

Aravind Srinivas¹, Tsung-Yi Lin², Niki Parmar², Jonathon Shlens², Pieter Abbeel¹, Ashish Vaswani² - Show less +2 more•Institutions (2)

University of California, Berkeley¹, Google²

20 Jun 2021

TL;DR: BoTNet as mentioned in this paper incorporates self-attention for image classification, object detection, and instance segmentation, and achieves state-of-the-art performance on the ImageNet benchmark.

...read moreread less

Abstract: We present BoTNet, a conceptually simple yet powerful backbone architecture that incorporates self-attention for multiple computer vision tasks including image classification, object detection and instance segmentation. By just replacing the spatial convolutions with global self-attention in the final three bottleneck blocks of a ResNet and no other changes, our approach improves upon the baselines significantly on instance segmentation and object detection while also reducing the parameters, with minimal overhead in latency. Through the design of BoTNet, we also point out how ResNet bottleneck blocks with self-attention can be viewed as Transformer blocks. Without any bells and whistles, BoTNet achieves 44.4% Mask AP and 49.7% Box AP on the COCO Instance Segmentation benchmark using the Mask R-CNN framework; surpassing the previous best published single model and single scale results of ResNeSt [67] evaluated on the COCO validation set. Finally, we present a simple adaptation of the BoTNet design for image classification, resulting in models that achieve a strong performance of 84.7% top-1 accuracy on the ImageNet benchmark while being up to 1.64x faster in "compute"1 time than the popular EfficientNet models on TPU-v3 hardware. We hope our simple and effective approach will serve as a strong baseline for future research in self-attention models for vision.2

...read moreread less

675 citations

Journal Article•DOI•

Enabling Large Intelligent Surfaces With Compressive Sensing and Deep Learning

[...]

Abdelrahman Taha¹, Muhammad Alrabeiah¹, Ahmed Alkhateeb¹•Institutions (1)

Arizona State University¹

04 Mar 2021-IEEE Access

TL;DR: In this article, a novel LIS architecture based on sparse channel sensors is proposed, where all the LIS elements are passive except for a few elements that are connected to the baseband.

...read moreread less

Abstract: Employing large intelligent surfaces (LISs) is a promising solution for improving the coverage and rate of future wireless systems. These surfaces comprise massive numbers of nearly-passive elements that interact with the incident signals, for example by reflecting them, in a smart way that improves the wireless system performance. Prior work focused on the design of the LIS reflection matrices assuming full channel knowledge. Estimating these channels at the LIS, however, is a key challenging problem. With the massive number of LIS elements, channel estimation or reflection beam training will be associated with (i) huge training overhead if all the LIS elements are passive (not connected to a baseband) or with (ii) prohibitive hardware complexity and power consumption if all the elements are connected to the baseband through a fully-digital or hybrid analog/digital architecture. This paper proposes efficient solutions for these problems by leveraging tools from compressive sensing and deep learning. First, a novel LIS architecture based on sparse channel sensors is proposed. In this architecture, all the LIS elements are passive except for a few elements that are active (connected to the baseband). We then develop two solutions that design the LIS reflection matrices with negligible training overhead. In the first approach, we leverage compressive sensing tools to construct the channels at all the LIS elements from the channels seen only at the active elements. In the second approach, we develop a deep-learning based solution where the LIS learns how to interact with the incident signal given the channels at the active elements, which represent the state of the environment and transmitter/receiver locations. We show that the achievable rates of the proposed solutions approach the upper bound, which assumes perfect channel knowledge, with negligible training overhead and with only a few active elements, making them promising for future LIS systems.

...read moreread less

405 citations

Journal Article•DOI•

Two-Timescale Channel Estimation for Reconfigurable Intelligent Surface Aided Wireless Communications

[...]

Chen Hu¹, Linglong Dai¹, Shuangfeng Han², Xiaoyun Wang²•Institutions (2)

Tsinghua University¹, China Mobile Research Institute²

12 Apr 2021-IEEE Transactions on Communications

TL;DR: A two-timescale channel estimation framework to exploit the property that the BS-RIS channel is high-dimensional but quasi-static, while the RIS-UE channel is mobile but low-dimensional is proposed.

...read moreread less

Abstract: Channel estimation is challenging for the reconfigurable intelligent surface (RIS)-aided wireless communications. Since the number of coefficients of the cascaded channel among the base station (BS), the RIS, and the user equipment (UE), is the product of the number of BS antennas, the number of RIS elements, and the number of UEs, the pilot overhead can be prohibitively high. In this paper, we propose a two-timescale channel estimation framework to exploit the property that the BS-RIS channel is high-dimensional but quasi-static, while the RIS-UE channel is mobile but low-dimensional. Specifically, to estimate the quasi-static BS-RIS channel, we propose a dual-link pilot transmission scheme, where the BS transmits downlink pilots and receives uplink pilots reflected by the RIS. Then, we propose a coordinate descent-based algorithm to recover the BS-RIS channel. Since the quasi-static BS-RIS channel is estimated less frequently than the mobile channel is, the average pilot overhead can be reduced from a long-term perspective. Although the mobile RIS-UE channel has to be frequently estimated in a small timescale, the associated pilot overhead is low thanks to its low dimension. Simulation results show that the proposed two-timescale channel estimation framework can achieve accurate channel estimation with low pilot overhead.

...read moreread less

236 citations

Book Chapter•DOI•

UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation

[...]

Yunhe Gao¹, Mu Zhou¹, Dimitris N. Metaxas¹•Institutions (1)

Rutgers University¹

27 Sep 2021

TL;DR: In this paper, a self-attention mechanism along with relative position encoding was proposed to reduce the complexity of selfattention operation significantly from O(n 2 ) to approximate O (n).

...read moreread less

Abstract: Transformer architecture has emerged to be successful in a number of natural language processing tasks. However, its applications to medical vision remain largely unexplored. In this study, we present UTNet, a simple yet powerful hybrid Transformer architecture that integrates self-attention into a convolutional neural network for enhancing medical image segmentation. UTNet applies self-attention modules in both encoder and decoder for capturing long-range dependency at different scales with minimal overhead. To this end, we propose an efficient self-attention mechanism along with relative position encoding that reduces the complexity of self-attention operation significantly from $O(n^2)$ to approximate O(n). A new self-attention decoder is also proposed to recover fine-grained details from the skipped connections in the encoder. Our approach addresses the dilemma that Transformer requires huge amounts of data to learn vision inductive bias. Our hybrid layer design allows the initialization of Transformer into convolutional networks without a need of pre-training. We have evaluated UTNet on the multi-label, multi-vendor cardiac magnetic resonance imaging cohort. UTNet demonstrates superior segmentation performance and robustness against the state-of-the-art approaches, holding the promise to generalize well on other medical image segmentations.

...read moreread less

214 citations

Journal Article•DOI•

Turbo-Aggregate: Breaking the Quadratic Aggregation Barrier in Secure Federated Learning

[...]

Jinhyun So¹, Basak Guler², A. Salman Avestimehr¹•Institutions (2)

University of Southern California¹, University of California, Riverside²

26 Jan 2021

TL;DR: This article proposes the first secure aggregation framework, named Turbo-Aggregate, which employs a multi-group circular strategy for efficient model aggregation, and leverages additive secret sharing and novel coding techniques for injecting aggregation redundancy in order to handle user dropouts while guaranteeing user privacy.

...read moreread less

Abstract: Federated learning is a distributed framework for training machine learning models over the data residing at mobile devices, while protecting the privacy of individual users. A major bottleneck in scaling federated learning to a large number of users is the overhead of secure model aggregation across many users. In particular, the overhead of the state-of-the-art protocols for secure model aggregation grows quadratically with the number of users. In this article, we propose the first secure aggregation framework, named Turbo-Aggregate, that in a network with $N$ users achieves a secure aggregation overhead of $O(N\log {N})$ , as opposed to $O(N^{2})$ , while tolerating up to a user dropout rate of 50%. Turbo-Aggregate employs a multi-group circular strategy for efficient model aggregation, and leverages additive secret sharing and novel coding techniques for injecting aggregation redundancy in order to handle user dropouts while guaranteeing user privacy. We experimentally demonstrate that Turbo-Aggregate achieves a total running time that grows almost linear in the number of users, and provides up to $40\times $ speedup over the state-of-the-art protocols with up to $N=200$ users. Our experiments also demonstrate the impact of model size and bandwidth on the performance of Turbo-Aggregate.

...read moreread less

170 citations

Proceedings Article•DOI•

Distilling Knowledge via Knowledge Review

[...]

Pengguang Chen¹, Shu Liu, Hengshuang Zhao², Jiaya Jia¹•Institutions (2)

The Chinese University of Hong Kong¹, University of Oxford²

20 Jun 2021

TL;DR: In this paper, a cross-stage connection path is proposed to transfer knowledge from the teacher network to the student one, with the goal of greatly improving the performance of the student network.

...read moreread less

Abstract: Knowledge distillation transfers knowledge from the teacher network to the student one, with the goal of greatly improving the performance of the student network. Previous methods mostly focus on proposing feature transformation and loss functions between the same level's features to improve the effectiveness. We differently study the factor of connection path cross levels between teacher and student networks, and reveal its great importance. For the first time in knowledge distillation, cross-stage connection paths are proposed. Our new review mechanism is effective and structurally simple. Our finally designed nested and compact framework requires negligible computation overhead, and outperforms other methods on a variety of tasks. We apply our method to classification, object detection, and instance segmentation tasks. All of them witness significant student network performance improvement.

...read moreread less

165 citations

Journal Article•DOI•

Deep Anomaly Detection for Time-Series Data in Industrial IoT: A Communication-Efficient On-Device Federated Learning Approach

[...]

Yi Liu¹, Sahil Garg², Jiangtian Nie³, Yang Zhang⁴, Zehui Xiong³, Jiawen Kang³, M. Shamim Hossain⁵ - Show less +3 more•Institutions (5)

Heilongjiang University¹, École de technologie supérieure², Nanyang Technological University³, Wuhan University of Technology⁴, King Saud University⁵

15 Apr 2021-IEEE Internet of Things Journal

TL;DR: In this article, the authors proposed an attention mechanism-based convolutional neural network-long short-term memory (AMCNN-LSTM) model to accurately detect anomalies.

...read moreread less

Abstract: Since edge device failures (i.e., anomalies) seriously affect the production of industrial products in Industrial IoT (IIoT), accurately and timely detecting anomalies are becoming increasingly important. Furthermore, data collected by the edge device contain massive user’s private data, which is challenging current detection approaches as user privacy has attracted more and more public concerns. With this focus, this article proposes a new communication-efficient on-device federated learning (FL)-based deep anomaly detection framework for sensing time-series data in IIoT. Specifically, we first introduce an FL framework to enable decentralized edge devices to collaboratively train an anomaly detection model, which can improve its generalization ability. Second, we propose an attention mechanism-based convolutional neural network-long short-term memory (AMCNN-LSTM) model to accurately detect anomalies. The AMCNN-LSTM model uses attention mechanism-based convolutional neural network units to capture important fine-grained features, thereby preventing memory loss and gradient dispersion problems. Furthermore, this model retains the advantages of the long short-term memory unit in predicting time-series data. Third, to adapt the proposed framework to the timeliness of industrial anomaly detection, we propose a gradient compression mechanism based on Top- ${k}$ selection to improve communication efficiency. Extensive experimental studies on four real-world data sets demonstrate that our framework accurately and timely detects anomalies and also reduces the communication overhead by 50% compared to the FL framework that does not use the gradient compression scheme.

...read moreread less

159 citations

Journal Article•DOI•

Energy-Optimized Partial Computation Offloading in Mobile-Edge Computing With Genetic Simulated-Annealing-Based Particle Swarm Optimization

[...]

Jing Bi¹, Haitao Yuan², Shuaifei Duanmu¹, MengChu Zhou³, Abdullah Abusorrah⁴ - Show less +1 more•Institutions (4)

Beijing University of Technology¹, Beihang University², New Jersey Institute of Technology³, King Abdulaziz University⁴

01 Mar 2021-IEEE Internet of Things Journal

TL;DR: In this paper, a hybrid metaheuristic algorithm named genetic simulated annealing-based particle swarm optimization (GSPO) was proposed to minimize the total energy consumed by mobile devices and edge servers by jointly optimizing the offloading ratio of tasks, CPU speeds of mobile devices, allocated bandwidth of available channels, and transmission power of each mobile device in each time slot.

...read moreread less

Abstract: Smart mobile devices (SMDs) can meet users’ high expectations by executing computational intensive applications but they only have limited resources, including CPU, memory, battery power, and wireless medium. To tackle this limitation, partial computation offloading can be used as a promising method to schedule some tasks of applications from resource-limited SMDs to high-performance edge servers. However, it brings communication overhead issues caused by limited bandwidth and inevitably increases the latency of tasks offloaded to edge servers. Therefore, it is highly challenging to achieve a balance between high-resource consumption in SMDs and high communication cost for providing energy-efficient and latency-low services to users. This work proposes a partial computation offloading method to minimize the total energy consumed by SMDs and edge servers by jointly optimizing the offloading ratio of tasks, CPU speeds of SMDs, allocated bandwidth of available channels, and transmission power of each SMD in each time slot. It jointly considers the execution time of tasks performed in SMDs and edge servers, and transmission time of data. It also jointly considers latency limits, CPU speeds, transmission power limits, available energy of SMDs, and the maximum number of CPU cycles and memories in edge servers. Considering these factors, a nonlinear constrained optimization problem is formulated and solved by a novel hybrid metaheuristic algorithm named genetic simulated annealing-based particle swarm optimization (GSP) to produce a close-to-optimal solution. GSP achieves joint optimization of computation offloading between a cloud data center and the edge, and resource allocation in the data center. Real-life data-based experimental results prove that it achieves lower energy consumption in less convergence time than its three typical peers.

...read moreread less

138 citations

Journal Article•DOI•

Energy-Efficient Random Access for LEO Satellite-Assisted 6G Internet of Remote Things

[...]

Li Zhen, Ali Kashif Bashir¹, Keping Yu², Yasser D. Al-Otaibi³, Chuan Heng Foh⁴, Pei Xiao⁴ - Show less +2 more•Institutions (4)

Manchester Metropolitan University¹, Waseda University², King Abdulaziz University³, University of Surrey⁴

01 Apr 2021-IEEE Internet of Things Journal

TL;DR: In this article, the authors proposed a novel impulse-like timing metric based on length-alterable differential cross-correlation (LDCC), which is immune to carrier frequency offset (CFO) and capable of mitigating the impact of noise on timing estimation.

...read moreread less

Abstract: Satellite communication system is expected to play a vital role for realizing various remote Internet-of-Things (IoT) applications in sixth-generation vision. Due to unique characteristics of satellite environment, one of the main challenges in this system is to accommodate massive random access (RA) requests of IoT devices while minimizing their energy consumptions. In this article, we focus on the reliable design and detection of RA preamble to effectively enhance the access efficiency in high-dynamic low-earth-orbit (LEO) scenarios. To avoid additional signaling overhead and detection process, a long preamble sequence is constructed by concatenating the conjugated and circularly shifted replicas of a single root Zadoff–Chu (ZC) sequence in RA procedure. Moreover, we propose a novel impulse-like timing metric based on length-alterable differential cross-correlation (LDCC), that is immune to carrier frequency offset (CFO) and capable of mitigating the impact of noise on timing estimation. Statistical analysis of the proposed metric reveals that increasing correlation length can obviously promote the output signal-to-noise power ratio, and the first-path detection threshold is independent of noise statistics. Simulation results in different LEO scenarios validate the robustness of the proposed method to severe channel distortion, and show that our method can achieve significant performance enhancement in terms of timing estimation accuracy, success probability of first access, and mean normalized access energy, compared with the existing RA methods.

...read moreread less

130 citations

Journal Article•DOI•

Channel Estimation for RIS Assisted Wireless Communications—Part II: An Improved Solution Based on Double-Structured Sparsity

[...]

Xiuhong Wei¹, Decai Shen¹, Linglong Dai¹•Institutions (1)

Tsinghua University¹

19 Jan 2021-IEEE Communications Letters

TL;DR: The double-structured orthogonal matching pursuit (DS-OMP) algorithm, where the completely common non-zero rows and the partially commonNon-zero columns are jointly estimated for all users are proposed.

...read moreread less

Abstract: Reconfigurable intelligent surface (RIS) can manipulate the wireless communication environment by controlling the coefficients of RIS elements. However, due to the large number of passive RIS elements without signal processing capability, channel estimation in RIS assisted wireless communication system requires high pilot overhead. In the second part of this invited paper, we propose to exploit the double-structured sparsity of the angular cascaded channels among users to reduce the pilot overhead. Specifically, we first reveal the double-structured sparsity, i.e., different angular cascaded channels for different users enjoy the completely common non-zero rows and the partially common non-zero columns. By exploiting this double-structured sparsity, we further propose the double-structured orthogonal matching pursuit (DS-OMP) algorithm, where the completely common non-zero rows and the partially common non-zero columns are jointly estimated for all users. Simulation results show that the pilot overhead required by the proposed scheme is lower than existing schemes.

...read moreread less

123 citations

Journal Article•DOI•

Intelligent Reflecting Surface Enhanced Wireless Networks: Two-Timescale Beamforming Optimization

[...]

Ming-Min Zhao¹, Qingqing Wu², Minjian Zhao¹, Rui Zhang³•Institutions (3)

Zhejiang University¹, City University of Macau², National University of Singapore³

01 Jan 2021-IEEE Transactions on Wireless Communications

TL;DR: In this paper, a two-timescale (TTS) transmission protocol was proposed to maximize the achievable average sum-rate for an IRS-aided multiuser system under the general correlated Rician channel model.

...read moreread less

Abstract: Intelligent reflecting surface (IRS) has drawn a lot of attention recently as a promising new solution to achieve high spectral and energy efficiency for future wireless networks. By utilizing massive low-cost passive reflecting elements, the wireless propagation environment becomes controllable and thus can be made favorable for improving the communication performance. Prior works on IRS mainly rely on the instantaneous channel state information (I-CSI), which, however, is practically difficult to obtain for IRS-associated links due to its passive operation and large number of reflecting elements. To overcome this difficulty, we propose in this paper a new two-timescale (TTS) transmission protocol to maximize the achievable average sum-rate for an IRS-aided multiuser system under the general correlated Rician channel model. Specifically, the passive IRS phase shifts are first optimized based on the statistical CSI (S-CSI) of all links, which varies much slowly as compared to their I-CSI; while the transmit beamforming/precoding vectors at the access point (AP) are then designed to cater to the I-CSI of the users’ effective fading channels with the optimized IRS phase shifts, thus significantly reducing the channel training overhead and passive beamforming design complexity over the existing schemes based on the I-CSI of all channels. Besides, for ease of practical implementation, we consider discrete phase shifts at each reflecting element of the IRS. For the single-user case, an efficient penalty dual decomposition (PDD)-based algorithm is proposed, where the IRS phase shifts are updated in parallel to reduce the computational time. For the multiuser case, we propose a general TTS stochastic successive convex approximation (SSCA) algorithm by constructing a quadratic surrogate of the objective function, which cannot be explicitly expressed in closed-form. Simulation results are presented to validate the effectiveness of our proposed algorithms and evaluate the impact of S-CSI and channel correlation on the system performance.

...read moreread less

Journal Article•DOI•

Joint computation offloading and task caching for multi-user and multi-task MEC systems: reinforcement learning-based algorithms

[...]

Ibrahim A. Elgendy¹, Ibrahim A. Elgendy², Weizhe Zhang², Hui He², Brij B. Gupta³, Brij B. Gupta⁴, Ahmed A. Abd El-Latif¹ - Show less +3 more•Institutions (4)

Menoufia University¹, Harbin Institute of Technology², National Institute of Technology, Kurukshetra³, Asia University (Taiwan)⁴

01 Apr 2021-Wireless Networks

TL;DR: This study proposes an offloading model for a multi-user MEC system with multi-task, and an equivalent form of reinforcement learning is created where the state spaces are defined based on all possible solutions and the actions are defined on the basis of movement between the different states.

...read moreread less

Abstract: Computation offloading at mobile edge computing (MEC) servers can mitigate the resource limitation and reduce the communication latency for mobile devices Thereby, in this study, we proposed an offloading model for a multi-user MEC system with multi-task In addition, a new caching concept is introduced for the computation tasks, where the application program and related code for the completed tasks are cached at the edge server Furthermore, an efficient model of task offloading and caching integration is formulated as a nonlinear problem whose goal is to reduce the total overhead of time and energy However, solving these types of problems is computationally prohibitive, especially for large-scale of mobile users Thus, an equivalent form of reinforcement learning is created where the state spaces are defined based on all possible solutions and the actions are defined on the basis of movement between the different states Afterwards, two effective Q-learning and Deep-Q-Network-based algorithms are proposed to derive the near-optimal solution for this problem Finally, experimental evaluations verify that our proposed model can substantially minimize the mobile devices’ overhead by deploying computation offloading and task caching strategy reasonably

...read moreread less

Journal Article•DOI•

Overhead-Aware Design of Reconfigurable Intelligent Surfaces in Smart Radio Environments

[...]

Alessio Zappone¹, Marco Di Renzo², Farshad Shams², Xuewen Qian², Merouane Debbah³ - Show less +1 more•Institutions (3)

University of Cassino¹, Université Paris-Saclay², Huawei³

01 Jan 2021-IEEE Transactions on Wireless Communications

TL;DR: In this article, an overhead-aware resource allocation framework for wireless networks where reconfigurable intelligent surfaces are used to improve the communication performance is proposed and incorporated in the expressions of the system rate and energy efficiency.

...read moreread less

Abstract: Reconfigurable intelligent surfaces have emerged as a promising technology for future wireless networks. Given that a large number of reflecting elements is typically used and that the surface has no signal processing capabilities, a major challenge is to cope with the overhead that is required to estimate the channel state information and to report the optimized phase shifts to the surface. This issue has not been addressed by previous works, which do not explicitly consider the overhead during the resource allocation phase. This work aims at filling this gap, by developing an overhead-aware resource allocation framework for wireless networks where reconfigurable intelligent surfaces are used to improve the communication performance. An overhead model is proposed and incorporated in the expressions of the system rate and energy efficiency, which are then optimized with respect to the phase shifts of the reconfigurable intelligent surface, the transmit and receive filters, the power and bandwidth used for the communication and feedback phases. The bi-objective maximization of the rate and energy efficiency is investigated, too. The proposed framework characterizes the trade-off between optimized radio resource allocation policies and the related overhead in networks with reconfigurable intelligent surfaces.

...read moreread less

Proceedings Article•DOI•

SIMDRAM: a framework for bit-serial SIMD processing using DRAM

[...]

Nastaran Hajinazar¹, Geraldo F. Oliveira², Sven Gregorio², João Dinis Ferreira², Nika Mansouri Ghiasi², Minesh Patel², Mohammed Alser², Saugata Ghose³, Juan Gómez-Luna², Onur Mutlu² - Show less +6 more•Institutions (3)

Simon Fraser University¹, ETH Zurich², University of Illinois at Urbana–Champaign³

19 Apr 2021

TL;DR: SIMDRAM as mentioned in this paper is a general-purpose processing-using-DRAM framework that enables the efficient implementation of complex operations and provides a flexible mechanism to support the implementation of arbitrary user-defined operations.

...read moreread less

Abstract: Processing-using-DRAM has been proposed for a limited set of basic operations (i.e., logic operations, addition). However, in order to enable full adoption of processing-using-DRAM, it is necessary to provide support for more complex operations. In this paper, we propose SIMDRAM, a flexible general-purpose processing-using-DRAM framework that (1) enables the efficient implementation of complex operations, and (2) provides a flexible mechanism tosupport the implementation of arbitrary user-defined operations. The SIMDRAM framework comprises three key steps. The first step builds an efficient MAJ/NOT representation of a given desired operation. The second step allocates DRAM rows that are reserved for computation to the operation’s input and output operands, and generates the required sequence of DRAM commands to perform the MAJ/NOT implementation of the desired operation in DRAM. The third step uses the SIMDRAM control unit located inside the memory controller to manage the computation of the operation from start to end, by executing the DRAM commands generated in the second step of the framework. We design the hardware and ISA support for SIMDRAM framework to (1) address key system integration challenges, and (2) allow programmers to employ new SIMDRAM operations without hardware changes. We evaluate SIMDRAM for reliability, area overhead, throughput, and energy efficiency using a wide range of operations and seven real-world applications to demonstrate SIMDRAM’s generality. Our evaluations using a single DRAM bank show that (1) over 16 operations, SIMDRAM provides 2.0X the throughput and 2.6X the energy efficiency of Ambit, a state-of-the-art processing-using-DRAM mechanism; (2) over seven real-world applications, SIMDRAM provides 2.5X the performance of Ambit. Using 16 DRAM banks, SIMDRAM provides (1) 88X and 5.8X the throughput, and 257X and 31X the energy efficiency, of a CPU and a high-end GPU, respectively, over 16 operations; (2) 21X and 2.1X the performance of the CPU and GPU, over seven real-world applications. SIMDRAM incurs an area overhead of only 0.2% in a high-end CPU.

...read moreread less

Journal Article•DOI•

A trust infrastructure based authentication method for clustered vehicular ad hoc networks

[...]

Fatemehsadat Mirsadeghi¹, Marjan Kuchaki Rafsanjani¹, Brij B. Gupta², Brij B. Gupta³, Brij B. Gupta⁴ - Show less +1 more•Institutions (4)

Shahid Bahonar University of Kerman¹, Macquarie University², Asia University (Taiwan)³, National Institute of Technology, Kurukshetra⁴

01 Jul 2021-Peer-to-peer Networking and Applications

TL;DR: By simulation results, it is proves that the proposed trust based authentication method for clustered vehicular ad hoc networks increases the accuracy in detecting malicious nodes and the packet delivery ratio, and decreases the delay of authentication and overhead.

...read moreread less

Abstract: Vehicular Ad hoc Networks (VANETs) as a subset of mobile ad hoc networks which allow communication between any vehicle with other adjacent vehicles, road side units and infrastructure. In these networks, the purpose is to enhance the security, improve the management of urban and road traffic and provide services to the passenger. Due to problems such as reliability and privacy, messages that are exchanged in the network should be confidential and secure. Therefore, we need a secure topology to maintain trust, which enables the cryptographic process. In this paper, a trust based authentication method for clustered vehicular ad hoc networks is proposed. The efficient authentication method should be able to accurately detect malicious nodes and reduced delay and overhead. The main purpose of the proposed method is to create trustworthy and stable clusters that lead to the stability of the entire network. For this purpose, we estimate the trust degree of each vehicle by combining the trust between vehicles and the trust between the vehicle and Road Side Units (RSUs), and Cluster Heads (CHs) are selected based on this estimated trust degree. Cluster Heads along with verifiers are responsible for monitoring each vehicle. On the other hand, the cluster heads provide an optimal and secure route for transmitting messages. Messages are digitally signed by the sender and encrypted using a public/private key as distributed by a Trusted Authority (TA) and decrypted by the destination; so that each message contains a certificate from a trusted authority. In this identification, the sender and receiver of the message are verified and authentication will be achieved. By simulation results, it is proves that the proposed method increases the accuracy in detecting malicious nodes and the packet delivery ratio, and decreases the delay of authentication and overhead.

...read moreread less

Journal Article•DOI•

Efficient Certificateless Aggregate Signature With Conditional Privacy Preservation in IoV

[...]

Qian Mei¹, Hu Xiong¹, Jinhao Chen¹, Minghao Yang¹, Saru Kumari², Muhammad Khurram Khan³ - Show less +2 more•Institutions (3)

University of Electronic Science and Technology of China¹, Chaudhary Charan Singh University², King Saud University³

01 Mar 2021-IEEE Systems Journal

TL;DR: This article proposes an efficient certificateless aggregate signature scheme with conditional privacy preservation that is suitable for resource-constrained environments, and it is compared with related works from aspects of computation cost, communication efficiency, and security requirements.

...read moreread less

Abstract: As an extension of traditional vehicular ad hoc networks, the Internet of Vehicles (IoV) enables information collection and dissemination, which brings a lot of convenience and benefits to the intelligent transportation systems. However, the booming IoV confronts a few challenges in the aspects of vehicle location privacy preservation and the authenticity of the transmitted information. In order to meet these challenges, we propose an efficient certificateless aggregate signature scheme with conditional privacy preservation in this article. Our scheme utilizes the technique of full aggregation to reduce the bandwidth resources and computing overhead. Besides, the conditional privacy preservation in IoV system is realized by using pseudonym mechanism. We demonstrate that the proposed scheme is secure against the Type-I and Type-II adversaries in the random oracle under the computational Diffie–Hellman assumption. In addition, the proposed scheme is compared with related works from aspects of computation cost, communication efficiency, and security requirements. The comparison results show that the proposed scheme is efficient, and it is suitable for resource-constrained environments.

...read moreread less

Journal Article•DOI•

An Intelligent Route Computation Approach Based on Real-Time Deep Learning Strategy for Software Defined Communication Systems

[...]

Bomin Mao¹, Fengxiao Tang¹, Zubair Md. Fadlullah¹, Nei Kato¹•Institutions (1)

Tohoku University¹

01 Jul 2021-IEEE Transactions on Emerging Topics in Computing

TL;DR: This paper utilizes the deep learning technique to conduct the routing computation for the SDCSs and considers an online training manner to reduce the computation overhead of the central controller and improve the adaptation of CNNs to the changing traffic pattern.

...read moreread less

Abstract: Software Defined Networking (SDN) is regarded as the next generation paradigm as it simplifies the structure of the data plane and improves the resource utilization. However, in current Software Defined Communication Systems (SDCSs), the maximum or minimum metric value based routing strategies come from traditional networks, which lack the ability of self-adaptation and do not efficiently utilize the computation resource in the controllers. To solve these problems, in this paper, we utilize the deep learning technique to conduct the routing computation for the SDCSs. Specifically, in our proposal, the considered Convolutional Neural Networks (CNNs) are adopted to intelligently compute the paths according to the input real-time traffic traces. To reduce the computation overhead of the central controller and improve the adaptation of CNNs to the changing traffic pattern, we consider an online training manner. Analysis shows that the computation complexity can be significantly reduced through the online training manner. Moreover, the simulation results demonstrate that our proposed CNNs are able to compute the appropriate paths combinations with high accuracy. Furthermore, the adopted periodical retraining enables the deep learning structures to adapt to the traffic changes.

...read moreread less

Journal Article•DOI•

HDMA: Hybrid D2D Message Authentication Scheme for 5G-Enabled VANETs

[...]

Peng Wang¹, Chien-Ming Chen², Saru Kumari³, Mohammad Shojafar⁴, Rahim Tafazolli⁴, Yining Liu¹ - Show less +2 more•Institutions (4)

Guilin University of Electronic Technology¹, Shandong University of Science and Technology², Chaudhary Charan Singh University³, University of Surrey⁴

01 Aug 2021-IEEE Transactions on Intelligent Transportation Systems

TL;DR: A hybrid D2D message authentication (HDMA) scheme is proposed for 5G-enabled VANETs, in which a novel group signature-based algorithm is used for mutual authentication between vehicle to vehicle (V2V) communication.

...read moreread less

Abstract: The fifth-generation (5G) mobile communication technology with higher capacity and data rate, ultra-low device to device (D2D) latency, and massive device connectivity will greatly promote the development of vehicular ad hoc networks (VANETs). Meantime, new challenges such as security, privacy and efficiency are raised. In this article, a hybrid D2D message authentication (HDMA) scheme is proposed for 5G-enabled VANETs, in which a novel group signature-based algorithm is used for mutual authentication between vehicle to vehicle (V2V) communication. In addition, a pre-computed lookup table is adopted to reduce the computation overhead of modular exponentiation operation. Security analysis shows that HDMA is robust to resist various security attacks, and performance analysis also points out that, the authentication overhead of HDMA is more efficient than some traditional schemes with the help of the pre-computed lookup table in V2V and vehicle to infrastructure (V2I) communication.

...read moreread less

Proceedings Article•DOI•

POSEIDON: Privacy-Preserving Federated Neural Network Learning.

[...]

Sinem Sav¹, Apostolos Pyrgelis¹, Juan Ramón Troncoso-Pastoriza¹, David Froelicher¹, Jean-Philippe Bossuat¹, Joao Sa Sousa¹, Jean-Pierre Hubaux¹ - Show less +3 more•Institutions (1)

École Polytechnique Fédérale de Lausanne¹

01 Jan 2021

TL;DR: A novel system, POSEIDON, is proposed, the first of its kind in the regime of privacy-preserving neural network training, employing multiparty lattice-based cryptography and preserving the confidentiality of the training data, the model, and the evaluation data, under a passive-adversary model and collusions between up to $N-1 parties.

...read moreread less

Abstract: In this paper, we address the problem of privacy-preserving training and evaluation of neural networks in an $N$-party, federated learning setting. We propose a novel system, POSEIDON, the first of its kind in the regime of privacy-preserving neural network training. It employs multiparty lattice-based cryptography to preserve the confidentiality of the training data, the model, and the evaluation data, under a passive-adversary model and collusions between up to $N-1$ parties. To efficiently execute the secure backpropagation algorithm for training neural networks, we provide a generic packing approach that enables Single Instruction, Multiple Data (SIMD) operations on encrypted data. We also introduce arbitrary linear transformations within the cryptographic bootstrapping operation, optimizing the costly cryptographic computations over the parties, and we define a constrained optimization problem for choosing the cryptographic parameters. Our experimental results show that POSEIDON achieves accuracy similar to centralized or decentralized non-private approaches and that its computation and communication overhead scales linearly with the number of parties. POSEIDON trains a 3-layer neural network on the MNIST dataset with 784 features and 60K samples distributed among 10 parties in less than 2 hours.

...read moreread less

Proceedings Article•DOI•

FaasCache: keeping serverless computing alive with greedy-dual caching

[...]

Alexander Fuerst¹, Prateek Sharma¹•Institutions (1)

Indiana University¹

19 Apr 2021

TL;DR: In this article, a caching-inspired Greedy-Dual keep-alive policy is proposed to reduce the cold-start overhead of FaaS applications by more than 3× compared to current approaches.

...read moreread less

Abstract: Functions as a Service (also called serverless computing) promises to revolutionize how applications use cloud resources. However, functions suffer from cold-start problems due to the overhead of initializing their code and data dependencies before they can start executing. Keeping functions alive and warm after they have finished execution can alleviate the cold-start overhead. Keep-alive policies must keep functions alive based on their resource and usage characteristics, which is challenging due to the diversity in FaaS workloads. Our insight is that keep-alive is analogous to caching. Our caching-inspired Greedy-Dual keep-alive policy can be effective in reducing the cold-start overhead by more than 3× compared to current approaches. Caching concepts such as reuse distances and hit-ratio curves can also be used for auto-scaled server resource provisioning, which can reduce the resource requirement of FaaS providers by 30% for real-world dynamic workloads. We implement caching-based keep-alive and resource provisioning policies in our FaasCache system, which is based on OpenWhisk. We hope that our caching analogy opens the door to more principled and optimized keep-alive and resource provisioning techniques for future FaaS workloads and platforms.

...read moreread less

Proceedings Article•DOI•

[...]

Xiaomin Ouyang¹, Zhiyuan Xie¹, Jiayu Zhou², Jianwei Huang¹, Guoliang Xing¹ - Show less +1 more•Institutions (2)

The Chinese University of Hong Kong¹, Michigan State University²

24 Jun 2021

TL;DR: ClusterFL as mentioned in this paper is a similarity-aware federated learning system that can provide high model accuracy and low communication overhead for human activity recognition (HAR) applications, which can efficiently drop out the nodes that converge slower or have little correlation with other nodes in each cluster.

...read moreread less

Abstract: Federated Learning (FL) has recently received significant interests thanks to its capability of protecting data privacy. However, existing FL paradigms yield unsatisfactory performance for a wide class of human activity recognition (HAR) applications since they are oblivious to the intrinsic relationship between data of different users. We propose ClusterFL, a similarity-aware federated learning system that can provide high model accuracy and low communication overhead for HAR applications. ClusterFL features a novel clustered multi-task federated learning framework that maximizes the training accuracy of multiple learned models while automatically capturing the intrinsic clustering relationship among the data of different nodes. Based on the learned cluster relationship, ClusterFL can efficiently drop out the nodes that converge slower or have little correlation with other nodes in each cluster, significantly speeding up the convergence while maintaining the accuracy performance. We evaluate the performance of ClusterFL on an NVIDIA edge testbed using four new HAR datasets collected from total 145 users. The results show that, ClusterFL outperforms several state-of-the-art FL paradigms in terms of overall accuracy, and save more than 50% communication overhead at the expense of negligible accuracy degradation.

...read moreread less

Journal Article•DOI•

Hybrid Satellite-UAV-Terrestrial Networks for 6G Ubiquitous Coverage: A Maritime Communications Perspective

[...]

Yanmin Wang, Wei Feng¹, Jue Wang², Tony Q. S. Quek³•Institutions (3)

Tsinghua University¹, Nantong University², Singapore University of Technology and Design³

25 Apr 2021-arXiv: Information Theory

TL;DR: In this article, a joint link scheduling and rate adaptation problem for a hierarchical satellite-UAV-terrestrial network on the ocean is addressed to minimize the total energy consumption with quality of service (QoS) guarantees.

...read moreread less

Abstract: In the coming smart ocean era, reliable and efficient communications are crucial for promoting a variety of maritime activities. Current maritime communication networks (MCNs) mainly rely on marine satellites and on-shore base stations (BSs). The former generally provides limited transmission rate, while the latter lacks wide-area coverage capability. Due to these facts, the state-of-the-art MCN falls far behind terrestrial fifth-generation (5G) networks. To fill up the gap in the coming sixth-generation (6G) era, we explore the benefit of deployable BSs for maritime coverage enhancement. Both unmanned aerial vehicles (UAVs) and mobile vessels are used to configure deployable BSs. This leads to a hierarchical satellite-UAV-terrestrial network on the ocean. We address the joint link scheduling and rate adaptation problem for this hybrid network, to minimize the total energy consumption with quality of service (QoS) guarantees. Different from previous studies, we use only the large-scale channel state information (CSI), which is location-dependent and thus can be predicted through the position information of each UAV/vessel based on its specific trajectory/shipping lane. The problem is shown to be an NP-hard mixed integer nonlinear programming problem with a group of hidden non-linear equality constraints. We solve it suboptimally by using Min-Max transformation and iterative problem relaxation, leading to a process-oriented joint link scheduling and rate adaptation scheme. As observed by simulations, the scheme can provide agile on-demand coverage for all users with much reduced system overhead and a polynomial computation complexity. Moreover, it can achieve a prominent performance close to the optimal solution.

...read moreread less

Journal Article•DOI•

Efficient Federated Learning Algorithm for Resource Allocation in Wireless IoT Networks

[...]

Van-Dinh Nguyen¹, Shree Krishna Sharma¹, Thang X. Vu¹, Symeon Chatzinotas¹, Bjorn Ottersten¹ - Show less +1 more•Institutions (1)

University of Luxembourg¹

01 Mar 2021-IEEE Internet of Things Journal

TL;DR: A convergence upper bound is provided characterizing the tradeoff between convergence rate and global rounds, showing that a small number of active UEs per round still guarantees convergence and advocating the proposed FL algorithm for a paradigm shift in bandwidth-constrained learning wireless IoT networks.

...read moreread less

Abstract: Federated learning (FL) allows multiple edge computing nodes to jointly build a shared learning model without having to transfer their raw data to a centralized server, thus reducing communication overhead. However, FL still faces a number of challenges such as nonindependent and identically distributed data and heterogeneity of user equipments (UEs). Enabling a large number of UEs to join the training process in every round raises a potential issue of the heavy global communication burden. To address these issues, we generalize the current state-of-the-art federated averaging (FedAvg) by adding a weight-based proximal term to the local loss function. The proposed FL algorithm runs stochastic gradient descent in parallel on a sampled subset of the total UEs with replacement during each global round. We provide a convergence upper bound characterizing the tradeoff between convergence rate and global rounds, showing that a small number of active UEs per round still guarantees convergence. Next, we employ the proposed FL algorithm in wireless Internet-of-Things (IoT) networks to minimize either total energy consumption or completion time of FL, where a simple yet efficient path-following algorithm is developed for its solutions. Finally, numerical results on unbalanced data sets are provided to demonstrate the performance improvement and robustness on the convergence rate of the proposed FL algorithm over FedAvg. They also reveal that the proposed algorithm requires much less training time and energy consumption than the FL algorithm with full user participation. These observations advocate the proposed FL algorithm for a paradigm shift in bandwidth-constrained learning wireless IoT networks.

...read moreread less

Journal Article•DOI•

A Privacy-Preserving and Scalable Authentication Protocol for the Internet of Vehicles

[...]

Muhammad Naveed Aman¹, Uzair Javaid¹, Biplab Sikdar¹•Institutions (1)

National University of Singapore¹

15 Jan 2021-IEEE Internet of Things Journal

TL;DR: A performance analysis of the proposed protocol shows that the proposed strategy significantly reduces the number of authentication packets and MAC/PHY overhead while the security analysis demonstrates its robustness against various types of attacks.

...read moreread less

Abstract: One of the most important and critical requirements for the Internet of Vehicles (IoV) is security under strict latency. Typically, authentication protocols for vehicular ad hoc networks need to authenticate themselves frequently. This results in reduced application traffic and increased overhead. Moreover, the mobile nature of vehicles makes them a prime target for physical, side channel, and cloning attacks. To address these issues, this article presents an efficient protocol for authentication in the IoV. The proposed protocol uses physical unclonable functions to provide the desired security characteristics. To reduce the overhead of authentication and improve the throughput of application layer packets, the proposed protocol uses a three-layered infrastructure architecture for IoVs, i.e., roadside units (RSUs), RSU gateways, and trusted authority. A vehicle needs to authenticate only once when it enters the area of an RSU gateway which may engulf multiple RSUs. A performance analysis of the protocol shows that the proposed strategy significantly reduces the number of authentication packets and MAC/PHY overhead while the security analysis demonstrates its robustness against various types of attacks.

...read moreread less

Journal Article•DOI•

Secure and Lightweight Conditional Privacy-Preserving Authentication for Securing Traffic Emergency Messages in VANETs

[...]

Lu Wei¹, Jie Cui¹, Yan Xu¹, Jiujun Cheng², Hong Zhong¹ - Show less +1 more•Institutions (2)

Anhui University¹, Tongji University²

01 Jan 2021-IEEE Transactions on Information Forensics and Security

TL;DR: An SSK updating algorithm is designed, which is constructed on Shamir’s secret sharing algorithm and secure pseudo random function, so that the TPDs of unrevoked vehicles can update SSK securely.

...read moreread less

Abstract: Owing to the development of wireless communication technology and the increasing number of automobiles, vehicular ad hoc networks (VANETs) have become essential tools to secure traffic safety and enhance driving convenience. It is necessary to design a conditional privacy-preserving authentication (CPPA) scheme for VANETs because of their vulnerability and security requirements. Traditional CPPA schemes have two deficiencies. One is that the communication or storage overhead is not sufficiently low, but the traffic emergency message requires an ultra-low transmission delay. The other is that traditional CPPA schemes do not consider updating the system secret key (SSK), which is stored in an unhackable Tamper Proof Device (TPD), whereas side-channel attack methods and the wide usage of the SSK increase the probability of breaking the SSK. To solve the first issue, we propose a CPPA signature scheme based on elliptic curve cryptography, which can achieve message recovery and be reduced to elliptic curve discrete logarithm assumption, so that traffic emergency messages are secured with ultra-low communication overhead. To solve the second issue, we design an SSK updating algorithm, which is constructed on Shamir’s secret sharing algorithm and secure pseudo random function, so that the TPDs of unrevoked vehicles can update SSK securely. Formal security proof and analysis show that our proposed scheme satisfies the security and privacy requirements of VANETs. Performance analysis demonstrates that our proposed scheme requires less storage size and has a lower transmission delay compared with related schemes.

...read moreread less

Journal Article•DOI•

Bayesian Predictive Beamforming for Vehicular Networks: A Low-Overhead Joint Radar-Communication Approach

[...]

Weijie Yuan¹, Fan Liu¹, Christos Masouros², Jinhong Yuan³, Derrick Wing Kwan Ng³, Nuria Gonzalez-Prelcic⁴ - Show less +2 more•Institutions (4)

Southern University of Science and Technology¹, University College London², University of New South Wales³, North Carolina State University⁴

01 Mar 2021-IEEE Transactions on Wireless Communications

TL;DR: In this article, the authors proposed a predictive beamforming scheme in the context of dual-functional radar-communication (DFRC) systems, where the road-side unit estimates and predicts the motion parameters of vehicles based on the echoes of the DFRC signal.

...read moreread less

Abstract: The development of dual-functional radar-communication (DFRC) systems, where vehicle localization and tracking can be combined with vehicular communication, will lead to more efficient future vehicular networks. In this paper, we develop a predictive beamforming scheme in the context of DFRC systems. We consider a system model where the road-side unit estimates and predicts the motion parameters of vehicles based on the echoes of the DFRC signal. Compared to the conventional feedback-based beam tracking approaches, the proposed method can reduce the signaling overhead and improve the accuracy of the angle estimation. To accurately estimate the motion parameters of vehicles in real-time, we propose a novel message passing algorithm based on factor graph, which yields a near optimal performance achieved by the maximum a posteriori estimation. The beamformers are then designed based on the predicted angles for establishing the communication links. With the employment of appropriate approximations, all messages on the factor graph can be derived in a closed-form, thus reduce the complexity. Simulation results show that the proposed DFRC based beamforming scheme is superior to the feedback-based approach in terms of both estimation and communication performance. Moreover, the proposed message passing algorithm achieves a similar performance of the high-complexity particle filtering-based methods.

...read moreread less

Journal Article•DOI•

Model-Driven Deep Learning Based Channel Estimation and Feedback for Millimeter-Wave Massive Hybrid MIMO Systems

[...]

Xisuo Ma¹, Zhen Gao¹, Feifei Gao², Marco Di Renzo³•Institutions (3)

Beijing Institute of Technology¹, Tsinghua University², Université Paris-Saclay³

11 Jun 2021-IEEE Journal on Selected Areas in Communications

TL;DR: A model-driven deep learning (MDDL)-based channel estimation and feedback scheme for wideband millimeter-wave (mmWave) massive hybrid multiple-input multiple-output (MIMO) systems, where the angle-delay domain channels’ sparsity is exploited for reducing the overhead.

...read moreread less

Abstract: This paper proposes a model-driven deep learning (MDDL)-based channel estimation and feedback scheme for wideband millimeter-wave (mmWave) massive hybrid multiple-input multiple-output (MIMO) systems, where the angle-delay domain channels’ sparsity is exploited for reducing the overhead. First, we consider the uplink channel estimation for time-division duplexing systems. To reduce the uplink pilot overhead for estimating high-dimensional channels from a limited number of radio frequency (RF) chains at the base station (BS), we propose to jointly train the phase shift network and the channel estimator as an auto-encoder. Particularly, by exploiting the channels’ structured sparsity from an a priori model and learning the integrated trainable parameters from the data samples, the proposed multiple-measurement-vectors learned approximate message passing (MMV-LAMP) network with the devised redundant dictionary can jointly recover multiple subcarriers’ channels with significantly enhanced performance. Moreover, we consider the downlink channel estimation and feedback for frequency-division duplexing systems. Similarly, the pilots at the BS and channel estimator at the users can be jointly trained as an encoder and a decoder, respectively. Besides, to further reduce the channel feedback overhead, only the received pilots on part of the subcarriers are fed back to the BS, which can exploit the MMV-LAMP network to reconstruct the spatial-frequency channel matrix. Numerical results show that the proposed MDDL-based channel estimation and feedback scheme outperforms state-of-the-art approaches.

...read moreread less

Proceedings Article•DOI•

Rethinking software runtimes for disaggregated memory

[...]

Irina Calciu¹, M. Talha Imran², Ivan Puddu³, Sanidhya Kashyap⁴, Hasan Al Maruf⁵, Onur Mutlu³, Aasheesh Kolli² - Show less +3 more•Institutions (5)

VMware¹, Pennsylvania State University², ETH Zurich³, École Polytechnique Fédérale de Lausanne⁴, University of Michigan⁵

19 Apr 2021

TL;DR: In this paper, cache coherence is used instead of virtual memory for tracking applications' memory accesses transparently, at cache-line granularity, eliminating page faults from the application critical path when accessing remote data, and decoupling the application memory access tracking from the virtual memory page size.

...read moreread less

Abstract: Disaggregated memory can address resource provisioning inefficiencies in current datacenters. Multiple software runtimes for disaggregated memory have been proposed in an attempt to make disaggregated memory practical. These systems rely on the virtual memory subsystem to transparently offer disaggregated memory to applications using a local memory abstraction. Unfortunately, using virtual memory for disaggregation has multiple limitations, including high overhead that comes from the use of page faults to identify what data to fetch and cache locally, and high dirty data amplification that comes from the use of page-granularity for tracking changes to the cached data (4KB or higher). In this paper, we propose a fundamentally new approach to designing software runtimes for disaggregated memory that addresses these limitations. Our main observation is that we can use cache coherence instead of virtual memory for tracking applications' memory accesses transparently, at cache-line granularity. This simple idea (1) eliminates page faults from the application critical path when accessing remote data, and (2) decouples the application memory access tracking from the virtual memory page size, enabling cache-line granularity dirty data tracking and eviction. Using this observation, we implemented a new software runtime for disaggregated memory that improves average memory access time by 1.7-5X and reduces dirty data amplification by 2-10X, compared to state-of-the-art systems.

...read moreread less

Journal Article•DOI•

Distributed Deep Convolutional Compression for Massive MIMO CSI Feedback

[...]

Mahdi Boloursaz Mashhadi¹, Qianqian Yang², Deniz Gunduz¹•Institutions (2)

Imperial College London¹, Zhejiang University²

01 Apr 2021-IEEE Transactions on Wireless Communications

TL;DR: In this article, the authors proposed a deep learning-based CSI compression scheme called DeepCMC, which is composed of convolutional layers followed by quantization and entropy coding blocks.

...read moreread less

Abstract: Massive multiple-input multiple-output (MIMO) systems require downlink channel state information (CSI) at the base station (BS) to achieve spatial diversity and multiplexing gains. In a frequency division duplex (FDD) multiuser massive MIMO network, each user needs to compress and feedback its downlink CSI to the BS. The CSI overhead scales with the number of antennas, users and subcarriers, and becomes a major bottleneck for the overall spectral efficiency. In this paper, we propose a deep learning (DL)-based CSI compression scheme, called DeepCMC , composed of convolutional layers followed by quantization and entropy coding blocks. In comparison with previous DL-based CSI reduction structures, DeepCMC proposes a novel fully-convolutional neural network (NN) architecture, with residual layers at the decoder, and incorporates quantization and entropy coding blocks into its design. DeepCMC is trained to minimize a weighted rate-distortion cost, which enables a trade-off between the CSI quality and its feedback overhead. Simulation results demonstrate that DeepCMC outperforms the state of the art CSI compression schemes in terms of the reconstruction quality of CSI for the same compression rate. We also propose a distributed version of DeepCMC for a multi-user MIMO scenario to encode and reconstruct the CSI from multiple users in a distributed manner. Distributed DeepCMC not only utilizes the inherent CSI structures of a single MIMO user for compression, but also benefits from the correlations among the channel matrices of nearby users to further improve the performance in comparison with DeepCMC. We also propose a reduced-complexity training method for distributed DeepCMC, allowing to scale it to multiple users, and suggest a cluster-based distributed DeepCMC approach for practical implementation.

...read moreread less

Proceedings Article•DOI•

16.2 eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded-Dynamic-Memory Array Realizing Adaptive Data Converters and Charge-Domain Computing

[...]

Shanshan Xie¹, Can Ni¹, Aseem Sayal¹, Pulkit Jain², Fatih Hamzaoglu², Jaydeep P. Kulkarni¹ - Show less +2 more•Institutions (2)

University of Texas at Austin¹, Intel²

13 Feb 2021

TL;DR: In this paper, the intrinsic charge sharing operation during a dynamic memory access can be used effectively to perform analog CIM computations: by reconfiguring existing eDRAM columns as charge domain circuits, thus, greatly minimizing peripheral circuit area and power overhead.

...read moreread less

Abstract: The unprecedented growth in deep neural networks (DNN) size has led to massive amounts of data movement from off-chip memory to on-chip processing cores in modern machine learning (ML) accelerators. Compute-in-memory (CIM) designs performing analog DNN computations within a memory array, along with peripheral mixed-signal circuits, are being explored to mitigate this memory-wall bottleneck: consisting of memory latency and energy overhead. Embedded-dynamic random-access memory (eDRAM) [1], [2], which integrates the 1T1C (T=Transistor, C=Capacitor) DRAM bitcell monolithically along with high-performance logic transistors and interconnects, can enable custom CIM designs. It offers the densest embedded bitcell, a low pJ/bit access energy, a low soft error rate, high-endurance, high-performance, and high-bandwidth: all desired attributes for ML accelerators. In addition, the intrinsic charge sharing operation during a dynamic memory access can be used effectively to perform analog CIM computations: by reconfiguring existing eDRAM columns as charge domain circuits, thus, greatly minimizing peripheral circuit area and power overhead. Configuring a part of eDRAM as a CIM engine (for data conversion, DNN computations, and weight storage) and retaining the remaining part as a regular memory (for inputs, gradients during training, and non-CIM workload data) can help to meet the layer/kernel dependent variable storage needs during a DNN inference/training step. Thus, the high cost/bit of eDRAM can be amortized by repurposing part of existing large capacity, level-4 eDRAM caches [7] in high-end microprocessors, into large-scale CIM engines.

...read moreread less

Collapse