Showing papers on "Overhead (computing) published in 2020"

PDF

Open Access

Journal Article•DOI•

Robust and Communication-Efficient Federated Learning From Non-i.i.d. Data

[...]

Felix Sattler¹, Simon Wiedemann¹, Klaus-Robert Müller², Wojciech Samek¹•Institutions (2)

Heinrich Hertz Institute¹, Technical University of Berlin²

01 Sep 2020-IEEE Transactions on Neural Networks

TL;DR: In this paper, the authors propose sparse ternary compression (STC), a new compression framework that is specifically designed to meet the requirements of the federated learning environment, which extends the existing compression technique of top- $k$ gradient sparsification with a novel mechanism to enable downstream compression as well as ternarization and optimal Golomb encoding of the weight updates.

...read moreread less

Abstract: Federated learning allows multiple parties to jointly train a deep learning model on their combined data, without any of the participants having to reveal their local data to a centralized server. This form of privacy-preserving collaborative learning, however, comes at the cost of a significant communication overhead during training. To address this problem, several compression methods have been proposed in the distributed training literature that can reduce the amount of required communication by up to three orders of magnitude. These existing methods, however, are only of limited utility in the federated learning setting, as they either only compress the upstream communication from the clients to the server (leaving the downstream communication uncompressed) or only perform well under idealized conditions, such as i.i.d. distribution of the client data, which typically cannot be found in federated learning. In this article, we propose sparse ternary compression (STC), a new compression framework that is specifically designed to meet the requirements of the federated learning environment. STC extends the existing compression technique of top- $k$ gradient sparsification with a novel mechanism to enable downstream compression as well as ternarization and optimal Golomb encoding of the weight updates. Our experiments on four different learning tasks demonstrate that STC distinctively outperforms federated averaging in common federated learning scenarios. These results advocate for a paradigm shift in federated optimization toward high-frequency low-bitwidth communication, in particular in the bandwidth-constrained learning environments.

...read moreread less

618 citations

Journal Article•DOI•

Intelligent Reflecting Surface Meets OFDM: Protocol Design and Rate Maximization

[...]

Yifei Yang¹, Beixiong Zheng¹, Shuowen Zhang¹, Rui Zhang¹•Institutions (1)

National University of Singapore¹

17 Mar 2020-IEEE Transactions on Communications

TL;DR: In this article, an IRS-enhanced orthogonal frequency division multiplexing (OFDM) system under frequency-selective channels is considered and a practical transmission protocol with channel estimation is proposed.

...read moreread less

Abstract: Intelligent reflecting surface (IRS) is a promising new technology for achieving both spectrum and energy efficient wireless communication systems in the future. However, existing works on IRS mainly consider frequency-flat channels and assume perfect knowledge of channel state information (CSI) at the transmitter. Motivated by the above, in this paper we study an IRS-enhanced orthogonal frequency division multiplexing (OFDM) system under frequency-selective channels and propose a practical transmission protocol with channel estimation. First, to reduce the overhead in channel training as well as exploit the channel spatial correlation, we propose a novel IRS elements grouping method, where each group consists of a set of adjacent IRS elements that share a common reflection coefficient. Based on this method, we propose a practical transmission protocol where only the combined channel of each group needs to be estimated, thus substantially reducing the training overhead. Next, with any given grouping and estimated CSI, we formulate the problem to maximize the achievable rate by jointly optimizing the transmit power allocation and the IRS passive array reflection coefficients. Although the formulated problem is non-convex and thus difficult to solve, we propose an efficient algorithm to obtain a high-quality suboptimal solution for it, by alternately optimizing the power allocation and the passive array coefficients in an iterative manner, along with a customized method for the initialization. Simulation results show that the proposed design significantly improves the OFDM link rate performance as compared to the case without using IRS. Moreover, it is shown that there exists an optimal size for IRS elements grouping which achieves the maximum achievable rate due to the practical trade-off between the training overhead and IRS passive beamforming flexibility.

...read moreread less

594 citations

Journal Article•DOI•

Channel Estimation for Intelligent Reflecting Surface Assisted Multiuser Communications: Framework, Algorithms, and Analysis

[...]

Zhaorui Wang¹, Liang Liu¹, Shuguang Cui²•Institutions (2)

Hong Kong Polytechnic University¹, The Chinese University of Hong Kong²

30 Jun 2020-IEEE Transactions on Wireless Communications

TL;DR: In this paper, the authors proposed a three-phase pilot-based channel estimation framework for IRS-assisted uplink multiuser communications, in which the user-BS direct channels and the users-IRS-BS reflected channels of a typical user were estimated in Phase I and Phase II, respectively, while the users reflected channels were estimated with low overhead in Phase III via leveraging their strong correlation with those of the typical user under the case without receiver noise at the BS.

...read moreread less

Abstract: In intelligent reflecting surface (IRS) assisted communication systems, the acquisition of channel state information is a crucial impediment for achieving the beamforming gain of IRS because of the considerable overhead required for channel estimation Specifically, under the current beamforming design for IRS-assisted communications, in total $KMN+KM$ channel coefficients should be estimated, where $K$ , $N$ and $M$ denote the numbers of users, IRS reflecting elements, and antennas at the base station (BS), respectively For the first time in the literature, this paper points out that despite the vast number of channel coefficients that should be estimated, significant redundancy exists in the user-IRS-BS reflected channels of different users arising from the fact that each IRS element reflects the signals from all the users to the BS via the same channel To utilize this redundancy for reducing the channel estimation time, we propose a novel three-phase pilot-based channel estimation framework for IRS-assisted uplink multiuser communications, in which the user-BS direct channels and the user-IRS-BS reflected channels of a typical user are estimated in Phase I and Phase II, respectively, while the user-IRS-BS reflected channels of the other users are estimated with low overhead in Phase III via leveraging their strong correlation with those of the typical user Under this framework, we analytically prove that a time duration consisting of $K+N+\max (K-1,\lceil (K-1)N/M \rceil)$ pilot symbols is sufficient for perfectly recovering all the $KMN+KM$ channel coefficients under the case without receiver noise at the BS Further, under the case with receiver noise, the user pilot sequences, IRS reflecting coefficients, and BS linear minimum mean-squared error channel estimators are characterized in closed-form

...read moreread less

571 citations

Proceedings Article•DOI•

Client-edge-cloud hierarchical federated learning

[...]

Lumin Liu¹, Jun Zhang², S. H. Song¹, Khaled Ben Letaief¹•Institutions (2)

Hong Kong University of Science and Technology¹, Hong Kong Polytechnic University²

07 Jun 2020

TL;DR: In this paper, the authors proposed a client-edge-cloud hierarchical federated learning system, supported with a HierFAVG algorithm that allows multiple edge servers to perform partial model aggregation.

...read moreread less

Abstract: Federated Learning is a collaborative machine learning framework to train a deep learning model without accessing clients' private data. Previous works assume one central parameter server either at the cloud or at the edge. The cloud server can access more data but with excessive communication overhead and long latency, while the edge server enjoys more efficient communications with the clients. To combine their advantages, we propose a client-edge-cloud hierarchical Federated Learning system, supported with a HierFAVG algorithm that allows multiple edge servers to perform partial model aggregation. In this way, the model can be trained faster and better communication-computation trade-offs can be achieved. Convergence analysis is provided for HierFAVG and the effects of key parameters are also investigated, which lead to qualitative design guidelines. Empirical experiments verify the analysis and demonstrate the benefits of this hierarchical architecture in different data distribution scenarios. Particularly, it is shown that by introducing the intermediate edge servers, the model training time and the energy consumption of the end devices can be simultaneously reduced compared to cloud-based Federated Learning.

...read moreread less

433 citations

Posted Content•

Bayesian Deep Learning and a Probabilistic Perspective of Generalization

[...]

Andrew Gordon Wilson¹, Pavel Izmailov¹•Institutions (1)

New York University¹

20 Feb 2020-arXiv: Learning

TL;DR: In this article, deep ensembles provide an effective mechanism for approximate Bayesian marginalization, and propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction, without significant overhead.

...read moreread less

Abstract: The key distinguishing property of a Bayesian approach is marginalization, rather than using a single setting of weights. Bayesian marginalization can particularly improve the accuracy and calibration of modern deep neural networks, which are typically underspecified by the data, and can represent many compelling but different solutions. We show that deep ensembles provide an effective mechanism for approximate Bayesian marginalization, and propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction, without significant overhead. We also investigate the prior over functions implied by a vague distribution over neural network weights, explaining the generalization properties of such models from a probabilistic perspective. From this perspective, we explain results that have been presented as mysterious and distinct to neural network generalization, such as the ability to fit images with random labels, and show that these results can be reproduced with Gaussian processes. We also show that Bayesian model averaging alleviates double descent, resulting in monotonic performance improvements with increased flexibility. Finally, we provide a Bayesian perspective on tempering for calibrating predictive distributions.

...read moreread less

328 citations

Journal Article•DOI•

Compressed Channel Estimation for Intelligent Reflecting Surface-Assisted Millimeter Wave Systems

[...]

Peilan Wang¹, Jun Fang¹, Huiping Duan¹, Hongbin Li²•Institutions (2)

University of Electronic Science and Technology of China¹, Stevens Institute of Technology²

28 May 2020-IEEE Signal Processing Letters

TL;DR: Simulation results show that the proposed method can provide an accurate channel estimate and achieve a substantial training overhead reduction and the inherent sparsity in mmWave channels is exploited.

...read moreread less

Abstract: In this letter, we consider channel estimation for intelligent reflecting surface (IRS)-assisted millimeter wave (mmWave) systems, where an IRS is deployed to assist the data transmission from the base station (BS) to a user. It is shown that for the purpose of joint active and passive beamforming, the knowledge of a large-size cascade channel matrix needs to be acquired. To reduce the training overhead, the inherent sparsity in mmWave channels is exploited. By utilizing properties of Katri-Rao and Kronecker products, we find a sparse representation of the cascade channel and convert cascade channel estimation into a sparse signal recovery problem. Simulation results show that our proposed method can provide an accurate channel estimate and achieve a substantial training overhead reduction.

...read moreread less

327 citations

Journal Article•DOI•

Privacy-preserving Traffic Flow Prediction: A Federated Learning Approach

[...]

Yi Liu¹, James J.Q. Yu¹, Jiawen Kang², Dusit Niyato², Shuyu Zhang² - Show less +1 more•Institutions (2)

Southern University of Science and Technology¹, Nanyang Technological University²

19 Mar 2020-arXiv: Learning

TL;DR: Wang et al. as mentioned in this paper introduced a privacy-preserving machine learning technique named federated learning and proposed a Federated Learning-based Gated Recurrent Unit neural network algorithm (FedGRU), which differs from current centralized learning methods and updates universal learning models through a secure parameter aggregation mechanism rather than directly sharing raw data among organizations.

...read moreread less

Abstract: Existing traffic flow forecasting approaches by deep learning models achieve excellent success based on a large volume of datasets gathered by governments and organizations. However, these datasets may contain lots of user's private data, which is challenging the current prediction approaches as user privacy is calling for the public concern in recent years. Therefore, how to develop accurate traffic prediction while preserving privacy is a significant problem to be solved, and there is a trade-off between these two objectives. To address this challenge, we introduce a privacy-preserving machine learning technique named federated learning and propose a Federated Learning-based Gated Recurrent Unit neural network algorithm (FedGRU) for traffic flow prediction. FedGRU differs from current centralized learning methods and updates universal learning models through a secure parameter aggregation mechanism rather than directly sharing raw data among organizations. In the secure parameter aggregation mechanism, we adopt a Federated Averaging algorithm to reduce the communication overhead during the model parameter transmission process. Furthermore, we design a Joint Announcement Protocol to improve the scalability of FedGRU. We also propose an ensemble clustering-based scheme for traffic flow prediction by grouping the organizations into clusters before applying FedGRU algorithm. Through extensive case studies on a real-world dataset, it is shown that FedGRU's prediction accuracy is 90.96% higher than the advanced deep learning models, which confirm that FedGRU can achieve accurate and timely traffic prediction without compromising the privacy and security of raw data.

...read moreread less

217 citations

Journal Article•DOI•

Joint Communication, Computation, Caching, and Control in Big Data Multi-Access Edge Computing

[...]

Anselme Ndikumana¹, Nguyen H. Tran², Tai Manh Ho³, Zhu Han⁴, Walid Saad¹, Dusit Niyato⁵, Choong Seon Hong¹ - Show less +3 more•Institutions (5)

Kyung Hee University¹, University of Sydney², Institut national de la recherche scientifique³, University of Houston⁴, Nanyang Technological University⁵

01 Jun 2020-IEEE Transactions on Mobile Computing

TL;DR: In this paper, the problem of joint computing, caching, communication, and control (4C) in big data MEC is formulated as an optimization problem whose goal is to jointly optimize a linear combination of the bandwidth consumption and network latency.

...read moreread less

Abstract: The concept of Multi-access Edge Computing (MEC) has been recently introduced to supplement cloud computing by deploying MEC servers to the network edge so as to reduce the network delay and alleviate the load on cloud data centers. However, compared to the resourceful cloud, MEC server has limited resources. When each MEC server operates independently, it cannot handle all computational and big data demands stemming from users devices. Consequently, the MEC server cannot provide significant gains in overhead reduction of data exchange between users devices and remote cloud. Therefore, joint Computing, Caching, Communication, and Control (4C) at the edge with MEC server collaboration is needed. To address these challenges, in this paper, the problem of joint 4C in big data MEC is formulated as an optimization problem whose goal is to jointly optimize a linear combination of the bandwidth consumption and network latency. However, the formulated problem is shown to be non-convex. As a result, a proximal upper bound problem of the original formulated problem is proposed. To solve the proximal upper bound problem, the block successive upper bound minimization method is applied. Simulation results show that the proposed approach satisfies computation deadlines and minimizes bandwidth consumption and network latency.

...read moreread less

208 citations

Journal Article•DOI•

Privacy-Preserving Traffic Flow Prediction: A Federated Learning Approach

[...]

Yi Liu¹, James J.Q. Yu¹, Jiawen Kang², Dusit Niyato², Shuyu Zhang¹ - Show less +1 more•Institutions (2)

Southern University of Science and Technology¹, Nanyang Technological University²

30 Apr 2020-IEEE Internet of Things Journal

TL;DR: This work introduces a privacy-preserving machine learning technique named federated learning (FL) and proposes an FL-based gated recurrent unit neural network algorithm (FedGRU) for traffic flow prediction (TFP) that differs from current centralized learning methods and updates universal learning models through a secure parameter aggregation mechanism.

...read moreread less

Abstract: Existing traffic flow forecasting approaches by deep learning models achieve excellent success based on a large volume of data sets gathered by governments and organizations. However, these data sets may contain lots of user’s private data, which is challenging the current prediction approaches as user privacy is calling for the public concern in recent years. Therefore, how to develop accurate traffic prediction while preserving privacy is a significant problem to be solved, and there is a tradeoff between these two objectives. To address this challenge, we introduce a privacy-preserving machine learning technique named federated learning (FL) and propose an FL-based gated recurrent unit neural network algorithm (FedGRU) for traffic flow prediction (TFP). FedGRU differs from current centralized learning methods and updates universal learning models through a secure parameter aggregation mechanism rather than directly sharing raw data among organizations. In the secure parameter aggregation mechanism, we adopt a federated averaging algorithm to reduce the communication overhead during the model parameter transmission process. Furthermore, we design a joint announcement protocol to improve the scalability of FedGRU. We also propose an ensemble clustering-based scheme for TFP by grouping the organizations into clusters before applying the FedGRU algorithm. Extensive case studies on a real-world data set demonstrate that FedGRU can produce predictions that are merely 0.76 km/h worse than the state of the art in terms of mean average error under the privacy preservation constraint, confirming that the proposed model develops accurate traffic predictions without compromising the data privacy.

...read moreread less

195 citations

Journal Article•DOI•

Intelligent Reflecting Surface Assisted Multi-User OFDMA: Channel Estimation and Training Design

[...]

Beixiong Zheng¹, Changsheng You¹, Rui Zhang¹•Institutions (1)

National University of Singapore¹

02 Mar 2020-arXiv: Information Theory

TL;DR: Two efficient channel estimation schemes for different channel setups in an IRS-assisted multi-user broadband communication system employing the orthogonal frequency division multiple access (OFDMA) are proposed and the fundamental limits on the minimum training overhead and the maximum number of supportable users are derived.

...read moreread less

Abstract: To achieve the full passive beamforming gains of intelligent reflecting surface (IRS), accurate channel state information (CSI) is indispensable but practically challenging to acquire, due to the excessive amount of channel parameters to be estimated which increases with the number of IRS reflecting elements as well as that of IRS-served users. To tackle this challenge, we propose in this paper two efficient channel estimation schemes for different channel setups in an IRS-assisted multi-user broadband communication system employing the orthogonal frequency division multiple access (OFDMA). The first channel estimation scheme, which estimates the CSI of all users in parallel simultaneously at the access point (AP), is applicable for arbitrary frequency-selective fading channels. In contrast, the second channel estimation scheme, which exploits a key property that all users share the same (common) IRS-AP channel to enhance the training efficiency and support more users, is proposed for the typical scenario with line-of-sight (LoS) dominant user-IRS channels. For the two proposed channel estimation schemes, we further optimize their corresponding training designs (including pilot tone allocations for all users and IRS time-varying reflection pattern) to minimize the channel estimation error. Moreover, we derive and compare the fundamental limits on the minimum training overhead and the maximum number of supportable users of these two schemes. Simulation results verify the effectiveness of the proposed channel estimation schemes and training designs, and show their significant performance improvement over various benchmark schemes.

...read moreread less

191 citations

Journal Article•DOI•

Matrix-Calibration-Based Cascaded Channel Estimation for Reconfigurable Intelligent Surface Assisted Multiuser MIMO

[...]

Hang Liu¹, Xiaojun Yuan², Ying-Jun Angela Zhang¹•Institutions (2)

The Chinese University of Hong Kong¹, University of Electronic Science and Technology of China²

03 Jul 2020-IEEE Journal on Selected Areas in Communications

TL;DR: This paper forms the channel estimation problem in the RIS-assisted multiuser MIMO system as a matrix-calibration based matrix factorization task and proposes a novel message-passing based algorithm to factorize the cascaded channels.

...read moreread less

Abstract: Reconfigurable intelligent surface (RIS) is envisioned to be an essential component of the paradigm for beyond 5G networks as it can potentially provide similar or higher array gains with much lower hardware cost and energy consumption compared with the massive multiple-input multiple-output (MIMO) technology. In this paper, we focus on one of the fundamental challenges, namely the channel acquisition, in a RIS-assisted multiuser MIMO system. The state-of-the-art channel acquisition approach in such a system with fully passive RIS elements estimates the cascaded transmitter-to-RIS and RIS-to-receiver channels by adopting excessively long training sequences. To estimate the cascaded channels with an affordable training overhead, we formulate the channel estimation problem in the RIS-assisted multiuser MIMO system as a matrix-calibration based matrix factorization task. By exploiting the information on the slow-varying channel components and the hidden channel sparsity, we propose a novel message-passing based algorithm to factorize the cascaded channels. Furthermore, we present an analytical framework to characterize the theoretical performance bound of the proposed estimator in the large-system limit. Finally, we conduct simulations to verify the high accuracy and efficiency of the proposed algorithm.

...read moreread less

Journal Article•DOI•

Distributed Deep Learning Model for Intelligent Video Surveillance Systems with Edge Computing

[...]

Jianguo Chen¹, Kenli Li¹, Qingying Deng², Keqin Li³, Philip S. Yu⁴ - Show less +1 more•Institutions (4)

Hunan University¹, Xiangtan University², State University of New York at New Paltz³, University of Illinois at Chicago⁴

01 Jan 2020-IEEE Transactions on Industrial Informatics

TL;DR: Experimental results showed that the EC architecture can provide elastic and scalable computing power, and the proposed DIVS system can efficiently handle video surveillance and analysis tasks.

...read moreread less

Abstract: In this paper, we propose a Distributed Intelligent Video Surveillance (DIVS) system using Deep Learning (DL) algorithms and deploy it in an edge computing environment. We establish a multi-layer edge computing architecture and a distributed DL training model for the DIVS system. The DIVS system can migrate computing workloads from the network center to network edges to reduce huge network communication overhead and provide low-latency and accurate video analysis solutions. We implement the proposed DIVS system and address the problems of parallel training, model synchronization, and workload balancing. Task-level parallel and model-level parallel training methods are proposed to further accelerate the video analysis process. In addition, we propose a model parameter updating method to achieve model synchronization of the global DL model in a distributed EC environment. Moreover, a dynamic data migration approach is proposed to address the imbalance of workload and computational power of edge nodes. Experimental results showed that the EC architecture can provide elastic and scalable computing power, and the proposed DIVS system can efficiently handle video surveillance and analysis tasks.

...read moreread less

Proceedings Article•DOI•

ABCNet: Real-Time Scene Text Spotting With Adaptive Bezier-Curve Network

[...]

Yuliang Liu¹, Hao Chen², Chunhua Shen², Tong He², Lianwen Jin¹, Liangwei Wang³ - Show less +2 more•Institutions (3)

South China University of Technology¹, University of Adelaide², Huawei³

14 Jun 2020

TL;DR: Adelai et al. as discussed by the authors proposed the Adaptive Bezier-Curve Network (\BeCan), which adaptively fit oriented or curved text by a parameterized bezier curve.

...read moreread less

Abstract: Scene text detection and recognition has received increasing research attention. Existing methods can be roughly categorized into two groups: character-based and segmentation-based. These methods either are costly for character annotation or need to maintain a complex pipeline, which is often not suitable for real-time applications. Here we address the problem by proposing the Adaptive Bezier-Curve Network (\BeCan). Our contributions are three-fold: 1) For the first time, we adaptively fit oriented or curved text by a parameterized Bezier curve. 2) We design a novel BezierAlign layer for extracting accurate convolution features of a text instance with arbitrary shapes, significantly improving the precision compared with previous methods. 3) Compared with standard bounding box detection, our Bezier curve detection introduces negligible computation overhead, resulting in superiority of our method in both efficiency and accuracy. Experiments on oriented or curved benchmark datasets, namely Total-Text and CTW1500, demonstrate that \BeCan achieves state-of-the-art accuracy, meanwhile significantly improving the speed. In particular, on Total-Text, our real-time version is over 10 times faster than recent state-of-the-art methods with a competitive recognition accuracy. Code is available at \url{https://git.io/AdelaiDet}.

...read moreread less

Journal Article•DOI•

HFEL: Joint Edge Association and Resource Allocation for Cost-Efficient Hierarchical Federated Edge Learning

[...]

Siqi Luo¹, Xu Chen¹, Qiong Wu¹, Zhi Zhou¹, Shuai Yu¹ - Show less +1 more•Institutions (1)

Sun Yat-sen University¹

26 Jun 2020-IEEE Transactions on Wireless Communications

TL;DR: A novel Hierarchical Federated Edge Learning (HFEL) framework is introduced in which model aggregation is partially migrated to edge servers from the cloud and achieves better training performance compared to conventional federated learning.

...read moreread less

Abstract: Federated Learning (FL) has been proposed as an appealing approach to handle data privacy issue of mobile devices compared to conventional machine learning at the remote cloud with raw user data uploading By leveraging edge servers as intermediaries to perform partial model aggregation in proximity and relieve core network transmission overhead, it enables great potentials in low-latency and energy-efficient FL Hence we introduce a novel Hierarchical Federated Edge Learning (HFEL) framework in which model aggregation is partially migrated to edge servers from the cloud We further formulate a joint computation and communication resource allocation and edge association problem for device users under HFEL framework to achieve global cost minimization To solve the problem, we propose an efficient resource scheduling algorithm in the HFEL framework It can be decomposed into two subproblems: resource allocation given a scheduled set of devices for each edge server and edge association of device users across all the edge servers With the optimal policy of the convex resource allocation subproblem for a set of devices under a single edge server, an efficient edge association strategy can be achieved through iterative global cost reduction adjustment process, which is shown to converge to a stable system point Extensive performance evaluations demonstrate that our HFEL framework outperforms the proposed benchmarks in global cost saving and achieves better training performance compared to conventional federated learning

...read moreread less

Journal Article•DOI•

Building Auto-Encoder Intrusion Detection System based on random forest feature selection

[...]

Li Xukui¹, Wei Chen¹, Qianru Zhang², Lifa Wu¹•Institutions (2)

Nanjing University of Posts and Telecommunications¹, University of Hong Kong²

01 Aug 2020-Computers & Security

TL;DR: The experimental results show that the proposed AE-IDS (Auto-Encoder Intrusion Detection System) is superior to traditional machine learning based intrusion detection methods in terms of easy training, strong adaptability, and high detection accuracy.

...read moreread less

Proceedings Article•DOI•

PINT: Probabilistic In-band Network Telemetry

[...]

Ran Ben Basat¹, Sivaramakrishnan Ramanathan², Yuliang Li¹, Gianni Antichi³, Minlan Yu¹, Michael Mitzenmacher¹ - Show less +2 more•Institutions (3)

Harvard University¹, University of Southern California², Queen Mary University of London³

30 Jul 2020

TL;DR: Using real topologies and traffic characteristics, it is shown that PINT concurrently enables applications such as congestion control, path tracing, and computing tail latencies, using only sixteen bits per packet, with performance comparable to the state of the art.

...read moreread less

Abstract: Commodity network devices support adding in-band telemetry measurements into data packets, enabling a wide range of applications, including network troubleshooting, congestion control, and path tracing. However, including such information on packets adds significant overhead that impacts both flow completion times and application-level performance. We introduce PINT, an in-band network telemetry framework that bounds the amount of information added to each packet. PINT encodes the requested data on multiple packets, allowing per-packet overhead limits that can be as low as one bit. We analyze PINT and prove performance bounds, including cases when multiple queries are running simultaneously. PINT is implemented in P4 and can be deployed on network devices.Using real topologies and traffic characteristics, we show that PINT concurrently enables applications such as congestion control, path tracing, and computing tail latencies, using only sixteen bits per packet, with performance comparable to the state of the art.

...read moreread less

Journal Article•DOI•

Toward Compact ConvNets via Structure-Sparsity Regularized Filter Pruning

[...]

Shaohui Lin¹, Rongrong Ji¹, Yuchao Li¹, Cheng Deng², Xuelong Li³ - Show less +1 more•Institutions (3)

Xiamen University¹, Xidian University², Northwestern Polytechnical University³

01 Feb 2020-IEEE Transactions on Neural Networks

TL;DR: Zhang et al. as discussed by the authors proposed a structured sparsity regularization (SSR) regularization to reduce the memory overhead of CNNs, which can be well supported by various off-the-shelf deep learning libraries.

...read moreread less

Abstract: The success of convolutional neural networks (CNNs) in computer vision applications has been accompanied by a significant increase of computation and memory costs, which prohibits their usage on resource-limited environments, such as mobile systems or embedded devices. To this end, the research of CNN compression has recently become emerging. In this paper, we propose a novel filter pruning scheme, termed structured sparsity regularization (SSR), to simultaneously speed up the computation and reduce the memory overhead of CNNs, which can be well supported by various off-the-shelf deep learning libraries. Concretely, the proposed scheme incorporates two different regularizers of structured sparsity into the original objective function of filter pruning, which fully coordinates the global output and local pruning operations to adaptively prune filters. We further propose an alternative updating with Lagrange multipliers (AULM) scheme to efficiently solve its optimization. AULM follows the principle of alternating direction method of multipliers (ADMM) and alternates between promoting the structured sparsity of CNNs and optimizing the recognition loss, which leads to a very efficient solver ( $2.5\times $ to the most recent work that directly solves the group sparsity-based regularization). Moreover, by imposing the structured sparsity, the online inference is extremely memory-light since the number of filters and the output feature maps are simultaneously reduced. The proposed scheme has been deployed to a variety of state-of-the-art CNN structures, including LeNet, AlexNet, VGGNet, ResNet, and GoogLeNet, over different data sets. Quantitative results demonstrate that the proposed scheme achieves superior performance over the state-of-the-art methods. We further demonstrate the proposed compression scheme for the task of transfer learning, including domain adaptation and object detection, which also show exciting performance gains over the state-of-the-art filter pruning methods.

...read moreread less

Journal Article•DOI•

Deep Reinforcement Learning for Resource Protection and Real-Time Detection in IoT Environment

[...]

Wei Liang¹, Weihong Huang¹, Jing Long², Ke Zhang³, Kuan-Ching Li⁴, Dafang Zhang¹ - Show less +2 more•Institutions (4)

Hunan University¹, Hunan Normal University², University of Electronic Science and Technology of China³, Providence College⁴

17 Feb 2020-IEEE Internet of Things Journal

TL;DR: A fast deep-reinforcement-learning (DRL)-based detection algorithm for virtual IP watermarks is proposed by combining the technologies of mapping function and DRL to preprocess the ownership information of the IP circuit resource.

...read moreread less

Abstract: With the fast advancements of electronic chip technologies in the Internet of Things (IoT), it is urgent to address the copyright protection issue of intellectual property (IP) circuit resources of the electronic devices in IoT environments. In this article, a fast deep-reinforcement-learning (DRL)-based detection algorithm for virtual IP watermarks is proposed by combining the technologies of mapping function and DRL to preprocess the ownership information of the IP circuit resource. The deep $Q$ -learning (DQN) algorithm is used to generate the watermarked positions adaptively, making the watermarked positions secure yet close to the original design, turning the watermarked positions secure. An artificial neural network (ANN) algorithm is utilized for training the position distance characteristic vectors of the IP circuit, in which the characteristic function of the virtual position for IP watermark is generated after training. In IP ownership verification, the DRL model can quickly locate the range of virtual watermark positions. With the characteristic values of the virtual positions in each lookup table (LUT) area and surrounding areas, the mapping position relationship can be calculated in a supervised manner in the neural network, as the algorithm realizes the fast location of the real ownership information in an IP circuit. The experimental results show that the proposed algorithm can effectively improve the speed of watermark detection as also reducing the resource overhead. Besides, it also achieves excellent performance in security.

...read moreread less

Journal Article•DOI•

Edge Computing in VANETs-An Efficient and Privacy-Preserving Cooperative Downloading Scheme

[...]

Jie Cui¹, Lu Wei¹, Hong Zhong¹, Jing Zhang¹, Yan Xu¹, Lu Liu² - Show less +2 more•Institutions (2)

Anhui University¹, University of Leicester²

17 Apr 2020-IEEE Journal on Selected Areas in Communications

TL;DR: In the proposed scheme, a roadside unit (RSU) can find the popular data by analyzing the encrypted requests sent from nearby vehicles without having to sacrifice the privacy of their download requests.

...read moreread less

Abstract: With the advancements in social media and rising demand for real traffic information, the data shared in vehicular ad hoc networks (VANETs) indicate that the size and amount of requested data will continue increasing. Vehicles in the same area often have similar data downloading requests. If we ignore the common requests, the resource allocation efficiency of the VANET system will be quite low. Motivated by this fact, we propose an efficient and privacy-preserving data downloading scheme for VANETs, based on the edge computing concept. In the proposed scheme, a roadside unit (RSU) can find the popular data by analyzing the encrypted requests sent from nearby vehicles without having to sacrifice the privacy of their download requests. Further, the RSU caches the popular data in nearby qualified vehicles called edge computing vehicles (ECVs). If a vehicle wishes to download the popular data, it can download it directly from the nearby ECVs. This method increases the downloading efficiency of the system. The security analysis results show that the proposed scheme can resist multiple security attacks. The performance analysis results demonstrate that our scheme has reasonable computation and communication overhead. Finally, the OMNeT++ simulation results indicate that our scheme has good network performance.

...read moreread less

Proceedings Article•DOI•

Deep Reinforcement Learning for Intelligent Reflecting Surfaces: Towards Standalone Operation

[...]

Abdelrahman Taha¹, Yu Zhang¹, Faris B. Mismar², Ahmed Alkhateeb¹•Institutions (2)

Arizona State University¹, University of Texas at Austin²

26 May 2020

TL;DR: In this article, a deep reinforcement learning framework was proposed to predict the reflecting coefficients of reflecting surfaces without requiring massive channel estimation or beam training overhead, and the proposed online learning framework can converge to the optimal rate that assumes perfect channel knowledge.

...read moreread less

Abstract: The promising coverage and spectral efficiency gains of intelligent reflecting surfaces (IRSs) are attracting increasing interest. To adopt these surfaces in practice, however, several challenges need to be addressed. One of these main challenges is how to configure the reflecting coefficients on these passive surfaces without requiring massive channel estimation or beam training overhead. Earlier work suggested leveraging supervised learning tools to predict the IRS reflection matrices. While this approach has the potential of reducing the beam training overhead, it requires collecting large datasets for training the neural network models. In this paper, we propose a novel deep reinforcement learning framework for predicting the IRS reflection matrices with minimal beam training overhead. Simulation results show that the proposed online learning framework can converge to the optimal rate that assumes perfect channel knowledge. This represents an important step towards realizing a standalone IRS operation, where the surface configures itself without any control from the infrastructure.

...read moreread less

Journal Article•DOI•

A Twin-8T SRAM Computation-in-Memory Unit-Macro for Multibit CNN-Based AI Edge Processors

[...]

Xin Si¹, Rui Liu², Shimeng Yu³, Ren-Shuo Liu⁴, Chih-Cheng Hsieh⁴, Kea-Tiong Tang⁴, Qiang Li¹, Meng-Fan Chang⁴, Jia-Jing Chen⁴, Yung-Ning Tu⁴, Wei-Hsing Huang⁴, Jing-Hong Wang⁴, Yen-Cheng Chiu⁴, Wei-Chen Wei⁴, Ssu-Yen Wu⁴, Xiaoyu Sun³ - Show less +12 more•Institutions (4)

University of Electronic Science and Technology of China¹, Arizona State University², Georgia Institute of Technology³, National Tsing Hua University⁴

01 Jan 2020-IEEE Journal of Solid-state Circuits

TL;DR: An static random access memory (SRAM) CIM unit-macro using compact-rule compatible twin-8T cells for weighted CIM MAC operations to reduce area overhead and vulnerability to process variation and an even–odd dual-channel (EODC) input mapping scheme to extend input bandwidth is presented.

...read moreread less

Abstract: Computation-in-memory (CIM) is a promising candidate to improve the energy efficiency of multiply-and-accumulate (MAC) operations of artificial intelligence (AI) chips. This work presents an static random access memory (SRAM) CIM unit-macro using: 1) compact-rule compatible twin-8T (T8T) cells for weighted CIM MAC operations to reduce area overhead and vulnerability to process variation; 2) an even–odd dual-channel (EODC) input mapping scheme to extend input bandwidth; 3) a two’s complement weight mapping (C2WM) scheme to enable MAC operations using positive and negative weights within a cell array in order to reduce area overhead and computational latency; and 4) a configurable global–local reference voltage generation (CGLRVG) scheme for kernels of various sizes and bit precision. A 64 $\times $ 60 b T8T unit-macro with 1-, 2-, 4-b inputs, 1-, 2-, 5-b weights, and up to 7-b MAC-value (MACV) outputs was fabricated as a test chip using a foundry 55-nm process. The proposed SRAM-CIM unit-macro achieved access times of 5 ns and energy efficiency of 37.5–45.36 TOPS/W under 5-b MACV output.

...read moreread less

Journal Article•DOI•

Deep Anomaly Detection for Time-series Data in Industrial IoT: A Communication-Efficient On-device Federated Learning Approach

[...]

Yi Liu¹, Sahil Garg², Jiangtian Nie³, Yang Zhang⁴, Zehui Xiong³, Jiawen Kang³, M. Shamim Hossain⁵ - Show less +3 more•Institutions (5)

Heilongjiang University¹, École de technologie supérieure², Nanyang Technological University³, Wuhan University of Technology⁴, King Saud University⁵

19 Jul 2020-arXiv: Learning

TL;DR: A new communication-efficient on-device federated learning (FL)-based deep anomaly detection framework for sensing time-series data in IIoT and an attention mechanism-based convolutional neural network-long short-term memory (AMCNN-LSTM) model to accurately detect anomalies is proposed.

...read moreread less

Abstract: Since edge device failures (i.e., anomalies) seriously affect the production of industrial products in Industrial IoT (IIoT), accurately and timely detecting anomalies is becoming increasingly important. Furthermore, data collected by the edge device may contain the user's private data, which is challenging the current detection approaches as user privacy is calling for the public concern in recent years. With this focus, this paper proposes a new communication-efficient on-device federated learning (FL)-based deep anomaly detection framework for sensing time-series data in IIoT. Specifically, we first introduce a FL framework to enable decentralized edge devices to collaboratively train an anomaly detection model, which can improve its generalization ability. Second, we propose an Attention Mechanism-based Convolutional Neural Network-Long Short Term Memory (AMCNN-LSTM) model to accurately detect anomalies. The AMCNN-LSTM model uses attention mechanism-based CNN units to capture important fine-grained features, thereby preventing memory loss and gradient dispersion problems. Furthermore, this model retains the advantages of LSTM unit in predicting time series data. Third, to adapt the proposed framework to the timeliness of industrial anomaly detection, we propose a gradient compression mechanism based on Top-\textit{k} selection to improve communication efficiency. Extensive experiment studies on four real-world datasets demonstrate that the proposed framework can accurately and timely detect anomalies and also reduce the communication overhead by 50\% compared to the federated learning framework that does not use a gradient compression scheme.

...read moreread less

Posted Content•

HFEL: Joint Edge Association and Resource Allocation for Cost-Efficient Hierarchical Federated Edge Learning

[...]

Siqi Luo¹, Xu Chen¹, Qiong Wu¹, Zhi Zhou¹, Shuai Yu¹ - Show less +1 more•Institutions (1)

Sun Yat-sen University¹

26 Feb 2020-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this article, the authors proposed a hierarchical federated edge learning (HFEL) framework in which model aggregation is partially migrated to edge servers from the cloud and formulated a joint computation and communication resource allocation and edge association problem for device users under HFEL framework to achieve global cost minimization.

...read moreread less

Abstract: Federated Learning (FL) has been proposed as an appealing approach to handle data privacy issue of mobile devices compared to conventional machine learning at the remote cloud with raw user data uploading. By leveraging edge servers as intermediaries to perform partial model aggregation in proximity and relieve core network transmission overhead, it enables great potentials in low-latency and energy-efficient FL. Hence we introduce a novel Hierarchical Federated Edge Learning (HFEL) framework in which model aggregation is partially migrated to edge servers from the cloud. We further formulate a joint computation and communication resource allocation and edge association problem for device users under HFEL framework to achieve global cost minimization. To solve the problem, we propose an efficient resource scheduling algorithm in the HFEL framework. It can be decomposed into two subproblems: \emph{resource allocation} given a scheduled set of devices for each edge server and \emph{edge association} of device users across all the edge servers. With the optimal policy of the convex resource allocation subproblem for a set of devices under a single edge server, an efficient edge association strategy can be achieved through iterative global cost reduction adjustment process, which is shown to converge to a stable system point. Extensive performance evaluations demonstrate that our HFEL framework outperforms the proposed benchmarks in global cost saving and achieves better training performance compared to conventional federated learning.

...read moreread less

Journal Article•DOI•

Enabling Efficient User Revocation in Identity-Based Cloud Storage Auditing for Shared Big Data

[...]

Yue Zhang¹, Jia Yu¹, Rong Hao¹, Cong Wang², Kui Ren³ - Show less +1 more•Institutions (3)

Qingdao University¹, City University of Hong Kong², University at Buffalo³

01 May 2020-IEEE Transactions on Dependable and Secure Computing

TL;DR: A novel storage auditing scheme that achieves highly-efficient user revocation independent of the total number of file blocks possessed by the revoked user in the cloud is proposed by exploring a novel strategy for key generation and a new private key update technique.

...read moreread less

Abstract: Cloud storage auditing schemes for shared data refer to checking the integrity of cloud data shared by a group of users. User revocation is commonly supported in such schemes, as users may be subject to group membership changes for various reasons. Previously, the computational overhead for user revocation in such schemes is linear with the total number of file blocks possessed by a revoked user. The overhead, however, may become a heavy burden because of the sheer amount of the shared cloud data. Thus, how to reduce the computational overhead caused by user revocations becomes a key research challenge for achieving practical cloud data auditing. In this paper, we propose a novel storage auditing scheme that achieves highly-efficient user revocation independent of the total number of file blocks possessed by the revoked user in the cloud. This is achieved by exploring a novel strategy for key generation and a new private key update technique. Using this strategy and the technique, we realize user revocation by just updating the non-revoked group users’ private keys rather than authenticators of the revoked user. The integrity auditing of the revoked user's data can still be correctly performed when the authenticators are not updated. Meanwhile, the proposed scheme is based on identity-base cryptography, which eliminates the complicated certificate management in traditional Public Key Infrastructure (PKI) systems. The security and efficiency of the proposed scheme are validated via both analysis and experimental results.

...read moreread less

Journal Article•DOI•

PPO-CPQ: A Privacy-Preserving Optimization of Clinical Pathway Query for E-Healthcare Systems

[...]

Mingwu Zhang¹, Yu Chen¹, Willy Susilo²•Institutions (2)

Hubei University of Technology¹, University UCINF²

07 Jul 2020-IEEE Internet of Things Journal

TL;DR: A privacy-preserving optimization of clinical pathway query scheme (PPO-CPQ) to achieve the secureclinical pathway query under e-healthcare CSs without revealing neither the private information of patients, such as name, gender, age, and physical index, nor the sensitive information of hospitals.

...read moreread less

Abstract: With the help of patients’ health data, e-healthcare providers can offer reliable data services for better medical treatment. For example, the clinical pathway provides optimal detailed guidance for the clinical treatment. However, the e-healthcare providers are incompetent with the huge volumes of e-healthcare data, and a popular and feasible solution is used to outsource the medical data to powerful cloud servers (CSs). Because the medical data are very sensitive yet the outsourced servers are not fully trusted, the straightforward execution of clinical pathway query service will inevitably bring huge privacy risks to patients’ data. Apart from the privacy issues, the efficiency issues also need to be taken into consideration, such as the communication overhead and computational cost between servers and providers. Considering the above issues, this article proposes a privacy-preserving optimization of clinical pathway query scheme (PPO-CPQ) to achieve the secure clinical pathway query under e-healthcare CSs without revealing neither the private information of patients, such as name, gender, age, and physical index, nor the sensitive information of hospitals, such as treatment, medication, and expense. In our proposed scheme, it first designs secure and privacy-preserving several subprotocols, such as privacy-preserving comparison, privacy-preserving clinical comparison, privacy-preserving stage selection, and privacy-preserving stage update protocol, to ensure privacy in the e-healthcare system, then it adopts the greedy algorithm in a secure manner to perform the query and the min-heap technology to improve efficiency. The experimental result shows that our scheme is practical and efficient in terms of computational cost and communication overhead.

...read moreread less

Journal Article•DOI•

Deep Learning-Based mmWave Beam Selection for 5G NR/6G With Sub-6 GHz Channel Information: Algorithms and Prototype Validation

[...]

Min Soo Sim¹, Yeon-Geun Lim², Sanghyun Park¹, Linglong Dai³, Chan-Byoung Chae¹ - Show less +1 more•Institutions (3)

Yonsei University¹, Samsung², Tsinghua University³

12 Mar 2020-IEEE Access

TL;DR: A deep learning-based beam selection, which is compatible with the 5G NR standard, and introduces a deep neural network (DNN) structure and explains how a power delay profile (PDP) of a sub-6 GHz channel is used as an input of the DNN.

...read moreread less

Abstract: In fifth-generation (5G) communications, millimeter wave (mmWave) is one of the key technologies to increase the data rate. To overcome this technology's poor propagation characteristics, it is necessary to employ a number of antennas and form narrow beams. It becomes crucial then, especially for initial access, to attain fine beam alignment between a next generation NodeB (gNB) and a user equipment (UE). The current 5G New Radio (NR) standard, however, adopts an exhaustive search-based beam sweeping, which causes time overhead of a half frame for initial beam establishment. In this paper, we propose a deep learning-based beam selection, which is compatible with the 5G NR standard. To select a mmWave beam, we exploit sub-6 GHz channel information. We introduce a deep neural network (DNN) structure and explain how we estimate a power delay profile (PDP) of a sub-6 GHz channel, which is used as an input of the DNN. We then validate its performance with real environment-based 3D ray-tracing simulations and over-the-air experiments with a mmWave prototype. Evaluation results confirm that, with support from the sub-6 GHz connection, the proposed beam selection reduces the beam sweeping overhead by up to 79.3 %.

...read moreread less

Journal Article•DOI•

Collaborative Learning of Communication Routes in Edge-Enabled Multi-Access Vehicular Environment

[...]

Celimuge Wu¹, Zhi Liu², Fuqiang Liu³, Tsutomu Yoshinaga¹, Yusheng Ji⁴, Jie Li⁵ - Show less +2 more•Institutions (5)

University of Electro-Communications¹, Shizuoka University², Tongji University³, National Institute of Informatics⁴, Shanghai Jiao Tong University⁵

15 Jun 2020-IEEE Transactions on Cognitive Communications and Networking

TL;DR: A collaborative learning-based routing scheme for multi-access vehicular edge computing environment that employs a reinforcement learning algorithm based on end-edge-cloud collaboration to find routes in a proactive manner with a low communication overhead and is preemptively changed based on the learned information.

...read moreread less

Abstract: Some Internet-of-Things (IoT) applications have a strict requirement on the end-to-end delay where edge computing can be used to provide a short delay for end-users by conducing efficient caching and computing at the edge nodes. However, a fast and efficient communication route creation in multi-access vehicular environment is an underexplored research problem. In this paper, we propose a collaborative learning-based routing scheme for multi-access vehicular edge computing environment. The proposed scheme employs a reinforcement learning algorithm based on end-edge-cloud collaboration to find routes in a proactive manner with a low communication overhead. The routes are also preemptively changed based on the learned information. By integrating the “proactive” and “preemptive” approach, the proposed scheme can achieve a better forwarding of packets as compared with existing alternatives. We conduct extensive and realistic computer simulations to show the performance advantage of the proposed scheme over existing baselines.

...read moreread less

Posted Content•

ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network

[...]

Yuliang Liu¹, Hao Chen², Chunhua Shen², Tong He², Lianwen Jin¹, Liangwei Wang³ - Show less +2 more•Institutions (3)

South China University of Technology¹, University of Adelaide², Huawei³

24 Feb 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: For the first time, a novel BezierAlign layer is designed for extracting accurate convolution features of a text instance with arbitrary shapes, significantly improving the precision compared with previous methods and introducing negligible computation overhead.

...read moreread less

Abstract: Scene text detection and recognition has received increasing research attention. Existing methods can be roughly categorized into two groups: character-based and segmentation-based. These methods either are costly for character annotation or need to maintain a complex pipeline, which is often not suitable for real-time applications. Here we address the problem by proposing the Adaptive Bezier-Curve Network (ABCNet). Our contributions are three-fold: 1) For the first time, we adaptively fit arbitrarily-shaped text by a parameterized Bezier curve. 2) We design a novel BezierAlign layer for extracting accurate convolution features of a text instance with arbitrary shapes, significantly improving the precision compared with previous methods. 3) Compared with standard bounding box detection, our Bezier curve detection introduces negligible computation overhead, resulting in superiority of our method in both efficiency and accuracy. Experiments on arbitrarily-shaped benchmark datasets, namely Total-Text and CTW1500, demonstrate that ABCNet achieves state-of-the-art accuracy, meanwhile significantly improving the speed. In particular, on Total-Text, our realtime version is over 10 times faster than recent state-of-the-art methods with a competitive recognition accuracy. Code is available at this https URL

...read moreread less

Posted Content•

Federated Learning with Compression: Unified Analysis and Sharp Guarantees

[...]

Farzin Haddadpour¹, Mohammad Mahdi Kamani², Aryan Mokhtari³, Mehrdad Mahdavi⁴•Institutions (4)

Baidu¹, Pennsylvania State University², University of Texas at Austin³, Microsoft⁴

02 Jul 2020-arXiv: Learning

TL;DR: This work proposes a set of algorithms with periodical compressed (quantized or sparsified) communication and analyzes their convergence properties in both homogeneous and heterogeneous local data distributions settings and introduces a scheme to mitigate data heterogeneity.

...read moreread less

Abstract: In federated learning, communication cost is often a critical bottleneck to scale up distributed optimization algorithms to collaboratively learn a model from millions of devices with potentially unreliable or limited communication and heterogeneous data distributions Two notable trends to deal with the communication overhead of federated algorithms are \emph{gradient compression} and \emph{local computation with periodic communication} Despite many attempts, characterizing the relationship between these two approaches has proven elusive We address this by proposing a set of algorithms with periodical compressed (quantized or sparsified) communication and analyze their convergence properties in both homogeneous and heterogeneous local data distributions settings For the homogeneous setting, our analysis improves existing bounds by providing tighter convergence rates for both \emph{strongly convex} and \emph{non-convex} objective functions To mitigate data heterogeneity, we introduce a \emph{local gradient tracking} scheme and obtain sharp convergence rates that match the best-known communication complexities without compression for convex, strongly convex, and nonconvex settings We complement our theoretical results and demonstrate the effectiveness of our proposed methods by several experiments on real-world datasets

...read moreread less

Proceedings Article•

TinyTL: Reduce Memory, Not Parameters for Efficient On-Device Learning

[...]

Han Cai¹, Chuang Gan¹, Ligeng Zhu¹, Song Han¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 2020

TL;DR: Tiny-Transfer-Learning (TinyTL) as mentioned in this paper proposes a new memory-efficient bias module, the lite residual module, to refine the feature extractor by learning small residual feature maps adding only 3.8% memory overhead.

...read moreread less

Abstract: On-device learning enables edge devices to continually adapt the AI models to new data, which requires a small memory footprint to fit the tight memory constraint of edge devices. Existing work solves this problem by reducing the number of trainable parameters. However, this doesn't directly translate to memory saving since the major bottleneck is the activations, not parameters. In this work, we present Tiny-Transfer-Learning (TinyTL) for memory-efficient on-device learning. TinyTL freezes the weights while only learns the bias modules, thus no need to store the intermediate activations. To maintain the adaptation capacity, we introduce a new memory-efficient bias module, the lite residual module, to refine the feature extractor by learning small residual feature maps adding only 3.8% memory overhead. Extensive experiments show that TinyTL significantly saves the memory (up to 6.5x) with little accuracy loss compared to fine-tuning the full network. Compared to fine-tuning the last layer, TinyTL provides significant accuracy improvements (up to 33.8%) with little memory overhead. Furthermore, combined with feature extractor adaptation, TinyTL provides 7.5-12.9x memory saving without sacrificing accuracy compared to fine-tuning the full Inception-V3.

...read moreread less

Collapse