scispace - formally typeset
Search or ask a question

Showing papers on "Overhead (computing) published in 2020"


Journal ArticleDOI
TL;DR: In this paper, the authors propose sparse ternary compression (STC), a new compression framework that is specifically designed to meet the requirements of the federated learning environment, which extends the existing compression technique of top- $k$ gradient sparsification with a novel mechanism to enable downstream compression as well as ternarization and optimal Golomb encoding of the weight updates.
Abstract: Federated learning allows multiple parties to jointly train a deep learning model on their combined data, without any of the participants having to reveal their local data to a centralized server. This form of privacy-preserving collaborative learning, however, comes at the cost of a significant communication overhead during training. To address this problem, several compression methods have been proposed in the distributed training literature that can reduce the amount of required communication by up to three orders of magnitude. These existing methods, however, are only of limited utility in the federated learning setting, as they either only compress the upstream communication from the clients to the server (leaving the downstream communication uncompressed) or only perform well under idealized conditions, such as i.i.d. distribution of the client data, which typically cannot be found in federated learning. In this article, we propose sparse ternary compression (STC), a new compression framework that is specifically designed to meet the requirements of the federated learning environment. STC extends the existing compression technique of top- $k$ gradient sparsification with a novel mechanism to enable downstream compression as well as ternarization and optimal Golomb encoding of the weight updates. Our experiments on four different learning tasks demonstrate that STC distinctively outperforms federated averaging in common federated learning scenarios. These results advocate for a paradigm shift in federated optimization toward high-frequency low-bitwidth communication, in particular in the bandwidth-constrained learning environments.

618 citations


Journal ArticleDOI
TL;DR: In this article, an IRS-enhanced orthogonal frequency division multiplexing (OFDM) system under frequency-selective channels is considered and a practical transmission protocol with channel estimation is proposed.
Abstract: Intelligent reflecting surface (IRS) is a promising new technology for achieving both spectrum and energy efficient wireless communication systems in the future. However, existing works on IRS mainly consider frequency-flat channels and assume perfect knowledge of channel state information (CSI) at the transmitter. Motivated by the above, in this paper we study an IRS-enhanced orthogonal frequency division multiplexing (OFDM) system under frequency-selective channels and propose a practical transmission protocol with channel estimation. First, to reduce the overhead in channel training as well as exploit the channel spatial correlation, we propose a novel IRS elements grouping method, where each group consists of a set of adjacent IRS elements that share a common reflection coefficient. Based on this method, we propose a practical transmission protocol where only the combined channel of each group needs to be estimated, thus substantially reducing the training overhead. Next, with any given grouping and estimated CSI, we formulate the problem to maximize the achievable rate by jointly optimizing the transmit power allocation and the IRS passive array reflection coefficients. Although the formulated problem is non-convex and thus difficult to solve, we propose an efficient algorithm to obtain a high-quality suboptimal solution for it, by alternately optimizing the power allocation and the passive array coefficients in an iterative manner, along with a customized method for the initialization. Simulation results show that the proposed design significantly improves the OFDM link rate performance as compared to the case without using IRS. Moreover, it is shown that there exists an optimal size for IRS elements grouping which achieves the maximum achievable rate due to the practical trade-off between the training overhead and IRS passive beamforming flexibility.

594 citations


Journal ArticleDOI
TL;DR: In this paper, the authors proposed a three-phase pilot-based channel estimation framework for IRS-assisted uplink multiuser communications, in which the user-BS direct channels and the users-IRS-BS reflected channels of a typical user were estimated in Phase I and Phase II, respectively, while the users reflected channels were estimated with low overhead in Phase III via leveraging their strong correlation with those of the typical user under the case without receiver noise at the BS.
Abstract: In intelligent reflecting surface (IRS) assisted communication systems, the acquisition of channel state information is a crucial impediment for achieving the beamforming gain of IRS because of the considerable overhead required for channel estimation Specifically, under the current beamforming design for IRS-assisted communications, in total $KMN+KM$ channel coefficients should be estimated, where $K$ , $N$ and $M$ denote the numbers of users, IRS reflecting elements, and antennas at the base station (BS), respectively For the first time in the literature, this paper points out that despite the vast number of channel coefficients that should be estimated, significant redundancy exists in the user-IRS-BS reflected channels of different users arising from the fact that each IRS element reflects the signals from all the users to the BS via the same channel To utilize this redundancy for reducing the channel estimation time, we propose a novel three-phase pilot-based channel estimation framework for IRS-assisted uplink multiuser communications, in which the user-BS direct channels and the user-IRS-BS reflected channels of a typical user are estimated in Phase I and Phase II, respectively, while the user-IRS-BS reflected channels of the other users are estimated with low overhead in Phase III via leveraging their strong correlation with those of the typical user Under this framework, we analytically prove that a time duration consisting of $K+N+\max (K-1,\lceil (K-1)N/M \rceil)$ pilot symbols is sufficient for perfectly recovering all the $KMN+KM$ channel coefficients under the case without receiver noise at the BS Further, under the case with receiver noise, the user pilot sequences, IRS reflecting coefficients, and BS linear minimum mean-squared error channel estimators are characterized in closed-form

571 citations


Proceedings ArticleDOI
07 Jun 2020
TL;DR: In this paper, the authors proposed a client-edge-cloud hierarchical federated learning system, supported with a HierFAVG algorithm that allows multiple edge servers to perform partial model aggregation.
Abstract: Federated Learning is a collaborative machine learning framework to train a deep learning model without accessing clients' private data. Previous works assume one central parameter server either at the cloud or at the edge. The cloud server can access more data but with excessive communication overhead and long latency, while the edge server enjoys more efficient communications with the clients. To combine their advantages, we propose a client-edge-cloud hierarchical Federated Learning system, supported with a HierFAVG algorithm that allows multiple edge servers to perform partial model aggregation. In this way, the model can be trained faster and better communication-computation trade-offs can be achieved. Convergence analysis is provided for HierFAVG and the effects of key parameters are also investigated, which lead to qualitative design guidelines. Empirical experiments verify the analysis and demonstrate the benefits of this hierarchical architecture in different data distribution scenarios. Particularly, it is shown that by introducing the intermediate edge servers, the model training time and the energy consumption of the end devices can be simultaneously reduced compared to cloud-based Federated Learning.

433 citations


Posted Content
TL;DR: In this article, deep ensembles provide an effective mechanism for approximate Bayesian marginalization, and propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction, without significant overhead.
Abstract: The key distinguishing property of a Bayesian approach is marginalization, rather than using a single setting of weights. Bayesian marginalization can particularly improve the accuracy and calibration of modern deep neural networks, which are typically underspecified by the data, and can represent many compelling but different solutions. We show that deep ensembles provide an effective mechanism for approximate Bayesian marginalization, and propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction, without significant overhead. We also investigate the prior over functions implied by a vague distribution over neural network weights, explaining the generalization properties of such models from a probabilistic perspective. From this perspective, we explain results that have been presented as mysterious and distinct to neural network generalization, such as the ability to fit images with random labels, and show that these results can be reproduced with Gaussian processes. We also show that Bayesian model averaging alleviates double descent, resulting in monotonic performance improvements with increased flexibility. Finally, we provide a Bayesian perspective on tempering for calibrating predictive distributions.

328 citations


Journal ArticleDOI
TL;DR: Simulation results show that the proposed method can provide an accurate channel estimate and achieve a substantial training overhead reduction and the inherent sparsity in mmWave channels is exploited.
Abstract: In this letter, we consider channel estimation for intelligent reflecting surface (IRS)-assisted millimeter wave (mmWave) systems, where an IRS is deployed to assist the data transmission from the base station (BS) to a user. It is shown that for the purpose of joint active and passive beamforming, the knowledge of a large-size cascade channel matrix needs to be acquired. To reduce the training overhead, the inherent sparsity in mmWave channels is exploited. By utilizing properties of Katri-Rao and Kronecker products, we find a sparse representation of the cascade channel and convert cascade channel estimation into a sparse signal recovery problem. Simulation results show that our proposed method can provide an accurate channel estimate and achieve a substantial training overhead reduction.

327 citations


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper introduced a privacy-preserving machine learning technique named federated learning and proposed a Federated Learning-based Gated Recurrent Unit neural network algorithm (FedGRU), which differs from current centralized learning methods and updates universal learning models through a secure parameter aggregation mechanism rather than directly sharing raw data among organizations.
Abstract: Existing traffic flow forecasting approaches by deep learning models achieve excellent success based on a large volume of datasets gathered by governments and organizations. However, these datasets may contain lots of user's private data, which is challenging the current prediction approaches as user privacy is calling for the public concern in recent years. Therefore, how to develop accurate traffic prediction while preserving privacy is a significant problem to be solved, and there is a trade-off between these two objectives. To address this challenge, we introduce a privacy-preserving machine learning technique named federated learning and propose a Federated Learning-based Gated Recurrent Unit neural network algorithm (FedGRU) for traffic flow prediction. FedGRU differs from current centralized learning methods and updates universal learning models through a secure parameter aggregation mechanism rather than directly sharing raw data among organizations. In the secure parameter aggregation mechanism, we adopt a Federated Averaging algorithm to reduce the communication overhead during the model parameter transmission process. Furthermore, we design a Joint Announcement Protocol to improve the scalability of FedGRU. We also propose an ensemble clustering-based scheme for traffic flow prediction by grouping the organizations into clusters before applying FedGRU algorithm. Through extensive case studies on a real-world dataset, it is shown that FedGRU's prediction accuracy is 90.96% higher than the advanced deep learning models, which confirm that FedGRU can achieve accurate and timely traffic prediction without compromising the privacy and security of raw data.

217 citations


Journal ArticleDOI
TL;DR: In this paper, the problem of joint computing, caching, communication, and control (4C) in big data MEC is formulated as an optimization problem whose goal is to jointly optimize a linear combination of the bandwidth consumption and network latency.
Abstract: The concept of Multi-access Edge Computing (MEC) has been recently introduced to supplement cloud computing by deploying MEC servers to the network edge so as to reduce the network delay and alleviate the load on cloud data centers. However, compared to the resourceful cloud, MEC server has limited resources. When each MEC server operates independently, it cannot handle all computational and big data demands stemming from users devices. Consequently, the MEC server cannot provide significant gains in overhead reduction of data exchange between users devices and remote cloud. Therefore, joint Computing, Caching, Communication, and Control (4C) at the edge with MEC server collaboration is needed. To address these challenges, in this paper, the problem of joint 4C in big data MEC is formulated as an optimization problem whose goal is to jointly optimize a linear combination of the bandwidth consumption and network latency. However, the formulated problem is shown to be non-convex. As a result, a proximal upper bound problem of the original formulated problem is proposed. To solve the proximal upper bound problem, the block successive upper bound minimization method is applied. Simulation results show that the proposed approach satisfies computation deadlines and minimizes bandwidth consumption and network latency.

208 citations


Journal ArticleDOI
TL;DR: This work introduces a privacy-preserving machine learning technique named federated learning (FL) and proposes an FL-based gated recurrent unit neural network algorithm (FedGRU) for traffic flow prediction (TFP) that differs from current centralized learning methods and updates universal learning models through a secure parameter aggregation mechanism.
Abstract: Existing traffic flow forecasting approaches by deep learning models achieve excellent success based on a large volume of data sets gathered by governments and organizations. However, these data sets may contain lots of user’s private data, which is challenging the current prediction approaches as user privacy is calling for the public concern in recent years. Therefore, how to develop accurate traffic prediction while preserving privacy is a significant problem to be solved, and there is a tradeoff between these two objectives. To address this challenge, we introduce a privacy-preserving machine learning technique named federated learning (FL) and propose an FL-based gated recurrent unit neural network algorithm (FedGRU) for traffic flow prediction (TFP). FedGRU differs from current centralized learning methods and updates universal learning models through a secure parameter aggregation mechanism rather than directly sharing raw data among organizations. In the secure parameter aggregation mechanism, we adopt a federated averaging algorithm to reduce the communication overhead during the model parameter transmission process. Furthermore, we design a joint announcement protocol to improve the scalability of FedGRU. We also propose an ensemble clustering-based scheme for TFP by grouping the organizations into clusters before applying the FedGRU algorithm. Extensive case studies on a real-world data set demonstrate that FedGRU can produce predictions that are merely 0.76 km/h worse than the state of the art in terms of mean average error under the privacy preservation constraint, confirming that the proposed model develops accurate traffic predictions without compromising the data privacy.

195 citations


Journal ArticleDOI
TL;DR: Two efficient channel estimation schemes for different channel setups in an IRS-assisted multi-user broadband communication system employing the orthogonal frequency division multiple access (OFDMA) are proposed and the fundamental limits on the minimum training overhead and the maximum number of supportable users are derived.
Abstract: To achieve the full passive beamforming gains of intelligent reflecting surface (IRS), accurate channel state information (CSI) is indispensable but practically challenging to acquire, due to the excessive amount of channel parameters to be estimated which increases with the number of IRS reflecting elements as well as that of IRS-served users. To tackle this challenge, we propose in this paper two efficient channel estimation schemes for different channel setups in an IRS-assisted multi-user broadband communication system employing the orthogonal frequency division multiple access (OFDMA). The first channel estimation scheme, which estimates the CSI of all users in parallel simultaneously at the access point (AP), is applicable for arbitrary frequency-selective fading channels. In contrast, the second channel estimation scheme, which exploits a key property that all users share the same (common) IRS-AP channel to enhance the training efficiency and support more users, is proposed for the typical scenario with line-of-sight (LoS) dominant user-IRS channels. For the two proposed channel estimation schemes, we further optimize their corresponding training designs (including pilot tone allocations for all users and IRS time-varying reflection pattern) to minimize the channel estimation error. Moreover, we derive and compare the fundamental limits on the minimum training overhead and the maximum number of supportable users of these two schemes. Simulation results verify the effectiveness of the proposed channel estimation schemes and training designs, and show their significant performance improvement over various benchmark schemes.

191 citations


Journal ArticleDOI
TL;DR: This paper forms the channel estimation problem in the RIS-assisted multiuser MIMO system as a matrix-calibration based matrix factorization task and proposes a novel message-passing based algorithm to factorize the cascaded channels.
Abstract: Reconfigurable intelligent surface (RIS) is envisioned to be an essential component of the paradigm for beyond 5G networks as it can potentially provide similar or higher array gains with much lower hardware cost and energy consumption compared with the massive multiple-input multiple-output (MIMO) technology. In this paper, we focus on one of the fundamental challenges, namely the channel acquisition, in a RIS-assisted multiuser MIMO system. The state-of-the-art channel acquisition approach in such a system with fully passive RIS elements estimates the cascaded transmitter-to-RIS and RIS-to-receiver channels by adopting excessively long training sequences. To estimate the cascaded channels with an affordable training overhead, we formulate the channel estimation problem in the RIS-assisted multiuser MIMO system as a matrix-calibration based matrix factorization task. By exploiting the information on the slow-varying channel components and the hidden channel sparsity, we propose a novel message-passing based algorithm to factorize the cascaded channels. Furthermore, we present an analytical framework to characterize the theoretical performance bound of the proposed estimator in the large-system limit. Finally, we conduct simulations to verify the high accuracy and efficiency of the proposed algorithm.

Journal ArticleDOI
TL;DR: Experimental results showed that the EC architecture can provide elastic and scalable computing power, and the proposed DIVS system can efficiently handle video surveillance and analysis tasks.
Abstract: In this paper, we propose a Distributed Intelligent Video Surveillance (DIVS) system using Deep Learning (DL) algorithms and deploy it in an edge computing environment. We establish a multi-layer edge computing architecture and a distributed DL training model for the DIVS system. The DIVS system can migrate computing workloads from the network center to network edges to reduce huge network communication overhead and provide low-latency and accurate video analysis solutions. We implement the proposed DIVS system and address the problems of parallel training, model synchronization, and workload balancing. Task-level parallel and model-level parallel training methods are proposed to further accelerate the video analysis process. In addition, we propose a model parameter updating method to achieve model synchronization of the global DL model in a distributed EC environment. Moreover, a dynamic data migration approach is proposed to address the imbalance of workload and computational power of edge nodes. Experimental results showed that the EC architecture can provide elastic and scalable computing power, and the proposed DIVS system can efficiently handle video surveillance and analysis tasks.

Proceedings ArticleDOI
14 Jun 2020
TL;DR: Adelai et al. as discussed by the authors proposed the Adaptive Bezier-Curve Network (\BeCan), which adaptively fit oriented or curved text by a parameterized bezier curve.
Abstract: Scene text detection and recognition has received increasing research attention. Existing methods can be roughly categorized into two groups: character-based and segmentation-based. These methods either are costly for character annotation or need to maintain a complex pipeline, which is often not suitable for real-time applications. Here we address the problem by proposing the Adaptive Bezier-Curve Network (\BeCan). Our contributions are three-fold: 1) For the first time, we adaptively fit oriented or curved text by a parameterized Bezier curve. 2) We design a novel BezierAlign layer for extracting accurate convolution features of a text instance with arbitrary shapes, significantly improving the precision compared with previous methods. 3) Compared with standard bounding box detection, our Bezier curve detection introduces negligible computation overhead, resulting in superiority of our method in both efficiency and accuracy. Experiments on oriented or curved benchmark datasets, namely Total-Text and CTW1500, demonstrate that \BeCan achieves state-of-the-art accuracy, meanwhile significantly improving the speed. In particular, on Total-Text, our real-time version is over 10 times faster than recent state-of-the-art methods with a competitive recognition accuracy. Code is available at \url{https://git.io/AdelaiDet}.

Journal ArticleDOI
Siqi Luo1, Xu Chen1, Qiong Wu1, Zhi Zhou1, Shuai Yu1 
TL;DR: A novel Hierarchical Federated Edge Learning (HFEL) framework is introduced in which model aggregation is partially migrated to edge servers from the cloud and achieves better training performance compared to conventional federated learning.
Abstract: Federated Learning (FL) has been proposed as an appealing approach to handle data privacy issue of mobile devices compared to conventional machine learning at the remote cloud with raw user data uploading By leveraging edge servers as intermediaries to perform partial model aggregation in proximity and relieve core network transmission overhead, it enables great potentials in low-latency and energy-efficient FL Hence we introduce a novel Hierarchical Federated Edge Learning (HFEL) framework in which model aggregation is partially migrated to edge servers from the cloud We further formulate a joint computation and communication resource allocation and edge association problem for device users under HFEL framework to achieve global cost minimization To solve the problem, we propose an efficient resource scheduling algorithm in the HFEL framework It can be decomposed into two subproblems: resource allocation given a scheduled set of devices for each edge server and edge association of device users across all the edge servers With the optimal policy of the convex resource allocation subproblem for a set of devices under a single edge server, an efficient edge association strategy can be achieved through iterative global cost reduction adjustment process, which is shown to converge to a stable system point Extensive performance evaluations demonstrate that our HFEL framework outperforms the proposed benchmarks in global cost saving and achieves better training performance compared to conventional federated learning

Journal ArticleDOI
TL;DR: The experimental results show that the proposed AE-IDS (Auto-Encoder Intrusion Detection System) is superior to traditional machine learning based intrusion detection methods in terms of easy training, strong adaptability, and high detection accuracy.

Proceedings ArticleDOI
30 Jul 2020
TL;DR: Using real topologies and traffic characteristics, it is shown that PINT concurrently enables applications such as congestion control, path tracing, and computing tail latencies, using only sixteen bits per packet, with performance comparable to the state of the art.
Abstract: Commodity network devices support adding in-band telemetry measurements into data packets, enabling a wide range of applications, including network troubleshooting, congestion control, and path tracing. However, including such information on packets adds significant overhead that impacts both flow completion times and application-level performance. We introduce PINT, an in-band network telemetry framework that bounds the amount of information added to each packet. PINT encodes the requested data on multiple packets, allowing per-packet overhead limits that can be as low as one bit. We analyze PINT and prove performance bounds, including cases when multiple queries are running simultaneously. PINT is implemented in P4 and can be deployed on network devices.Using real topologies and traffic characteristics, we show that PINT concurrently enables applications such as congestion control, path tracing, and computing tail latencies, using only sixteen bits per packet, with performance comparable to the state of the art.

Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a structured sparsity regularization (SSR) regularization to reduce the memory overhead of CNNs, which can be well supported by various off-the-shelf deep learning libraries.
Abstract: The success of convolutional neural networks (CNNs) in computer vision applications has been accompanied by a significant increase of computation and memory costs, which prohibits their usage on resource-limited environments, such as mobile systems or embedded devices. To this end, the research of CNN compression has recently become emerging. In this paper, we propose a novel filter pruning scheme, termed structured sparsity regularization (SSR), to simultaneously speed up the computation and reduce the memory overhead of CNNs, which can be well supported by various off-the-shelf deep learning libraries. Concretely, the proposed scheme incorporates two different regularizers of structured sparsity into the original objective function of filter pruning, which fully coordinates the global output and local pruning operations to adaptively prune filters. We further propose an alternative updating with Lagrange multipliers (AULM) scheme to efficiently solve its optimization. AULM follows the principle of alternating direction method of multipliers (ADMM) and alternates between promoting the structured sparsity of CNNs and optimizing the recognition loss, which leads to a very efficient solver ( $2.5\times $ to the most recent work that directly solves the group sparsity-based regularization). Moreover, by imposing the structured sparsity, the online inference is extremely memory-light since the number of filters and the output feature maps are simultaneously reduced. The proposed scheme has been deployed to a variety of state-of-the-art CNN structures, including LeNet, AlexNet, VGGNet, ResNet, and GoogLeNet, over different data sets. Quantitative results demonstrate that the proposed scheme achieves superior performance over the state-of-the-art methods. We further demonstrate the proposed compression scheme for the task of transfer learning, including domain adaptation and object detection, which also show exciting performance gains over the state-of-the-art filter pruning methods.

Journal ArticleDOI
TL;DR: A fast deep-reinforcement-learning (DRL)-based detection algorithm for virtual IP watermarks is proposed by combining the technologies of mapping function and DRL to preprocess the ownership information of the IP circuit resource.
Abstract: With the fast advancements of electronic chip technologies in the Internet of Things (IoT), it is urgent to address the copyright protection issue of intellectual property (IP) circuit resources of the electronic devices in IoT environments. In this article, a fast deep-reinforcement-learning (DRL)-based detection algorithm for virtual IP watermarks is proposed by combining the technologies of mapping function and DRL to preprocess the ownership information of the IP circuit resource. The deep $Q$ -learning (DQN) algorithm is used to generate the watermarked positions adaptively, making the watermarked positions secure yet close to the original design, turning the watermarked positions secure. An artificial neural network (ANN) algorithm is utilized for training the position distance characteristic vectors of the IP circuit, in which the characteristic function of the virtual position for IP watermark is generated after training. In IP ownership verification, the DRL model can quickly locate the range of virtual watermark positions. With the characteristic values of the virtual positions in each lookup table (LUT) area and surrounding areas, the mapping position relationship can be calculated in a supervised manner in the neural network, as the algorithm realizes the fast location of the real ownership information in an IP circuit. The experimental results show that the proposed algorithm can effectively improve the speed of watermark detection as also reducing the resource overhead. Besides, it also achieves excellent performance in security.

Journal ArticleDOI
Jie Cui1, Lu Wei1, Hong Zhong1, Jing Zhang1, Yan Xu1, Lu Liu2 
TL;DR: In the proposed scheme, a roadside unit (RSU) can find the popular data by analyzing the encrypted requests sent from nearby vehicles without having to sacrifice the privacy of their download requests.
Abstract: With the advancements in social media and rising demand for real traffic information, the data shared in vehicular ad hoc networks (VANETs) indicate that the size and amount of requested data will continue increasing. Vehicles in the same area often have similar data downloading requests. If we ignore the common requests, the resource allocation efficiency of the VANET system will be quite low. Motivated by this fact, we propose an efficient and privacy-preserving data downloading scheme for VANETs, based on the edge computing concept. In the proposed scheme, a roadside unit (RSU) can find the popular data by analyzing the encrypted requests sent from nearby vehicles without having to sacrifice the privacy of their download requests. Further, the RSU caches the popular data in nearby qualified vehicles called edge computing vehicles (ECVs). If a vehicle wishes to download the popular data, it can download it directly from the nearby ECVs. This method increases the downloading efficiency of the system. The security analysis results show that the proposed scheme can resist multiple security attacks. The performance analysis results demonstrate that our scheme has reasonable computation and communication overhead. Finally, the OMNeT++ simulation results indicate that our scheme has good network performance.

Proceedings ArticleDOI
26 May 2020
TL;DR: In this article, a deep reinforcement learning framework was proposed to predict the reflecting coefficients of reflecting surfaces without requiring massive channel estimation or beam training overhead, and the proposed online learning framework can converge to the optimal rate that assumes perfect channel knowledge.
Abstract: The promising coverage and spectral efficiency gains of intelligent reflecting surfaces (IRSs) are attracting increasing interest. To adopt these surfaces in practice, however, several challenges need to be addressed. One of these main challenges is how to configure the reflecting coefficients on these passive surfaces without requiring massive channel estimation or beam training overhead. Earlier work suggested leveraging supervised learning tools to predict the IRS reflection matrices. While this approach has the potential of reducing the beam training overhead, it requires collecting large datasets for training the neural network models. In this paper, we propose a novel deep reinforcement learning framework for predicting the IRS reflection matrices with minimal beam training overhead. Simulation results show that the proposed online learning framework can converge to the optimal rate that assumes perfect channel knowledge. This represents an important step towards realizing a standalone IRS operation, where the surface configures itself without any control from the infrastructure.

Journal ArticleDOI
TL;DR: An static random access memory (SRAM) CIM unit-macro using compact-rule compatible twin-8T cells for weighted CIM MAC operations to reduce area overhead and vulnerability to process variation and an even–odd dual-channel (EODC) input mapping scheme to extend input bandwidth is presented.
Abstract: Computation-in-memory (CIM) is a promising candidate to improve the energy efficiency of multiply-and-accumulate (MAC) operations of artificial intelligence (AI) chips. This work presents an static random access memory (SRAM) CIM unit-macro using: 1) compact-rule compatible twin-8T (T8T) cells for weighted CIM MAC operations to reduce area overhead and vulnerability to process variation; 2) an even–odd dual-channel (EODC) input mapping scheme to extend input bandwidth; 3) a two’s complement weight mapping (C2WM) scheme to enable MAC operations using positive and negative weights within a cell array in order to reduce area overhead and computational latency; and 4) a configurable global–local reference voltage generation (CGLRVG) scheme for kernels of various sizes and bit precision. A 64 $\times $ 60 b T8T unit-macro with 1-, 2-, 4-b inputs, 1-, 2-, 5-b weights, and up to 7-b MAC-value (MACV) outputs was fabricated as a test chip using a foundry 55-nm process. The proposed SRAM-CIM unit-macro achieved access times of 5 ns and energy efficiency of 37.5–45.36 TOPS/W under 5-b MACV output.

Journal ArticleDOI
TL;DR: A new communication-efficient on-device federated learning (FL)-based deep anomaly detection framework for sensing time-series data in IIoT and an attention mechanism-based convolutional neural network-long short-term memory (AMCNN-LSTM) model to accurately detect anomalies is proposed.
Abstract: Since edge device failures (i.e., anomalies) seriously affect the production of industrial products in Industrial IoT (IIoT), accurately and timely detecting anomalies is becoming increasingly important. Furthermore, data collected by the edge device may contain the user's private data, which is challenging the current detection approaches as user privacy is calling for the public concern in recent years. With this focus, this paper proposes a new communication-efficient on-device federated learning (FL)-based deep anomaly detection framework for sensing time-series data in IIoT. Specifically, we first introduce a FL framework to enable decentralized edge devices to collaboratively train an anomaly detection model, which can improve its generalization ability. Second, we propose an Attention Mechanism-based Convolutional Neural Network-Long Short Term Memory (AMCNN-LSTM) model to accurately detect anomalies. The AMCNN-LSTM model uses attention mechanism-based CNN units to capture important fine-grained features, thereby preventing memory loss and gradient dispersion problems. Furthermore, this model retains the advantages of LSTM unit in predicting time series data. Third, to adapt the proposed framework to the timeliness of industrial anomaly detection, we propose a gradient compression mechanism based on Top-\textit{k} selection to improve communication efficiency. Extensive experiment studies on four real-world datasets demonstrate that the proposed framework can accurately and timely detect anomalies and also reduce the communication overhead by 50\% compared to the federated learning framework that does not use a gradient compression scheme.

Posted Content
Siqi Luo1, Xu Chen1, Qiong Wu1, Zhi Zhou1, Shuai Yu1 
TL;DR: In this article, the authors proposed a hierarchical federated edge learning (HFEL) framework in which model aggregation is partially migrated to edge servers from the cloud and formulated a joint computation and communication resource allocation and edge association problem for device users under HFEL framework to achieve global cost minimization.
Abstract: Federated Learning (FL) has been proposed as an appealing approach to handle data privacy issue of mobile devices compared to conventional machine learning at the remote cloud with raw user data uploading. By leveraging edge servers as intermediaries to perform partial model aggregation in proximity and relieve core network transmission overhead, it enables great potentials in low-latency and energy-efficient FL. Hence we introduce a novel Hierarchical Federated Edge Learning (HFEL) framework in which model aggregation is partially migrated to edge servers from the cloud. We further formulate a joint computation and communication resource allocation and edge association problem for device users under HFEL framework to achieve global cost minimization. To solve the problem, we propose an efficient resource scheduling algorithm in the HFEL framework. It can be decomposed into two subproblems: \emph{resource allocation} given a scheduled set of devices for each edge server and \emph{edge association} of device users across all the edge servers. With the optimal policy of the convex resource allocation subproblem for a set of devices under a single edge server, an efficient edge association strategy can be achieved through iterative global cost reduction adjustment process, which is shown to converge to a stable system point. Extensive performance evaluations demonstrate that our HFEL framework outperforms the proposed benchmarks in global cost saving and achieves better training performance compared to conventional federated learning.

Journal ArticleDOI
TL;DR: A novel storage auditing scheme that achieves highly-efficient user revocation independent of the total number of file blocks possessed by the revoked user in the cloud is proposed by exploring a novel strategy for key generation and a new private key update technique.
Abstract: Cloud storage auditing schemes for shared data refer to checking the integrity of cloud data shared by a group of users. User revocation is commonly supported in such schemes, as users may be subject to group membership changes for various reasons. Previously, the computational overhead for user revocation in such schemes is linear with the total number of file blocks possessed by a revoked user. The overhead, however, may become a heavy burden because of the sheer amount of the shared cloud data. Thus, how to reduce the computational overhead caused by user revocations becomes a key research challenge for achieving practical cloud data auditing. In this paper, we propose a novel storage auditing scheme that achieves highly-efficient user revocation independent of the total number of file blocks possessed by the revoked user in the cloud. This is achieved by exploring a novel strategy for key generation and a new private key update technique. Using this strategy and the technique, we realize user revocation by just updating the non-revoked group users’ private keys rather than authenticators of the revoked user. The integrity auditing of the revoked user's data can still be correctly performed when the authenticators are not updated. Meanwhile, the proposed scheme is based on identity-base cryptography, which eliminates the complicated certificate management in traditional Public Key Infrastructure (PKI) systems. The security and efficiency of the proposed scheme are validated via both analysis and experimental results.

Journal ArticleDOI
TL;DR: A privacy-preserving optimization of clinical pathway query scheme (PPO-CPQ) to achieve the secureclinical pathway query under e-healthcare CSs without revealing neither the private information of patients, such as name, gender, age, and physical index, nor the sensitive information of hospitals.
Abstract: With the help of patients’ health data, e-healthcare providers can offer reliable data services for better medical treatment. For example, the clinical pathway provides optimal detailed guidance for the clinical treatment. However, the e-healthcare providers are incompetent with the huge volumes of e-healthcare data, and a popular and feasible solution is used to outsource the medical data to powerful cloud servers (CSs). Because the medical data are very sensitive yet the outsourced servers are not fully trusted, the straightforward execution of clinical pathway query service will inevitably bring huge privacy risks to patients’ data. Apart from the privacy issues, the efficiency issues also need to be taken into consideration, such as the communication overhead and computational cost between servers and providers. Considering the above issues, this article proposes a privacy-preserving optimization of clinical pathway query scheme (PPO-CPQ) to achieve the secure clinical pathway query under e-healthcare CSs without revealing neither the private information of patients, such as name, gender, age, and physical index, nor the sensitive information of hospitals, such as treatment, medication, and expense. In our proposed scheme, it first designs secure and privacy-preserving several subprotocols, such as privacy-preserving comparison, privacy-preserving clinical comparison, privacy-preserving stage selection, and privacy-preserving stage update protocol, to ensure privacy in the e-healthcare system, then it adopts the greedy algorithm in a secure manner to perform the query and the min-heap technology to improve efficiency. The experimental result shows that our scheme is practical and efficient in terms of computational cost and communication overhead.

Journal ArticleDOI
TL;DR: A deep learning-based beam selection, which is compatible with the 5G NR standard, and introduces a deep neural network (DNN) structure and explains how a power delay profile (PDP) of a sub-6 GHz channel is used as an input of the DNN.
Abstract: In fifth-generation (5G) communications, millimeter wave (mmWave) is one of the key technologies to increase the data rate. To overcome this technology's poor propagation characteristics, it is necessary to employ a number of antennas and form narrow beams. It becomes crucial then, especially for initial access, to attain fine beam alignment between a next generation NodeB (gNB) and a user equipment (UE). The current 5G New Radio (NR) standard, however, adopts an exhaustive search-based beam sweeping, which causes time overhead of a half frame for initial beam establishment. In this paper, we propose a deep learning-based beam selection, which is compatible with the 5G NR standard. To select a mmWave beam, we exploit sub-6 GHz channel information. We introduce a deep neural network (DNN) structure and explain how we estimate a power delay profile (PDP) of a sub-6 GHz channel, which is used as an input of the DNN. We then validate its performance with real environment-based 3D ray-tracing simulations and over-the-air experiments with a mmWave prototype. Evaluation results confirm that, with support from the sub-6 GHz connection, the proposed beam selection reduces the beam sweeping overhead by up to 79.3 %.

Journal ArticleDOI
TL;DR: A collaborative learning-based routing scheme for multi-access vehicular edge computing environment that employs a reinforcement learning algorithm based on end-edge-cloud collaboration to find routes in a proactive manner with a low communication overhead and is preemptively changed based on the learned information.
Abstract: Some Internet-of-Things (IoT) applications have a strict requirement on the end-to-end delay where edge computing can be used to provide a short delay for end-users by conducing efficient caching and computing at the edge nodes. However, a fast and efficient communication route creation in multi-access vehicular environment is an underexplored research problem. In this paper, we propose a collaborative learning-based routing scheme for multi-access vehicular edge computing environment. The proposed scheme employs a reinforcement learning algorithm based on end-edge-cloud collaboration to find routes in a proactive manner with a low communication overhead. The routes are also preemptively changed based on the learned information. By integrating the “proactive” and “preemptive” approach, the proposed scheme can achieve a better forwarding of packets as compared with existing alternatives. We conduct extensive and realistic computer simulations to show the performance advantage of the proposed scheme over existing baselines.

Posted Content
TL;DR: For the first time, a novel BezierAlign layer is designed for extracting accurate convolution features of a text instance with arbitrary shapes, significantly improving the precision compared with previous methods and introducing negligible computation overhead.
Abstract: Scene text detection and recognition has received increasing research attention. Existing methods can be roughly categorized into two groups: character-based and segmentation-based. These methods either are costly for character annotation or need to maintain a complex pipeline, which is often not suitable for real-time applications. Here we address the problem by proposing the Adaptive Bezier-Curve Network (ABCNet). Our contributions are three-fold: 1) For the first time, we adaptively fit arbitrarily-shaped text by a parameterized Bezier curve. 2) We design a novel BezierAlign layer for extracting accurate convolution features of a text instance with arbitrary shapes, significantly improving the precision compared with previous methods. 3) Compared with standard bounding box detection, our Bezier curve detection introduces negligible computation overhead, resulting in superiority of our method in both efficiency and accuracy. Experiments on arbitrarily-shaped benchmark datasets, namely Total-Text and CTW1500, demonstrate that ABCNet achieves state-of-the-art accuracy, meanwhile significantly improving the speed. In particular, on Total-Text, our realtime version is over 10 times faster than recent state-of-the-art methods with a competitive recognition accuracy. Code is available at this https URL

Posted Content
TL;DR: This work proposes a set of algorithms with periodical compressed (quantized or sparsified) communication and analyzes their convergence properties in both homogeneous and heterogeneous local data distributions settings and introduces a scheme to mitigate data heterogeneity.
Abstract: In federated learning, communication cost is often a critical bottleneck to scale up distributed optimization algorithms to collaboratively learn a model from millions of devices with potentially unreliable or limited communication and heterogeneous data distributions Two notable trends to deal with the communication overhead of federated algorithms are \emph{gradient compression} and \emph{local computation with periodic communication} Despite many attempts, characterizing the relationship between these two approaches has proven elusive We address this by proposing a set of algorithms with periodical compressed (quantized or sparsified) communication and analyze their convergence properties in both homogeneous and heterogeneous local data distributions settings For the homogeneous setting, our analysis improves existing bounds by providing tighter convergence rates for both \emph{strongly convex} and \emph{non-convex} objective functions To mitigate data heterogeneity, we introduce a \emph{local gradient tracking} scheme and obtain sharp convergence rates that match the best-known communication complexities without compression for convex, strongly convex, and nonconvex settings We complement our theoretical results and demonstrate the effectiveness of our proposed methods by several experiments on real-world datasets

Proceedings Article
01 Jan 2020
TL;DR: Tiny-Transfer-Learning (TinyTL) as mentioned in this paper proposes a new memory-efficient bias module, the lite residual module, to refine the feature extractor by learning small residual feature maps adding only 3.8% memory overhead.
Abstract: On-device learning enables edge devices to continually adapt the AI models to new data, which requires a small memory footprint to fit the tight memory constraint of edge devices. Existing work solves this problem by reducing the number of trainable parameters. However, this doesn't directly translate to memory saving since the major bottleneck is the activations, not parameters. In this work, we present Tiny-Transfer-Learning (TinyTL) for memory-efficient on-device learning. TinyTL freezes the weights while only learns the bias modules, thus no need to store the intermediate activations. To maintain the adaptation capacity, we introduce a new memory-efficient bias module, the lite residual module, to refine the feature extractor by learning small residual feature maps adding only 3.8% memory overhead. Extensive experiments show that TinyTL significantly saves the memory (up to 6.5x) with little accuracy loss compared to fine-tuning the full network. Compared to fine-tuning the last layer, TinyTL provides significant accuracy improvements (up to 33.8%) with little memory overhead. Furthermore, combined with feature extractor adaptation, TinyTL provides 7.5-12.9x memory saving without sacrificing accuracy compared to fine-tuning the full Inception-V3.