scispace - formally typeset
Search or ask a question

Showing papers on "Block (data storage) published in 2021"


Journal ArticleDOI
TL;DR: Res2Net as mentioned in this paper constructs hierarchical residual-like connections within one single residual block to represent multi-scale features at a granular level and increases the range of receptive fields for each network layer.
Abstract: Representing features at multiple scales is of great importance for numerous vision tasks. Recent advances in backbone convolutional neural networks (CNNs) continually demonstrate stronger multi-scale representation ability, leading to consistent performance gains on a wide range of applications. However, most existing methods represent the multi-scale features in a layer-wise manner. In this paper, we propose a novel building block for CNNs, namely Res2Net, by constructing hierarchical residual-like connections within one single residual block. The Res2Net represents multi-scale features at a granular level and increases the range of receptive fields for each network layer. The proposed Res2Net block can be plugged into the state-of-the-art backbone CNN models, e.g., ResNet, ResNeXt, and DLA. We evaluate the Res2Net block on all these models and demonstrate consistent performance gains over baseline models on widely-used datasets, e.g., CIFAR-100 and ImageNet. Further ablation studies and experimental results on representative computer vision tasks, i.e., object detection, class activation mapping, and salient object detection, further verify the superiority of the Res2Net over the state-of-the-art baseline methods. The source code and trained models are available on https://mmcheng.net/res2net/ .

1,553 citations


Journal ArticleDOI
TL;DR: A new framework model based on a novel feature selection metric approach named CorrAUC is proposed, and a new feature selection algorithm based on the wrapper technique to filter the features accurately and select effective features for the selected ML algorithm by using the area under the curve (AUC) metric.
Abstract: Identification of anomaly and malicious traffic in the Internet-of-Things (IoT) network is essential for the IoT security to keep eyes and block unwanted traffic flows in the IoT network. For this purpose, numerous machine-learning (ML) technique models are presented by many researchers to block malicious traffic flows in the IoT network. However, due to the inappropriate feature selection, several ML models prone misclassify mostly malicious traffic flows. Nevertheless, the significant problem still needs to be studied more in-depth that is how to select effective features for accurate malicious traffic detection in the IoT network. To address the problem, a new framework model is proposed. First, a novel feature selection metric approach named CorrAUC is proposed, and then based on CorrAUC, a new feature selection algorithm named CorrAUC is developed and designed, which is based on the wrapper technique to filter the features accurately and select effective features for the selected ML algorithm by using the area under the curve (AUC) metric. Then, we applied the integrated TOPSIS and Shannon entropy based on a bijective soft set to validate selected features for malicious traffic identification in the IoT network. We evaluate our proposed approach by using the Bot-IoT data set and four different ML algorithms. The experimental results analysis showed that our proposed method is efficient and can achieve >96% results on average.

244 citations


Proceedings ArticleDOI
13 Apr 2021
TL;DR: Lite-HRNet as mentioned in this paper introduces a lightweight unit, conditional channel weighting, to replace costly pointwise (1 × 1) convolutions in shuffle blocks, which can be easily applied to semantic segmentation task.
Abstract: We present an efficient high-resolution network, Lite-HRNet, for human pose estimation. We start by simply applying the efficient shuffle block in ShuffleNet to HRNet (high-resolution network), yielding stronger performance over popular lightweight networks, such as MobileNet, ShuffleNet, and Small HRNet. We find that the heavily-used pointwise (1 × 1) convolutions in shuffle blocks become the computational bottleneck. We introduce a lightweight unit, conditional channel weighting, to replace costly pointwise (1 × 1) convolutions in shuffle blocks. The complexity of channel weighting is linear w.r.t the number of channels and lower than the quadratic time complexity for pointwise convolutions. Our solution learns the weights from all the channels and over multiple resolutions that are readily available in the parallel branches in HRNet. It uses the weights as the bridge to exchange information across channels and resolutions, compensating the role played by the pointwise (1 × 1) convolution. Lite-HRNet demonstrates superior results on human pose estimation over popular lightweight networks. Moreover, Lite-HRNet can be easily applied to semantic segmentation task in the same lightweight manner. The code and models have been publicly available at https://github.com/HRNet/Lite-HRNet.

161 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed a hybrid implicit-explicit network architecture and training strategy that adaptively allocates resources during training and inference based on the local complexity of a signal of interest.
Abstract: Neural representations have emerged as a new paradigm for applications in rendering, imaging, geometric modeling, and simulation. Compared to traditional representations such as meshes, point clouds, or volumes they can be flexibly incorporated into differentiable learning-based pipelines. While recent improvements to neural representations now make it possible to represent signals with fine details at moderate resolutions (e.g., for images and 3D shapes), adequately representing large-scale or complex scenes has proven a challenge. Current neural representations fail to accurately represent images at resolutions greater than a megapixel or 3D scenes with more than a few hundred thousand polygons. Here, we introduce a new hybrid implicit-explicit network architecture and training strategy that adaptively allocates resources during training and inference based on the local complexity of a signal of interest. Our approach uses a multiscale block-coordinate decomposition, similar to a quadtree or octree, that is optimized during training. The network architecture operates in two stages: using the bulk of the network parameters, a coordinate encoder generates a feature grid in a single forward pass. Then, hundreds or thousands of samples within each block can be efficiently evaluated using a lightweight feature decoder. With this hybrid implicit-explicit network architecture, we demonstrate the first experiments that fit gigapixel images to nearly 40 dB peak signal-to-noise ratio. Notably this represents an increase in scale of over 1000X compared to the resolution of previously demonstrated image-fitting experiments. Moreover, our approach is able to represent 3D shapes significantly faster and better than previous techniques; it reduces training times from days to hours or minutes and memory requirements by over an order of magnitude.

134 citations


Proceedings ArticleDOI
01 Jun 2021
TL;DR: Diverse Branch Block (DBB) as mentioned in this paper enhances the representational capacity of a single convolution by combining diverse branches of different scales and complexities to enrich the feature space, including sequences of convolutions, multiscale convolutions and average pooling.
Abstract: We propose a universal building block of Convolutional Neural Network (ConvNet) to improve the performance without any inference-time costs. The block is named Diverse Branch Block (DBB), which enhances the representational capacity of a single convolution by combining diverse branches of different scales and complexities to enrich the feature space, including sequences of convolutions, multiscale convolutions, and average pooling. After training, a DBB can be equivalently converted into a single conv layer for deployment. Unlike the advancements of novel ConvNet architectures, DBB complicates the training-time microstructure while maintaining the macro architecture, so that it can be used as a drop-in replacement for regular conv layers of any architecture. In this way, the model can be trained to reach a higher level of performance and then transformed into the original inference-time structure for inference. DBB improves ConvNets on image classification (up to 1.9% higher top-1 accuracy on ImageNet), object detection and semantic segmentation. The PyTorch code and models are released at https://github.com/DingXiaoH/DiverseBranchBlock.

119 citations


Journal ArticleDOI
TL;DR: New quadrant-based search algorithm with zero motion prejudgment method is proposed for motion estimation (ME) in HEVC (High Efficiency Video Coding) standard to obtain efficient output with low motion estimation time.
Abstract: In this manuscript, new quadrant-based search algorithm with zero motion prejudgment is proposed for motion estimation (ME) in HEVC (High Efficiency Video Coding) standard. The HEVC standard is used to obtain efficient output with low motion estimation time. The proposed quadrant-based search algorithm is a fast block matching algorithm that obtain better block matching amid the current block and reference block. The zero motion prejudgment (ZMP) method is used to find the block, whether it is motion or static and it is used for decreasing the computational complexity (CC) in the proposed quadrant-based search algorithm. The proposed quadrant-based search algorithm with ZMP technique for motion estimation in HEVC is implemented on the FPGA hardware platform. The entire architecture is executed in Verilog HDL with Virtex-5 technology and integrated with Xilinx ISE Design Suite 14.5. The results are integrated into the CIF (352 × 288 pixels) and HD (1280 × 720 pixels) video input sequence. The evaluation metrics like PSNR, Motion estimation time, sum of absolute difference (SAD) value are analyzed with existing method like hexagon, adaptive root pattern algorithm, and diamond search algorithm. Then the hardware parameters like power consumption and maximum operating frequency are measured. The hardware utilization is reduced and the power consumption of the proposed model is diminished to 0.143 W. The maximal operating frequency of the proposed model is 440.470 MHz. The experimental outcomes demonstrate that the proposed motion evaluation approach in HEVC is more effective than existing algorithms.

104 citations


Proceedings ArticleDOI
01 Jun 2021
TL;DR: Wang et al. as mentioned in this paper proposed a learnable module that learns spatial contextual features from large-scale point clouds, called SCF, which mainly consists of three blocks, including the local polar representation block, the dual-distance attentive pooling block, and the global contextual feature block.
Abstract: How to learn effective features from large-scale point clouds for semantic segmentation has attracted increasing attention in recent years. Addressing this problem, we propose a learnable module that learns Spatial Contextual Features from large-scale point clouds, called SCF in this paper. The proposed module mainly consists of three blocks, including the local polar representation block, the dual-distance attentive pooling block, and the global contextual feature block. For each 3D point, the local polar representation block is firstly explored to construct a spatial representation that is invariant to the z-axis rotation, then the dual-distance attentive pooling block is designed to utilize the representations of its neighbors for learning more discriminative local features according to both the geometric and feature distances among them, and finally, the global contextual feature block is designed to learn a global context for each 3D point by utilizing its spatial location and the volume ratio of the neighborhood to the global point cloud. The proposed module could be easily embedded into various network architectures for point cloud segmentation, naturally resulting in a new 3D semantic segmentation network with an encoder-decoder architecture, called SCF-Net in this work. Extensive experimental results on two public datasets demonstrate that the proposed SCF-Net performs better than several state-of-the-art methods in most cases.

100 citations


Journal ArticleDOI
TL;DR: Simulation results demonstrate that the proposed algorithm can offer significant average sum-rate enhancement compared to that achieved using the ideal IRS reflection model, which confirms the importance of the use of the practical model for the design of wideband systems.
Abstract: Intelligent reflecting surface (IRS) is envisioned as a revolutionary technology for future wireless communication systems since it can intelligently change radio environment and integrate it into wireless communication optimization However, most existing works adopted an ideal IRS reflection model, which is impractical and can cause significant performance degradation in realistic wideband systems To address this issue, we first study the dual phase- and amplitude-squint effect of reflected signals and present a simplified practical IRS reflection model for wideband signals Then, an IRS enhanced wideband multiuser multi-input single-output orthogonal frequency division multiplexing (MU-MISO-OFDM) system is investigated We aim to jointly design the transmit beamformer and IRS reflection for the case of using both continuous and discrete phase shifters to maximize the average sum-rate over all subcarriers By exploiting the relationship between sum-rate maximization and mean square error (MSE) minimization, the original problem is equivalently transformed into a multi-block/variable problem, which can be efficiently solved by the block coordinate descent (BCD) method Complexity and convergence for both cases are analyzed or illustrated Simulation results demonstrate that the proposed algorithm can offer significant average sum-rate enhancement compared to that achieved using the ideal IRS reflection model, which confirms the importance of the use of the practical model for the design of wideband systems

97 citations


Journal ArticleDOI
TL;DR: This approach leverages the nearby edge devices to create the decoupled blocks in blockchain so as to securely transmit the healthcare data from sensors to the edge nodes and transmit and store the data at the cloud using the incremental tensor-based scheme.
Abstract: The in-house health monitoring sensors form a large network of Internet of things (IoT) that continuously monitors and sends the data to the nearby devices or server. However, the connectivity of these IoT-based sensors with different entities leads to security loopholes wherein the adversary can exploit the vulnerabilities due to the openness of the data. This is a major concern especially in the healthcare sector where the change in data values from sensors can change the course of diagnosis which can cause severe health issues. Therefore, in order to prevent the data tempering and preserve the privacy of patients, we present a decoupled blockchain-based approach in the edge-envisioned ecosystem. This approach leverages the nearby edge devices to create the decoupled blocks in blockchain so as to securely transmit the healthcare data from sensors to the edge nodes. The edge nodes then transmit and store the data at the cloud using the incremental tensor-based scheme. This helps to reduce the data duplication of the huge amount of data transmitted in the large IoT healthcare network. The results show the effectiveness of the proposed approach in terms of the block preparation time, header generation time, tensor reduction ratio, and approximation error.

92 citations


Journal ArticleDOI
TL;DR: This article investigates the security problems for dual UAV-assisted mobile edge computing systems, where one UAV is invoked to help the ground terminal devices (TDs) to compute the offloaded tasks and the other one acts as a jammer to suppress the vicious eavesdroppers.
Abstract: Unmanned aerial vehicle (UAV) has been widely applied in internet-of-things (IoT) scenarios while the security for UAV communications remains a challenging problem due to the broadcast nature of the line-of-sight (LoS) wireless channels. This article investigates the security problems for dual UAV-assisted mobile edge computing (MEC) systems, where one UAV is invoked to help the ground terminal devices (TDs) to compute the offloaded tasks and the other one acts as a jammer to suppress the vicious eavesdroppers. In our framework, minimum secure computing capacity maximization problems are proposed for both the time division multiple access (TDMA) scheme and non-orthogonal multiple access (NOMA) scheme by jointly optimizing the communication resources, computation resources, and UAVs’ trajectories. The formulated problems are non-trivial and challenging to be solved due to the highly coupled variables. To tackle these problems, we first transform them into more tractable ones then a block coordinate descent based algorithm and a penalized block coordinate descent based algorithm are proposed to solve the problems for TDMA and NOMA schemes, respectively. Finally, numerical results show that the security computing capacity performance of the systems is enhanced by the proposed algorithms as compared with the benchmarks. Meanwhile, the NOMA scheme is superior to the TDMA scheme for security improvement.

91 citations


Journal ArticleDOI
TL;DR: This paper provides a block-wise network generation pipeline called BlockQNN which automatically builds high-performance networks using the Q-Learning paradigm with epsilon-greedy exploration strategy and proposes a distributed asynchronous framework and an early stop strategy.
Abstract: Convolutional neural networks have gained a remarkable success in computer vision. However, most popular network architectures are hand-crafted and usually require expertise and elaborate design. In this paper, we provide a block-wise network generation pipeline called BlockQNN which automatically builds high-performance networks using the Q-Learning paradigm with epsilon-greedy exploration strategy. The optimal network block is constructed by the learning agent which is trained to choose component layers sequentially. We stack the block to construct the whole auto-generated network. To accelerate the generation process, we also propose a distributed asynchronous framework and an early stop strategy. The block-wise generation brings unique advantages: (1) it yields state-of-the-art results in comparison to the hand-crafted networks on image classification, particularly, the best network generated by BlockQNN achieves 2.35 percent top-1 error rate on CIFAR-10. (2) it offers tremendous reduction of the search space in designing networks, spending only 3 days with 32 GPUs. A faster version can yield a comparable result with only 1 GPU in 20 hours. (3) it has strong generalizability in that the network built on CIFAR also performs well on the larger-scale dataset. The best network achieves very competitive accuracy of 82.0 percent top-1 and 96.0 percent top-5 on ImageNet.

Journal ArticleDOI
TL;DR: This study proposes a neural network—a progressive-recursive image enhancement network (PRIEN)—to enhance low-light images and demonstrates the advantages of the method compared with other methods, from both qualitative and quantitative perspectives.
Abstract: Low-light images have low brightness and contrast, which presents a huge obstacle to computer vision tasks. Low-light image enhancement is challenging because multiple factors (such as brightness, contrast, artifacts, and noise) must be considered simultaneously. In this study, we propose a neural network—a progressive-recursive image enhancement network (PRIEN)—to enhance low-light images. The main idea is to use a recursive unit, composed of a recursive layer and a residual block, to repeatedly unfold the input image for feature extraction. Unlike in previous methods, in the proposed study, we directly input low-light images into the dual attention model for global feature extraction. Next, we use a combination of recurrent layers and residual blocks for local feature extraction. Finally, we output the enhanced image. Furthermore, we input the global feature map of dual attention into each stage in a progressive way. In the local feature extraction module, a recurrent layer shares depth features across stages. In addition, we perform recursive operations on a single residual block, significantly reducing the number of parameters while ensuring good network performance. Although the network structure is simple, it can produce good results for a range of low-light conditions. We conducted experiments on widely adopted datasets. The results demonstrate the advantages of our method compared with other methods, from both qualitative and quantitative perspectives.

Journal ArticleDOI
TL;DR: This work proposes a distributed sensor-fault detection and diagnosis system based on machine learning algorithms where the fault detection block is implemented in the sensor in order to achieve output immediately after data collection and shows the efficiency of the proposed fuzzy learning-based model over classic neuro-fuzzy and non- fuzzy learning approaches.

Proceedings ArticleDOI
01 Jun 2021
TL;DR: Nirkin this paper proposes a real-time semantic segmentation network, where the encoder both encodes and generates the parameters (weights) of the decoder, and the weights at each decoder block vary spatially.
Abstract: We present a novel, real-time, semantic segmentation network in which the encoder both encodes and generates the parameters (weights) of the decoder. Furthermore, to allow maximal adaptivity, the weights at each decoder block vary spatially. For this purpose, we design a new type of hypernetwork, composed of a nested U-Net for drawing higher level context features, a multi-headed weight generating module which generates the weights of each block in the decoder immediately before they are consumed, for efficient memory utilization, and a primary network that is composed of novel dynamic patch-wise convolutions. Despite the usage of less-conventional blocks, our architecture obtains real-time performance. In terms of the runtime vs. accuracy trade-off, we surpass state of the art (SotA) results on popular semantic segmentation benchmarks: PASCAL VOC 2012 (val. set) and real-time semantic segmentation on Cityscapes, and CamVid. The code is available: https://nirkin.com/hyperseg.

Proceedings ArticleDOI
13 Feb 2021
TL;DR: In this paper, the authors proposed block-wise sparsity strategy for energy-efficient neural network (NN) processors, especially for low-power edge devices, and demonstrated macro and system-level design enabling multi-bit operations and sparsity support.
Abstract: Computing-in-memory (CIM) is an attractive approach for energy-efficient neural network (NN) processors, especially for low-power edge devices. Previous CIM chips [1] –[5] have demonstrated macro and system-level design enabling multi-bit operations and sparsity support. However, several challenges exist, as shown in Fig. 15.2.1. First, though a previously proposed block-wise sparsity strategy [5] can power off ADCs, zeros still contributed to storage requirements, and power gating was not applied to computing resources. Second, on-chip SRAM CIM macros are not large enough to hold all weights. Updating weights between computing operations leads to significant performance loss. Finally, the limited sensing margin incurs poor accuracy for large NN models on practical datasets, such as ImageNet. The precision and power of the ADCs should be optimized and adjusted.

Journal ArticleDOI
TL;DR: The simulation results prove that the proposed density-based content distribution method can obviously reduce the average transmission delay of content distribution under different network conditions and has better stability and self-adaptability under continuous time variation.
Abstract: The satellite-terrestrial networks (STN) utilize the spacious coverage and low transmission latency of Low Earth Orbit (LEO) constellation to distribute requested content for ground subscribers. With the development of storage and computing capacity of satellite onboard equipment, it is considered promising to leverage in-network caching technology on STN to improve content distribution efficiency. However, traditional ground network caching schemes are not suitable in STN, considering dynamic satellite propagation and time-varying topology. More specifically, the unevenness of user distribution results in difficulties for assurance of quality of experience. To address these problems, we firstly propose a density-based block division algorithm to divide the content subscribers into a series of blocks with different sizes according to user density. The LEO satellite orbit and time-varying network model is established to describe STN topology. Next, we propose an approximate minimum coverage vertex set algorithm and a novel cache node selection algorithm for optimal user blocks matching. The simulation results prove that the proposed density-based content distribution method can obviously reduce the average transmission delay of content distribution under different network conditions and has better stability and self-adaptability under continuous time variation.

Journal ArticleDOI
TL;DR: A UAV-aided mobile edge computing system to jointly minimize the energy consumption at the IoT devices and the UAVs during task execution is proposed and a block successive upper-bound minimization (BSUM) algorithm is introduced.
Abstract: Unmanned aerial vehicles (UAVs) have been deployed to enhance the network capacity and provide services to mobile users with or without infrastructure coverage. At the same time, we have observed the exponential growth in Internet of Things (IoTs) devices and applications. However, as IoT devices have limited computation capacity and battery lifetime, it is challenging to process data locally on the devices. To this end, in this letter, a UAV-aided mobile edge computing system is proposed. The problem to jointly minimize the energy consumption at the IoT devices and the UAVs during task execution is studied by optimizing the task offloading decision, resource allocation mechanism and UAV’s trajectory while considering the communication and computation latency requirements. A non-convex structure of the formulated problem is revealed and shown to be challenging to solve. To address this challenge, a block successive upper-bound minimization (BSUM) algorithm is introduced. Finally, simulation results are provided to show the efficiency of our proposed algorithm.

Journal ArticleDOI
TL;DR: A new R package for fitting the Gamma-Poisson distribution to data with the characteristics of modern single cell datasets more quickly and more accurately than existing methods is presented.
Abstract: Motivation The Gamma-Poisson distribution is a theoretically and empirically motivated model for the sampling variability of single cell RNA-sequencing counts and an essential building block for analysis approaches including differential expression analysis, principal component analysis and factor analysis. Existing implementations for inferring its parameters from data often struggle with the size of single cell datasets, which can comprise millions of cells; at the same time, they do not take full advantage of the fact that zero and other small numbers are frequent in the data. These limitations have hampered uptake of the model, leaving room for statistically inferior approaches such as logarithm(-like) transformation. Results We present a new R package for fitting the Gamma-Poisson distribution to data with the characteristics of modern single cell datasets more quickly and more accurately than existing methods. The software can work with data on disk without having to load them into RAM simultaneously. Availabilityand implementation The package glmGamPoi is available from Bioconductor for Windows, macOS and Linux, and source code is available on github.com/const-ae/glmGamPoi under a GPL-3 license. The scripts to reproduce the results of this paper are available on github.com/const-ae/glmGamPoi-Paper. Supplementary information Supplementary data are available at Bioinformatics online.

Journal ArticleDOI
TL;DR: An alternating optimization algorithm with guaranteed convergence is developed to minimize the maximum computation delay among IoT devices with the joint scheduling for association control, computation task allocation, transmission power and bandwidth allocation, UAV computation resource, and deployment position optimization.
Abstract: Space-aerial-assisted computation offloading has been recognized as a promising technique to provide ubiquitous computing services for remote Internet of Things (IoT) applications, such as forest fire monitoring and disaster rescue. This article considers a space-aerial-assisted mixed cloud-edge computing framework, where the flying unmanned aerial vehicles (UAVs) provide IoT devices with low-delay edge computing service and satellites provide ubiquitous access to cloud computing. We aim to minimize the maximum computation delay among IoT devices with the joint scheduling for association control, computation task allocation, transmission power and bandwidth allocation, UAV computation resource, and deployment position optimization. Through exploiting block coordinate descent and successive convex approximation, we develop an alternating optimization algorithm with guaranteed convergence, to solve the formulated problem. Extensive simulation results are provided to demonstrate the remarkable delay reduction of the proposed scheme than existing benchmark methods.

Book ChapterDOI
27 Sep 2021
TL;DR: In this paper, a network based on self-attention between neighboring patches and without any convolution operations was proposed to achieve better segmentation performance than a traditional CNN model for medical image segmentation.
Abstract: Like other applications in computer vision, medical image segmentation and his email address have been most successfully addressed using deep learning models that rely on the convolution operation as their main building block. Convolutions enjoy important properties such as sparse interactions, weight sharing, and translation equivariance. These properties give convolutional neural networks (CNNs) a strong and useful inductive bias for vision tasks. However, the convolution operation also has important shortcomings: it performs a fixed operation on every test image regardless of the content and it cannot efficiently model long-range interactions. In this work we show that a network based on self-attention between neighboring patches and without any convolution operations can achieve better results. Given a 3D image block, our network divides it into \(n^3\) 3D patches, where \(n=3 \text { or } 5\) and computes a 1D embedding for each patch. The network predicts the segmentation map for the center patch of the block based on the self-attention between these patch embeddings. We show that the proposed model can achieve higher segmentation accuracies than a state of the art CNN. For scenarios with very few labeled images, we propose methods for pre-training the network on large corpora of unlabeled images. Our experiments show that with pre-training the advantage of our proposed network over CNNs can be significant when labeled training data is small.

Journal ArticleDOI
TL;DR: A lightweight single image super-resolution network with an expectation-maximization attention mechanism (EMASRN) for better balancing performance and applicability and the experimental results demonstrate the superiority of the EMASRN over state-of-the-art lightweight SISR methods in terms of both quantitative metrics and visual quality.
Abstract: In recent years, with the rapid development of deep learning, super-resolution methods based on convolutional neural networks (CNNs) have made great progress. However, the parameters and the required consumption of computing resources of these methods are also increasing to the point that such methods are difficult to implement on devices with low computing power. To address this issue, we propose a lightweight single image super-resolution network with an expectation-maximization attention mechanism (EMASRN) for better balancing performance and applicability. Specifically, a progressive multi-scale feature extraction block (PMSFE) is proposed to extract feature maps of different sizes. Furthermore, we propose an HR-size expectation-maximization attention block (HREMAB) that directly captures the long-range dependencies of HR-size feature maps. We also utilize a feedback network to feed the high-level features of each generation into the next generationb’s shallow network. Compared with the existing lightweight single image super-resolution (SISR) methods, our EMASRN reduces the number of parameters by almost one-third. The experimental results demonstrate the superiority of our EMASRN over state-of-the-art lightweight SISR methods in terms of both quantitative metrics and visual quality. The source code can be downloaded at https://github.com/xyzhu1/EMASRN.

Proceedings ArticleDOI
17 Oct 2021
TL;DR: In this article, an edge-oriented convolution block (ECB) is proposed for efficient and light-weight super resolution (SR) design, which extracts features in multiple paths, including a normal 3 x 3 convolution, a channel expanding-and-squeezing convolution and 1st-order and 2nd-order spatial derivatives from intermediate features.
Abstract: Efficient and light-weight super resolution (SR) is highly demanded in practical applications. However, most of the existing studies focusing on reducing the number of model parameters and FLOPs may not necessarily lead to faster running speed on mobile devices. In this work, we propose a re-parameterizable building block, namely Edge-oriented Convolution Block (ECB), for efficient SR design. In the training stage, the ECB extracts features in multiple paths, including a normal 3 x 3 convolution, a channel expanding-and-squeezing convolution, and 1st-order and 2nd-order spatial derivatives from intermediate features. In the inference stage, the multiple operations can be merged into one single 3 3 convolution. ECB can be regarded as a drop-in replacement to improve the performance of normal 3 3 convolution without introducing any additional cost in the inference stage. We then propose an extremely efficient SR network for mobile devices based on ECB, namely ECBSR. Extensive experiments across five benchmark datasets demonstrate the effectiveness and efficiency of ECB and ECBSR. Our ECBSR achieves comparable PSNR/SSIM performance to state-of-the-art light-weight SR models, while it can super resolve images from 270p/540p to 1080p in real-time on commodity mobile devices, e.g., Snapdragon 865 SOC and Dimensity 1000+ SOC. The source code can be found at https://github.com/xindongzhang/ECBSR.

Journal ArticleDOI
TL;DR: This article investigates a backscatter-assisted data offloading in OFDMA-based wireless-powered MEC for IoT systems and concludes that the FEA is the best solution which results in a near-globally-optimal solution at a much lower complexity as compared to benchmark schemes.
Abstract: Mobile-edge computing (MEC) has emerged as a prominent technology to overcome sudden demands on computation-intensive applications of the Internet of Things (IoT) with finite processing capabilities Nevertheless, the limited energy resources also seriously hinder IoT devices from offloading tasks that consume high power in active RF communications Despite the development of energy harvesting (EH) techniques, the harvested energy from surrounding environments could be inadequate for power-hungry tasks Fortunately, backscatter communications (Backcom) is an intriguing technology to narrow the gap between the power needed for communication and harvested power Motivated by these considerations, this article investigates a backscatter-assisted data offloading in OFDMA-based wireless-powered (WP) MEC for IoT systems Specifically, we aim at maximizing the sum computation rate by jointly optimizing the transmit power at the gateway (GW), backscatter coefficient, time-splitting (TS) ratio, and binary decision-making matrices This problem is challenging to solve due to its nonconvexity To find solutions, we first simplify the problem by determining the optimal values of transmit power of the GW and backscatter coefficient Then, the original problem is decomposed into two subproblems, namely, TS ratio optimization with given offloading decision matrices and offloading decision optimization with given TS ratio Especially, a closed-form expression for the TS ratio is obtained which greatly enhances the CPU execution time Based on the solutions of the two subproblems, an efficient algorithm, termed the fast-efficient algorithm (FEA), is proposed by leveraging the block coordinate descent method Then, it is compared with exhaustive search (ES), the bisection-based algorithm (BA), edge computing (EC), and local computing (LC) used as reference methods As a result, the FEA is the best solution which results in a near-globally-optimal solution at a much lower complexity as compared to benchmark schemes For instance, the CPU execution time of FEA is about 0029 s in a 50-user network, which is tailored for ultralow latency applications of IoT networks

Journal ArticleDOI
TL;DR: The location-only deep learning method is shown to offer a particularly practical and robust solution alleviating the need for CSI estimation and feedback when line-of-sight (LoS) direct links exist between UEs and the AP.
Abstract: In this paper, we explore optimization-based and data-driven solutions in a reconfigurable intelligent surface (RIS)-aided multi-user mobile edge computing (MEC) system, where the user equipment (UEs) can partially offload their computation tasks to the access point (AP). We aim at maximizing the total completed task-input bits (TCTB) of all UEs with limited energy budgets during a given time slot, through jointly optimizing the RIS reflecting coefficients, the AP’s receive beamforming vectors, and the UEs’ energy partition strategies for local computing and offloading. A three-step block coordinate descending (BCD) algorithm is first proposed to effectively solve the non-convex TCTB maximization problem with guaranteed convergence. In order to reduce the computational complexity and facilitate lightweight online implementation of the optimization algorithm, we further construct two deep learning architectures. The first one takes channel state information (CSI) as input, while the second one exploits the UEs’ locations only for online inference. The two data-driven approaches are trained using data samples generated by the BCD algorithm via supervised learning. Our simulation results reveal a close match between the performance of the optimization-based BCD algorithm and the low-complexity learning-based architectures, all with superior performance to existing schemes in both cases with perfect and imperfect input features. Importantly, the location-only deep learning method is shown to offer a particularly practical and robust solution alleviating the need for CSI estimation and feedback when line-of-sight (LoS) direct links exist between UEs and the AP.

Journal ArticleDOI
TL;DR: Experimental results reveal that the approach outperforms many state-of-the-art techniques in terms of payload, imperceptibility, computational complexity, and capability to detect and localize tamper.
Abstract: Internet of Medical Things (IoMT)-driven smart health and emotional care is revolutionizing the healthcare industry by embracing several technologies related to multimodal physiological data collection, communication, intelligent automation, and efficient manufacturing. The authentication and secure exchange of electronic health records (EHRs), comprising of patient data collected using wearable sensors and laboratory investigations, is of paramount importance. In this article, we present a novel high payload and reversible EHR embedding framework to secure the patient information successfully and authenticate the received content. The proposed approach is based on novel left data mapping (LDM), pixel repetition method (PRM), RC4 encryption, and checksum computation. The input image of size $M \times N$ is upscaled by using PRM that guarantees reversibility with lesser computational complexity. The binary secret data are encrypted using the RC4 encryption algorithm and then the encrypted data are grouped into 3-bit chunks and converted into decimal equivalents. Before embedding, these decimal digits are encoded by LDM. To embed the shifted data, the cover image is divided into $2\times 2$ blocks and then in each block, two digits are embedded into the counter diagonal pixels. For tamper detection and localization, a checksum digit computed from the block is embedded into one of the main diagonal pixels. A fragile logo is embedded into the cover images in addition to EHR to facilitate early tamper detection. The average peak signal to noise ratio (PSNR) of the stego-images obtained is 41.95 dB for a very high embedding capacity of 2.25 bits per pixel. Furthermore, the embedding time is less than 0.2 s. Experimental results reveal that our approach outperforms many state-of-the-art techniques in terms of payload, imperceptibility, computational complexity, and capability to detect and localize tamper. All the attributes affirm that the proposed scheme is a potential candidate for providing better security and authentication solutions for IoMT-based smart health.

Journal ArticleDOI
TL;DR: Based on the benefits of digital CIM, reconfigurability, and bit-serial computing architecture, the Colonnade can achieve both high performance and energy efficiency for processing neural networks.
Abstract: This article (Colonnade) presents a fully digital bit-serial compute-in-memory (CIM) macro. The digital CIM macro is designed for processing neural networks with reconfigurable 1–16 bit input and weight precisions based on bit-serial computing architecture and a novel all-digital bitcell structure. A column of bitcells forms a column MAC and used for computing a multiply-and-accumulate (MAC) operation. The column MACs placed in a row work as a single neuron and computes a dot-product, which is an essential building block of neural network accelerators. Several key features differentiate the proposed Colonnade architecture from the existing analog and digital implementations. First, its full-digital circuit implementation is free from process variation, noise susceptibility, and data-conversion overhead that are prevalent in prior analog CIM macros. A bitwise MAC operation in a bitcell is performed in the digital domain using a custom-designed XNOR gate and a full-adder. Second, the proposed CIM macro is fully reconfigurable in both weight and input precision from 1 to 16 bit. So far, most of the analog macros were used for processing quantized neural networks with very low input/weight precisions, mainly due to a memory density issue. Recent digital accelerators have implemented reconfigurable precisions, but they are inferior in energy efficiency due to significant off-chip memory access. We present a regular digital bitcell array that is readily reconfigured to a 1–16 bit weight-stationary bit-serial CIM macro. The macro computes parallel dot-product operations between the weights stored in memory and inputs that are serialized from LSB to MSB. Finally, the bit-serial computing scheme significantly reduces the area overhead while sacrificing latency due to bit-by-bit operation cycles. Based on the benefits of digital CIM, reconfigurability, and bit-serial computing architecture, the Colonnade can achieve both high performance and energy efficiency (i.e., both benefits of prior analog and digital accelerators) for processing neural networks. A test-chip with $128 \times 128$ SRAM-based bitcells for digital bit-serial computing is implemented using 65-nm technology and tested with 1–16 bit weight/input precisions. The measured energy efficiency is 117.3 TOPS/W at 1 bit and 2.06 TOPS/W at 16 bit.

Journal ArticleDOI
01 Mar 2021
TL;DR: A 3-D multi-UAV deployment approach to provide Quality-of-Service requirements for different types of user distributions in the presence of co-channel interference by maximizing the minimum achievable system throughput for all of the ground users is proposed.
Abstract: Over the past few years, there has been a growing interest in using unmanned aerial vehicles (UAVs) for high-rate wireless communication systems due to their highly flexible deployment and maneuverability. The aim of this article is to propose a 3-D multi-UAV deployment approach to provide Quality-of-Service (QoS) requirements for different types of user distributions in the presence of co-channel interference by maximizing the minimum achievable system throughput for all of the ground users. The proposed approach is divided into two separate algorithms. In the first algorithm, by using the mean-shift technique and prior knowledge of users’ positions provided by the global positioning system (GPS), it has been shown that one can simultaneously find $xy$ coordinates of UAVs, which are associated with the maximum of users’ density and schedule users to UAVs. Once the $xy$ -Cartesian coordinates of UAVs are determined, UAVs’ altitudes and transmit powers are separately optimized. Since these problems are nonconvex optimizations, the successive convex optimization technique has been applied to approximate their nonconvex constraints. In the second algorithm, the block coordinate descent technique is leveraged to jointly optimize UAVs altitudes and transmit powers by tightening the bounds obtained for approximations. It is then proven that the suggested algorithm is guaranteed to converge. The computational complexity of the proposed placement approach is derived. Numerical experiments are carried out to evaluate the performance of our technique and show its superiority to conventional benchmarks.

Journal ArticleDOI
TL;DR: The intra prediction and mode coding of the Versatile Video Coding (VVC) standard is presented and a bitrate saving of 25% on average is reported over H.265/HEVC using an objective metric.
Abstract: This paper presents the intra prediction and mode coding of the Versatile Video Coding (VVC) standard. This standard was collaboratively developed by the Joint Video Experts Team (JVET). It follows the traditional architecture of a hybrid block-based codec that was also the basis of previous standards. Almost all intra prediction features of VVC either contain substantial modifications in comparison with its predecessor H.265/HEVC or were newly added. The key aspects of these tools are the following: 65 angular intra prediction modes with block shape-adaptive directions and 4-tap interpolation filters are supported as well as the DC and Planar mode, Position Dependent Prediction Combination is applied for most of these modes, Multiple Reference Line Prediction can be used, an intra block can be further subdivided by the Intra Subpartition mode, Matrix-based Intra Prediction is supported, and the chroma prediction signal can be generated by the Cross Component Linear Model method. Finally, the intra prediction mode in VVC is coded separately for luma and chroma. Here, a Most Probable Mode list containing six modes is applied for luma. The individual compression performance of tools is reported in this paper. For the full VVC intra codec, a bitrate saving of 25% on average is reported over H.265/HEVC using an objective metric. Significant subjective benefits are illustrated with specific examples.

Journal ArticleDOI
TL;DR: A dilated transaction access and retrieval method that identifies the transaction history based on the non-replicated identity and recursive organization of the block and performs relevance based retrieval to improve the correctness of transaction assessment.
Abstract: Blockchain technology is designed to improve the security features and information access of a transaction in a connected Internet of Things platform The private information retrieval from the transactions using blockchain improves the quality of experience through systematic assessments However, the information retrieval from the fore-gone transaction does not result in maximum profit due to time and sequence of transactions This article introduces a dilated transaction access and retrieval method The proposed method identifies the transaction history based on the non-replicated identity and recursive organization of the block A non-recurrent binary searching process assists information access and retrieval randomly The random process increases the time, and therefore, a transaction-time constraint is used to limit the number of random searches In this method, multi-random searches are initiated in a branched manner for identifying the block Pursued by this access, the relevance based retrieval is performed to improve the correctness of transaction assessment

Journal ArticleDOI
TL;DR: In this paper, a UAV-assisted mobile-edge computing (MEC) system was considered, in which a moving UAV equipped with computing resources was employed to help user devices (UDs) compute their tasks.
Abstract: Unmanned aerial vehicles (UAVs) have been introduced into wireless communication systems to provide high-quality services and enhanced coverage due to their high mobility. In this article, we study a UAV-assisted mobile-edge computing (MEC) system in which a moving UAV equipped with computing resources is employed to help user devices (UDs) compute their tasks. The computing tasks of each UD can be divided into two parts: one portion is processed locally and the remaining portion is offloaded to the UAV for computing. Offloading is enabled by uplink and downlink communications between UDs and the UAV. On this basis, two types of access modes are considered, namely, nonorthogonal and orthogonal multiple access. For both access modes, we formulate new optimization problems to minimize the weighted-sum energy consumption of the UAV and UDs by jointly optimizing the UAV trajectory and computation resource allocation, under the constraint on the number of computation bits. These problems are nonconvex optimization problems that are difficult to solve directly. Accordingly, we develop alternating iterative algorithms to solve them based on the block alternating descent method. Specifically, the UAV trajectory and computation resource allocation are alteratively optimized in each iteration. Extensive simulation results demonstrate the significant energy savings of our proposed joint design over the benchmarks.