scispace - formally typeset
Search or ask a question

Showing papers by "Huawei published in 2021"


Journal ArticleDOI
TL;DR: 6G with additional technical requirements beyond those of 5G will enable faster and further communications to the extent that the boundary between physical and cyber worlds disappears.
Abstract: The fifth generation (5G) wireless communication networks are being deployed worldwide from 2020 and more capabilities are in the process of being standardized, such as mass connectivity, ultra-reliability, and guaranteed low latency. However, 5G will not meet all requirements of the future in 2030 and beyond, and sixth generation (6G) wireless communication networks are expected to provide global coverage, enhanced spectral/energy/cost efficiency, better intelligence level and security, etc. To meet these requirements, 6G networks will rely on new enabling technologies, i.e., air interface and transmission technologies and novel network architecture, such as waveform design, multiple access, channel coding schemes, multi-antenna technologies, network slicing, cell-free architecture, and cloud/fog/edge computing. Our vision on 6G is that it will have four new paradigm shifts. First, to satisfy the requirement of global coverage, 6G will not be limited to terrestrial communication networks, which will need to be complemented with non-terrestrial networks such as satellite and unmanned aerial vehicle (UAV) communication networks, thus achieving a space-air-ground-sea integrated communication network. Second, all spectra will be fully explored to further increase data rates and connection density, including the sub-6 GHz, millimeter wave (mmWave), terahertz (THz), and optical frequency bands. Third, facing the big datasets generated by the use of extremely heterogeneous networks, diverse communication scenarios, large numbers of antennas, wide bandwidths, and new service requirements, 6G networks will enable a new range of smart applications with the aid of artificial intelligence (AI) and big data technologies. Fourth, network security will have to be strengthened when developing 6G networks. This article provides a comprehensive survey of recent advances and future trends in these four aspects. Clearly, 6G with additional technical requirements beyond those of 5G will enable faster and further communications to the extent that the boundary between physical and cyber worlds disappears.

935 citations


Journal ArticleDOI
TL;DR: This work focuses on task incremental classification, where tasks arrive sequentially and are delineated by clear boundaries and study the influence of model capacity, weight decay and dropout regularization, and the order in which the tasks are presented, and qualitatively compare methods in terms of required memory, computation time and storage.
Abstract: Artificial neural networks thrive in solving the classification problem for a particular rigid task, acquiring knowledge through generalized learning behaviour from a distinct training phase. The resulting network resembles a static entity of knowledge, with endeavours to extend this knowledge without targeting the original task resulting in a catastrophic forgetting. Continual learning shifts this paradigm towards networks that can continually accumulate knowledge over different tasks without the need to retrain from scratch. We focus on task incremental classification, where tasks arrive sequentially and are delineated by clear boundaries. Our main contributions concern 1) a taxonomy and extensive overview of the state-of-the-art, 2) a novel framework to continually determine the stability-plasticity trade-off of the continual learner, 3) a comprehensive experimental comparison of 11 state-of-the-art continual learning methods and 4 baselines. We empirically scrutinize method strengths and weaknesses on three benchmarks, considering Tiny Imagenet and large-scale unbalanced iNaturalist and a sequence of recognition datasets. We study the influence of model capacity, weight decay and dropout regularization, and the order in which the tasks are presented, and qualitatively compare methods in terms of required memory, computation time and storage.

866 citations


Proceedings ArticleDOI
01 Jun 2021
TL;DR: Hu et al. as discussed by the authors proposed a pre-trained image processing transformer (IPT) model for denoising, super-resolution and deraining tasks, which is trained on corrupted image pairs with multi-heads and multi-tails.
Abstract: As the computing power of modern hardware is increasing strongly, pre-trained deep learning models (e.g., BERT, GPT-3) learned on large-scale datasets have shown their effectiveness over conventional methods. The big progress is mainly contributed to the representation ability of transformer and its variant architectures. In this paper, we study the low-level computer vision task (e.g., denoising, super-resolution and deraining) and develop a new pre-trained model, namely, image processing transformer (IPT). To maximally excavate the capability of transformer, we present to utilize the well-known ImageNet benchmark for generating a large amount of corrupted image pairs. The IPT model is trained on these images with multi-heads and multi-tails. In addition, the contrastive learning is introduced for well adapting to different image processing tasks. The pre-trained model can therefore efficiently employed on desired task after fine-tuning. With only one pre-trained model, IPT outperforms the current state-of-the-art methods on various low-level benchmarks. Code is available at https://github.com/huawei-noah/Pretrained-IPT and https://gitee.com/mindspore/mindspore/tree/master/model_zoo/research/cv/IPT

416 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present an in-depth tutorial of the 3GPP Release 16 5G NR V2X standard for vehicular communications, with a particular focus on the sidelink.
Abstract: The Third Generation Partnership Project (3GPP) has recently published its Release 16 that includes the first Vehicle-to-Everything (V2X) standard based on the 5G New Radio (NR) air interface 5G NR V2X introduces advanced functionalities on top of the 5G NR air interface to support connected and automated driving use cases with stringent requirements This article presents an in-depth tutorial of the 3GPP Release 16 5G NR V2X standard for V2X communications, with a particular focus on the sidelink, since it is the most significant part of 5G NR V2X The main part of the paper is an in-depth treatment of the key aspects of 5G NR V2X: the physical layer, the resource allocation, the quality of service management, the enhancements introduced to the Uu interface and the mobility management for V2N (Vehicle to Network) communications, as well as the co-existence mechanisms between 5G NR V2X and LTE V2X We also review the use cases, the system architecture, and describe the evaluation methodology and simulation assumptions for 5G NR V2X Finally, we provide an outlook on possible 5G NR V2X enhancements, including those identified within Release 17

350 citations


Journal ArticleDOI
TL;DR: In this article, the authors focus on convergent 6G communication, localization and sensing systems by identifying key technology enablers, discussing their underlying challenges, implementation issues, and recommending potential solutions.
Abstract: Herein, we focus on convergent 6G communication, localization and sensing systems by identifying key technology enablers, discussing their underlying challenges, implementation issues, and recommending potential solutions. Moreover, we discuss exciting new opportunities for integrated localization and sensing applications, which will disrupt traditional design principles and revolutionize the way we live, interact with our environment, and do business. Regarding potential enabling technologies, 6G will continue to develop towards even higher frequency ranges, wider bandwidths, and massive antenna arrays. In turn, this will enable sensing solutions with very fine range, Doppler, and angular resolutions, as well as localization to cm-level degree of accuracy. Besides, new materials, device types, and reconfigurable surfaces will allow network operators to reshape and control the electromagnetic response of the environment. At the same time, machine learning and artificial intelligence will leverage the unprecedented availability of data and computing resources to tackle the biggest and hardest problems in wireless communication systems. As a result, 6G will be truly intelligent wireless systems that will provide not only ubiquitous communication but also empower high accuracy localization and high-resolution sensing services. They will become the catalyst for this revolution by bringing about a unique new set of features and service capabilities, where localization and sensing will coexist with communication, continuously sharing the available resources in time, frequency, and space. This work concludes by highlighting foundational research challenges, as well as implications and opportunities related to privacy, security, and trust.

224 citations


Journal ArticleDOI
TL;DR: The struggles of designing a family of polar codes able to satisfy the demands of 5G systems are illustrated, with particular attention to rate flexibility and low decoding latency.
Abstract: Polar codes have attracted the attention of academia and industry alike in the past decade, such that the $5^{\mathrm {th}}$ generation wireless systems (5G) standardization process of the $3^{\mathrm {rd}}$ generation partnership project (3GPP) chose polar codes as a channel coding scheme. In this tutorial, we provide a description of the encoding process of polar codes adopted by the 5G standard. We illustrate the struggles of designing a family of polar codes able to satisfy the demands of 5G systems, with particular attention to rate flexibility and low decoding latency. The result of these efforts is an elaborate framework that applies novel coding techniques to provide a solid channel code for NR requirements.

197 citations


Journal ArticleDOI
TL;DR: In this article, the authors present an in-depth tutorial of the 3GPP Release 16 5G NR V2X standard, with a particular focus on the sidelink.
Abstract: The Third Generation Partnership Project (3GPP) has recently published its Release 16 that includes the first Vehicle to-Everything (V2X) standard based on the 5G New Radio (NR) air interface. 5G NR V2X introduces advanced functionalities on top of the 5G NR air interface to support connected and automated driving use cases with stringent requirements. This paper presents an in-depth tutorial of the 3GPP Release 16 5G NR V2X standard for V2X communications, with a particular focus on the sidelink, since it is the most significant part of 5G NR V2X. The main part of the paper is an in-depth treatment of the key aspects of 5G NR V2X: the physical layer, the resource allocation, the quality of service management, the enhancements introduced to the Uu interface and the mobility management for V2N (Vehicle to Network) communications, as well as the co-existence mechanisms between 5G NR V2X and LTE V2X. We also review the use cases, the system architecture, and describe the evaluation methodology and simulation assumptions for 5G NR V2X. Finally, we provide an outlook on possible 5G NR V2X enhancements, including those identified within Release 17.

193 citations


Proceedings ArticleDOI
Qinwei Xu1, Ruipeng Zhang1, Ya Zhang1, Yanfeng Wang1, Qi Tian2 
01 Jun 2021
TL;DR: In this article, a Fourier-based data augmentation strategy called amplitude mix is proposed to force the model to capture phase information, which linearly interpolates between the amplitude spectrums of two images.
Abstract: Modern deep neural networks suffer from performance degradation when evaluated on testing data under different distributions from training data. Domain generalization aims at tackling this problem by learning transferable knowledge from multiple source domains in order to generalize to unseen target domains. This paper introduces a novel Fourier-based perspective for domain generalization. The main assumption is that the Fourier phase information contains high-level semantics and is not easily affected by domain shifts. To force the model to capture phase information, we develop a novel Fourier-based data augmentation strategy called amplitude mix which linearly interpolates between the amplitude spectrums of two images. A dual-formed consistency loss called co-teacher regularization is further introduced between the predictions induced from original and augmented images. Extensive experiments on three benchmarks have demonstrated that the proposed method is able to achieve state-of-the-arts performance for domain generalization.

184 citations


Posted Content
TL;DR: In this paper, the authors provide a comprehensive overview on the background, range of key applications and state-of-the-art approaches of Integrated Sensing and Communications (ISAC).
Abstract: As the standardization of 5G is being solidified, researchers are speculating what 6G will be. Integrating sensing functionality is emerging as a key feature of the 6G Radio Access Network (RAN), allowing to exploit the dense cell infrastructure of 5G for constructing a perceptive network. In this paper, we provide a comprehensive overview on the background, range of key applications and state-of-the-art approaches of Integrated Sensing and Communications (ISAC). We commence by discussing the interplay between sensing and communications (S&C) from a historical point of view, and then consider multiple facets of ISAC and its performance gains. By introducing both ongoing and potential use cases, we shed light on industrial progress and standardization activities related to ISAC. We analyze a number of performance tradeoffs between S&C, spanning from information theoretical limits, tradeoffs in physical layer performance, to the tradeoff in cross-layer designs. Next, we discuss signal processing aspects of ISAC, namely ISAC waveform design and receive signal processing. As a step further, we provide our vision on the deeper integration between S&C within the framework of perceptive networks, where the two functionalities are expected to mutually assist each other, i.e., communication-assisted sensing and sensing-assisted communications. Finally, we summarize the paper by identifying the potential integration between ISAC and other emerging communication technologies, and their positive impact on the future of wireless networks.

181 citations


Proceedings ArticleDOI
Mu Hu1, Shuling Wang1, Bin Li1, Shiyu Ning2, Li Fan2, Xiaojin Gong1 
30 May 2021
TL;DR: PENet-ICRA2021 as mentioned in this paper proposes a two-branch backbone that consists of a color-dominant branch to exploit and fuse two modalities thoroughly, and a simple geometric convolutional layer to encode 3D geometric cues.
Abstract: Image guided depth completion is the task of generating a dense depth map from a sparse depth map and a high quality image. In this task, how to fuse the color and depth modalities plays an important role in achieving good performance. This paper proposes a two-branch backbone that consists of a color-dominant branch and a depth-dominant branch to exploit and fuse two modalities thoroughly. More specifically, one branch inputs a color image and a sparse depth map to predict a dense depth map. The other branch takes as inputs the sparse depth map and the previously predicted depth map, and outputs a dense depth map as well. The depth maps predicted from two branches are complimentary to each other and therefore they are adaptively fused. In addition, we also propose a simple geometric convolutional layer to encode 3D geometric cues. The geometric encoded backbone conducts the fusion of different modalities at multiple stages, leading to good depth completion results. We further implement a dilated and accelerated CSPN++ to refine the fused depth map efficiently. The proposed full model ranks 1st in the KITTI depth completion online leaderboard at the time of submission. It also infers much faster than most of the top ranked methods. The code of this work is available at https://github.com/JUGGHM/PENet_ICRA2021.

156 citations


Proceedings ArticleDOI
20 Jun 2021
TL;DR: Neighbor2Neighbor as mentioned in this paper uses a random neighbor sub-sampler for the generation of training image pairs, satisfying the requirement that paired pixels of paired images are neighbors and have very similar appearance with each other.
Abstract: In the last few years, image denoising has benefited a lot from the fast development of neural networks. However, the requirement of large amounts of noisy-clean image pairs for supervision limits the wide use of these models. Although there have been a few attempts in training an image denoising model with only single noisy images, existing self-supervised denoising approaches suffer from inefficient network training, loss of useful information, or dependence on noise modeling. In this paper, we present a very simple yet effective method named Neighbor2Neighbor to train an effective image denoising model with only noisy images. Firstly, a random neighbor sub-sampler is proposed for the generation of training image pairs. In detail, input and target used to train a network are images sub-sampled from the same noisy image, satisfying the requirement that paired pixels of paired images are neighbors and have very similar appearance with each other. Secondly, a denoising network is trained on sub-sampled training pairs generated in the first stage, with a proposed regularizer as additional loss for better performance. The proposed Neighbor2Neighbor framework is able to enjoy the progress of state-of-the-art supervised denoising networks in network architecture design. Moreover, it avoids heavy dependence on the assumption of the noise distribution. We explain our approach from a theoretical perspective and further validate it through extensive experiments, including synthetic experiments with different noise distributions in sRGB space and real-world experiments on a denoising benchmark dataset in raw-RGB space.

Proceedings ArticleDOI
Danny Kai Pin Tan1, Jia He1, Yanchun Li1, Alireza Bayesteh1, Yan Chen1, Peiying Zhu1, Wen Tong1 
23 Feb 2021
TL;DR: In this paper, the authors discuss novel applications, key performance requirements, challenges and future research directions for Integrated Sensing and Communication (ISAC) design in 6G, and demonstrate how the information obtained through sensing can significantly improve the performance of communication.
Abstract: In 6G, it is envisaged that the function of sensing and communication will coexist and be fully integrated in one system, sharing the same resources in time, frequency and space domains, as well as key elements including waveform, signal processing, hardware, etc. This paper discusses novel applications, key performance requirements, challenges and future research directions for Integrated Sensing and Communication (ISAC) design in 6G. First, four categories of ISAC use cases are described as new services in 6G together with their corresponding performance requirements on 6G design. In addition, it is demonstrated how the information obtained through sensing can significantly improve the performance of communication. Thereafter, the challenges in the design and evaluation of a practical ISAC system are discussed, including sensing and communication performance tradeoffs, hardware imperfections, as well as the investigation of an appropriate channel model. Finally, future research directions are presented to set the path for designing an efficient ISAC network.

Posted ContentDOI
12 Mar 2021-medRxiv
TL;DR: In this paper, meta-transcriptome sequencing of 1,067 nasopharyngeal swab samples collected between May 9th and Jun 29th, 2020 during the first peak of the local COVID-19 epidemic was conducted.
Abstract: To unravel the source of SARS-CoV-2 introduction and the pattern of its spreading and evolution in the United Arab Emirates, we conducted meta-transcriptome sequencing of 1,067 nasopharyngeal swab samples collected between May 9th and Jun 29th, 2020 during the first peak of the local COVID-19 epidemic. We identified global clade distribution and eleven novel genetic variants that were almost absent in the rest of the world defined five subclades specific to the UAE viral population. Cross-settlement human-to-human transmission was related to the local business activity. Perhaps surprisingly, at least 5% of the population were co-infected by SARS-CoV-2 of multiple clades within the same host. We also discovered an enrichment of cytosine-to-uracil mutation among the viral population collected from the nasopharynx, that is different from the adenosine-to-inosine change previously reported in the bronchoalveolar lavage fluid samples and a previously unidentified upregulation of APOBEC4 expression in nasopharynx among infected patients, indicating the innate immune host response mediated by ADAR and APOBEC gene families could be tissue-specific. The genomic epidemiological and molecular biological knowledge reported here provides new insights for the SARS-CoV-2 evolution and transmission and points out future direction on host-pathogen interaction investigation.

Proceedings ArticleDOI
Ran Cheng1, Ryan Razani1, Ehsan Taghavi1, Enxu Li1, Bingbing Liu1 
20 Jun 2021
TL;DR: In this paper, an end-to-end encoder-decoder CNN network for 3D LiDAR semantic segmentation is proposed, where a multibranch attentive feature fusion module in the encoder and a unique adaptive feature selection module with feature map re-weighting in the decoder are introduced.
Abstract: Autonomous robotic systems and self driving cars rely on accurate perception of their surroundings as the safety of the passengers and pedestrians is the top priority. Semantic segmentation is one of the essential components of road scene perception that provides semantic information of the surrounding environment. Recently, several methods have been introduced for 3D LiDAR semantic segmentation. While they can lead to improved performance, they are either afflicted by high computational complexity, therefore are inefficient, or they lack fine details of smaller object instances. To alleviate these problems, we propose (AF)2-S3Net, an end-to-end encoder-decoder CNN network for 3D LiDAR semantic segmentation. We present a novel multibranch attentive feature fusion module in the encoder and a unique adaptive feature selection module with feature map re-weighting in the decoder. Our (AF)2-S3Net fuses the voxel-based learning and point-based learning methods into a unified framework to effectively process the potentially large 3D scene. Our experimental results show that the proposed method outperforms the state-of-the-art approaches on the large-scale nuScenes-lidarseg and SemanticKITTI benchmark, ranking 1st on both competitive public leaderboard competitions upon publication.

Journal ArticleDOI
TL;DR: A novel Hierarchical Long Short-Term Concurrent Memory (H-LSTCM) is proposed to model the long-term inter-related dynamics among a group of persons for recognizing human interactions by comparing against baseline and state-of-the-art methods.
Abstract: In this work, we aim to address the problem of human interaction recognition in videos by exploring the long-term inter-related dynamics among multiple persons. Recently, Long Short-Term Memory (LSTM) has become a popular choice to model individual dynamic for single-person action recognition due to its ability to capture the temporal motion information in a range. However, most existing LSTM-based methods focus only on capturing the dynamics of human interaction by simply combining all dynamics of individuals or modeling them as a whole. Such methods neglect the inter-related dynamics of how human interactions change over time. To this end, we propose a novel Hierarchical Long Short-Term Concurrent Memory (H-LSTCM) to model the long-term inter-related dynamics among a group of persons for recognizing human interactions. Specifically, we first feed each person's static features into a Single-Person LSTM to model the single-person dynamic. Subsequently, at one time step, the outputs of all Single-Person LSTM units are fed into a novel Concurrent LSTM (Co-LSTM) unit, which mainly consists of multiple sub-memory units, a new cell gate, and a new co-memory cell. In the Co-LSTM unit, each sub-memory unit stores individual motion information, while this Co-LSTM unit selectively integrates and stores inter-related motion information between multiple interacting persons from multiple sub-memory units via the cell gate and co-memory cell, respectively. Extensive experiments on several public datasets validate the effectiveness of the proposed H-LSTCM by comparing against baseline and state-of-the-art methods.

Posted Content
TL;DR: Conformer as mentioned in this paper adopts a concurrent structure so that local features and global representations are retained to the maximum extent, and outperforms DeiT-B by 2.3% on ImageNet.
Abstract: Within Convolutional Neural Network (CNN), the convolution operations are good at extracting local features but experience difficulty to capture global representations. Within visual transformer, the cascaded self-attention modules can capture long-distance feature dependencies but unfortunately deteriorate local feature details. In this paper, we propose a hybrid network structure, termed Conformer, to take advantage of convolutional operations and self-attention mechanisms for enhanced representation learning. Conformer roots in the Feature Coupling Unit (FCU), which fuses local features and global representations under different resolutions in an interactive fashion. Conformer adopts a concurrent structure so that local features and global representations are retained to the maximum extent. Experiments show that Conformer, under the comparable parameter complexity, outperforms the visual transformer (DeiT-B) by 2.3% on ImageNet. On MSCOCO, it outperforms ResNet-101 by 3.7% and 3.6% mAPs for object detection and instance segmentation, respectively, demonstrating the great potential to be a general backbone network. Code is available at this https URL.

Journal ArticleDOI
TL;DR: In this article, an overhead-aware resource allocation framework for wireless networks where reconfigurable intelligent surfaces are used to improve the communication performance is proposed and incorporated in the expressions of the system rate and energy efficiency.
Abstract: Reconfigurable intelligent surfaces have emerged as a promising technology for future wireless networks. Given that a large number of reflecting elements is typically used and that the surface has no signal processing capabilities, a major challenge is to cope with the overhead that is required to estimate the channel state information and to report the optimized phase shifts to the surface. This issue has not been addressed by previous works, which do not explicitly consider the overhead during the resource allocation phase. This work aims at filling this gap, by developing an overhead-aware resource allocation framework for wireless networks where reconfigurable intelligent surfaces are used to improve the communication performance. An overhead model is proposed and incorporated in the expressions of the system rate and energy efficiency, which are then optimized with respect to the phase shifts of the reconfigurable intelligent surface, the transmit and receive filters, the power and bandwidth used for the communication and feedback phases. The bi-objective maximization of the rate and energy efficiency is investigated, too. The proposed framework characterizes the trade-off between optimized radio resource allocation policies and the related overhead in networks with reconfigurable intelligent surfaces.

Proceedings ArticleDOI
Jianyuan Guo1, Kai Han1, Yunhe Wang1, Han Wu2, Xinghao Chen1, Chunjing Xu1, Chang Xu2 
01 Jun 2021
TL;DR: This paper presents a novel distillation algorithm via decoupled features (DeFeat) for learning a better student detector that is able to surpass the state-of-the-art distillation methods for object detection.
Abstract: Knowledge distillation is a widely used paradigm for inheriting information from a complicated teacher network to a compact student network and maintaining the strong performance. Different from image classification, object detectors are much more sophisticated with multiple loss functions in which features that semantic information rely on are tangled. In this paper, we point out that the information of features derived from regions excluding objects are also essential for distilling the student detector, which is usually ignored in existing approaches. In addition, we elucidate that features from different regions should be assigned with different importance during distillation. To this end, we present a novel distillation algorithm via decoupled features (DeFeat) for learning a better student detector. Specifically, two levels of decoupled features will be processed for embedding useful information into the student, i.e., decoupled features from neck and decoupled proposals from classification head. Extensive experiments on various detectors with different backbones show that the proposed DeFeat is able to surpass the state-of-the-art distillation methods for object detection. For example, DeFeat improves ResNet50 based Faster R-CNN from 37.4% to 40.9% mAP, and improves ResNet50 based RetinaNet from 36.5% to 39.7% mAP on COCO benchmark. Code will be released1,2.

Proceedings ArticleDOI
18 Mar 2021
TL;DR: Peng et al. as mentioned in this paper proposed a two-stage model for diverse inpainting, where the first stage generates multiple coarse results each of which has a different structure, and the second stage refines each coarse result separately by augmenting texture.
Abstract: Given an incomplete image without additional constraint, image inpainting natively allows for multiple solutions as long as they appear plausible. Recently, multiple-solution inpainting methods have been proposed and shown the potential of generating diverse results. However, these methods have difficulty in ensuring the quality of each solution, e.g. they produce distorted structure and/or blurry texture. We propose a two-stage model for diverse inpainting, where the first stage generates multiple coarse results each of which has a different structure, and the second stage refines each coarse result separately by augmenting texture. The proposed model is inspired by the hierarchical vector quantized variational auto-encoder (VQ-VAE), whose hierarchical architecture disentangles structural and textural information. In addition, the vector quantization in VQVAE enables autoregressive modeling of the discrete distribution over the structural information. Sampling from the distribution can easily generate diverse and high-quality structures, making up the first stage of our model. In the second stage, we propose a structural attention module inside the texture generation network, where the module utilizes the structural information to capture distant correlations. We further reuse the VQ-VAE to calculate two feature losses, which help improve structure coherence and texture realism, respectively. Experimental results on CelebA-HQ, Places2, and ImageNet datasets show that our method not only enhances the diversity of the inpainting solutions but also improves the visual quality of the generated multiple images. Code and models are available at: https://github.com/USTC-JialunPeng/Diverse-Structure-Inpainting.

Journal ArticleDOI
TL;DR: A novel learning framework, termed Adversarial Reciprocal Point Learning (ARPL), is proposed to minimize the overlap of known distribution and unknown distributions without loss of known classification accuracy and extensive experimental results indicate that the proposed method is significantly superior to other existing approaches and achieves state-of-the-art performance.
Abstract: Open set recognition (OSR), aiming to simultaneously classify the seen classes and identify the unseen classes as unknown, is essential for reliable machine learning. The key challenge of OSR is how to reduce the empirical classification risk on the labeled known data and the open space risk on the potential unknown data simultaneously. To handle the challenge, we formulate the open space risk problem from the perspective of multi-class integration, and model the unexploited extra-class space with a novel concept Reciprocal Point. Follow this, a novel Adversarial Reciprocal Point Learning framework is proposed to minimize the overlap of known distribution and unknown distributions without loss of known classification accuracy. Specifically, each reciprocal point is learned by the extra-class space with the corresponding known category, and the confrontation among multiple known categories are employed to reduce the empirical classification risk. An adversarial margin constraint is proposed to reduce the open space risk by limiting the latent open space constructed by reciprocal points. Moreover, an instantiated adversarial enhancement method is designed to generate diverse and confusing training samples. Extensive experimental results on various benchmark datasets indicate that the proposed method is significantly superior to existing approaches and achieves state-of-the-art performance.

Journal ArticleDOI
TL;DR: In this article, a general decoupling method based on a new perspective of common mode (CM) and differential mode (DM) cancellation is proposed for two closely spaced antennas, where the mutual coupling effect can be analyzed and solved by exciting them simultaneously with in-phase and out-of-phase signals.
Abstract: In this article, a general decoupling method based on a new perspective of common mode (CM) and differential mode (DM) cancellation is proposed. For two closely spaced antennas, the mutual coupling effect can be analyzed and solved by exciting them simultaneously with in-phase (CM) and out-of-phase (DM) signals. It is theoretically proved that, if CM and DM impedances are the same, the mutual coupling effect between two separated antennas can be totally eliminated. Therefore, we can solve the coupling problem by CM and DM impedance analysis and exploit the unique field properties of characteristic modes to assist in antenna decoupling in a physical intuitive way. To validate the feasibility of this method, two practical design examples, including the decoupling between closely spaced dipole antennas and planar inverted-F antennas, are proposed. Both design examples have demonstrated that the proposed method can provide a systemic design guideline for antenna decoupling and achieve better decoupling performance compared to the conventional decoupling techniques. We forecast the proposed decoupling scheme, with a simplified decoupling procedure, has great potential for the applications of antenna arrays and multi-input multi-output (MIMO) systems.

Posted Content
TL;DR: Transformer iN Transformer (TNT) as discussed by the authors is a new kind of neural architecture which encodes the input data as powerful features via the attention mechanism, where the visual transformers first divide the input images into several local patches and then calculate both representations and their relationship.
Abstract: Transformer is a new kind of neural architecture which encodes the input data as powerful features via the attention mechanism. Basically, the visual transformers first divide the input images into several local patches and then calculate both representations and their relationship. Since natural images are of high complexity with abundant detail and color information, the granularity of the patch dividing is not fine enough for excavating features of objects in different scales and locations. In this paper, we point out that the attention inside these local patches are also essential for building visual transformers with high performance and we explore a new architecture, namely, Transformer iN Transformer (TNT). Specifically, we regard the local patches (e.g., 16$\times$16) as "visual sentences" and present to further divide them into smaller patches (e.g., 4$\times$4) as "visual words". The attention of each word will be calculated with other words in the given visual sentence with negligible computational costs. Features of both words and sentences will be aggregated to enhance the representation ability. Experiments on several benchmarks demonstrate the effectiveness of the proposed TNT architecture, e.g., we achieve an $81.5%$ top-1 accuracy on the ImageNet, which is about $1.7%$ higher than that of the state-of-the-art visual transformer with similar computational cost. The PyTorch code is available at this https URL, and the MindSpore code is at this https URL.

Posted Content
TL;DR: This paper introduces the token semantic coupled attention map (TS-CAM) to take full advantage of the self-attention mechanism in visual transformer for long-range dependency extraction and achieves state-of-the-art performance.
Abstract: Weakly supervised object localization (WSOL) is a challenging problem when given image category labels but requires to learn object localization models. Optimizing a convolutional neural network (CNN) for classification tends to activate local discriminative regions while ignoring complete object extent, causing the partial activation issue. In this paper, we argue that partial activation is caused by the intrinsic characteristics of CNN, where the convolution operations produce local receptive fields and experience difficulty to capture long-range feature dependency among pixels. We introduce the token semantic coupled attention map (TS-CAM) to take full advantage of the self-attention mechanism in visual transformer for long-range dependency extraction. TS-CAM first splits an image into a sequence of patch tokens for spatial embedding, which produce attention maps of long-range visual dependency to avoid partial activation. TS-CAM then re-allocates category-related semantics for patch tokens, enabling each of them to be aware of object categories. TS-CAM finally couples the patch tokens with the semantic-agnostic attention map to achieve semantic-aware localization. Experiments on the ILSVRC/CUB-200-2011 datasets show that TS-CAM outperforms its CNN-CAM counterparts by 7.1%/27.1% for WSOL, achieving state-of-the-art performance.

Posted Content
TL;DR: DetCo as mentioned in this paper explores the contrasts between global image and local image patches to learn discriminative representations for object detection and achieves state-of-the-art results on object detection.
Abstract: Unsupervised contrastive learning achieves great success in learning image representations with CNN. Unlike most recent methods that focused on improving accuracy of image classification, we present a novel contrastive learning approach, named DetCo, which fully explores the contrasts between global image and local image patches to learn discriminative representations for object detection. DetCo has several appealing benefits. (1) It is carefully designed by investigating the weaknesses of current self-supervised methods, which discard important representations for object detection. (2) DetCo builds hierarchical intermediate contrastive losses between global image and local patches to improve object detection, while maintaining global representations for image recognition. Theoretical analysis shows that the local patches actually remove the contextual information of an image, improving the lower bound of mutual information for better contrastive learning. (3) Extensive experiments on PASCAL VOC, COCO and Cityscapes demonstrate that DetCo not only outperforms state-of-the-art methods on object detection, but also on segmentation, pose estimation, and 3D shape prediction, while it is still competitive on image classification. For example, on PASCAL VOC, DetCo-100ep achieves 57.4 mAP, which is on par with the result of MoCov2-800ep. Moreover, DetCo consistently outperforms supervised method by 1.6/1.2/1.0 AP on Mask RCNN-C4/FPN/RetinaNet with 1x schedule. Code will be released at \href{this https URL}{\color{blue}{\tt this http URL}}.

Proceedings ArticleDOI
26 Oct 2021
TL;DR: UltraGCN as discussed by the authors proposes an ultra-simplified formulation of GCNs, which skips infinite layers of message passing for efficient recommendation, instead of explicit message passing, resorting to directly approximate the limit of infinite-layer graph convolutions via a constraint loss.
Abstract: With the recent success of graph convolutional networks (GCNs), they have been widely applied for recommendation, and achieved impressive performance gains. The core of GCNs lies in its message passing mechanism to aggregate neighborhood information. However, we observed that message passing largely slows down the convergence of GCNs during training, especially for large-scale recommender systems, which hinders their wide adoption. LightGCN makes an early attempt to simplify GCNs for collaborative filtering by omitting feature transformations and nonlinear activations. In this paper, we take one step further to propose an ultra-simplified formulation of GCNs (dubbed UltraGCN), which skips infinite layers of message passing for efficient recommendation. Instead of explicit message passing, UltraGCN resorts to directly approximate the limit of infinite-layer graph convolutions via a constraint loss. Meanwhile, UltraGCN allows for more appropriate edge weight assignments and flexible adjustment of the relative importances among different types of relationships. This finally yields a simple yet effective UltraGCN model, which is easy to implement and efficient to train. Experimental results on four benchmark datasets show that UltraGCN not only outperforms the state-of-the-art GCN models but also achieves more than 10x speedup over LightGCN.

Journal ArticleDOI
TL;DR: In this paper, the authors provide a vision for scalable and trustworthy edge AI systems with integrated design of wireless communication strategies and decentralized machine learning models, as well as a holistic end-to-end system architecture to support edge AI.
Abstract: The thriving of artificial intelligence (AI) applications is driving the further evolution of wireless networks. It has been envisioned that 6G will be transformative and will revolutionize the evolution of wireless from “connected things” to “connected intelligence”. However, state-of-the-art deep learning and big data analytics based AI systems require tremendous computation and communication resources, causing significant latency, energy consumption, network congestion, and privacy leakage in both of the training and inference processes. By embedding model training and inference capabilities into the network edge, edge AI stands out as a disruptive technology for 6G to seamlessly integrate sensing, communication, computation, and intelligence, thereby improving the efficiency, effectiveness, privacy, and security of 6G networks. In this paper, we shall provide our vision for scalable and trustworthy edge AI systems with integrated design of wireless communication strategies and decentralized machine learning models. New design principles of wireless networks, service-driven resource allocation optimization methods, as well as a holistic end-to-end system architecture to support edge AI will be described. Standardization, software and hardware platforms, and application scenarios are also discussed to facilitate the industrialization and commercialization of edge AI systems.

Journal ArticleDOI
TL;DR: In this article, the development, structure, and materials of skyrmions are discussed and the recent progress in skyrMion devices for memory and logic applications and discuss their challenges and prospects.
Abstract: Skyrmions have received considerable attention in various studies since the experimental observation in magnetic materials in 2009. Skyrmions, which are topological, particle-like localized structures, show significant fundamental research value in the field of physics and materials and are also regarded as novel information carriers that have the potential for use in developing high-density, low-power, and multi-functional spintronic devices. In this Perspective, we first overview the development, structure, and materials of skyrmions. Subsequently, we focus on the recent progress in skyrmion devices for memory and logic applications and discuss their challenges and prospects.

Proceedings ArticleDOI
01 Jun 2021
TL;DR: Zhang et al. as mentioned in this paper proposed a Context-Aware Biaffine Localizing Network (CBLN) which incorporates both local and global contexts into features of each start/end position for biaffin-based localization.
Abstract: This paper addresses the problem of temporal sentence grounding (TSG), which aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query. Previous works either compare pre-defined candidate segments with the query and select the best one by ranking, or directly regress the boundary timestamps of the target segment. In this paper, we propose a novel localization framework that scores all pairs of start and end indices within the video simultaneously with a biaffine mechanism. In particular, we present a Context-aware Biaffine Localizing Network (CBLN) which incorporates both local and global contexts into features of each start/end position for biaffine-based localization. The local contexts from the adjacent frames help distinguish the visually similar appearance, and the global contexts from the entire video contribute to reasoning the temporal relation. Besides, we also develop a multi-modal self-attention module to provide fine-grained query-guided video representation for this biaffine strategy. Extensive experiments show that our CBLN significantly outperforms state-of-thearts on three public datasets (ActivityNet Captions, TACoS, and Charades-STA), demonstrating the effectiveness of the proposed localization framework. The code is available at https://github.com/liudaizong/CBLN.

Journal ArticleDOI
Wen Wu1, Nan Chen1, Conghao Zhou1, Mushu Li1, Xuemin Shen1, Weihua Zhuang1, Xu Li2 
TL;DR: This paper proposes a two-layer constrained RL algorithm, named RAWS, which effectively reduces the system cost while satisfying QoS requirements with a high probability, as compared with benchmarks.
Abstract: In this paper, we investigate a radio access network (RAN) slicing problem for Internet of vehicles (IoV) services with different quality of service (QoS) requirements, in which multiple logically-isolated slices are constructed on a common roadside network infrastructure. A dynamic RAN slicing framework is presented to dynamically allocate radio spectrum and computing resource, and distribute computation workloads for the slices. To obtain an optimal RAN slicing policy for accommodating the spatial-temporal dynamics of vehicle traffic density, we first formulate a constrained RAN slicing problem with the objective to minimize long-term system cost. This problem cannot be directly solved by traditional reinforcement learning (RL) algorithms due to complicated coupled constraints among decisions. Therefore, we decouple the problem into a resource allocation subproblem and a workload distribution subproblem, and propose a two-layer constrained RL algorithm, named R esource A llocation and W orkload di S tribution (RAWS) to solve them. Specifically, an outer layer first makes the resource allocation decision via an RL algorithm, and then an inner layer makes the workload distribution decision via an optimization subroutine. Extensive trace-driven simulations show that the RAWS effectively reduces the system cost while satisfying QoS requirements with a high probability, as compared with benchmarks.

Proceedings ArticleDOI
01 Jun 2021
TL;DR: In this article, the authors introduce Time Lens, a novel method that leverages the advantages of both synthesis-based and flow-based approaches, and extensively evaluate their method on three synthetic and two real benchmarks where they show an up to 5.21 dB improvement in terms of PSNR over state-of-the-art frame interpolation methods.
Abstract: State-of-the-art frame interpolation methods generate intermediate frames by inferring object motions in the image from consecutive key-frames. In the absence of additional information, first-order approximations, i.e. optical flow, must be used, but this choice restricts the types of motions that can be modeled, leading to errors in highly dynamic scenarios. Event cameras are novel sensors that address this limitation by providing auxiliary visual information in the blind-time between frames. They asynchronously measure per-pixel brightness changes and do this with high temporal resolution and low latency. Event-based frame interpolation methods typically adopt a synthesis-based approach, where predicted frame residuals are directly applied to the key-frames. However, while these approaches can capture non-linear motions they suffer from ghosting and perform poorly in low-texture regions with few events. Thus, synthesis-based and flow-based approaches are complementary. In this work, we introduce Time Lens, a novel method that leverages the advantages of both. We extensively evaluate our method on three synthetic and two real benchmarks where we show an up to 5.21 dB improvement in terms of PSNR over state-of-the-art frame-based and event-based methods. Finally, we release a new large-scale dataset in highly dynamic scenarios, aimed at pushing the limits of existing methods.