scispace - formally typeset
Search or ask a question

Showing papers by "Huawei published in 2019"


Journal ArticleDOI
TL;DR: In this article, the authors developed energy-efficient designs for both the transmit power allocation and the phase shifts of the surface reflecting elements subject to individual link budget guarantees for the mobile users.
Abstract: The adoption of a reconfigurable intelligent surface (RIS) for downlink multi-user communication from a multi-antenna base station is investigated in this paper. We develop energy-efficient designs for both the transmit power allocation and the phase shifts of the surface reflecting elements subject to individual link budget guarantees for the mobile users. This leads to non-convex design optimization problems for which to tackle we propose two computationally affordable approaches, capitalizing on alternating maximization, gradient descent search, and sequential fractional programming. Specifically, one algorithm employs gradient descent for obtaining the RIS phase coefficients, and fractional programming for optimal transmit power allocation. Instead, the second algorithm employs sequential fractional programming for the optimization of the RIS phase shifts. In addition, a realistic power consumption model for RIS-based systems is presented, and the performance of the proposed methods is analyzed in a realistic outdoor environment. In particular, our results show that the proposed RIS-based resource allocation methods are able to provide up to 300% higher energy efficiency in comparison with the use of regular multi-antenna amplify-and-forward relaying.

1,967 citations


Journal ArticleDOI
TL;DR: This paper overviews the current research efforts on smart radio environments, the enabling technologies to realize them in practice, the need of new communication-theoretic models for their analysis and design, and the long-term and open research issues to be solved towards their massive deployment.
Abstract: Future wireless networks are expected to constitute a distributed intelligent wireless communications, sensing, and computing platform, which will have the challenging requirement of interconnecting the physical and digital worlds in a seamless and sustainable manner. Currently, two main factors prevent wireless network operators from building such networks: (1) the lack of control of the wireless environment, whose impact on the radio waves cannot be customized, and (2) the current operation of wireless radios, which consume a lot of power because new signals are generated whenever data has to be transmitted. In this paper, we challenge the usual “more data needs more power and emission of radio waves” status quo, and motivate that future wireless networks necessitate a smart radio environment: a transformative wireless concept, where the environmental objects are coated with artificial thin films of electromagnetic and reconfigurable material (that are referred to as reconfigurable intelligent meta-surfaces), which are capable of sensing the environment and of applying customized transformations to the radio waves. Smart radio environments have the potential to provide future wireless networks with uninterrupted wireless connectivity, and with the capability of transmitting data without generating new signals but recycling existing radio waves. We will discuss, in particular, two major types of reconfigurable intelligent meta-surfaces applied to wireless networks. The first type of meta-surfaces will be embedded into, e.g., walls, and will be directly controlled by the wireless network operators via a software controller in order to shape the radio waves for, e.g., improving the network coverage. The second type of meta-surfaces will be embedded into objects, e.g., smart t-shirts with sensors for health monitoring, and will backscatter the radio waves generated by cellular base stations in order to report their sensed data to mobile phones. These functionalities will enable wireless network operators to offer new services without the emission of additional radio waves, but by recycling those already existing for other purposes. This paper overviews the current research efforts on smart radio environments, the enabling technologies to realize them in practice, the need of new communication-theoretic models for their analysis and design, and the long-term and open research issues to be solved towards their massive deployment. In a nutshell, this paper is focused on discussing how the availability of reconfigurable intelligent meta-surfaces will allow wireless network operators to redesign common and well-known network communication paradigms.

1,504 citations


Journal ArticleDOI
TL;DR: In this article, a comprehensive tutorial on the potential benefits and applications of UAVs in wireless communications is presented, and the important challenges and the fundamental tradeoffs in UAV-enabled wireless networks are thoroughly investigated.
Abstract: The use of flying platforms such as unmanned aerial vehicles (UAVs), popularly known as drones, is rapidly growing. In particular, with their inherent attributes such as mobility, flexibility, and adaptive altitude, UAVs admit several key potential applications in wireless systems. On the one hand, UAVs can be used as aerial base stations to enhance coverage, capacity, reliability, and energy efficiency of wireless networks. On the other hand, UAVs can operate as flying mobile terminals within a cellular network. Such cellular-connected UAVs can enable several applications ranging from real-time video streaming to item delivery. In this paper, a comprehensive tutorial on the potential benefits and applications of UAVs in wireless communications is presented. Moreover, the important challenges and the fundamental tradeoffs in UAV-enabled wireless networks are thoroughly investigated. In particular, the key UAV challenges such as 3D deployment, performance analysis, channel modeling, and energy efficiency are explored along with representative results. Then, open problems and potential research directions pertaining to UAV communications are introduced. Finally, various analytical frameworks and mathematical tools, such as optimization theory, machine learning, stochastic geometry, transport theory, and game theory are described. The use of such tools for addressing unique UAV problems is also presented. In a nutshell, this tutorial provides key guidelines on how to analyze, optimize, and design UAV-based wireless communication systems.

1,395 citations


Proceedings ArticleDOI
19 Apr 2019
TL;DR: CenterNet as discussed by the authors detects each object as a triplet, rather than a pair, of keypoints, which improves both precision and recall by enriching information collected by both the top-left and bottom-right corners and providing more recognizable information from the central regions.
Abstract: In object detection, keypoint-based approaches often experience the drawback of a large number of incorrect object bounding boxes, arguably due to the lack of an additional assessment inside cropped regions. This paper presents an efficient solution that explores the visual patterns within individual cropped regions with minimal costs. We build our framework upon a representative one-stage keypoint-based detector named CornerNet. Our approach, named CenterNet, detects each object as a triplet, rather than a pair, of keypoints, which improves both precision and recall. Accordingly, we design two customized modules, cascade corner pooling, and center pooling, that enrich information collected by both the top-left and bottom-right corners and provide more recognizable information from the central regions. On the MS-COCO dataset, CenterNet achieves an AP of 47.0 %, outperforming all existing one-stage detectors by at least 4.9%. Furthermore, with a faster inference speed than the top-ranked two-stage detectors, CenterNet demonstrates a comparable performance to these detectors. Code is available at https://github.com/Duankaiwen/CenterNet.

1,199 citations


Posted Content
TL;DR: This paper presents an efficient solution that explores the visual patterns within individual cropped regions with minimal costs, and builds the framework upon a representative one-stage keypoint-based detector named CornerNet, which improves both precision and recall.
Abstract: In object detection, keypoint-based approaches often suffer a large number of incorrect object bounding boxes, arguably due to the lack of an additional look into the cropped regions. This paper presents an efficient solution which explores the visual patterns within each cropped region with minimal costs. We build our framework upon a representative one-stage keypoint-based detector named CornerNet. Our approach, named CenterNet, detects each object as a triplet, rather than a pair, of keypoints, which improves both precision and recall. Accordingly, we design two customized modules named cascade corner pooling and center pooling, which play the roles of enriching information collected by both top-left and bottom-right corners and providing more recognizable information at the central regions, respectively. On the MS-COCO dataset, CenterNet achieves an AP of 47.0%, which outperforms all existing one-stage detectors by at least 4.9%. Meanwhile, with a faster inference speed, CenterNet demonstrates quite comparable performance to the top-ranked two-stage detectors. Code is available at this https URL.

1,136 citations


Proceedings ArticleDOI
Zhengyan Zhang1, Xu Han1, Zhiyuan Liu1, Xin Jiang2, Maosong Sun1, Qun Liu2 
17 May 2019
TL;DR: This paper utilizes both large-scale textual corpora and KGs to train an enhanced language representation model (ERNIE) which can take full advantage of lexical, syntactic, and knowledge information simultaneously, and is comparable with the state-of-the-art model BERT on other common NLP tasks.
Abstract: Neural language representation models such as BERT pre-trained on large-scale corpora can well capture rich semantic patterns from plain text, and be fine-tuned to consistently improve the performance of various NLP tasks. However, the existing pre-trained language models rarely consider incorporating knowledge graphs (KGs), which can provide rich structured knowledge facts for better language understanding. We argue that informative entities in KGs can enhance language representation with external knowledge. In this paper, we utilize both large-scale textual corpora and KGs to train an enhanced language representation model (ERNIE), which can take full advantage of lexical, syntactic, and knowledge information simultaneously. The experimental results have demonstrated that ERNIE achieves significant improvements on various knowledge-driven tasks, and meanwhile is comparable with the state-of-the-art model BERT on other common NLP tasks. The code and datasets will be available in the future.

1,076 citations


Journal ArticleDOI
TL;DR: In this paper, the authors proposed to integrate the Deep Reinforcement Learning techniques and Federated Learning framework with mobile edge systems for optimizing mobile edge computing, caching and communication, and designed the "In-Edge AI" framework in order to intelligently utilize the collaboration among devices and edge nodes to exchange the learning parameters for a better training and inference of the models, and thus to carry out dynamic system-level optimization and application-level enhancement while reducing the unnecessary system communication load.
Abstract: Recently, along with the rapid development of mobile communication technology, edge computing theory and techniques have been attracting more and more attention from global researchers and engineers, which can significantly bridge the capacity of cloud and requirement of devices by the network edges, and thus can accelerate content delivery and improve the quality of mobile services. In order to bring more intelligence to edge systems, compared to traditional optimization methodology, and driven by the current deep learning techniques, we propose to integrate the Deep Reinforcement Learning techniques and Federated Learning framework with mobile edge systems, for optimizing mobile edge computing, caching and communication. And thus, we design the "In-Edge AI" framework in order to intelligently utilize the collaboration among devices and edge nodes to exchange the learning parameters for a better training and inference of the models, and thus to carry out dynamic system-level optimization and application-level enhancement while reducing the unnecessary system communication load. "In-Edge AI" is evaluated and proved to have near-optimal performance but relatively low overhead of learning, while the system is cognitive and adaptive to mobile communication systems. Finally, we discuss several related challenges and opportunities for unveili

764 citations


Proceedings ArticleDOI
Yang He1, Ping Liu1, Ziwei Wang, Zhilan Hu2, Yi Yang1 
15 Jun 2019
TL;DR: He et al. as discussed by the authors proposed a filter pruning via geometric median (FPGM) method to compress CNN models by pruning filters with redundancy, rather than those with relatively less importance.
Abstract: Previous works utilized “smaller-norm-less-important” criterion to prune filters with smaller norm values in a convolutional neural network In this paper, we analyze this norm-based criterion and point out that its effectiveness depends on two requirements that are not always met: (1) the norm deviation of the filters should be large; (2) the minimum norm of the filters should be small To solve this problem, we propose a novel filter pruning method, namely Filter Pruning via Geometric Median (FPGM), to compress the model regardless of those two requirements Unlike previous methods, FPGM compresses CNN models by pruning filters with redundancy, rather than those with“relatively less” importance When applied to two image classification benchmarks, our method validates its usefulness and strengths Notably, on CIFAR-10, FPGM reduces more than 52% FLOPs on ResNet-110 with even 269% relative accuracy improvement Moreover, on ILSVRC-2012, FPGM reduces more than 42% FLOPs on ResNet-101 without top-5 accuracy drop, which has advanced the state-of-the-art Code is publicly available on GitHub: https://githubcom/he-y/filter-pruning-geometric-median

698 citations


Proceedings ArticleDOI
15 Jun 2019
TL;DR: This model learns the semantic labels in a supervised fashion, and broadens its understanding of the data by learning from self-supervised signals how to solve a jigsaw puzzle on the same images, which helps the network to learn the concepts of spatial correlation while acting as a regularizer for the classification task.
Abstract: Human adaptability relies crucially on the ability to learn and merge knowledge both from supervised and unsupervised learning: the parents point out few important concepts, but then the children fill in the gaps on their own. This is particularly effective, because supervised learning can never be exhaustive and thus learning autonomously allows to discover invariances and regularities that help to generalize. In this paper we propose to apply a similar approach to the task of object recognition across domains: our model learns the semantic labels in a supervised fashion, and broadens its understanding of the data by learning from self-supervised signals how to solve a jigsaw puzzle on the same images. This secondary task helps the network to learn the concepts of spatial correlation while acting as a regularizer for the classification task. Multiple experiments on the PACS, VLCS, Office-Home and digits datasets confirm our intuition and show that this simple method outperforms previous domain generalization and adaptation solutions. An ablation study further illustrates the inner workings of our approach.

678 citations


Journal ArticleDOI
TL;DR: This paper constitutes the first holistic tutorial on the development of ANN-based ML techniques tailored to the needs of future wireless networks and overviews how artificial neural networks (ANNs)-based ML algorithms can be employed for solving various wireless networking problems.
Abstract: In order to effectively provide ultra reliable low latency communications and pervasive connectivity for Internet of Things (IoT) devices, next-generation wireless networks can leverage intelligent, data-driven functions enabled by the integration of machine learning (ML) notions across the wireless core and edge infrastructure. In this context, this paper provides a comprehensive tutorial that overviews how artificial neural networks (ANNs)-based ML algorithms can be employed for solving various wireless networking problems. For this purpose, we first present a detailed overview of a number of key types of ANNs that include recurrent, spiking, and deep neural networks, that are pertinent to wireless networking applications. For each type of ANN, we present the basic architecture as well as specific examples that are particularly important and relevant wireless network design. Such ANN examples include echo state networks, liquid state machine, and long short term memory. Then, we provide an in-depth overview on the variety of wireless communication problems that can be addressed using ANNs, ranging from communication using unmanned aerial vehicles to virtual reality applications over wireless networks as well as edge computing and caching. For each individual application, we present the main motivation for using ANNs along with the associated challenges while we also provide a detailed example for a use case scenario and outline future works that can be addressed using ANNs. In a nutshell, this paper constitutes the first holistic tutorial on the development of ANN-based ML techniques tailored to the needs of future wireless networks.

666 citations


Posted Content
TL;DR: A novel Ghost module is proposed to generate more feature maps from cheap operations based on a set of intrinsic feature maps to generate many ghost feature maps that could fully reveal information underlying intrinsic features.
Abstract: Deploying convolutional neural networks (CNNs) on embedded devices is difficult due to the limited memory and computation resources. The redundancy in feature maps is an important characteristic of those successful CNNs, but has rarely been investigated in neural architecture design. This paper proposes a novel Ghost module to generate more feature maps from cheap operations. Based on a set of intrinsic feature maps, we apply a series of linear transformations with cheap cost to generate many ghost feature maps that could fully reveal information underlying intrinsic features. The proposed Ghost module can be taken as a plug-and-play component to upgrade existing convolutional neural networks. Ghost bottlenecks are designed to stack Ghost modules, and then the lightweight GhostNet can be easily established. Experiments conducted on benchmarks demonstrate that the proposed Ghost module is an impressive alternative of convolution layers in baseline models, and our GhostNet can achieve higher recognition performance (e.g. $75.7\%$ top-1 accuracy) than MobileNetV3 with similar computational cost on the ImageNet ILSVRC-2012 classification dataset. Code is available at this https URL

Proceedings ArticleDOI
15 Jun 2019
TL;DR: The proposed AS-GCN achieves consistently large improvement compared to the state-of-the-art methods and shows promising results for future pose prediction.
Abstract: Action recognition with skeleton data has recently attracted much attention in computer vision. Previous studies are mostly based on fixed skeleton graphs, only capturing local physical dependencies among joints, which may miss implicit joint correlations. To capture richer dependencies, we introduce an encoder-decoder structure, called A-link inference module, to capture action-specific latent dependencies, i.e. actional links, directly from actions. We also extend the existing skeleton graphs to represent higher-order dependencies, i.e. structural links. Combing the two types of links into a generalized skeleton graph, We further propose the actional-structural graph convolution network (AS-GCN), which stacks actional-structural graph convolution and temporal convolution as a basic building block, to learn both spatial and temporal features for action recognition. A future pose prediction head is added in parallel to the recognition head to help capture more detailed action patterns through self-supervision. We validate AS-GCN in action recognition using two skeleton data sets, NTU-RGB+D and Kinetics. The proposed AS-GCN achieves consistently large improvement compared to the state-of-the-art methods. As a side product, AS-GCN also shows promising results for future pose prediction.

Posted Content
TL;DR: A novel Transformer distillation method that is specially designed for knowledge distillation (KD) of the Transformer-based models is proposed and, by leveraging this new KD method, the plenty of knowledge encoded in a large “teacher” BERT can be effectively transferred to a small “student” TinyBERT.
Abstract: Language model pre-training, such as BERT, has significantly improved the performances of many natural language processing tasks. However, pre-trained language models are usually computationally expensive, so it is difficult to efficiently execute them on resource-restricted devices. To accelerate inference and reduce model size while maintaining accuracy, we first propose a novel Transformer distillation method that is specially designed for knowledge distillation (KD) of the Transformer-based models. By leveraging this new KD method, the plenty of knowledge encoded in a large teacher BERT can be effectively transferred to a small student Tiny-BERT. Then, we introduce a new two-stage learning framework for TinyBERT, which performs Transformer distillation at both the pretraining and task-specific learning stages. This framework ensures that TinyBERT can capture he general-domain as well as the task-specific knowledge in BERT. TinyBERT with 4 layers is empirically effective and achieves more than 96.8% the performance of its teacher BERTBASE on GLUE benchmark, while being 7.5x smaller and 9.4x faster on inference. TinyBERT with 4 layers is also significantly better than 4-layer state-of-the-art baselines on BERT distillation, with only about 28% parameters and about 31% inference time of them. Moreover, TinyBERT with 6 layers performs on-par with its teacher BERTBASE.

Journal ArticleDOI
TL;DR: In this paper, a survey on the relationship between edge intelligence and intelligent edge computing is presented, and the practical implementation methods and enabling technologies, namely DL training and inference in the customized edge computing framework, challenges and future trends of more pervasive and fine-grained intelligence.
Abstract: Ubiquitous sensors and smart devices from factories and communities are generating massive amounts of data, and ever-increasing computing power is driving the core of computation and services from the cloud to the edge of the network. As an important enabler broadly changing people's lives, from face recognition to ambitious smart factories and cities, developments of artificial intelligence (especially deep learning, DL) based applications and services are thriving. However, due to efficiency and latency issues, the current cloud computing service architecture hinders the vision of "providing artificial intelligence for every person and every organization at everywhere". Thus, unleashing DL services using resources at the network edge near the data sources has emerged as a desirable solution. Therefore, edge intelligence, aiming to facilitate the deployment of DL services by edge computing, has received significant attention. In addition, DL, as the representative technique of artificial intelligence, can be integrated into edge computing frameworks to build intelligent edge for dynamic, adaptive edge maintenance and management. With regard to mutually beneficial edge intelligence and intelligent edge, this paper introduces and discusses: 1) the application scenarios of both; 2) the practical implementation methods and enabling technologies, namely DL training and inference in the customized edge computing framework; 3) challenges and future trends of more pervasive and fine-grained intelligence. We believe that by consolidating information scattered across the communication, networking, and DL areas, this survey can help readers to understand the connections between enabling technologies while promoting further discussions on the fusion of edge intelligence and intelligent edge, i.e., Edge DL.

Posted Content
TL;DR: The fundamental differences with other technologies, the most important open research issues to tackle, and the reasons why the use of reconfigurable intelligent surfaces necessitates to rethink the communication-theoretic models currently employed in wireless networks are elaborated.
Abstract: The future of mobile communications looks exciting with the potential new use cases and challenging requirements of future 6th generation (6G) and beyond wireless networks. Since the beginning of the modern era of wireless communications, the propagation medium has been perceived as a randomly behaving entity between the transmitter and the receiver, which degrades the quality of the received signal due to the uncontrollable interactions of the transmitted radio waves with the surrounding objects. The recent advent of reconfigurable intelligent surfaces in wireless communications enables, on the other hand, network operators to control the scattering, reflection, and refraction characteristics of the radio waves, by overcoming the negative effects of natural wireless propagation. Recent results have revealed that reconfigurable intelligent surfaces can effectively control the wavefront, e.g., the phase, amplitude, frequency, and even polarization, of the impinging signals without the need of complex decoding, encoding, and radio frequency processing operations. Motivated by the potential of this emerging technology, the present article is aimed to provide the readers with a detailed overview and historical perspective on state-of-the-art solutions, and to elaborate on the fundamental differences with other technologies, the most important open research issues to tackle, and the reasons why the use of reconfigurable intelligent surfaces necessitates to rethink the communication-theoretic models currently employed in wireless networks. This article also explores theoretical performance limits of reconfigurable intelligent surface-assisted communication systems using mathematical techniques and elaborates on the potential use cases of intelligent surfaces in 6G and beyond wireless networks.

Proceedings ArticleDOI
15 Jun 2019
TL;DR: In this article, a modular co-attention network (MCAN) is proposed, which consists of Modular Co-Attention (MCA) layers cascaded in depth.
Abstract: Visual Question Answering (VQA) requires a fine-grained and simultaneous understanding of both the visual content of images and the textual content of questions. Therefore, designing an effective `co-attention' model to associate key words in questions with key objects in images is central to VQA performance. So far, most successful attempts at co-attention learning have been achieved by using shallow models, and deep co-attention models show little improvement over their shallow counterparts. In this paper, we propose a deep Modular Co-Attention Network (MCAN) that consists of Modular Co-Attention (MCA) layers cascaded in depth. Each MCA layer models the self-attention of questions and images, as well as the question-guided-attention of images jointly using a modular composition of two basic attention units. We quantitatively and qualitatively evaluate MCAN on the benchmark VQA-v2 dataset and conduct extensive ablation studies to explore the reasons behind MCAN's effectiveness. Experimental results demonstrate that MCAN significantly outperforms the previous state-of-the-art. Our best single model delivers 70.63% overall accuracy on the test-dev set.

Journal ArticleDOI
TL;DR: In this article, a novel concept of three-dimensional (3D) cellular networks, that integrate drone base stations (drone-BSs) and cellular-connected drone users (Drone-UEs), is introduced.
Abstract: In this paper, a novel concept of three-dimensional (3D) cellular networks, that integrate drone base stations (drone-BS) and cellular-connected drone users (drone-UEs), is introduced. For this new 3D cellular architecture, a novel framework for network planning for drone-BSs and latency-minimal cell association for drone-UEs is proposed. For network planning, a tractable method for drone-BSs’ deployment based on the notion of truncated octahedron shapes is proposed, which ensures full coverage for a given space with a minimum number of drone-BSs. In addition, to characterize frequency planning in such 3D wireless networks, an analytical expression for the feasible integer frequency reuse factors is derived. Subsequently, an optimal 3D cell association scheme is developed for which the drone-UEs’ latency, considering transmission, computation, and backhaul delays, is minimized. To this end, first, the spatial distribution of the drone-UEs is estimated using a kernel density estimation method, and the parameters of the estimator are obtained using a cross-validation method. Then, according to the spatial distribution of drone-UEs and the locations of drone-BSs, the latency-minimal 3D cell association for drone-UEs is derived by exploiting tools from an optimal transport theory. The simulation results show that the proposed approach reduces the latency of drone-UEs compared with the classical cell association approach that uses a signal-to-interference-plus-noise ratio (SINR) criterion. In particular, the proposed approach yields a reduction of up to 46% in the average latency compared with the SINR-based association. The results also show that the proposed latency-optimal cell association improves the spectral efficiency of a 3D wireless cellular network of drones.

Proceedings ArticleDOI
01 Oct 2019
TL;DR: Chen et al. as discussed by the authors proposed an efficient algorithm which allows the depth of searched architectures to grow gradually during the training procedure, which brings two issues, namely, heavy computational overheads and weaker search stability, which they solve using search space approximation and regularization, respectively.
Abstract: Recently, differentiable search methods have made major progress in reducing the computational costs of neural architecture search. However, these approaches often report lower accuracy in evaluating the searched architecture or transferring it to another dataset. This is arguably due to the large gap between the architecture depths in search and evaluation scenarios. In this paper, we present an efficient algorithm which allows the depth of searched architectures to grow gradually during the training procedure. This brings two issues, namely, heavier computational overheads and weaker search stability, which we solve using search space approximation and regularization, respectively. With a significantly reduced search time (~7 hours on a single GPU), our approach achieves state-of-the-art performance on both the proxy dataset (CIFAR10 or CIFAR100) and the target dataset (ImageNet). Code is available at https://github.com/chenxin061/pdarts

Proceedings ArticleDOI
01 Oct 2019
TL;DR: Zhang et al. as mentioned in this paper designed a loss term that enforces one simple type of geometric constraints, namely, virtual normal directions determined by randomly sampled three points in the reconstructed 3D space.
Abstract: Monocular depth prediction plays a crucial role in understanding 3D scene geometry. Although recent methods have achieved impressive progress in evaluation metrics such as the pixel-wise relative error, most methods neglect the geometric constraints in the 3D space. In this work, we show the importance of the high-order 3D geometric constraints for depth prediction. By designing a loss term that enforces one simple type of geometric constraints, namely, virtual normal directions determined by randomly sampled three points in the reconstructed 3D space, we can considerably improve the depth prediction accuracy. Furthermore, we can not only predict accurate depth but also achieve high-quality other 3D information from the depth without retraining new parameters, Significantly, the byproduct of this predicted depth being sufficiently accurate is that we are now able to recover good 3D structures of the scene such as the point cloud and surface normal directly from the depth, eliminating the necessity of training new sub-models as was previously done. Experiments on two challenging benchmarks: NYU Depth-V2 and KITTI demonstrate the effectiveness of our method and state-of-the-art performance.

Posted Content
TL;DR: This paper presents an efficient algorithm which allows the depth of searched architectures to grow gradually during the training procedure, and solves two issues, namely, heavier computational overheads and weaker search stability, which are solved using search space approximation and regularization.
Abstract: Recently, differentiable search methods have made major progress in reducing the computational costs of neural architecture search. However, these approaches often report lower accuracy in evaluating the searched architecture or transferring it to another dataset. This is arguably due to the large gap between the architecture depths in search and evaluation scenarios. In this paper, we present an efficient algorithm which allows the depth of searched architectures to grow gradually during the training procedure. This brings two issues, namely, heavier computational overheads and weaker search stability, which we solve using search space approximation and regularization, respectively. With a significantly reduced search time (~7 hours on a single GPU), our approach achieves state-of-the-art performance on both the proxy dataset (CIFAR10 or CIFAR100) and the target dataset (ImageNet). Code is available at this https URL.

Posted Content
TL;DR: A deep Modular Co-Attention Network (MCAN) that consists of Modular co-attention layers cascaded in depth that significantly outperforms the previous state-of-the-art models and is quantitatively and qualitatively evaluated on the benchmark VQA-v2 dataset.
Abstract: Visual Question Answering (VQA) requires a fine-grained and simultaneous understanding of both the visual content of images and the textual content of questions. Therefore, designing an effective `co-attention' model to associate key words in questions with key objects in images is central to VQA performance. So far, most successful attempts at co-attention learning have been achieved by using shallow models, and deep co-attention models show little improvement over their shallow counterparts. In this paper, we propose a deep Modular Co-Attention Network (MCAN) that consists of Modular Co-Attention (MCA) layers cascaded in depth. Each MCA layer models the self-attention of questions and images, as well as the guided-attention of images jointly using a modular composition of two basic attention units. We quantitatively and qualitatively evaluate MCAN on the benchmark VQA-v2 dataset and conduct extensive ablation studies to explore the reasons behind MCAN's effectiveness. Experimental results demonstrate that MCAN significantly outperforms the previous state-of-the-art. Our best single model delivers 70.63$\%$ overall accuracy on the test-dev set. Code is available at this https URL.

Proceedings ArticleDOI
01 Jun 2019
TL;DR: This work develops Point Attention Transformers (PATs), using a parameter-efficient Group Shuffle Attention (GSA) to replace the costly Multi-Head Attention, and proposes an end-to-end learnable and task-agnostic sampling operation, named Gumbel Subset Sampling (GSS), to select a representative subset of input points.
Abstract: Geometric deep learning is increasingly important thanks to the popularity of 3D sensors. Inspired by the recent advances in NLP domain, the self-attention transformer is introduced to consume the point clouds. We develop Point Attention Transformers (PATs), using a parameter-efficient Group Shuffle Attention (GSA) to replace the costly Multi-Head Attention. We demonstrate its ability to process size-varying inputs, and prove its permutation equivariance. Besides, prior work uses heuristics dependence on the input data (e.g., Furthest Point Sampling) to hierarchically select subsets of input points. Thereby, we for the first time propose an end-to-end learnable and task-agnostic sampling operation, named Gumbel Subset Sampling (GSS), to select a representative subset of input points. Equipped with Gumbel-Softmax, it produces a "soft" continuous subset in training phase, and a "hard" discrete subset in test phase. By selecting representative subsets in a hierarchical fashion, the networks learn a stronger representation of the input sets with lower computation cost. Experiments on classification and segmentation benchmarks show the effectiveness and efficiency of our methods. Furthermore, we propose a novel application, to process event camera stream as point clouds, and achieve a state-of-the-art performance on DVS128 Gesture Dataset.

Posted Content
TL;DR: Partially-Connected Differentiable Architecture Search (PC-DARTS) as mentioned in this paper performs operation search in a subset of channels while bypassing the held out part in a shortcut, which alleviates the undesired inconsistency on selecting the edges of super-net caused by sampling different channels.
Abstract: Differentiable architecture search (DARTS) provided a fast solution in finding effective network architectures, but suffered from large memory and computing overheads in jointly training a super-network and searching for an optimal architecture. In this paper, we present a novel approach, namely, Partially-Connected DARTS, by sampling a small part of super-network to reduce the redundancy in exploring the network space, thereby performing a more efficient search without comprising the performance. In particular, we perform operation search in a subset of channels while bypassing the held out part in a shortcut. This strategy may suffer from an undesired inconsistency on selecting the edges of super-net caused by sampling different channels. We alleviate it using edge normalization, which adds a new set of edge-level parameters to reduce uncertainty in search. Thanks to the reduced memory cost, PC-DARTS can be trained with a larger batch size and, consequently, enjoys both faster speed and higher training stability. Experimental results demonstrate the effectiveness of the proposed method. Specifically, we achieve an error rate of 2.57% on CIFAR10 with merely 0.1 GPU-days for architecture search, and a state-of-the-art top-1 error rate of 24.2% on ImageNet (under the mobile setting) using 3.8 GPU-days for search. Our code has been made available at: this https URL.

Proceedings ArticleDOI
15 Jun 2019
TL;DR: Variational technique is introduced to estimate distribution of a newly proposed parameter, called channel saliency, based on which redundant channels can be removed from model via a simple criterion, and results in significant size reduction and computation saving.
Abstract: We propose a variational Bayesian scheme for pruning convolutional neural networks in channel level. This idea is motivated by the fact that deterministic value based pruning methods are inherently improper and unstable. In a nutshell, variational technique is introduced to estimate distribution of a newly proposed parameter, called channel saliency, based on this, redundant channels can be removed from model via a simple criterion. The advantages are two-fold: 1) Our method conducts channel pruning without desire of re-training stage, thus improving the computation efficiency. 2) Our method is implemented as a stand-alone module, called variational pruning layer, which can be straightforwardly inserted into off-the-shelf deep learning packages, without any special network design. Extensive experimental results well demonstrate the effectiveness of our method: For CIFAR-10, we perform channel removal on different CNN models up to 74\% reduction, which results in significant size reduction and computation saving. For ImageNet, about 40% channels of ResNet-50 are removed without compromising accuracy.

Proceedings ArticleDOI
01 Jun 2019
TL;DR: A fine-grained feature imitation method exploiting the cross-location discrepancy of feature response on the near object anchor locations reveals important information of how teacher model tends to generalize.
Abstract: State-of-the-art CNN based recognition models are often computationally prohibitive to deploy on low-end devices A promising high level approach tackling this limitation is knowledge distillation, which let small student model mimic cumbersome teacher model's output to get improved generalization However, related methods mainly focus on simple task of classification while do not consider complex tasks like object detection We show applying the vanilla knowledge distillation to detection model gets minor gain To address the challenge of distilling knowledge in detection model, we propose a fine-grained feature imitation method exploiting the cross-location discrepancy of feature response Our intuition is that detectors care more about local near object regions Thus the discrepancy of feature response on the near object anchor locations reveals important information of how teacher model tends to generalize We design a novel mechanism to estimate those locations and let student model imitate the teacher on them to get enhanced performance We first validate the idea on a developed lightweight toy detector which carries simplest notion of current state-of-the-art anchor based detection models on challenging KITTI dataset, our method generates up to 15% boost of mAP for the student model compared to the non-imitated counterpart We then extensively evaluate the method with Faster R-CNN model under various scenarios with common object detection benchmark of Pascal VOC and COCO, imitation alleviates up to 74% performance drop of student model compared to teacher Codes released at https://githubcom/twangnh/Distilling-Object-Detectors

Proceedings ArticleDOI
01 Oct 2019
TL;DR: A novel framework for training efficient deep neural networks by exploiting generative adversarial networks (GANs) is proposed, where the pre-trained teacher networks are regarded as a fixed discriminator and the generator is utilized for derivating training samples which can obtain the maximum response on the discriminator.
Abstract: Learning portable neural networks is very essential for computer vision for the purpose that pre-trained heavy deep models can be well applied on edge devices such as mobile phones and micro sensors. Most existing deep neural network compression and speed-up methods are very effective for training compact deep models, when we can directly access the training dataset. However, training data for the given deep network are often unavailable due to some practice problems (\eg privacy, legal issue, and transmission), and the architecture of the given network are also unknown except some interfaces. To this end, we propose a novel framework for training efficient deep neural networks by exploiting generative adversarial networks (GANs). To be specific, the pre-trained teacher networks are regarded as a fixed discriminator and the generator is utilized for derivating training samples which can obtain the maximum response on the discriminator. Then, an efficient network with smaller model size and computational complexity is trained using the generated data and the teacher network, simultaneously. Efficient student networks learned using the proposed Data-Free Learning (DFL) method achieve 92.22% and 74.47% accuracies without any training data on the CIFAR-10 and CIFAR-100 datasets, respectively. Meanwhile, our student network obtains an 80.56% accuracy on the CelebA benchmark.

Proceedings ArticleDOI
27 May 2019
TL;DR: This paper presents a comprehensive evaluation study on automated log parsing, evaluating 13 log parsers on a total of 16 log datasets spanning distributed systems, supercomputers, operating systems, mobile systems, server applications, and standalone software and reports the results in terms of accuracy, robustness, and efficiency.
Abstract: Logs are imperative in the development and maintenance process of many software systems. They record detailed runtime information that allows developers and support engineers to monitor their systems and dissect anomalous behaviors and errors. The increasing scale and complexity of modern software systems, however, make the volume of logs explodes. In many cases, the traditional way of manual log inspection becomes impractical. Many recent studies, as well as industrial tools, resort to powerful text search and machine learning-based analytics solutions. Due to the unstructured nature of logs, a first crucial step is to parse log messages into structured data for subsequent analysis. In recent years, automated log parsing has been widely studied in both academia and industry, producing a series of log parsers by different techniques. To better understand the characteristics of these log parsers, in this paper, we present a comprehensive evaluation study on automated log parsing and further release the tools and benchmarks for easy reuse. More specifically, we evaluate 13 log parsers on a total of 16 log datasets spanning distributed systems, supercomputers, operating systems, mobile systems, server applications, and standalone software. We report the benchmarking results in terms of accuracy, robustness, and efficiency, which are of practical importance when deploying automated log parsing in production. We also share the success stories and lessons learned in an industrial application at Huawei. We believe that our work could serve as the basis and provide valuable guidance to future research and deployment of automated log parsing.

Proceedings ArticleDOI
15 Jun 2019
TL;DR: In this article, an attention-guided unified network (AUNet) is proposed for panoptic segmentation, in which foreground objects provide complementary cues to assist background understanding, and two sources of attentions are added to the foreground objects to provide object-level and pixel-level attentions, respectively.
Abstract: This paper studies panoptic segmentation, a recently proposed task which segments foreground (FG) objects at the instance level as well as background (BG) contents at the semantic level. Existing methods mostly dealt with these two problems separately, but in this paper, we reveal the underlying relationship between them, in particular, FG objects provide complementary cues to assist BG understanding. Our approach, named the Attention-guided Unified Network (AUNet), is a unified framework with two branches for FG and BG segmentation simultaneously. Two sources of attentions are added to the BG branch, namely, RPN and FG segmentation mask to provide object-level and pixel-level attentions, respectively. Our approach is generalized to different backbones with consistent accuracy gain in both FG and BG segmentation, and also sets new state-of-the-arts both in the MS-COCO (46.5% PQ) and Cityscapes (59.0% PQ) benchmarks.

Posted Content
TL;DR: Simulation results show that RISs can outperform half-duplex relays with a small number of passive reflecting elements while large RISs are needed to outperform full-duple relays.
Abstract: This work focuses on the downlink of a single-cell multi-user system in which a base station (BS) equipped with $M$ antennas communicates with $K$ single-antenna users through a reconfigurable intelligent surface (RIS) installed in the line-of-sight (LoS) of the BS. RIS is envisioned to offer unprecedented spectral efficiency gains by utilizing $N$ passive reflecting elements that induce phase shifts on the impinging electromagnetic waves to smartly reconfigure the signal propagation environment. We study the minimum signal-to-interference-plus-noise ratio (SINR) achieved by the optimal linear precoder (OLP), that maximizes the minimum SINR subject to a given power constraint for any given RIS phase matrix, for the cases where the LoS channel matrix between the BS and the RIS is of rank-one and of full-rank. In the former scenario, the minimum SINR achieved by the RIS-assisted link is bounded by a quantity that goes to zero with $K$. For the high-rank scenario, we develop accurate deterministic approximations for the parameters of the asymptotically OLP, which are then utilized to optimize the RIS phase matrix. Simulation results show that RISs can outperform half-duplex relays with a small number of passive reflecting elements while large RISs are needed to outperform full-duplex relays.

Proceedings ArticleDOI
01 Aug 2019
TL;DR: Empowered by template2vec, a novel, simple yet effective method to extract the semantic information hidden in log templates, LogAnomaly can detect both sequential and quantitive log anomalies simultaneously, which has not been done by any previous work.
Abstract: Recording runtime status via logs is common for almost every computer system, and detecting anomalies in logs is crucial for timely identifying malfunctions of systems. However, manually detecting anomalies for logs is time-consuming, error-prone, and infeasible. Existing automatic log anomaly detection approaches, using indexes rather than semantics of log templates, tend to cause false alarms. In this work, we propose LogAnomaly, a framework to model unstructured a log stream as a natural language sequence. Empowered by template2vec, a novel, simple yet effective method to extract the semantic information hidden in log templates, LogAnomaly can detect both sequential and quantitive log anomalies simultaneously, which were not done by any previous work. Moreover, LogAnomaly can avoid the false alarms caused by the newly appearing log templates between periodic model retrainings. Our evaluation on two public production log datasets show that LogAnomaly outperforms existing log-based anomaly detection methods.