Showing papers by &quot;Chang Xu published in 2019&quot;

Evolutionary Generative Adversarial Networks

01 Oct 2019

TL;DR: A novel framework for training efficient deep neural networks by exploiting generative adversarial networks (GANs) is proposed, where the pre-trained teacher networks are regarded as a fixed discriminator and the generator is utilized for derivating training samples which can obtain the maximum response on the discriminator.

...read moreread less

Abstract: Learning portable neural networks is very essential for computer vision for the purpose that pre-trained heavy deep models can be well applied on edge devices such as mobile phones and micro sensors. Most existing deep neural network compression and speed-up methods are very effective for training compact deep models, when we can directly access the training dataset. However, training data for the given deep network are often unavailable due to some practice problems (\eg privacy, legal issue, and transmission), and the architecture of the given network are also unknown except some interfaces. To this end, we propose a novel framework for training efficient deep neural networks by exploiting generative adversarial networks (GANs). To be specific, the pre-trained teacher networks are regarded as a fixed discriminator and the generator is utilized for derivating training samples which can obtain the maximum response on the discriminator. Then, an efficient network with smaller model size and computational complexity is trained using the generated data and the teacher network, simultaneously. Efficient student networks learned using the proposed Data-Free Learning (DFL) method achieve 92.22% and 74.47% accuracies without any training data on the CIFAR-10 and CIFAR-100 datasets, respectively. Meanwhile, our student network obtains an 80.56% accuracy on the CelebA benchmark.

...read moreread less

263 citations

Journal Article•DOI•

[...]

Chaoyue Wang¹, Chang Xu¹, Xin Yao², Dacheng Tao¹•Institutions (2)

University of Sydney¹, Southern University of Science and Technology²

28 Jan 2019-IEEE Transactions on Evolutionary Computation

TL;DR: E-GAN as mentioned in this paper proposes an evolutionary GAN to evolve a population of generators to play the adversarial game with the discriminator, where different adversarial training objectives are employed as mutation operations and each individual generator is updated based on these mutations.

...read moreread less

Abstract: Generative adversarial networks (GANs) have been effective for learning generative models for real-world data. However, accompanied with the generative tasks becoming more and more challenging, existing GANs (GAN and its variants) tend to suffer from different training problems such as instability and mode collapse. In this paper, we propose a novel GAN framework called evolutionary GANs (E-GANs) for stable GAN training and improved generative performance. Unlike existing GANs, which employ a predefined adversarial objective function alternately training a generator and a discriminator, we evolve a population of generators to play the adversarial game with the discriminator. Different adversarial training objectives are employed as mutation operations and each individual (i.e., generator candidature) are updated based on these mutations. Then, we devise an evaluation mechanism to measure the quality and diversity of generated samples, such that only well-performing generator(s) are preserved and used for further training. In this way, E-GAN overcomes the limitations of an individual adversarial training objective and always preserves the well-performing offspring, contributing to progress in, and the success of GANs. Experiments on several datasets demonstrate that E-GAN achieves convincing generative performance and reduces the training problems inherent in existing GANs.

...read moreread less

213 citations

Proceedings Article•DOI•

Self-Supervised Representation Learning by Rotation Feature Decoupling

[...]

Zeyu Feng¹, Chang Xu¹, Dacheng Tao¹•Institutions (1)

Packing Convolutional Neural Networks in the Frequency Domain

15 Jun 2019

TL;DR: A self-supervised learning method that focuses on beneficial properties of representation and their abilities in generalizing to real-world tasks and decouples the rotation discrimination from instance discrimination, which allows it to improve the rotation prediction by mitigating the influence of rotation label noise.

...read moreread less

Abstract: We introduce a self-supervised learning method that focuses on beneficial properties of representation and their abilities in generalizing to real-world tasks. The method incorporates rotation invariance into the feature learning framework, one of many good and well-studied properties of visual representation, which is rarely appreciated or exploited by previous deep convolutional neural network based self-supervised representation learning methods. Specifically, our model learns a split representation that contains both rotation related and unrelated parts. We train neural networks by jointly predicting image rotations and discriminating individual instances. In particular, our model decouples the rotation discrimination from instance discrimination, which allows us to improve the rotation prediction by mitigating the influence of rotation label noise, as well as discriminate instances without regard to image rotations. The resulting feature has a better generalization ability for more various tasks. Experimental results show that our model outperforms current state-of-the-art methods on standard self-supervised feature learning benchmarks.

...read moreread less

205 citations

Journal Article•DOI•

[...]

Yunhe Wang¹, Chang Xu², Chao Xu¹, Dacheng Tao²•Institutions (2)

Peking University¹, University of Sydney²

01 Oct 2019-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A series of approaches for compressing and speeding up CNNs in the frequency domain, which focuses not only on smaller weights but on all the weights and their underlying connections, and explores a data-driven method for removing redundancies in both spatial and frequency domains.

...read moreread less

Abstract: Deep convolutional neural networks (CNNs) are successfully used in a number of applications. However, their storage and computational requirements have largely prevented their widespread use on mobile devices. Here we present a series of approaches for compressing and speeding up CNNs in the frequency domain, which focuses not only on smaller weights but on all the weights and their underlying connections. By treating convolution filters as images, we decompose their representations in the frequency domain as common parts (i.e., cluster centers) shared by other similar filters and their individual private parts (i.e., individual residuals). A large number of low-energy frequency coefficients in both parts can be discarded to produce high compression without significantly compression romising accuracy. Furthermore, we explore a data-driven method for removing redundancies in both spatial and frequency domains, which allows us to discard more useless weights by keeping similar accuracies. After obtaining the optimal sparse CNN in the frequency domain, we relax the computational burden of convolution operations in CNNs by linearly combining the convolution responses of discrete cosine transform (DCT) bases. The compression and speed-up ratios of the proposed algorithm are thoroughly analyzed and evaluated on benchmark image datasets to demonstrate its superiority over state-of-the-art methods.

...read moreread less

164 citations

Posted Content•

STRIP: A Defence Against Trojan Attacks on Deep Neural Networks

[...]

Yansong Gao¹, Chang Xu², Derui Wang³, Shiping Chen², Damith C. Ranasinghe⁴, Surya Nepal² - Show less +2 more•Institutions (4)

Nanjing University of Science and Technology¹, Commonwealth Scientific and Industrial Research Organisation², Swinburne University of Technology³, University of Adelaide⁴

18 Feb 2019-arXiv: Cryptography and Security

TL;DR: STRIP as mentioned in this paper is a run-time trojan attack detection system based on adversarial perturbation, which intentionally perturbs the incoming input, for instance by superimposing various image patterns, and observe the randomness of predicted classes for perturbed inputs from a given deployed model.

...read moreread less

97 citations

Posted Content•

CARS: Continuous Evolution for Efficient Neural Architecture Search

[...]

Zhaohui Yang¹, Yunhe Wang¹, Xinghao Chen¹, Boxin Shi², Chao Xu², Chunjing Xu², Qi Tian², Chang Xu³ - Show less +4 more•Institutions (3)

Gated-GAN: Adversarial Gated Networks for Multi-Collection Style Transfer

11 Sep 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work develops an efficient continuous evolutionary approach for searching neural networks that provides a series of networks with the number of parameters ranging from 3.7M to 5.1M under mobile settings and surpasses those produced by the state-of-the-art methods on the benchmark ImageNet dataset.

...read moreread less

Abstract: Searching techniques in most of existing neural architecture search (NAS) algorithms are mainly dominated by differentiable methods for the efficiency reason. In contrast, we develop an efficient continuous evolutionary approach for searching neural networks. Architectures in the population that share parameters within one SuperNet in the latest generation will be tuned over the training dataset with a few epochs. The searching in the next evolution generation will directly inherit both the SuperNet and the population, which accelerates the optimal network generation. The non-dominated sorting strategy is further applied to preserve only results on the Pareto front for accurately updating the SuperNet. Several neural networks with different model sizes and performances will be produced after the continuous search with only 0.4 GPU days. As a result, our framework provides a series of networks with the number of parameters ranging from 3.7M to 5.1M under mobile settings. These networks surpass those produced by the state-of-the-art methods on the benchmark ImageNet dataset.

...read moreread less

97 citations

Journal Article•DOI•

[...]

Xinyuan Chen¹, Chang Xu², Xiaokang Yang¹, Li Song¹, Dacheng Tao² - Show less +1 more•Institutions (2)

Shanghai Jiao Tong University¹, University of Sydney²

04 Apr 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: Zhang et al. as discussed by the authors proposed adversarial gated networks (Gated GAN) to transfer multiple styles in a single model, which has three modules: an encoder, a gated transformer, and a decoder.

...read moreread less

Abstract: Style transfer describes the rendering of an image semantic content as different artistic styles. Recently, generative adversarial networks (GANs) have emerged as an effective approach in style transfer by adversarially training the generator to synthesize convincing counterfeits. However, traditional GAN suffers from the mode collapse issue, resulting in unstable training and making style transfer quality difficult to guarantee. In addition, the GAN generator is only compatible with one style, so a series of GANs must be trained to provide users with choices to transfer more than one kind of style. In this paper, we focus on tackling these challenges and limitations to improve style transfer. We propose adversarial gated networks (Gated GAN) to transfer multiple styles in a single model. The generative networks have three modules: an encoder, a gated transformer, and a decoder. Different styles can be achieved by passing input images through different branches of the gated transformer. To stabilize training, the encoder and decoder are combined as an autoencoder to reconstruct the input images. The discriminative networks are used to distinguish whether the input image is a stylized or genuine image. An auxiliary classifier is used to recognize the style categories of transferred images, thereby helping the generative networks generate images in multiple styles. In addition, Gated GAN makes it possible to explore a new style by investigating styles learned from artists or genres. Our extensive experiments demonstrate the stability and effectiveness of the proposed model for multistyle transfer.

...read moreread less

81 citations

Journal Article•DOI•

Gated-GAN: Adversarial Gated Networks for Multi-Collection Style Transfer

[...]

Xinyuan Chen¹, Chang Xu², Xiaokang Yang¹, Li Song¹, Dacheng Tao² - Show less +1 more•Institutions (2)

Shanghai Jiao Tong University¹, University of Sydney²

01 Feb 2019-IEEE Transactions on Image Processing

TL;DR: This paper proposes adversarial gated networks (Gated-GAN) to transfer multiple styles in a single model and makes it possible to explore a new style by investigating styles learned from artists or genres.

...read moreread less

Abstract: Style transfer describes the rendering of an image’s semantic content as different artistic styles Recently, generative adversarial networks (GANs) have emerged as an effective approach in style transfer by adversarially training the generator to synthesize convincing counterfeits However, traditional GAN suffers from the mode collapse issue, resulting in unstable training and making style transfer quality difficult to guarantee In addition, the GAN generator is only compatible with one style, so a series of GANs must be trained to provide users with choices to transfer more than one kind of style In this paper, we focus on tackling these challenges and limitations to improve style transfer We propose adversarial gated networks (Gated-GAN) to transfer multiple styles in a single model The generative networks have three modules: an encoder, a gated transformer, and a decoder Different styles can be achieved by passing input images through different branches of the gated transformer To stabilize training, the encoder and decoder are combined as an auto-encoder to reconstruct the input images The discriminative networks are used to distinguish whether the input image is a stylized or genuine image An auxiliary classifier is used to recognize the style categories of transferred images, thereby helping the generative networks generate images in multiple styles In addition, Gated-GAN makes it possible to explore a new style by investigating styles learned from artists or genres Our extensive experiments demonstrate the stability and effectiveness of the proposed model for multi-style transfer

...read moreread less

78 citations

Posted Content•

Data-Free Learning of Student Networks.

[...]

Hanting Chen¹, Yunhe Wang², Chang Xu³, Zhaohui Yang¹, Chuanjian Liu², Boxin Shi¹, Chunjing Xu², Chao Xu¹, Qi Tian² - Show less +5 more•Institutions (3)

Co-Evolutionary Compression for Unpaired Image Translation

02 Apr 2019-arXiv: Learning

TL;DR: In this article, a Data-Free Learning (DAFL) method was proposed to learn efficient deep neural networks by exploiting generative adversarial networks (GANs), where the teacher network is regarded as a fixed discriminator and the generator is utilized for derivating training samples which can obtain the maximum response on the discriminator.

...read moreread less

Abstract: Learning portable neural networks is very essential for computer vision for the purpose that pre-trained heavy deep models can be well applied on edge devices such as mobile phones and micro sensors. Most existing deep neural network compression and speed-up methods are very effective for training compact deep models, when we can directly access the training dataset. However, training data for the given deep network are often unavailable due to some practice problems (e.g. privacy, legal issue, and transmission), and the architecture of the given network are also unknown except some interfaces. To this end, we propose a novel framework for training efficient deep neural networks by exploiting generative adversarial networks (GANs). To be specific, the pre-trained teacher networks are regarded as a fixed discriminator and the generator is utilized for derivating training samples which can obtain the maximum response on the discriminator. Then, an efficient network with smaller model size and computational complexity is trained using the generated data and the teacher network, simultaneously. Efficient student networks learned using the proposed Data-Free Learning (DAFL) method achieve 92.22% and 74.47% accuracies using ResNet-18 without any training data on the CIFAR-10 and CIFAR-100 datasets, respectively. Meanwhile, our student network obtains an 80.56% accuracy on the CelebA benchmark.

...read moreread less

Proceedings Article•DOI•

[...]

Han Shu¹, Yunhe Wang¹, Xu Jia¹, Kai Han¹, Hanting Chen², Chunjing Xu¹, Qi Tian¹, Chang Xu³ - Show less +4 more•Institutions (3)

Huawei¹, Peking University², University of Sydney³

01 Oct 2019

TL;DR: In this article, a co-evolutionary approach for reducing memory usage and FLOPs simultaneously was proposed for image-to-image translation, where generators for two image domains are encoded as two populations and synergistically optimized for investigating the most important convolution filters iteratively.

...read moreread less

Abstract: Generative adversarial networks (GANs) have been successfully used for considerable computer vision tasks, especially the image-to-image translation. However, generators in these networks are of complicated architectures with large number of parameters and huge computational complexities. Existing methods are mainly designed for compressing and speeding-up deep neural networks in the classification task, and cannot be directly applied on GANs for image translation, due to their different objectives and training procedures. To this end, we develop a novel co-evolutionary approach for reducing their memory usage and FLOPs simultaneously. In practice, generators for two image domains are encoded as two populations and synergistically optimized for investigating the most important convolution filters iteratively. Fitness of each individual is calculated using the number of parameters, a discriminator-aware regularization, and the cycle consistency. Extensive experiments conducted on benchmark datasets demonstrate the effectiveness of the proposed method for obtaining compact and effective generators.

...read moreread less

Posted Content•

ReNAS:Relativistic Evaluation of Neural Architecture Search

[...]

Yixing Xu¹, Yunhe Wang¹, Kai Han¹, Yehui Tang¹, Shangling Jui¹, Chunjing Xu¹, Chang Xu² - Show less +3 more•Institutions (2)

Co-Evolutionary Compression for Unpaired Image Translation

30 Sep 2019-arXiv: Learning

TL;DR: This paper proposes a relativistic architecture performance predictor in NAS (ReNAS), encoding neural architectures into feature tensors, and further refining the representations with the predictor, to determine which architecture would perform better instead of accurately predict the absolute architecture performance.

...read moreread less

Abstract: An effective and efficient architecture performance evaluation scheme is essential for the success of Neural Architecture Search (NAS). To save computational cost, most of existing NAS algorithms often train and evaluate intermediate neural architectures on a small proxy dataset with limited training epochs. But it is difficult to expect an accurate performance estimation of an architecture in such a coarse evaluation way. This paper advocates a new neural architecture evaluation scheme, which aims to determine which architecture would perform better instead of accurately predict the absolute architecture performance. Therefore, we propose a \textbf{relativistic} architecture performance predictor in NAS (ReNAS). We encode neural architectures into feature tensors, and further refining the representations with the predictor. The proposed relativistic performance predictor can be deployed in discrete searching methods to search for the desired architectures without additional evaluation. Experimental results on NAS-Bench-101 dataset suggests that, sampling 424 ($0.1\%$ of the entire search space) neural architectures and their corresponding validation performance is already enough for learning an accurate architecture performance predictor. The accuracies of our searched neural architectures on NAS-Bench-101 and NAS-Bench-201 datasets are higher than that of the state-of-the-art methods and show the priority of the proposed method.

...read moreread less

Posted Content•

[...]

Han Shu¹, Yunhe Wang¹, Xu Jia¹, Kai Han¹, Hanting Chen², Chunjing Xu¹, Qi Tian¹, Chang Xu³ - Show less +4 more•Institutions (3)

Huawei¹, Peking University², University of Sydney³

25 Jul 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: A novel co-evolutionary approach for reducing their memory usage and FLOPs simultaneously and synergistically optimized for investigating the most important convolution filters iteratively is developed.

...read moreread less

Proceedings Article•DOI•

Self-Supervised Representation Learning From Multi-Domain Data

[...]

Zeyu Feng¹, Chang Xu¹, Dacheng Tao¹•Institutions (1)

LegoNet: Efficient Convolutional Neural Networks with Lego Filters

01 Oct 2019

TL;DR: This work presents an information-theoretically motivated constraint for self-supervised representation learning from multiple related domains that has the benefit of decreasing the build-in bias of individual domain, as well as leveraging information and allowing knowledge transfer across multiple domains.

...read moreread less

Abstract: We present an information-theoretically motivated constraint for self-supervised representation learning from multiple related domains. In contrast to previous self-supervised learning methods, our approach learns from multiple domains, which has the benefit of decreasing the build-in bias of individual domain, as well as leveraging information and allowing knowledge transfer across multiple domains. The proposed mutual information constraints encourage neural network to extract common invariant information across domains and to preserve peculiar information of each domain simultaneously. We adopt tractable upper and lower bounds of mutual information to make the proposed constraints solvable. The learned representation is more unbiased and robust toward the input images. Extensive experimental results on both multi-domain and large-scale datasets demonstrate the necessity and advantage of multi-domain self-supervised learning with mutual information constraints. Representations learned in our framework on state-of-the-art methods achieve improved performance than those learned on a single domain.

...read moreread less

Proceedings Article•

[...]

Zhaohui Yang¹, Yunhe Wang², Chuanjian Liu², Hanting Chen¹, Chunjing Xu², Boxin Shi¹, Chao Xu³, Chang Xu⁴ - Show less +4 more•Institutions (4)

Peking University¹, Huawei², University of Science and Technology of China³, University of Sydney⁴

24 May 2019

TL;DR: A split-transformmerge strategy for an efficient convolution by exploiting intermediate Lego feature maps is developed and it is suggested that an ordinary filter in the neural network can be upgraded to a sophisticated module as well.

...read moreread less

Abstract: This paper aims to build efficient convolutional neural networks using a set of Lego filters. Many successful building blocks, e.g. inception and residual modules, have been designed to refresh state-of-the-art records of CNNs on visual recognition tasks. Beyond these high-level modules, we suggest that an ordinary filter in the neural network can be upgraded to a sophisticated module as well. Filter modules are established by assembling a shared set of Lego filters that are often of much lower dimensions. Weights in Lego filters and binary masks to stack Lego filters for these filter modules can be simultaneously optimized in an end-to-end manner as usual. Inspired by network engineering, we develop a split-transformmerge strategy for an efficient convolution by exploiting intermediate Lego feature maps. The compression and acceleration achieved by Lego Networks using the proposed Lego filters have been theoretically discussed. Experimental results on benchmark datasets and deep models demonstrate the advantages of the proposed Lego filters and their potential real-world applications on mobile devices.

...read moreread less

Proceedings Article•DOI•

Image-Question-Answer Synergistic Network for Visual Dialog

[...]

Dalu Guo¹, Chang Xu¹, Dacheng Tao¹•Institutions (1)

Enhancing the Robustness of Neural Collaborative Filtering Systems Under Malicious Attacks

01 Jun 2019

TL;DR: A novel image-question-answer synergistic network to value the role of the answer for precise visual dialog and boosts the discriminative visual dialog model to achieve a new state-of-the-art of 57.88% normalized discounted cumulative gain.

...read moreread less

Abstract: The image, question (combined with the history for de-referencing), and the corresponding answer are three vital components of visual dialog. Classical visual dialog systems integrate the image, question, and history to search for or generate the best matched answer, and so, this approach significantly ignores the role of the answer. In this paper, we devise a novel image-question-answer synergistic network to value the role of the answer for precise visual dialog. We extend the traditional one-stage solution to a two-stage solution. In the first stage, candidate answers are coarsely scored according to their relevance to the image and question pair. Afterward, in the second stage, answers with high probability of being correct are re-ranked by synergizing with image and question. On the Visual Dialog v1.0 dataset, the proposed synergistic network boosts the discriminative visual dialog model to achieve a new state-of-the-art of 57.88% normalized discounted cumulative gain. A generative visual dialog model equipped with the proposed technique also shows promising improvements.

...read moreread less

Journal Article•DOI•

[...]

Yali Du¹, Meng Fang², Jinfeng Yi, Chang Xu³, Jun Cheng⁴, Dacheng Tao³ - Show less +2 more•Institutions (4)

University of Technology, Sydney¹, Tencent², University of Sydney³, Chinese Academy of Sciences⁴

01 Mar 2019-IEEE Transactions on Multimedia

TL;DR: Through the investigation, the proposed defensive method can reduce the success rate of malicious user attacks and keep the prediction accuracy comparable to standard neural recommendation systems.

...read moreread less

Abstract: Recommendation systems have become ubiquitous in online shopping in recent decades due to their power in reducing excessive choices of customers and industries. Recent collaborative filtering methods based on the deep neural network are studied and introduce promising results due to their power in learning hidden representations for users and items. However, it has revealed its vulnerabilities under malicious user attacks. With the knowledge of a collaborative filtering algorithm and its parameters, the performance of this recommendation system can be easily downgraded. Unfortunately, this problem is not addressed well, and the study on defending recommendation systems is insufficient. In this paper, we aim to improve the robustness of recommendation systems based on two concepts— stage-wise hints training and randomness . To protect a target model, we introduce noise layers in the training of a target model to increase its resistance to adversarial perturbations. To reduce the noise layers’ influence on model performance, we introduce intermediate layer outputs as hints from a teacher model to regularize the intermediate layers of a student target model. We consider white box attacks under which attackers have the knowledge of the target model. The generalizability and robustness properties of our method have been analytically inspected in experiments and discussions, and the computational cost is comparable to training a standard neural network-based collaborative filtering model. Through our investigation, the proposed defensive method can reduce the success rate of malicious user attacks and keep the prediction accuracy comparable to standard neural recommendation systems.

...read moreread less

Posted Content•

Image-Question-Answer Synergistic Network for Visual Dialog

[...]

Dalu Guo¹, Chang Xu¹, Dacheng Tao¹•Institutions (1)

AdderNet: Do We Really Need Multiplications in Deep Learning?

26 Feb 2019-arXiv: Computation and Language

TL;DR: Zhang et al. as mentioned in this paper proposed an image-question-answer synergistic network to value the role of the answer for precise visual dialog, which achieved state-of-the-art performance on Visual Dialog v1.0.

...read moreread less

Abstract: The image, question (combined with the history for de-referencing), and the corresponding answer are three vital components of visual dialog. Classical visual dialog systems integrate the image, question, and history to search for or generate the best matched answer, and so, this approach significantly ignores the role of the answer. In this paper, we devise a novel image-question-answer synergistic network to value the role of the answer for precise visual dialog. We extend the traditional one-stage solution to a two-stage solution. In the first stage, candidate answers are coarsely scored according to their relevance to the image and question pair. Afterward, in the second stage, answers with high probability of being correct are re-ranked by synergizing with image and question. On the Visual Dialog v1.0 dataset, the proposed synergistic network boosts the discriminative visual dialog model to achieve a new state-of-the-art of 57.88\% normalized discounted cumulative gain. A generative visual dialog model equipped with the proposed technique also shows promising improvements.

...read moreread less

Posted Content•

[...]

Hanting Chen¹, Yunhe Wang², Chunjing Xu², Boxin Shi¹, Chao Xu¹, Qi Tian², Chang Xu³ - Show less +3 more•Institutions (3)

Approximated Bilinear Modules for Temporal Modeling

31 Dec 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, the authors proposed adder networks (AdderNets) to trade these massive multiplications in deep neural networks, especially convolutional neural networks (CNNs), for much cheaper additions to reduce computation costs.

...read moreread less

Abstract: Compared with cheap addition operation, multiplication operation is of much higher computation complexity. The widely-used convolutions in deep neural networks are exactly cross-correlation to measure the similarity between input feature and convolution filters, which involves massive multiplications between float values. In this paper, we present adder networks (AdderNets) to trade these massive multiplications in deep neural networks, especially convolutional neural networks (CNNs), for much cheaper additions to reduce computation costs. In AdderNets, we take the $\ell_1$-norm distance between filters and input feature as the output response. The influence of this new similarity measure on the optimization of neural network have been thoroughly analyzed. To achieve a better performance, we develop a special back-propagation approach for AdderNets by investigating the full-precision gradient. We then propose an adaptive learning rate strategy to enhance the training procedure of AdderNets according to the magnitude of each neuron's gradient. As a result, the proposed AdderNets can achieve 74.9% Top-1 accuracy 91.7% Top-5 accuracy using ResNet-50 on the ImageNet dataset without any multiplication in convolution layer. The codes are publicly available at: this https URL.

...read moreread less

Proceedings Article•DOI•

[...]

Xinqi Zhu¹, Chang Xu¹, Langwen Hui², Cewu Lu², Dacheng Tao¹ - Show less +1 more•Institutions (2)

University of Sydney¹, Shanghai Jiao Tong University²

01 Oct 2019

TL;DR: In this article, two-layer subnets are converted to temporal bilinear modules by adding an auxiliary-branch, and snippet sampling and shifting inference are introduced to boost sparse-frame video classification performance.

...read moreread less

Abstract: We consider two less-emphasized temporal properties of video: 1. Temporal cues are fine-grained; 2. Temporal modeling needs reasoning. To tackle both problems at once, we exploit approximated bilinear modules (ABMs) for temporal modeling. There are two main points making the modules effective: two-layer MLPs can be seen as a constraint approximation of bilinear operations, thus can be used to construct deep ABMs in existing CNNs while reusing pretrained parameters; frame features can be divided into static and dynamic parts because of visual repetition in adjacent frames, which enables temporal modeling to be more efficient. Multiple ABM variants and implementations are investigated, from high performance to high efficiency. Specifically, we show how two-layer subnets in CNNs can be converted to temporal bilinear modules by adding an auxiliary-branch. Besides, we introduce snippet sampling and shifting inference to boost sparse-frame video classification performance. Extensive ablation studies are conducted to show the effectiveness of proposed techniques. Our models can outperform most state-of-the-art methods on Something-Something v1 and v2 datasets without Kinetics pretraining, and are also competitive on other YouTube-like action recognition datasets. Our code is available on https://github.com/zhuxinqimac/abm-pytorch.

...read moreread less

Posted Content•

Attribute Aware Pooling for Pedestrian Attribute Recognition

[...]

Kai Han¹, Yunhe Wang¹, Han Shu¹, Chuanjian Liu¹, Chunjing Xu¹, Chang Xu² - Show less +2 more•Institutions (2)

Positive-Unlabeled Compression on the Cloud

27 Jul 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper expands the strength of deep convolutional neural networks to the pedestrian attribute recognition problem by devising a novel attribute aware pooling algorithm that appropriately explores and exploits the correlations between attributes for the pedestrians attribute recognition.

...read moreread less

Abstract: This paper expands the strength of deep convolutional neural networks (CNNs) to the pedestrian attribute recognition problem by devising a novel attribute aware pooling algorithm. Existing vanilla CNNs cannot be straightforwardly applied to handle multi-attribute data because of the larger label space as well as the attribute entanglement and correlations. We tackle these challenges that hampers the development of CNNs for multi-attribute classification by fully exploiting the correlation between different attributes. The multi-branch architecture is adopted for fucusing on attributes at different regions. Besides the prediction based on each branch itself, context information of each branch are employed for decision as well. The attribute aware pooling is developed to integrate both kinds of information. Therefore, attributes which are indistinct or tangled with others can be accurately recognized by exploiting the context information. Experiments on benchmark datasets demonstrate that the proposed pooling method appropriately explores and exploits the correlations between attributes for the pedestrian attribute recognition.

...read moreread less

Posted Content•

[...]

Yixing Xu¹, Yunhe Wang¹, Hanting Chen², Kai Han², Chunjing Xu³, Dacheng Tao⁴, Chang Xu⁴ - Show less +3 more•Institutions (4)

Huawei¹, Peking University², Tsinghua University³, University of Sydney⁴

21 Sep 2019-arXiv: Learning

TL;DR: A novel positive-unlabeled (PU) setting for addressing the class imbalance problem of newly augmented training examples of convolutional neural networks achieved on high-end GPU servers to portable devices such as smart phones is presented.

...read moreread less

Abstract: Many attempts have been done to extend the great success of convolutional neural networks (CNNs) achieved on high-end GPU servers to portable devices such as smart phones. Providing compression and acceleration service of deep learning models on the cloud is therefore of significance and is attractive for end users. However, existing network compression and acceleration approaches usually fine-tuning the svelte model by requesting the entire original training data (\eg ImageNet), which could be more cumbersome than the network itself and cannot be easily uploaded to the cloud. In this paper, we present a novel positive-unlabeled (PU) setting for addressing this problem. In practice, only a small portion of the original training set is required as positive examples and more useful training examples can be obtained from the massive unlabeled data on the cloud through a PU classifier with an attention based multi-scale feature extractor. We further introduce a robust knowledge distillation (RKD) scheme to deal with the class imbalance problem of these newly augmented training examples. The superiority of the proposed method is verified through experiments conducted on the benchmark models and datasets. We can use only $8\%$ of uniformly selected data from the ImageNet to obtain an efficient model with comparable performance to the baseline ResNet-34.

...read moreread less

Proceedings Article•

Positive-Unlabeled Compression on the Cloud

[...]

Yixing Xu¹, Yunhe Wang¹, Hanting Chen², Kai Han², Chunjing Xu³, Dacheng Tao⁴, Chang Xu⁴ - Show less +3 more•Institutions (4)

Huawei¹, Peking University², Tsinghua University³, University of Sydney⁴

01 Jan 2019

TL;DR: In this paper, a PU classifier with an attention-based multi-scale feature extractor was proposed to deal with the class imbalance problem of these newly augmented training examples. But, the authors only used 8% of uniformly selected data from the ImageNet to obtain an efficient model with comparable performance to the baseline ResNet-34.

...read moreread less

Abstract: Many attempts have been done to extend the great success of convolutional neural networks (CNNs) achieved on high-end GPU servers to portable devices such as smart phones. Providing compression and acceleration service of deep learning models on the cloud is therefore of significance and is attractive for end users. However, existing network compression and acceleration approaches usually fine-tuning the svelte model by requesting the entire original training data (e.g. ImageNet), which could be more cumbersome than the network itself and cannot be easily uploaded to the cloud. In this paper, we present a novel positive-unlabeled (PU) setting for addressing this problem. In practice, only a small portion of the original training set is required as positive examples and more useful training examples can be obtained from the massive unlabeled data on the cloud through a PU classifier with an attention based multi-scale feature extractor. We further introduce a robust knowledge distillation (RKD) scheme to deal with the class imbalance problem of these newly augmented training examples. The superiority of the proposed method is verified through experiments conducted on the benchmark models and datasets. We can use only 8% of uniformly selected data from the ImageNet to obtain an efficient model with comparable performance to the baseline ResNet-34.

...read moreread less

Proceedings Article•DOI•

Learning Instance-wise Sparsity for Accelerating Deep Models

[...]

Chuanjian Liu¹, Yunhe Wang¹, Kai Han¹, Chunjing Xu¹, Chang Xu² - Show less +1 more•Institutions (2)

Attribute Aware Pooling for Pedestrian Attribute Recognition.

01 Aug 2019

TL;DR: This work expects intermediate feature maps of each instance in deep neural networks to be sparse while preserving the overall network performance, and takes coefficient of variation as a measure to select the layers that are appropriate for acceleration.

...read moreread less

Abstract: Exploring deep convolutional neural networks of high efficiency and low memory usage is very essential for a wide variety of machine learning tasks. Most of existing approaches used to accelerate deep models by manipulating parameters or filters without data, e.g., pruning and decomposition. In contrast, we study this problem from a different perspective by respecting the difference between data. An instance-wise feature pruning is developed by identifying informative features for different instances. Specifically, by investigating a feature decay regularization, we expect intermediate feature maps of each instance in deep neural networks to be sparse while preserving the overall network performance. During online inference, subtle features of input images extracted by intermediate layers of a well-trained neural network can be eliminated to accelerate the subsequent calculations. We further take coefficient of variation as a measure to select the layers that are appropriate for acceleration. Extensive experiments conducted on benchmark datasets and networks demonstrate the effectiveness of the proposed method.

...read moreread less

Proceedings Article•DOI•

[...]

Kai Han¹, Yunhe Wang¹, Han Shu¹, Chuanjian Liu¹, Chunjing Xu¹, Chang Xu² - Show less +2 more•Institutions (2)