scispace - formally typeset
Search or ask a question

Showing papers by "Hong Liu published in 2019"


Proceedings ArticleDOI
Xia Li1, Zhisheng Zhong1, Jianlong Wu1, Yibo Yang1, Zhouchen Lin1, Hong Liu1 
01 Oct 2019
TL;DR: This paper forms the attention mechanism into an expectation-maximization manner and iteratively estimate a much more compact set of bases upon which the attention maps are computed, which is robust to the variance of input and is also friendly in memory and computation.
Abstract: Self-attention mechanism has been widely used for various tasks. It is designed to compute the representation of each position by a weighted sum of the features at all positions. Thus, it can capture long-range relations for computer vision tasks. However, it is computationally consuming. Since the attention maps are computed w.r.t all other positions. In this paper, we formulate the attention mechanism into an expectation-maximization manner and iteratively estimate a much more compact set of bases upon which the attention maps are computed. By a weighted summation upon these bases, the resulting representation is low-rank and deprecates noisy information from the input. The proposed Expectation-Maximization Attention (EMA) module is robust to the variance of input and is also friendly in memory and computation. Moreover, we set up the bases maintenance and normalization methods to stabilize its training procedure. We conduct extensive experiments on popular semantic segmentation benchmarks including PASCAL VOC, PASCAL Context, and COCO Stuff, on which we set new records.

276 citations


Posted Content
Xia Li1, Zhisheng Zhong1, Jianlong Wu1, Yibo Yang1, Zhouchen Lin1, Hong Liu1 
TL;DR: In this article, the authors formulate the self-attention mechanism into an expectation-maximization manner and iteratively estimate a much more compact set of bases upon which the attention maps are computed.
Abstract: Self-attention mechanism has been widely used for various tasks. It is designed to compute the representation of each position by a weighted sum of the features at all positions. Thus, it can capture long-range relations for computer vision tasks. However, it is computationally consuming. Since the attention maps are computed w.r.t all other positions. In this paper, we formulate the attention mechanism into an expectation-maximization manner and iteratively estimate a much more compact set of bases upon which the attention maps are computed. By a weighted summation upon these bases, the resulting representation is low-rank and deprecates noisy information from the input. The proposed Expectation-Maximization Attention (EMA) module is robust to the variance of input and is also friendly in memory and computation. Moreover, we set up the bases maintenance and normalization methods to stabilize its training procedure. We conduct extensive experiments on popular semantic segmentation benchmarks including PASCAL VOC, PASCAL Context and COCO Stuff, on which we set new records.

176 citations


Journal ArticleDOI
17 Jul 2019
TL;DR: A locality-preserved reconstruction term is introduced to infer the missing views such that all views can be naturally aligned and a consensus graph is adaptively learned and embedded via the reverse graph regularization to guarantee the common local structure of multiple views.
Abstract: Multi-view clustering aims to partition data collected from diverse sources based on the assumption that all views are complete. However, such prior assumption is hardly satisfied in many real-world applications, resulting in the incomplete multi-view learning problem. The existing attempts on this problem still have the following limitations: 1) the underlying semantic information of the missing views is commonly ignored; 2) The local structure of data is not well explored; 3) The importance of different views is not effectively evaluated. To address these issues, this paper proposes a Unified Embedding Alignment Framework (UEAF) for robust incomplete multi-view clustering. In particular, a locality-preserved reconstruction term is introduced to infer the missing views such that all views can be naturally aligned. A consensus graph is adaptively learned and embedded via the reverse graph regularization to guarantee the common local structure of multiple views and in turn can further align the incomplete views and inferred views. Moreover, an adaptive weighting strategy is designed to capture the importance of different views. Extensive experimental results show that the proposed method can significantly improve the clustering performance in comparison with some state-of-the-art methods.

116 citations


Posted Content
TL;DR: A new Attention-Guided Generative Adversarial Networks (AttentionGAN) is proposed for the unpaired image-to-image translation task which can identify the most discriminative semantic objects and minimize changes of unwanted parts for semantic manipulation problems without using extra data and models.
Abstract: State-of-the-art methods in image-to-image translation are capable of learning a mapping from a source domain to a target domain with unpaired image data. Though the existing methods have achieved promising results, they still produce visual artifacts, being able to translate low-level information but not high-level semantics of input images. One possible reason is that generators do not have the ability to perceive the most discriminative parts between the source and target domains, thus making the generated images low quality. In this paper, we propose a new Attention-Guided Generative Adversarial Networks (AttentionGAN) for the unpaired image-to-image translation task. AttentionGAN can identify the most discriminative foreground objects and minimize the change of the background. The attention-guided generators in AttentionGAN are able to produce attention masks, and then fuse the generation output with the attention masks to obtain high-quality target images. Accordingly, we also design a novel attention-guided discriminator which only considers attended regions. Extensive experiments are conducted on several generative tasks with eight public datasets, demonstrating that the proposed method is effective to generate sharper and more realistic images compared with existing competitive models. The code is available at this https URL.

76 citations


Proceedings ArticleDOI
01 Sep 2019
TL;DR: Zhang et al. as mentioned in this paper proposed Deep Symmetry Enhanced Network (DSEN) that is able to explicitly extract the rotation equivariant features from rain images and designed a self-refining mechanism to remove the accumulated rain streaks in a coarse-to-fine manner.
Abstract: Rain removal aims to remove the rain streaks on rain images. The state-of-the-art methods are mostly based on Convolutional Neural Network (CNN). However, as CNN is not equivariant to object rotation, these methods are unsuitable for dealing with the tilted rain streaks. To tackle this problem, we propose Deep Symmetry Enhanced Network (DSEN) that is able to explicitly extract the rotation equivariant features from rain images. In addition, we design a self-refining mechanism to remove the accumulated rain streaks in a coarse-to-fine manner. This mechanism reuses DSEN with a novel information link which passes the gradient flow to the higher stages. Extensive experiments on both synthetic and real-world rain images show that our self-refining DSEN yields the top performance.

4 citations