scispace - formally typeset
Search or ask a question
Author

Zhibo Chen

Bio: Zhibo Chen is an academic researcher from University of Science and Technology of China. The author has contributed to research in topics: Image quality & Computer science. The author has an hindex of 27, co-authored 344 publications receiving 3385 citations. Previous affiliations of Zhibo Chen include Sony Broadcast & Professional Research Laboratories & Microsoft.


Papers
More filters
Proceedings ArticleDOI
14 Jun 2020
TL;DR: This work proposes an effective Relation-Aware Global Attention (RGA) module which captures the global structural information for better attention learning and proposes to stack the relations, i.e., its pairwise correlations/affinities with all the feature positions together to learn the attention with a shallow convolutional model.
Abstract: For person re-identification (re-id), attention mechanisms have become attractive as they aim at strengthening discriminative features and suppressing irrelevant ones, which matches well the key of re-id, i.e., discriminative feature learning. Previous approaches typically learn attention using local convolutions, ignoring the mining of knowledge from global structure patterns. Intuitively, the affinities among spatial positions/nodes in the feature map provide clustering-like information and are helpful for inferring semantics and thus attention, especially for person images where the feasible human poses are constrained. In this work, we propose an effective Relation-Aware Global Attention (RGA) module which captures the global structural information for better attention learning. Specifically, for each feature position, in order to compactly grasp the structural information of global scope and local appearance information, we propose to stack the relations, i.e., its pairwise correlations/affinities with all the feature positions (e.g., in raster scan order), and the feature itself together to learn the attention with a shallow convolutional model. Extensive ablation studies demonstrate that our RGA can significantly enhance the feature representation power and help achieve the state-of-the-art performance on several popular benchmarks. The source code is available at https://github.com/microsoft/Relation-Aware-Global-Attention-Networks.

354 citations

Proceedings ArticleDOI
14 Jun 2020
TL;DR: The aim of this paper is to design a generalizable person ReID framework which trains a model on source domains yet is able to generalize/perform well on target domains, and to enforce a dual causal loss constraint in SNR to encourage the separation of identity-relevant features and identity-irrelevant features.
Abstract: Existing fully-supervised person re-identification (ReID) methods usually suffer from poor generalization capability caused by domain gaps. The key to solving this problem lies in filtering out identity-irrelevant interference and learning domain-invariant person representations. In this paper, we aim to design a generalizable person ReID framework which trains a model on source domains yet is able to generalize/perform well on target domains. To achieve this goal, we propose a simple yet effective Style Normalization and Restitution (SNR) module. Specifically, we filter out style variations (e.g., illumination, color contrast) by Instance Normalization (IN). However, such a process inevitably removes discriminative information. We propose to distill identity-relevant feature from the removed information and restitute it to the network to ensure high discrimination. For better disentanglement, we enforce a dual causal loss constraint in SNR to encourage the separation of identity-relevant features and identity-irrelevant features. Extensive experiments demonstrate the strong generalization capability of our framework. Our models empowered by the SNR modules significantly outperform the state-of-the-art domain generalization approaches on multiple widely-used person ReID benchmarks, and also show superiority on unsupervised domain adaptation.

276 citations

Proceedings ArticleDOI
15 Jun 2019
TL;DR: Zhang et al. as discussed by the authors proposed a two-stream network that consists of a main full image stream (MF-Stream) and a densely semantically-aligned guiding stream (DSAG-Stream).
Abstract: We propose a densely semantically aligned person re-identification (re-ID) framework. It fundamentally addresses the body misalignment problem caused by pose/viewpoint variations, imperfect person detection, occlusion, etc.. By leveraging the estimation of the dense semantics of a person image, we construct a set of densely semantically aligned part images (DSAP-images), where the same spatial positions have the same semantics across different person images. We design a two-stream network that consists of a main full image stream (MF-Stream) and a densely semantically-aligned guiding stream (DSAG-Stream). The DSAG-Stream, with the DSAP-images as input, acts as a regulator to guide the MF-Stream to learn densely semantically aligned features from the original image. In the inference, the DSAG-Stream is discarded and only the MF-Stream is needed, which makes the inference system computationally efficient and robust. To our best knowledge, we are the first to make use of fine grained semantics for addressing misalignment problems for re-ID. Our method achieves rank-1 accuracy of 78.9% (new protocol) on the CUHK03 dataset, 90.4% on the CUHK01 dataset, and 95.7% on the Market1501 dataset, outperforming state-of-the-art methods.

233 citations

Journal ArticleDOI
TL;DR: A hybrid Unsymmetrical-cross Multi-hexagon-grid Search (UMHexagonS) algorithm is introduced, which well solves the false motion vector estimation problem because of the local-minimum and can save 30–50% computation compared with the Full Fractional-pel Search scheme.

196 citations

Journal ArticleDOI
TL;DR: This paper proposes effective and efficient end-to-end convolutional neural network models for spatially super-resolving LF images with an hourglass shape, which allows feature extraction to be performed at the low-resolution level to save both the computational and memory costs.
Abstract: Light field (LF) photography is an emerging paradigm for capturing more immersive representations of the real world. However, arising from the inherent tradeoff between the angular and spatial dimensions, the spatial resolution of LF images captured by commercial micro-lens-based LF cameras is significantly constrained. In this paper, we propose effective and efficient end-to-end convolutional neural network models for spatially super-resolving LF images. Specifically, the proposed models have an hourglass shape, which allows feature extraction to be performed at the low-resolution level to save both the computational and memory costs. To fully make use of the 4D structure information of LF data in both the spatial and angular domains, we propose to use 4D convolution to characterize the relationship among pixels. Moreover, as an approximation of 4D convolution, we also propose to use spatial-angular separable (SAS) convolutions for more computationally and memory-efficient extraction of spatial-angular joint features. Extensive experimental results on 57 test LF images with various challenging natural scenes show significant advantages from the proposed models over the state-of-the-art methods. That is, an average PSNR gain of more than 3.0 dB and better visual quality are achieved, and our methods preserve the LF structure of the super-resolved LF images better, which is highly desirable for subsequent applications. In addition, the SAS convolution-based model can achieve three times speed up with only negligible reconstruction quality decrease when compared with the 4D convolution-based one. The source code of our method is available online.

138 citations


Cited by
More filters
Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Journal ArticleDOI
01 Apr 1988-Nature
TL;DR: In this paper, a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) is presented.
Abstract: Deposits of clastic carbonate-dominated (calciclastic) sedimentary slope systems in the rock record have been identified mostly as linearly-consistent carbonate apron deposits, even though most ancient clastic carbonate slope deposits fit the submarine fan systems better. Calciclastic submarine fans are consequently rarely described and are poorly understood. Subsequently, very little is known especially in mud-dominated calciclastic submarine fan systems. Presented in this study are a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) that reveals a >250 m thick calciturbidite complex deposited in a calciclastic submarine fan setting. Seven facies are recognised from core and thin section characterisation and are grouped into three carbonate turbidite sequences. They include: 1) Calciturbidites, comprising mostly of highto low-density, wavy-laminated bioclast-rich facies; 2) low-density densite mudstones which are characterised by planar laminated and unlaminated muddominated facies; and 3) Calcidebrites which are muddy or hyper-concentrated debrisflow deposits occurring as poorly-sorted, chaotic, mud-supported floatstones. These

9,929 citations

Proceedings Article
01 Jan 1994
TL;DR: The main focus in MUCKE is on cleaning large scale Web image corpora and on proposing image representations which are closer to the human interpretation of images.
Abstract: MUCKE aims to mine a large volume of images, to structure them conceptually and to use this conceptual structuring in order to improve large-scale image retrieval. The last decade witnessed important progress concerning low-level image representations. However, there are a number problems which need to be solved in order to unleash the full potential of image mining in applications. The central problem with low-level representations is the mismatch between them and the human interpretation of image content. This problem can be instantiated, for instance, by the incapability of existing descriptors to capture spatial relationships between the concepts represented or by their incapability to convey an explanation of why two images are similar in a content-based image retrieval framework. We start by assessing existing local descriptors for image classification and by proposing to use co-occurrence matrices to better capture spatial relationships in images. The main focus in MUCKE is on cleaning large scale Web image corpora and on proposing image representations which are closer to the human interpretation of images. Consequently, we introduce methods which tackle these two problems and compare results to state of the art methods. Note: some aspects of this deliverable are withheld at this time as they are pending review. Please contact the authors for a preview.

2,134 citations

Reference EntryDOI
15 Oct 2004

2,118 citations