scispace - formally typeset
Search or ask a question
Author

Chunjie Zhang

Bio: Chunjie Zhang is an academic researcher from Beijing Jiaotong University. The author has contributed to research in topics: Contextual image classification & Feature (computer vision). The author has an hindex of 22, co-authored 94 publications receiving 1518 citations. Previous affiliations of Chunjie Zhang include Sun Yat-sen University & Chinese Academy of Sciences.


Papers
More filters
Proceedings ArticleDOI
20 Jun 2011
TL;DR: An image classification framework by leveraging the non-negative sparse coding, low-rank and sparse matrix decomposition techniques (LR-Sc+ SPM), which achieves or outperforms the state-of-the-art results on several benchmarks.
Abstract: We propose an image classification framework by leveraging the non-negative sparse coding, low-rank and sparse matrix decomposition techniques (LR-Sc+ SPM). First, we propose a new non-negative sparse coding along with max pooling and spatial pyramid matching method (Sc+ SPM) to extract local features' information in order to represent images, where non-negative sparse coding is used to encode local features. Max pooling along with spatial pyramid matching (SPM) is then utilized to get the feature vectors to represent images. Second, motivated by the observation that images of the same class often contain correlated (or common) items and specific (or noisy) items, we propose to leverage the low-rank and sparse matrix recovery technique to decompose the feature vectors of images per class into a low-rank matrix and a sparse error matrix. To incorporate the common and specific attributes into the image representation, we still adopt the idea of sparse coding to recode the Sc+ SPM representation of each image. In particular, we collect the columns of the both matrixes as the bases and use the coding parameters as the updated image representation by learning them through the locality-constrained linear coding (LLC). Finally, linear SVM classifier is leveraged for the final classification. Experimental results show that the proposed method achieves or outperforms the state-of-the-art results on several benchmarks.

181 citations

Journal ArticleDOI
TL;DR: This work learns a cross-modality bridging dictionary for the deep and complete understanding of a vast quantity of web images and proposes a knowledge-based concept transferring algorithm to discover the underlying relations of different categories.
Abstract: The understanding of web images has been a hot research topic in both artificial intelligence and multimedia content analysis domains. The web images are composed of various complex foregrounds and backgrounds, which makes the design of an accurate and robust learning algorithm a challenging task. To solve the above significant problem, first, we learn a cross-modality bridging dictionary for the deep and complete understanding of a vast quantity of web images. The proposed algorithm leverages the visual features into the semantic concept probability distribution, which can construct a global semantic description for images while preserving the local geometric structure. To discover and model the occurrence patterns between intra- and inter-categories, multi-task learning is introduced for formulating the objective formulation with Capped- $\ell _{1}$ penalty, which can obtain the optimal solution with a higher probability and outperform the traditional convex function-based methods. Second, we propose a knowledge-based concept transferring algorithm to discover the underlying relations of different categories. This distribution probability transferring among categories can bring the more robust global feature representation, and enable the image semantic representation to generalize better as the scenario becomes larger. Experimental comparisons and performance discussion with classical methods on the ImageNet, Caltech-256, SUN397, and Scene15 datasets show the effectiveness of our proposed method at three traditional image understanding tasks.

169 citations

Proceedings ArticleDOI
18 Jun 2018
TL;DR: A simple yet effective Two-Step Quantization (TSQ) framework is proposed, by decomposing the network quantization problem into two steps: code learning and transformation function learning based on the learned codes, and the sparse quantization method for code learning.
Abstract: Every bit matters in the hardware design of quantized neural networks. However, extremely-low-bit representation usually causes large accuracy drop. Thus, how to train extremely-low-bit neural networks with high accuracy is of central importance. Most existing network quantization approaches learn transformations (low-bit weights) as well as encodings (low-bit activations) simultaneously. This tight coupling makes the optimization problem difficult, and thus prevents the network from learning optimal representations. In this paper, we propose a simple yet effective Two-Step Quantization (TSQ) framework, by decomposing the network quantization problem into two steps: code learning and transformation function learning based on the learned codes. For the first step, we propose the sparse quantization method for code learning. The second step can be formulated as a non-linear least square regression problem with low-bit constraints, which can be solved efficiently in an iterative manner. Extensive experiments on CIFAR-10 and ILSVRC-12 datasets demonstrate that the proposed TSQ is effective and outperforms the state-of-the-art by a large margin. Especially, for 2-bit activation and ternary weight quantization of AlexNet, the accuracy of our TSQ drops only about 0.5 points compared with the full-precision counterpart, outperforming current state-of-the-art by more than 5 points.

130 citations

Journal ArticleDOI
TL;DR: A novel Parallel Down-up Fusion network (PDF-Net) for SOD in optical RSIs, which takes full advantage of the in-path low- and high-level features and cross-path multi-resolution features to distinguish diversely scaled salient objects and suppress the cluttered backgrounds.

73 citations

Journal ArticleDOI
TL;DR: This paper proposes not to directly use the activations of fully connected layers of a 3-D CNN as the video feature, but to use selective convolutional layer activations to form a discriminative descriptor for video.
Abstract: 3-D convolutional neural networks (3-D CNNs) have been established as a powerful tool to simultaneously learn features from both spatial and temporal dimensions, which is suitable to be applied to video-based action recognition. In this paper, we propose not to directly use the activations of fully connected layers of a 3-D CNN as the video feature, but to use selective convolutional layer activations to form a discriminative descriptor for video. It pools the feature on the convolutional layers under the guidance of body joint positions. Two schemes of mapping body joints into convolutional feature maps for pooling are discussed. The body joint positions can be obtained from any off-the-shelf skeleton estimation algorithm. The helpfulness of the body joint guided feature pooling with inaccurate skeleton estimation is systematically evaluated. To make it end-to-end and do not rely on any sophisticated body joint detection algorithm, we further propose a two-stream bilinear model which can learn the guidance from the body joints and capture the spatio-temporal features simultaneously. In this model, the body joint guided feature pooling is conveniently formulated as a bilinear product operation. Experimental results on three real-world datasets demonstrate the effectiveness of body joint guided pooling which achieves promising performance.

67 citations


Cited by
More filters
Journal ArticleDOI
01 Apr 1988-Nature
TL;DR: In this paper, a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) is presented.
Abstract: Deposits of clastic carbonate-dominated (calciclastic) sedimentary slope systems in the rock record have been identified mostly as linearly-consistent carbonate apron deposits, even though most ancient clastic carbonate slope deposits fit the submarine fan systems better. Calciclastic submarine fans are consequently rarely described and are poorly understood. Subsequently, very little is known especially in mud-dominated calciclastic submarine fan systems. Presented in this study are a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) that reveals a >250 m thick calciturbidite complex deposited in a calciclastic submarine fan setting. Seven facies are recognised from core and thin section characterisation and are grouped into three carbonate turbidite sequences. They include: 1) Calciturbidites, comprising mostly of highto low-density, wavy-laminated bioclast-rich facies; 2) low-density densite mudstones which are characterised by planar laminated and unlaminated muddominated facies; and 3) Calcidebrites which are muddy or hyper-concentrated debrisflow deposits occurring as poorly-sorted, chaotic, mud-supported floatstones. These

9,929 citations

01 Jan 2006

3,012 citations

Posted Content
TL;DR: The history of person re-identification and its relationship with image classification and instance retrieval is introduced and two new re-ID tasks which are much closer to real-world applications are described and discussed.
Abstract: Person re-identification (re-ID) has become increasingly popular in the community due to its application and research significance. It aims at spotting a person of interest in other cameras. In the early days, hand-crafted algorithms and small-scale evaluation were predominantly reported. Recent years have witnessed the emergence of large-scale datasets and deep learning systems which make use of large data volumes. Considering different tasks, we classify most current re-ID methods into two classes, i.e., image-based and video-based; in both tasks, hand-crafted and deep learning systems will be reviewed. Moreover, two new re-ID tasks which are much closer to real-world applications are described and discussed, i.e., end-to-end re-ID and fast re-ID in very large galleries. This paper: 1) introduces the history of person re-ID and its relationship with image classification and instance retrieval; 2) surveys a broad selection of the hand-crafted systems and the large-scale methods in both image- and video-based re-ID; 3) describes critical future directions in end-to-end re-ID and fast retrieval in large galleries; and 4) finally briefs some important yet under-developed issues.

984 citations

Journal ArticleDOI
TL;DR: A comprehensive overview of sparse representation is provided and an experimentally comparative study of these sparse representation algorithms was presented, which could sufficiently reveal the potential nature of the sparse representation theory.
Abstract: Sparse representation has attracted much attention from researchers in fields of signal processing, image processing, computer vision, and pattern recognition. Sparse representation also has a good reputation in both theoretical research and practical applications. Many different algorithms have been proposed for sparse representation. The main purpose of this paper is to provide a comprehensive study and an updated review on sparse representation and to supply guidance for researchers. The taxonomy of sparse representation methods can be studied from various viewpoints. For example, in terms of different norm minimizations used in sparsity constraints, the methods can be roughly categorized into five groups: 1) sparse representation with $l_{0}$ -norm minimization; 2) sparse representation with $l_{p}$ -norm ( $0 ) minimization; 3) sparse representation with $l_{1}$ -norm minimization; 4) sparse representation with $l_{2,1}$ -norm minimization; and 5) sparse representation with $l_{2}$ -norm minimization. In this paper, a comprehensive overview of sparse representation is provided. The available sparse representation algorithms can also be empirically categorized into four groups: 1) greedy strategy approximation; 2) constrained optimization; 3) proximity algorithm-based optimization; and 4) homotopy algorithm-based sparse representation. The rationales of different algorithms in each category are analyzed and a wide range of sparse representation applications are summarized, which could sufficiently reveal the potential nature of the sparse representation theory. In particular, an experimentally comparative study of these sparse representation algorithms was presented.

925 citations

Journal ArticleDOI
TL;DR: A scalable distance driven feature learning framework based on the deep neural network for person re-identification that achieves very promising results and outperforms other state-of-the-art approaches.

748 citations