scispace - formally typeset
Search or ask a question
Author

Peixi Peng

Other affiliations: Chinese Academy of Sciences
Bio: Peixi Peng is an academic researcher from Peking University. The author has contributed to research in topics: Computer science & Discriminative model. The author has an hindex of 10, co-authored 33 publications receiving 689 citations. Previous affiliations of Peixi Peng include Chinese Academy of Sciences.

Papers
More filters
Proceedings ArticleDOI
01 Jun 2016
TL;DR: This work presents an multi-task dictionary learning method which is able to learn a dataset-shared but target-data-biased representation, and demonstrates that the method significantly outperforms the state-of-the-art.
Abstract: Most existing person re-identification (Re-ID) approaches follow a supervised learning framework, in which a large number of labelled matching pairs are required for training. This severely limits their scalability in realworld applications. To overcome this limitation, we develop a novel cross-dataset transfer learning approach to learn a discriminative representation. It is unsupervised in the sense that the target dataset is completely unlabelled. Specifically, we present an multi-task dictionary learning method which is able to learn a dataset-shared but targetdata-biased representation. Experimental results on five benchmark datasets demonstrate that the method significantly outperforms the state-of-the-art.

408 citations

Journal ArticleDOI
TL;DR: A novel learning framework, termed Adversarial Reciprocal Point Learning (ARPL), is proposed to minimize the overlap of known distribution and unknown distributions without loss of known classification accuracy and extensive experimental results indicate that the proposed method is significantly superior to other existing approaches and achieves state-of-the-art performance.
Abstract: Open set recognition (OSR), aiming to simultaneously classify the seen classes and identify the unseen classes as unknown, is essential for reliable machine learning. The key challenge of OSR is how to reduce the empirical classification risk on the labeled known data and the open space risk on the potential unknown data simultaneously. To handle the challenge, we formulate the open space risk problem from the perspective of multi-class integration, and model the unexploited extra-class space with a novel concept Reciprocal Point. Follow this, a novel Adversarial Reciprocal Point Learning framework is proposed to minimize the overlap of known distribution and unknown distributions without loss of known classification accuracy. Specifically, each reciprocal point is learned by the extra-class space with the corresponding known category, and the confrontation among multiple known categories are employed to reduce the empirical classification risk. An adversarial margin constraint is proposed to reduce the open space risk by limiting the latent open space constructed by reciprocal points. Moreover, an instantiated adversarial enhancement method is designed to generate diverse and confusing training samples. Extensive experimental results on various benchmark datasets indicate that the proposed method is significantly superior to existing approaches and achieves state-of-the-art performance.

107 citations

Journal ArticleDOI
TL;DR: It is argued that learning the latent attributes jointly with user-defined semantic attributes not only leads to better representation but also helps semantic attribute prediction.
Abstract: A number of vision problems such as zero-shot learning and person re-identification can be considered as cross-class transfer learning problems. As mid-level semantic properties shared cross different object classes, attributes have been studied extensively for knowledge transfer across classes. Most previous attribute learning methods focus only on human-defined/nameable semantic attributes, whilst ignoring the fact there also exist undefined/latent shareable visual properties, or latent attributes. These latent attributes can be either discriminative or non-discriminative parts depending on whether they can contribute to an object recognition task. In this work, we argue that learning the latent attributes jointly with user-defined semantic attributes not only leads to better representation but also helps semantic attribute prediction. A novel dictionary learning model is proposed which decomposes the dictionary space into three parts corresponding to semantic, latent discriminative and latent background attributes respectively. Such a joint attribute learning model is then extended by following a multi-task transfer learning framework to address a more challenging unsupervised domain adaptation problem, where annotations are only available on an auxiliary dataset and the target dataset is completely unlabelled. Extensive experiments show that the proposed models, though being linear and thus extremely efficient to compute, produce state-of-the-art results on both zero-shot learning and person re-identification.

95 citations

Book ChapterDOI
23 Aug 2020
TL;DR: A new concept, Reciprocal Point, which is the potential representation of the extra-class space corresponding to each known category, which can indirectly introduce the unknown information into the learner with only known classes, so as to learn more compact and discriminative representations.
Abstract: Open set recognition is an emerging research area that aims to simultaneously classify samples from predefined classes and identify the rest as ‘unknown’. In this process, one of the key challenges is to reduce the risk of generalizing the inherent characteristics of numerous unknown samples learned from a small amount of known data. In this paper, we propose a new concept, Reciprocal Point, which is the potential representation of the extra-class space corresponding to each known category. The sample can be classified to known or unknown by the otherness with reciprocal points. To tackle the open set problem, we offer a novel open space risk regularization term. Based on the bounded space constructed by reciprocal points, the risk of unknown is reduced through multi-category interaction. The novel learning framework called Reciprocal Point Learning (RPL), which can indirectly introduce the unknown information into the learner with only known classes, so as to learn more compact and discriminative representations. Moreover, we further construct a new large-scale challenging aircraft dataset for open set recognition: Aircraft 300 (Air-300). Extensive experiments on multiple benchmark datasets indicate that our framework is significantly superior to other existing approaches and achieves state-of-the-art performance on standard open set benchmarks.

82 citations

Proceedings ArticleDOI
Ding Feifei1, Peixi Peng1, Yangru Huang1, Mengyue Geng1, Yonghong Tian1 
12 Oct 2020
TL;DR: A novel latent part detection (LPD) model is proposed to locate the latent facial part which is robust to mask wearing, and the latent part is further used to extract discriminative features.
Abstract: This paper focuses on a novel task named masked faces recognition (MFR), which aims to match masked faces with common faces and is important especially during the global outbreak of COVID-19. It is challenging to identify masked faces for two main reasons. Firstly, there is no large-scale training data and test data with ground truth for MFR. Collecting and annotating millions of masked faces is labor-consuming. Secondly, since most facial cues are occluded by mask, it is necessary to learn representations which are both discriminative and robust to mask wearing. To handle the first challenge, this paper collects two datasets designed for MFR: MFV with 400 pairs of 200 identities for verification, and MFI which contains 4,916 images of 669 identities for identification. As is known, a robust face recognition model needs images of millions of identities to train, and hundreds of identities is far from enough. Hence, MFV and MFI are only considered as test datasets to evaluate algorithms. Besides, a data augmentation method for training data is introduced to automatically generate synthetic masked face images from existing common face datasets. In addition, a novel latent part detection (LPD) model is proposed to locate the latent facial part which is robust to mask wearing, and the latent part is further used to extract discriminative features. The proposed LPD model is trained in an end-to-end manner and only utilizes the original and synthetic training data. Experimental results on MFV, MFI and synthetic masked LFW demonstrate that LPD model generalizes well on both realistic and synthetic masked data and outperforms other methods by a large margin.

61 citations


Cited by
More filters
Proceedings ArticleDOI
01 Jun 2019
TL;DR: This work proves the core reason Siamese trackers still have accuracy gap comes from the lack of strict translation invariance, and proposes a new model architecture to perform depth-wise and layer-wise aggregations, which not only improves the accuracy but also reduces the model size.
Abstract: Siamese network based trackers formulate tracking as convolutional feature cross-correlation between target template and searching region. However, Siamese trackers still have accuracy gap compared with state-of-the-art algorithms and they cannot take advantage of feature from deep networks, such as ResNet-50 or deeper. In this work we prove the core reason comes from the lack of strict translation invariance. By comprehensive theoretical analysis and experimental validations, we break this restriction through a simple yet effective spatial aware sampling strategy and successfully train a ResNet-driven Siamese tracker with significant performance gain. Moreover, we propose a new model architecture to perform depth-wise and layer-wise aggregations, which not only further improves the accuracy but also reduces the model size. We conduct extensive ablation studies to demonstrate the effectiveness of the proposed tracker, which obtains currently the best results on four large tracking benchmarks, including OTB2015, VOT2018, UAV123, and LaSOT. Our model will be released to facilitate further studies based on this problem.

1,478 citations

Posted Content
TL;DR: The history of person re-identification and its relationship with image classification and instance retrieval is introduced and two new re-ID tasks which are much closer to real-world applications are described and discussed.
Abstract: Person re-identification (re-ID) has become increasingly popular in the community due to its application and research significance. It aims at spotting a person of interest in other cameras. In the early days, hand-crafted algorithms and small-scale evaluation were predominantly reported. Recent years have witnessed the emergence of large-scale datasets and deep learning systems which make use of large data volumes. Considering different tasks, we classify most current re-ID methods into two classes, i.e., image-based and video-based; in both tasks, hand-crafted and deep learning systems will be reviewed. Moreover, two new re-ID tasks which are much closer to real-world applications are described and discussed, i.e., end-to-end re-ID and fast re-ID in very large galleries. This paper: 1) introduces the history of person re-ID and its relationship with image classification and instance retrieval; 2) surveys a broad selection of the hand-crafted systems and the large-scale methods in both image- and video-based re-ID; 3) describes critical future directions in end-to-end re-ID and fast retrieval in large galleries; and 4) finally briefs some important yet under-developed issues.

984 citations

Journal ArticleDOI
TL;DR: An attribute-person recognition (APR) network is proposed, a multi-task network which learns a re-ID embedding and at the same time predicts pedestrian attributes, and demonstrates that by learning a more discriminative representation, APR achieves competitive re-IDs performance compared with the state-of-the-art methods.

762 citations

Proceedings ArticleDOI
25 Sep 2017
TL;DR: Zhang et al. as mentioned in this paper proposed a Pose-driven Deep Convolutional (PDC) model to learn improved feature extraction and matching models from end to end, which explicitly leverages the human part cues to alleviate the pose variations and learn robust feature representations from both the global image and different local parts.
Abstract: Feature extraction and matching are two crucial components in person Re-Identification (ReID). The large pose deformations and the complex view variations exhibited by the captured person images significantly increase the difficulty of learning and matching of the features from person images. To overcome these difficulties, in this work we propose a Pose-driven Deep Convolutional (PDC) model to learn improved feature extraction and matching models from end to end. Our deep architecture explicitly leverages the human part cues to alleviate the pose variations and learn robust feature representations from both the global image and different local parts. To match the features from global human body and local body parts, a pose driven feature weighting sub-network is further designed to learn adaptive feature fusions. Extensive experimental analyses and results on three popular datasets demonstrate significant performance improvements of our model over all published state-of-the-art methods.

655 citations

Proceedings ArticleDOI
01 Oct 2017
TL;DR: This paper proposes a simple yet effective human part-aligned representation for handling the body part misalignment problem, and shows state-of-the-art results over standard datasets, Market-1501,CUHK03, CUHK01 and VIPeR.
Abstract: In this paper, we address the problem of person re-identification, which refers to associating the persons captured from different cameras. We propose a simple yet effective human part-aligned representation for handling the body part misalignment problem. Our approach decomposes the human body into regions (parts) which are discriminative for person matching, accordingly computes the representations over the regions, and aggregates the similarities computed between the corresponding regions of a pair of probe and gallery images as the overall matching score. Our formulation, inspired by attention models, is a deep neural network modeling the three steps together, which is learnt through minimizing the triplet loss function without requiring body part labeling information. Unlike most existing deep learning algorithms that learn a global or spatial partition-based local representation, our approach performs human body partition, and thus is more robust to pose changes and various human spatial distributions in the person bounding box. Our approach shows state-of-the-art results over standard datasets, Market-1501, CUHK03, CUHK01 and VIPeR. 1

653 citations