Volume structured ordinal features with background similarity measure for video face recognition

doi:10.1109/ICB.2013.6612990

Home
/
Papers
/
Volume structured ordinal features with background similarity measure for video face recognition

Proceedings Article•DOI•

Volume structured ordinal features with background similarity measure for video face recognition

Heydi Méndez-Vázquez, Yoanna Martínez-Díaz, Zhenhua Chai

04 Jun 2013-pp 1-6

TL;DR: The proposed method not only encodes jointly the local spatial and temporal information, but also extracts the most discriminative facial dynamic information while trying to discard spatio-temporal features related to intra-personal variations.

read less

Abstract: It has been shown in different studies the benefits of using spatio-temporal information for video face recognition. However, most of the existing spatio-temporal representations do not capture the local discriminative information present in human faces. In this paper we introduce a new local spatio-temporal descriptor, based on structured ordinal features, for video face recognition. The proposed method not only encodes jointly the local spatial and temporal information, but also extracts the most discriminative facial dynamic information while trying to discard spatio-temporal features related to intra-personal variations. Besides, a similarity measure based on a set of background samples is proposed to be used with our descriptor, showing to boost its performance. Extensive experiments conducted on the recent but difficult YouTube Faces database demonstrate the good performance of our proposal, achieving state-of-the-art results.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

DeepFace: Closing the Gap to Human-Level Performance in Face Verification

[...]

Yaniv Taigman¹, Ming Yang¹, Marc'Aurelio Ranzato¹, Lior Wolf²•Institutions (2)

Facebook¹, Tel Aviv University²

23 Jun 2014

TL;DR: This work revisits both the alignment step and the representation step by employing explicit 3D face modeling in order to apply a piecewise affine transformation, and derive a face representation from a nine-layer deep neural network.

...read moreread less

Abstract: In modern face recognition, the conventional pipeline consists of four stages: detect => align => represent => classify. We revisit both the alignment step and the representation step by employing explicit 3D face modeling in order to apply a piecewise affine transformation, and derive a face representation from a nine-layer deep neural network. This deep network involves more than 120 million parameters using several locally connected layers without weight sharing, rather than the standard convolutional layers. Thus we trained it on the largest facial dataset to-date, an identity labeled dataset of four million facial images belonging to more than 4, 000 identities. The learned representations coupling the accurate model-based alignment with the large facial database generalize remarkably well to faces in unconstrained environments, even with a simple classifier. Our method reaches an accuracy of 97.35% on the Labeled Faces in the Wild (LFW) dataset, reducing the error of the current state of the art by more than 27%, closely approaching human-level performance.

...read moreread less

6,132 citations

Additional excerpts

...VSOF+OSS [23] 79....
[...]

Proceedings Article•DOI•

Discriminative Deep Metric Learning for Face Verification in the Wild

[...]

Junlin Hu¹, Jiwen Lu, Yap-Peng Tan¹•Institutions (1)

Nanyang Technological University¹

23 Jun 2014

TL;DR: The proposed DDML trains a deep neural network which learns a set of hierarchical nonlinear transformations to project face pairs into the same feature subspace, under which the distance of each positive face pair is less than a smaller threshold and that of each negative pair is higher than a larger threshold.

...read moreread less

Abstract: This paper presents a new discriminative deep metric learning (DDML) method for face verification in the wild. Different from existing metric learning-based face verification methods which aim to learn a Mahalanobis distance metric to maximize the inter-class variations and minimize the intra-class variations, simultaneously, the proposed DDML trains a deep neural network which learns a set of hierarchical nonlinear transformations to project face pairs into the same feature subspace, under which the distance of each positive face pair is less than a smaller threshold and that of each negative pair is higher than a larger threshold, respectively, so that discriminative information can be exploited in the deep network. Our method achieves very competitive face verification performance on the widely used LFW and YouTube Faces (YTF) datasets.

...read moreread less

730 citations

Cites methods from "Volume structured ordinal features ..."

...These compared methods include Matched Background Similarity (MBGS) [34], APEM [21], STFRD+PMML [6], MBGS+SVM [37], VSOF+OSS (Adaboost) [24], and PHL+SILD [16]....
[...]

Proceedings Article•DOI•

Neural Aggregation Network for Video Face Recognition

[...]

Jiaolong Yang¹, Peiran Ren², Dongqing Zhang², Dong Chen², Fang Wen², Hongdong Li¹, Gang Hua² - Show less +3 more•Institutions (2)

Australian National University¹, Microsoft²

01 Jul 2017

TL;DR: This NAN is trained with a standard classification or verification loss without any extra supervision signal, and it is found that it automatically learns to advocate high-quality face images while repelling low-quality ones such as blurred, occluded and improperly exposed faces.

...read moreread less

Abstract: This paper presents a Neural Aggregation Network (NAN) for video face recognition. The network takes a face video or face image set of a person with a variable number of face images as its input, and produces a compact, fixed-dimension feature representation for recognition. The whole network is composed of two modules. The feature embedding module is a deep Convolutional Neural Network (CNN) which maps each face image to a feature vector. The aggregation module consists of two attention blocks which adaptively aggregate the feature vectors to form a single feature inside the convex hull spanned by them. Due to the attention mechanism, the aggregation is invariant to the image order. Our NAN is trained with a standard classification or verification loss without any extra supervision signal, and we found that it automatically learns to advocate high-quality face images while repelling low-quality ones such as blurred, occluded and improperly exposed faces. The experiments on IJB-A, YouTube Face, Celebrity-1000 video face recognition benchmarks show that it consistently outperforms naive aggregation methods and achieves the state-of-the-art accuracy.

...read moreread less

323 citations

Cites background from "Volume structured ordinal features ..."

...Video face recognition has caught more and more attention from the community in recent years [29, 17, 30, 5, 20, 18, 19, 21, 10, 26, 23]....
[...]

Posted Content•

Neural Aggregation Network for Video Face Recognition

[...]

Jiaolong Yang¹, Peiran Ren², Dongqing Zhang², Dong Chen², Fang Wen², Hongdong Li¹, Gang Hua² - Show less +3 more•Institutions (2)

Australian National University¹, Microsoft²

17 Mar 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: Wang et al. as mentioned in this paper proposed a Neural Aggregation Network (NAN) for video face recognition, which consists of two attention blocks which adaptively aggregate the feature vectors to form a single feature inside the convex hull spanned by them.

...read moreread less

291 citations

Journal Article•DOI•

Discriminative Deep Metric Learning for Face and Kinship Verification

[...]

Jiwen Lu¹, Junlin Hu², Yap-Peng Tan²•Institutions (2)

Tsinghua University¹, Nanyang Technological University²

20 Jun 2017-IEEE Transactions on Image Processing

TL;DR: A discriminative deep multi-metric learning method to jointly learn multiple neural networks, under which the correlation of different features of each sample is maximized, and the distance of each positive pair is reduced and that of each negative pair is enlarged.

...read moreread less

Abstract: This paper presents a new discriminative deep metric learning (DDML) method for face and kinship verification in wild conditions. While metric learning has achieved reasonably good performance in face and kinship verification, most existing metric learning methods aim to learn a single Mahalanobis distance metric to maximize the inter-class variations and minimize the intra-class variations, which cannot capture the nonlinear manifold where face images usually lie on. To address this, we propose a DDML method to train a deep neural network to learn a set of hierarchical nonlinear transformations to project face pairs into the same latent feature space, under which the distance of each positive pair is reduced and that of each negative pair is enlarged. To better use the commonality of multiple feature descriptors to make all the features more robust for face and kinship verification, we develop a discriminative deep multi-metric learning method to jointly learn multiple neural networks, under which the correlation of different features of each sample is maximized, and the distance of each positive pair is reduced and that of each negative pair is enlarged. Extensive experimental results show that our proposed methods achieve the acceptable results in both face and kinship verification.

...read moreread less

264 citations

1
2
3
4
…
5
6
7
8
9
10
11
12

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Evaluation of local spatio-temporal features for action recognition

[...]

Heng Wang, Muhammad Muneeb Ullah, Alexander Kläser, Ivan Laptev, Cordelia Schmid - Show less +1 more

07 Sep 2009

TL;DR: It is demonstrated that regular sampling of space-time features consistently outperforms all testedspace-time interest point detectors for human actions in realistic settings and is a consistent ranking for the majority of methods over different datasets.

...read moreread less

Abstract: Local space-time features have recently become a popular video representation for action recognition. Several methods for feature localization and description have been proposed in the literature and promising recognition results were demonstrated for a number of action classes. The comparison of existing methods, however, is often limited given the different experimental settings used. The purpose of this paper is to evaluate and compare previously proposed space-time features in a common experimental setup. In particular, we consider four different feature detectors and six local feature descriptors and use a standard bag-of-features SVM approach for action recognition. We investigate the performance of these methods on a total of 25 action classes distributed over three datasets with varying difficulty. Among interesting conclusions, we demonstrate that regular sampling of space-time features consistently outperforms all tested space-time interest point detectors for human actions in realistic settings. We also demonstrate a consistent ranking for the majority of methods over different datasets and discuss their advantages and limitations.

...read moreread less

1,485 citations

"Volume structured ordinal features ..." refers background in this paper

...Local spatio-temporal descriptors have become very popular for human action recognition in videos [17]....
[...]

Proceedings Article•DOI•

Face recognition in unconstrained videos with matched background similarity

[...]

Lior Wolf¹, Tal Hassner², Itay Maoz¹•Institutions (2)

Tel Aviv University¹, Open University of Israel²

20 Jun 2011

TL;DR: A comprehensive database of labeled videos of faces in challenging, uncontrolled conditions, the ‘YouTube Faces’ database, along with benchmark, pair-matching tests are presented and a novel set-to-set similarity measure, the Matched Background Similarity (MBGS), is described.

...read moreread less

Abstract: Recognizing faces in unconstrained videos is a task of mounting importance. While obviously related to face recognition in still images, it has its own unique characteristics and algorithmic requirements. Over the years several methods have been suggested for this problem, and a few benchmark data sets have been assembled to facilitate its study. However, there is a sizable gap between the actual application needs and the current state of the art. In this paper we make the following contributions. (a) We present a comprehensive database of labeled videos of faces in challenging, uncontrolled conditions (i.e., ‘in the wild’), the ‘YouTube Faces’ database, along with benchmark, pair-matching tests1. (b) We employ our benchmark to survey and compare the performance of a large variety of existing video face recognition techniques. Finally, (c) we describe a novel set-to-set similarity measure, the Matched Background Similarity (MBGS). This similarity is shown to considerably improve performance on the benchmark tests.

...read moreread less

1,423 citations

"Volume structured ordinal features ..." refers background or methods in this paper

...Recently, the public available YouTube Faces database [19] was introduced, containing more than 3425 videos of 1595 subjects obtained from YouTube, with significant variations on expression, illumination, pose, resolution and background....
[...]
...Three LBP variants were used as descriptors in [19] and many state-of-the-arte methods were tested, including all frames comparisons, pose based methods, algebraic methods, non-algebraic set methods and match background similarity methods (MBGS)....
[...]
...Similar to [19], the ROC curve was obtained for all splits together, using the average recognition rates....
[...]
...Recently, an alternative approach which combines both strategies have emerged [19, 20]....
[...]
...In our experiments we follow the protocol defined for the YouTube Faces database [19], where 5000 randomly chosen video pairs are used....
[...]

Book•

Computer Vision Using Local Binary Patterns

[...]

Matti Pietikinen¹, Abdenour Hadid¹, Guoying Zhao¹, Timo Ahonen²•Institutions (2)

University of Oulu¹, Nokia²

21 Jun 2011

TL;DR: Computer Vision Using Local Binary Patterns provides a detailed description of the LBP methods and their variants both in spatial and spatiotemporal domains and provides an excellent overview as to how texture methods can be utilized for solving different kinds of computer vision and image analysis problems.

...read moreread less

Abstract: The recent emergence of Local Binary Patterns (LBP) has led to significant progress in applying texture methods to various computer vision problems and applications. The focus of this research has broadened from 2D textures to 3D textures and spatiotemporal (dynamic) textures. Also, where texture was once utilized for applications such as remote sensing, industrial inspection and biomedical image analysis, the introduction of LBP-based approaches have provided outstanding results in problems relating to face and activity analysis, with future scope for face and facial expression recognition, biometrics, visual surveillance and video analysis. Computer Vision Using Local Binary Patterns provides a detailed description of the LBP methods and their variants both in spatial and spatiotemporal domains. This comprehensive reference also provides an excellent overview as to how texture methods can be utilized for solving different kinds of computer vision and image analysis problems. Source codes of the basic LBP algorithms, demonstrations, some databases and a comprehensive LBP bibliography can be found from an accompanying web site. Topics include: local binary patterns and their variants in spatial and spatiotemporal domains, texture classification and segmentation, description of interest regions, applications in image retrieval and 3D recognition - Recognition and segmentation of dynamic textures, background subtraction, recognition of actions, face analysis using still images and image sequences, visual speech recognition and LBP in various applications. Written by pioneers of LBP, this book is an essential resource for researchers, professional engineers and graduate students in computer vision, image analysis and pattern recognition. The book will also be of interest to all those who work with specific applications of machine vision.

...read moreread less

641 citations

"Volume structured ordinal features ..." refers background in this paper

...One of them is the Extended set of Volume Local Binary Patterns (EVLBP) [5], which is an extension of the very successful Local Binary Pattern (LBP) operator [14] to the case of videos....
[...]

Book Chapter•DOI•

Learning multi-scale block local binary patterns for face recognition

[...]

Shengcai Liao¹, Xiangxin Zhu¹, Zhen Lei¹, Lun Zhang¹, Stan Z. Li¹ - Show less +1 more•Institutions (1)

Chinese Academy of Sciences¹

27 Aug 2007

TL;DR: Experiments on Face Recognition Grand Challenge (FRGC) ver2.0 database show that the proposed MB-LBP method significantly outperforms other LBP based face recognition algorithms.

...read moreread less

Abstract: In this paper, we propose a novel representation, calledMultiscale Block Local Binary Pattern (MB-LBP), and apply it to face recognition. The Local Binary Pattern (LBP) has been proved to be effective for image representation, but it is too local to be robust. In MB-LBP, the computation is done based on average values of block subregions, instead of individual pixels. In this way, MB-LBP code presents several advantages: (1) It ismore robust than LBP; (2) it encodes not only microstructures but also macrostructures of image patterns, and hence provides a more complete image representation than the basic LBP operator; and (3) MB-LBP can be computed very efficiently using integral images. Furthermore, in order to reflect the uniform appearance of MB-LBP, we redefine the uniform patterns via statistical analysis. Finally, AdaBoost learning is applied to select most effective uniform MB-LBP features and construct face classifiers. Experiments on Face Recognition Grand Challenge (FRGC) ver2.0 database show that the proposed MB-LBP method significantly outperforms other LBP based face recognition algorithms.

...read moreread less

633 citations

"Volume structured ordinal features ..." refers background in this paper

...Different from Multi-block LBP [10], SOF compares not only adjacent regions but also neighboring regions of some size at a given radius....
[...]

Journal Article•DOI•

Recognizing moving faces: a psychological and neural synthesis.

[...]

Alice J. O'Toole¹, Dana A. Roark¹, Hervé Abdi¹•Institutions (1)

University of Texas at Dallas¹

01 Jun 2002-Trends in Cognitive Sciences

TL;DR: A recently proposed distributed neural system for face perception, with minor modifications, can accommodate the psychological findings with moving faces.

...read moreread less

466 citations

"Volume structured ordinal features ..." refers background in this paper

...Psychophysical and neural studies suggest that humans use both, the spatial and the dynamic information, when recognizing moving faces [13]....
[...]