Video-based face recognition via joint sparse representation

doi:10.1109/FG.2013.6553787

Home
/
Papers
/
Video-based face recognition via joint sparse representation

Proceedings Article•DOI•

Video-based face recognition via joint sparse representation

Yi-Chen Chen¹, Vishal M. Patel¹, Sumit Shekhar¹, Rama Chellappa¹, P. Jonathon Phillips - Show less +1 more•Institutions (1)

University of Maryland, College Park¹

22 Apr 2013-pp 1-8

TL;DR: A novel multivariate sparse representation method for video-to-video face recognition that simultaneously takes into account correlations as well as coupling information among the video frames, and modified to be robust in the presence of noise and occlusion.

read less

Abstract: In video-based face recognition, a key challenge is in exploiting the extra information available in a video; e.g., face, body, and motion identity cues. In addition, different video sequences of the same subject may contain variations in resolution, illumination, pose, and facial expressions. These variations contribute to the challenges in designing an effective video-based face-recognition algorithm. We propose a novel multivariate sparse representation method for video-to-video face recognition. Our method simultaneously takes into account correlations as well as coupling information among the video frames. Our method jointly represents all the video data by a sparse linear combination of training data. In addition, we modify our model so that it is robust in the presence of noise and occlusion. Furthermore, we kernelize the algorithm to handle the non-linearities present in video data. Numerous experiments using unconstrained video sequences show that our method is effective and performs significantly better than many state-of-the-art video-based face recognition algorithms in the literature.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

KinectFaceDB: A Kinect Database for Face Recognition

[...]

Rui Min¹, Neslihan Kose², Jean-Luc Dugelay²•Institutions (2)

University of North Carolina at Chapel Hill¹, Institut Eurécom²

28 Jul 2014

TL;DR: This paper presents the first publicly available face database based on the Kinect sensor, and conducts benchmark evaluations on the proposed database using standard face recognition methods, and demonstrates the gain in performance when integrating the depth data with the RGB data via score-level fusion.

...read moreread less

Abstract: The recent success of emerging RGB-D cameras such as the Kinect sensor depicts a broad prospect of 3-D data-based computer applications. However, due to the lack of a standard testing database, it is difficult to evaluate how the face recognition technology can benefit from this up-to-date imaging sensor. In order to establish the connection between the Kinect and face recognition research, in this paper, we present the first publicly available face database (i.e., KinectFaceDB 1 ) based on the Kinect sensor. The database consists of different data modalities (well-aligned and processed 2-D, 2.5-D, 3-D, and video-based face data) and multiple facial variations. We conducted benchmark evaluations on the proposed database using standard face recognition methods, and demonstrated the gain in performance when integrating the depth data with the RGB data via score-level fusion. We also compared the 3-D images of Kinect (from the KinectFaceDB) with the traditional high-quality 3-D scans (from the FRGC database) in the context of face biometrics, which reveals the imperative needs of the proposed database for face recognition research. 1 Online at http://rgb-d.eurecom.fr

...read moreread less

257 citations

Book Chapter•DOI•

Eigen-PEP for Video Face Recognition

[...]

Haoxiang Li¹, Gang Hua¹, Xiaohui Shen², Zhe Lin², Jonathan Brandt² - Show less +1 more•Institutions (2)

Stevens Institute of Technology¹, Adobe Systems²

01 Nov 2014

TL;DR: The Eigen-PEP model is presented, built upon the recent success of the probabilistic elastic part (PEP) model, which produces an intermediate high dimensional, part-based, and pose-invariant representation of a face subject.

...read moreread less

Abstract: To effectively solve the problem of large scale video face recognition, we argue for a comprehensive, compact, and yet flexible representation of a face subject. It shall comprehensively integrate the visual information from all relevant video frames of the subject in a compact form. It shall also be flexible to be incrementally updated, incorporating new or retiring obsolete observations. In search for such a representation, we present the Eigen-PEP that is built upon the recent success of the probabilistic elastic part (PEP) model. It first integrates the information from relevant video sources by a part-based average pooling through the PEP model, which produces an intermediate high dimensional, part-based, and pose-invariant representation. We then compress the intermediate representation through principal component analysis, and only a number of principal eigen dimensions are kept (as small as 100). We evaluate the Eigen-PEP representation both for video-based face verification and identification on the YouTube Faces Dataset and a new Celebrity-1000 video face dataset, respectively. On YouTube Faces, we further improve the state-of-the-art recognition accuracy. On Celebrity-1000, we lead the competing baselines by a significant margin while offering a scalable solution that is linear with respect to the number of subjects.

...read moreread less

134 citations

Cites methods from "Video-based face recognition via jo..."

...Previous work on the video face recognition includes methods representing the video data by linear combination of the training data [8, 9], utilizing probabilistic methods to exploit the intrinsic manifolds [10–12], etc....
[...]

Journal Article•DOI•

Simultaneous Feature and Dictionary Learning for Image Set Based Face Recognition

[...]

Jiwen Lu¹, Gang Wang², Jie Zhou¹•Institutions (2)

Tsinghua University¹, Alibaba Group²

08 Jun 2017-IEEE Transactions on Image Processing

TL;DR: To better exploit the nonlinearity of face samples from different image sets, a deep SFDL (D-SFDL) method is proposed by jointly learning hierarchical non-linear transformations and class-specific dictionaries to further improve the recognition performance.

...read moreread less

Abstract: In this paper, we propose a simultaneous feature and dictionary learning (SFDL) method for image set-based face recognition, where each training and testing example contains a set of face images, which were captured from different variations of pose, illumination, expression, resolution, and motion. While a variety of feature learning and dictionary learning methods have been proposed in recent years and some of them have been successfully applied to image set-based face recognition, most of them learn features and dictionaries for facial image sets individually, which may not be powerful enough because some discriminative information for dictionary learning may be compromised in the feature learning stage if they are applied sequentially, and vice versa. To address this, we propose a SFDL method to learn discriminative features and dictionaries simultaneously from raw face pixels so that discriminative information from facial image sets can be jointly exploited by a one-stage learning procedure. To better exploit the nonlinearity of face samples from different image sets, we propose a deep SFDL (D-SFDL) method by jointly learning hierarchical non-linear transformations and class-specific dictionaries to further improve the recognition performance. Extensive experimental results on five widely used face data sets clearly shows that our SFDL and D-SFDL achieve very competitive or even better performance with the state-of-the-arts.

...read moreread less

117 citations

Cites background or methods from "Video-based face recognition via jo..."

...Image set based face recognition has attracted increasing interest in computer vision in recent years [41,33,27,12,15,2,11,7,23,38,3,17,37,4,6,5,30,19,29]....
[...]
...However, most existing dictionary-based image set based face recognition methods are unsupervised [4,6,5], which are not discriminative enough to classify face sets....
[...]
...[4,6,5] presented a dictionary-based approach for image set based face recognition by building one dictionary for each face image set and using these dictionaries to measure the similarity of face image sets....
[...]
...There has been a number of work on image set based face recognition over the past decade [41,33,27,12,15,2,11,7,23,38,3,17,37,4,6,5,30,19], and dictionarybased methods have achieved state-of-the-art performance [4,6,5] because the pose, illumination and expression information in face image sets can be implicitly...
[...]
...Image Set Based Face Recognition: Over the past recent years, we have witnessed a considerable interest in developing new methods for image set based face recognition [33,27,12,15,2,11,23,38,36,3,17,8,4,6,5,30]....
[...]

Proceedings Article•DOI•

Discriminant analysis on Riemannian manifold of Gaussian distributions for face recognition with image sets

[...]

Wen Wang¹, Ruiping Wang¹, Zhiwu Huang¹, Shiguang Shan¹, Xilin Chen¹ - Show less +1 more•Institutions (1)

Chinese Academy of Sciences¹

07 Jun 2015

TL;DR: The proposed method, Discriminant Analysis on Riemannian manifold of Gaussian distributions (DARG), is evaluated by face identification and verification tasks on four most challenging and largest databases, YouTube Celebrities, COX, YouTube Face DB and Point-and-Shoot Challenge, to demonstrate its superiority over the state-of-the-art.

...read moreread less

Abstract: This paper presents a method named Discriminant Analysis on Riemannian manifold of Gaussian distributions (DARG) to solve the problem of face recognition with image sets. Our goal is to capture the underlying data distribution in each set and thus facilitate more robust classification. To this end, we represent image set as Gaussian Mixture Model (GMM) comprising a number of Gaussian components with prior probabilities and seek to discriminate Gaussian components from different classes. In the light of information geometry, the Gaussians lie on a specific Riemannian manifold. To encode such Riemannian geometry properly, we investigate several distances between Gaussians and further derive a series of provably positive definite probabilistic kernels. Through these kernels, a weighted Kernel Discriminant Analysis is finally devised which treats the Gaussians in GMMs as samples and their prior probabilities as sample weights. The proposed method is evaluated by face identification and verification tasks on four most challenging and largest databases, YouTube Celebrities, COX, YouTube Face DB and Point-and-Shoot Challenge, to demonstrate its superiority over the state-of-the-art.

...read moreread less

109 citations

Cites background from "Video-based face recognition via jo..."

...For instance, video-based dictionary [11] and joint sparse representation [12] generalize the works of sparse representation and dictionary learning from still image based to video-based face recognition....
[...]

Journal Article•DOI•

Discriminant Analysis on Riemannian Manifold of Gaussian Distributions for Face Recognition With Image Sets

[...]

Wen Wang¹, Ruiping Wang¹, Zhiwu Huang², Shiguang Shan¹, Xilin Chen¹ - Show less +1 more•Institutions (2)

Chinese Academy of Sciences¹, ETH Zurich²

01 Jan 2018-IEEE Transactions on Image Processing

...read moreread less

Abstract: To address the problem of face recognition with image sets, we aim to capture the underlying data distribution in each set and thus facilitate more robust classification. To this end, we represent image set as the Gaussian mixture model (GMM) comprising a number of Gaussian components with prior probabilities and seek to discriminate Gaussian components from different classes. Since in the light of information geometry, the Gaussians lie on a specific Riemannian manifold, this paper presents a method named discriminant analysis on Riemannian manifold of Gaussian distributions (DARG). We investigate several distance metrics between Gaussians and accordingly two discriminative learning frameworks are presented to meet the geometric and statistical characteristics of the specific manifold. The first framework derives a series of provably positive definite probabilistic kernels to embed the manifold to a high-dimensional Hilbert space, where conventional discriminant analysis methods developed in Euclidean space can be applied, and a weighted Kernel discriminant analysis is devised which learns discriminative representation of the Gaussian components in GMMs with their prior probabilities as sample weights. Alternatively, the other framework extends the classical graph embedding method to the manifold by utilizing the distance metrics between Gaussians to construct the adjacency graph, and hence the original manifold is embedded to a lower-dimensional and discriminative target manifold with the geometric structure preserved and the interclass separability maximized. The proposed method is evaluated by face identification and verification tasks on four most challenging and largest databases, YouTube Celebrities, COX, YouTube Face DB, and Point-and-Shoot Challenge, to demonstrate its superiority over the state-of-the-art.

...read moreread less

106 citations

Cites background from "Video-based face recognition via jo..."

...For instance, video-based dictionary [11] and joint sparse representation [12] generalize the works of sparse representation and dictionary learning from still image based to video-based face recognition....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Robust Face Recognition via Sparse Representation

[...]

John Wright¹, Allen Y. Yang², Arvind Ganesh¹, S. Shankar Sastry², Yi Ma¹ - Show less +1 more•Institutions (2)

University of Illinois at Urbana–Champaign¹, University of California, Berkeley²

01 Feb 2009-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This work considers the problem of automatically recognizing human faces from frontal views with varying expression and illumination, as well as occlusion and disguise, and proposes a general classification algorithm for (image-based) object recognition based on a sparse representation computed by C1-minimization.

...read moreread less

Abstract: We consider the problem of automatically recognizing human faces from frontal views with varying expression and illumination, as well as occlusion and disguise. We cast the recognition problem as one of classifying among multiple linear regression models and argue that new theory from sparse signal representation offers the key to addressing this problem. Based on a sparse representation computed by C1-minimization, we propose a general classification algorithm for (image-based) object recognition. This new framework provides new insights into two crucial issues in face recognition: feature extraction and robustness to occlusion. For feature extraction, we show that if sparsity in the recognition problem is properly harnessed, the choice of features is no longer critical. What is critical, however, is whether the number of features is sufficiently large and whether the sparse representation is correctly computed. Unconventional features such as downsampled images and random projections perform just as well as conventional features such as eigenfaces and Laplacianfaces, as long as the dimension of the feature space surpasses certain threshold, predicted by the theory of sparse representation. This framework can handle errors due to occlusion and corruption uniformly by exploiting the fact that these errors are often sparse with respect to the standard (pixel) basis. The theory of sparse representation helps predict how much occlusion the recognition algorithm can handle and how to choose the training images to maximize robustness to occlusion. We conduct extensive experiments on publicly available databases to verify the efficacy of the proposed algorithm and corroborate the above claims.

...read moreread less

9,658 citations

Journal Article•DOI•

$rm K$ -SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation

[...]

Michal Aharon¹, Michael Elad¹, Alfred M. Bruckstein¹•Institutions (1)

Technion – Israel Institute of Technology¹

01 Nov 2006-IEEE Transactions on Signal Processing

TL;DR: A novel algorithm for adapting dictionaries in order to achieve sparse signal representations, the K-SVD algorithm, an iterative method that alternates between sparse coding of the examples based on the current dictionary and a process of updating the dictionary atoms to better fit the data.

...read moreread less

Abstract: In recent years there has been a growing interest in the study of sparse representation of signals. Using an overcomplete dictionary that contains prototype signal-atoms, signals are described by sparse linear combinations of these atoms. Applications that use sparse representation are many and include compression, regularization in inverse problems, feature extraction, and more. Recent activity in this field has concentrated mainly on the study of pursuit algorithms that decompose signals with respect to a given dictionary. Designing dictionaries to better fit the above model can be done by either selecting one from a prespecified set of linear transforms or adapting the dictionary to a set of training signals. Both of these techniques have been considered, but this topic is largely still open. In this paper we propose a novel algorithm for adapting dictionaries in order to achieve sparse signal representations. Given a set of training signals, we seek the dictionary that leads to the best representation for each member in this set, under strict sparsity constraints. We present a new method-the K-SVD algorithm-generalizing the K-means clustering process. K-SVD is an iterative method that alternates between sparse coding of the examples based on the current dictionary and a process of updating the dictionary atoms to better fit the data. The update of the dictionary columns is combined with an update of the sparse representations, thereby accelerating convergence. The K-SVD algorithm is flexible and can work with any pursuit method (e.g., basis pursuit, FOCUSS, or matching pursuit). We analyze this algorithm and demonstrate its results both on synthetic tests and in applications on real image data

...read moreread less

8,905 citations

Journal Article•DOI•

Model selection and estimation in regression with grouped variables

[...]

Ming Yuan¹, Yi Lin²•Institutions (2)

Georgia Institute of Technology¹, University of Wisconsin-Madison²

01 Feb 2006-Journal of The Royal Statistical Society Series B-statistical Methodology

TL;DR: In this paper, instead of selecting factors by stepwise backward elimination, the authors focus on the accuracy of estimation and consider extensions of the lasso, the LARS algorithm and the non-negative garrotte for factor selection.

...read moreread less

Abstract: Summary. We consider the problem of selecting grouped variables (factors) for accurate prediction in regression. Such a problem arises naturally in many practical situations with the multifactor analysis-of-variance problem as the most important and well-known example. Instead of selecting factors by stepwise backward elimination, we focus on the accuracy of estimation and consider extensions of the lasso, the LARS algorithm and the non-negative garrotte for factor selection. The lasso, the LARS algorithm and the non-negative garrotte are recently proposed regression methods that can be used to select individual variables. We study and propose efficient algorithms for the extensions of these methods for factor selection and show that these extensions give superior performance to the traditional stepwise backward elimination method in factor selection problems. We study the similarities and the differences between these methods. Simulations and real examples are used to illustrate the methods.

...read moreread less

7,400 citations

"Video-based face recognition via jo..." refers methods in this paper

...A dictionary is learned with the minimum representation error under a strict sparseness constraint....
[...]

Journal Article•DOI•

Face recognition: A literature survey

[...]

W. Zhao¹, Rama Chellappa², P. J. Phillips³, Azriel Rosenfeld²•Institutions (3)

Sarnoff Corporation¹, University of Maryland, College Park², National Institute of Standards and Technology³

01 Dec 2003-ACM Computing Surveys

TL;DR: In this paper, the authors provide an up-to-date critical survey of still-and video-based face recognition research, and provide some insights into the studies of machine recognition of faces.

...read moreread less

Abstract: As one of the most successful applications of image analysis and understanding, face recognition has recently received significant attention, especially during the past several years. At least two reasons account for this trend: the first is the wide range of commercial and law enforcement applications, and the second is the availability of feasible technologies after 30 years of research. Even though current machine recognition systems have reached a certain level of maturity, their success is limited by the conditions imposed by many real applications. For example, recognition of face images acquired in an outdoor environment with changes in illumination and/or pose remains a largely unsolved problem. In other words, current systems are still far away from the capability of the human perception system.This paper provides an up-to-date critical survey of still- and video-based face recognition research. There are two underlying motivations for us to write this survey paper: the first is to provide an up-to-date review of the existing literature, and the second is to offer some insights into the studies of machine recognition of faces. To provide a comprehensive survey, we not only categorize existing recognition techniques but also present detailed descriptions of representative methods within each category. In addition, relevant topics such as psychophysical studies, system evaluation, and issues of illumination and pose variation are covered.

...read moreread less

6,384 citations

Journal Article•DOI•

The group lasso for logistic regression

[...]

Lukas Meier, Sara van de Geer, Peter Bühlmann

01 Feb 2008-Journal of The Royal Statistical Society Series B-statistical Methodology

TL;DR: An efficient algorithm is presented, that is especially suitable for high dimensional problems, which can also be applied to generalized linear models to solve the corresponding convex optimization problem.

...read moreread less

Abstract: Summary. The group lasso is an extension of the lasso to do variable selection on (predefined) groups of variables in linear regression models. The estimates have the attractive property of being invariant under groupwise orthogonal reparameterizations. We extend the group lasso to logistic regression models and present an efficient algorithm, that is especially suitable for high dimensional problems, which can also be applied to generalized linear models to solve the corresponding convex optimization problem. The group lasso estimator for logistic regression is shown to be statistically consistent even if the number of predictors is much larger than sample size but with sparse true underlying structure. We further use a two-stage procedure which aims for sparser models than the group lasso, leading to improved prediction performance for some cases. Moreover, owing to the two-stage nature, the estimates can be constructed to be hierarchical. The methods are used on simulated and real data sets about splice site detection in DNA sequences.

...read moreread less

1,709 citations