scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

On rank aggregation for face recognition from videos

TL;DR: A video based face recognition algorithm that computes a discriminative video signature as an ordered list of still face images to facilitate matching two videos with large variations is presented.
Abstract: Face recognition from still face images suffers due to intrapersonal variations caused by pose, illumination, and expression that degrade the performance. On the other hand, videos provide abundant information that can be leveraged to compensate the limitations of still face images and enhance face recognition performance. This paper presents a video based face recognition algorithm that computes a discriminative video signature as an ordered list of still face images. The video signature embeds diverse intra-personal and temporal variations across multiple frames, thus facilitates matching two videos with large variations. Two videos are matched by comparing their discriminative signatures using the Kendall tau similarity distance measure. Performance comparison with the benchmark results and a commercial face recognition system on the publicly available YouTube faces database show the efficacy of the proposed video based face recognition algorithm.
Citations
More filters
Journal ArticleDOI
TL;DR: A video-based face recognition algorithm that computes a discriminative video signature as an ordered list of still face images from a large dictionary, which embeds diverse intra-personal variations and facilitates in matching two videos with large variations.
Abstract: Due to widespread applications, availability of large intra-personal variations in video and limited information content in still images, video-based face recognition has gained significant attention. Unlike still face images, videos provide abundant information that can be leveraged to address variations in pose, illumination, and expression as well as enhance the face recognition performance. This paper presents a video-based face recognition algorithm that computes a discriminative video signature as an ordered list of still face images from a large dictionary. A three-stage approach is proposed for optimizing ranked lists across multiple video frames and fusing them into a single composite ordered list to compute the video signature. This signature embeds diverse intra-personal variations and facilitates in matching two videos with large variations. For matching two videos, a discounted cumulative gain measure is utilized, which uses the ranking of images in the video signature as well as the usefulness of images in characterizing the individual in the video. The efficacy of the proposed algorithm is evaluated under different video-based face recognition scenarios such as matching still face images with videos and matching videos with videos. The efficacy of the proposed algorithm is demonstrated on the YouTube faces database and the MBGC v2 video challenge database that comprise different types of video-based face recognition challenges such as matching still face images with videos and matching videos with videos. Performance comparison with the benchmark results on both the databases and a commercial face recognition system shows the efficiency of the proposed algorithm for video-based face recognition.

46 citations


Cites background or methods from "On rank aggregation for face recogn..."

  • ...[7] proposed to compute a video signature as an ordered list of still face images from a large dictionary....

    [...]

  • ...However, unlike the proposed algorithm, existing algorithm [7] does not optimize every ranked list before fusion which results in lower performance....

    [...]

  • ...gorithm [7] for all the three matching scenarios i....

    [...]

  • ...[7] Rank aggregation YouTube Faces [42] 78....

    [...]

  • ...[7] is also a rank aggregation based approach that combines multiple ranked lists for a video using Markov chain...

    [...]

Proceedings ArticleDOI
01 Sep 2013
TL;DR: This work demonstrates that all three COTS matchers individually are superior to previously published face recognition results on the unconstrained YouTube Faces database and achieves a 20% improvement in accuracy over previously published results.
Abstract: Face recognition in video is becoming increasingly important due to the abundance of video data captured by surveillance cameras, mobile devices, Internet uploads, and other sources. Given the aggregate of facial information contained in a video (i.e., a sequence of face images or frames), video-based face recognition solutions can potentially alleviate classic challenges caused by variations in pose, illumination, and expression. However, with this increased focus on the development of algorithms specifically crafted for video-based face recognition, it is important to establish a baseline for the accuracy using state-of-the-art still image matchers. Note that most commercial-off-the-shelf (COTS) offerings are still limited to single frame matching. In order to measure the accuracy of COTS face recognition systems on video data, we first investigate the effectiveness of multi-frame score-level fusion and analyze the consistency across three COTS face matchers. We demonstrate that all three COTS matchers individually are superior to previously published face recognition results on the unconstrained YouTube Faces database. Further, fusion of scores from the three COTS matchers achieves a 20% improvement in accuracy over previously published results. We encourage the use of these results as a competitive baseline for video-to-video face matching on the YouTube Faces database.

39 citations


Cites methods from "On rank aggregation for face recogn..."

  • ...The accuracies of the proposed COTS fusion schemes are benchmarked against Wolf et al.’s Matched Background Similarity (MBGS) [24], Li et al.’s Adaptive Probabilistic Elastic Matching (APEM) Fusion [15], Cui et al.’s Spatio-Temporal Face Region Descriptor Pairwiseconstrained Multiple Metric Learning (STFRD+PMML) [5], and Bhatt et al.’s method which we call Rank Aggregation [3]....

    [...]

  • ...All three COTS face matchers and fusion of three matchers significantly outperform previous methods: Rank Aggregation [3], APEM Fusion [15], and STFRD+PMML [5]....

    [...]

  • ..., the interpupillary distances remain the same) between the images used here and those used by [3, 15, 24]....

    [...]

  • ...’s method which we call Rank Aggregation [3]....

    [...]

Proceedings ArticleDOI
04 May 2015
TL;DR: This paper proposes a video-based face recognition method which improves upon the sparse representation framework with an intelligent and adaptive sparse dictionary that updates the current probe image into the training matrix based on continuously monitoring the probe video through a novel confidence criterion and a Bayesian inference scheme.
Abstract: Sparse representation-based face recognition has gained considerable attention recently due to its robustness against illumination and occlusion. Recognizing faces from videos has become a topic of importance to alleviate the limit of information content in still images. However, the sparse recognition framework is not applicable to video-based face recognition due to its sensitivity towards pose and alignment changes. In this paper, we propose a video-based face recognition method which improves upon the sparse representation framework. Our key contribution is an intelligent and adaptive sparse dictionary that updates the current probe image into the training matrix based on continuously monitoring the probe video through a novel confidence criterion and a Bayesian inference scheme. Due to this novel approach, our method is robust to pose and alignment and hence can be used to recognize faces from unconstrained videos successfully. Moreover, in a moving scene, camera angle, illumination and other imaging conditions may change quickly leading to performance loss in accuracy. In such situations, it is impractical to re-enroll the individual and re-train the classifiers on a continuous basis. Our novel approach addresses these practical issues. Experimental results on the well known YouTube Face database demonstrates the effectiveness of our method.

7 citations


Cites methods from "On rank aggregation for face recogn..."

  • ...In [15], Markov chain-based rank aggregation technique was used to calculate a video signature as an ordered set of frame images....

    [...]

Patent
31 Dec 2015
TL;DR: In this paper, a dictionary including a target collection defined by images that are known with a defined level of certainty to include a subject and an imposter collection defined of images of individuals other than the subject is used.
Abstract: The method includes a dictionary including a target collection defined by images that are known with a defined level of certainty to include a subject and an imposter collection defined by images of individuals other than the subject. In the method, images of an area are captured over a period of time. In respect of each image: a matching calculation is carried out, based upon a comparison of the image captured with the images in the dictionary to result in a measure of confidence that the subject is in the area; and an inference determination is made to replace one of the target collection images with a further image that is known with the defined level of certainty, the determination being a function of the measure of confidence resultant from the captured image, the measure resultant from one or more previously captured images and the associated capture times.

5 citations

Journal ArticleDOI
TL;DR: A novel approach for recognizing faces in videos with high recognition rate that embeds diverse intra-personal variations such as poses, expressions and facilitates in matching two videos with large variations and exhibits significant performance improvement when compared with the existing techniques.
Abstract: This paper proposes a novel approach for recognizing faces in videos with high recognition rate. Initially, the feature vector based on Normalized Local Binary Patterns is obtained for the face region. A set of training and testing videos are used in this face recognition procedure. Each frame in the query video is matched with the signature of the faces in the database using Euclidean distance and a rank list is formed. Each ranked list is clustered and its reliability is analyzed for re-ranking. Multiple re-ranked lists of the query video is fused together to form a video signature. This video signature embeds diverse intra-personal variations such as poses, expressions and facilitates in matching two videos with large variations. For matching two videos, their composite ranked lists are compared using a Kendall Tau distance measure. The developed methods are deployed on the YouTube and ChokePoint videos, and they exhibit significant performance improvement owing to their novel approach when compared with the existing techniques.

2 citations

References
More filters
Proceedings ArticleDOI
20 Jun 2011
TL;DR: It is shown that by introducing within-class and between-class similarity graphs to characterise intra-class compactness and inter-class separability, the geometrical structure of data can be exploited.
Abstract: A convenient way of dealing with image sets is to represent them as points on Grassmannian manifolds. While several recent studies explored the applicability of discriminant analysis on such manifolds, the conventional formalism of discriminant analysis suffers from not considering the local structure of the data. We propose a discriminant analysis approach on Grassmannian manifolds, based on a graph-embedding framework. We show that by introducing within-class and between-class similarity graphs to characterise intra-class compactness and inter-class separability, the geometrical structure of data can be exploited. Experiments on several image datasets (PIE, BANCA, MoBo, ETH-80) show that the proposed algorithm obtains considerable improvements in discrimination accuracy, in comparison to three recent methods: Grassmann Discriminant Analysis (GDA), Kernel GDA, and the kernel version of Affine Hull Image Set Distance. We further propose a Grassmannian kernel, based on canonical correlation between subspaces, which can increase discrimination accuracy when used in combination with previous Grassmannian kernels.

300 citations


"On rank aggregation for face recogn..." refers background in this paper

  • ...Index Terms— Video based face recognition, Rank ag- gregation, Dictionary based face recognition...

    [...]

Book ChapterDOI
07 Oct 2012
TL;DR: This work introduces the concept of video-dictionaries for face recognition, which generalizes the work in sparse representation and dictionaries for faces in still images and performs significantly better than many competitive video-based face recognition algorithms.
Abstract: The main challenge in recognizing faces in video is effectively exploiting the multiple frames of a face and the accompanying dynamic signature. One prominent method is based on extracting joint appearance and behavioral features. A second method models a person by temporal correlations of features in a video. Our approach introduces the concept of video-dictionaries for face recognition, which generalizes the work in sparse representation and dictionaries for faces in still images. Video-dictionaries are designed to implicitly encode temporal, pose, and illumination information. We demonstrate our method on the Face and Ocular Challenge Series (FOCS) Video Challenge, which consists of unconstrained video sequences. We show that our method is efficient and performs significantly better than many competitive video-based face recognition algorithms.

153 citations


"On rank aggregation for face recogn..." refers background in this paper

  • ...The challenges and limitations of still face recognition drive the research in video based face recognition....

    [...]

Proceedings ArticleDOI
23 Aug 2004
TL;DR: The paper poses video-to-video face recognition as a dynamical system identification and classification problem and uses an autoregressive and moving average (ARMA) model to represent such a system.
Abstract: The paper poses video-to-video face recognition as a dynamical system identification and classification problem. We model a moving face as a linear dynamical system whose appearance changes with pose. An autoregressive and moving average (ARMA) model is used to represent such a system. The choice of ARMA model is based on its ability to take care of the change in appearance while modeling the dynamics of pose, expression etc. Recognition is performed using the concept of sub space angles to compute distances between probe and gallery video sequences. The results obtained are very promising given the extent of pose, expression and illumination variation in the video data used for experiments.

129 citations

Journal ArticleDOI
TL;DR: The proposed manifold-manifold distance (MMD) method is applied to the task of face recognition with image sets, where identification is achieved by seeking the minimum MMD from the probe to the gallery of image sets.
Abstract: In this paper, we address the problem of classifying image sets for face recognition, where each set contains images belonging to the same subject and typically covering large variations. By modeling each image set as a manifold, we formulate the problem as the computation of the distance between two manifolds, called manifold-manifold distance (MMD). Since an image set can come in three pattern levels, point, subspace, and manifold, we systematically study the distance among the three levels and formulate them in a general multilevel MMD framework. Specifically, we express a manifold by a collection of local linear models, each depicted by a subspace. MMD is then converted to integrate the distances between pairs of subspaces from one of the involved manifolds. We theoretically and experimentally study several configurations of the ingredients of MMD. The proposed method is applied to the task of face recognition with image sets, where identification is achieved by seeking the minimum MMD from the probe to the gallery of image sets. Our experiments demonstrate that, as a general set similarity measure, MMD consistently outperforms other competing nondiscriminative methods and is also promisingly comparable to the state-of-the-art discriminative methods.

118 citations


"On rank aggregation for face recogn..." refers background in this paper

  • ...Index Terms— Video based face recognition, Rank ag- gregation, Dictionary based face recognition...

    [...]

Journal ArticleDOI
TL;DR: A broad and deep review of recently proposed methods for overcoming the difficulties encountered in unconstrained settings is presented and connections between the ways in which humans and current algorithms recognize faces are drawn.
Abstract: Driven by key law enforcement and commercial applications, research on face recognition from video sources has intensified in recent years. The ensuing results have demonstrated that videos possess unique properties that allow both humans and automated systems to perform recognition accurately in difficult viewing conditions. However, significant research challenges remain as most video-based applications do not allow for controlled recordings. In this survey, we categorize the research in this area and present a broad and deep review of recently proposed methods for overcoming the difficulties encountered in unconstrained settings. We also draw connections between the ways in which humans and current algorithms recognize faces. An overview of the most popular and difficult publicly available face video databases is provided to complement these discussions. Finally, we cover key research challenges and opportunities that lie ahead for the field as a whole.

115 citations


"On rank aggregation for face recogn..." refers background in this paper

  • ...Index Terms— Video based face recognition, Rank ag- gregation, Dictionary based face recognition...

    [...]