On rank aggregation for face recognition from videos

doi:10.1109/ICIP.2013.6738616

Home
/
Papers
/
On rank aggregation for face recognition from videos

Proceedings Article•DOI•

On rank aggregation for face recognition from videos

Himanshu Bhatt¹, Richa Singh¹, Mayank Vatsa¹•Institutions (1)

Indraprastha Institute of Information Technology¹

01 Sep 2013-pp 2993-2997

TL;DR: A video based face recognition algorithm that computes a discriminative video signature as an ordered list of still face images to facilitate matching two videos with large variations is presented.

read less

Abstract: Face recognition from still face images suffers due to intrapersonal variations caused by pose, illumination, and expression that degrade the performance. On the other hand, videos provide abundant information that can be leveraged to compensate the limitations of still face images and enhance face recognition performance. This paper presents a video based face recognition algorithm that computes a discriminative video signature as an ordered list of still face images. The video signature embeds diverse intra-personal and temporal variations across multiple frames, thus facilitates matching two videos with large variations. Two videos are matched by comparing their discriminative signatures using the Kendall tau similarity distance measure. Performance comparison with the benchmark results and a commercial face recognition system on the publicly available YouTube faces database show the efficacy of the proposed video based face recognition algorithm.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

On Recognizing Faces in Videos Using Clustering-Based Re-Ranking and Fusion

[...]

Himanshu Bhatt¹, Richa Singh¹, Mayank Vatsa¹•Institutions (1)

Indraprastha Institute of Information Technology¹

01 Jul 2014-IEEE Transactions on Information Forensics and Security

TL;DR: A video-based face recognition algorithm that computes a discriminative video signature as an ordered list of still face images from a large dictionary, which embeds diverse intra-personal variations and facilitates in matching two videos with large variations.

...read moreread less

Abstract: Due to widespread applications, availability of large intra-personal variations in video and limited information content in still images, video-based face recognition has gained significant attention. Unlike still face images, videos provide abundant information that can be leveraged to address variations in pose, illumination, and expression as well as enhance the face recognition performance. This paper presents a video-based face recognition algorithm that computes a discriminative video signature as an ordered list of still face images from a large dictionary. A three-stage approach is proposed for optimizing ranked lists across multiple video frames and fusing them into a single composite ordered list to compute the video signature. This signature embeds diverse intra-personal variations and facilitates in matching two videos with large variations. For matching two videos, a discounted cumulative gain measure is utilized, which uses the ranking of images in the video signature as well as the usefulness of images in characterizing the individual in the video. The efficacy of the proposed algorithm is evaluated under different video-based face recognition scenarios such as matching still face images with videos and matching videos with videos. The efficacy of the proposed algorithm is demonstrated on the YouTube faces database and the MBGC v2 video challenge database that comprise different types of video-based face recognition challenges such as matching still face images with videos and matching videos with videos. Performance comparison with the benchmark results on both the databases and a commercial face recognition system shows the efficiency of the proposed algorithm for video-based face recognition.

...read moreread less

46 citations

Cites background or methods from "On rank aggregation for face recogn..."

...[7] proposed to compute a video signature as an ordered list of still face images from a large dictionary....
[...]
...However, unlike the proposed algorithm, existing algorithm [7] does not optimize every ranked list before fusion which results in lower performance....
[...]
...gorithm [7] for all the three matching scenarios i....
[...]
...[7] Rank aggregation YouTube Faces [42] 78....
[...]
...[7] is also a rank aggregation based approach that combines multiple ranked lists for a video using Markov chain...
[...]

Proceedings Article•DOI•

Video-to-video face matching: Establishing a baseline for unconstrained face recognition

[...]

Lacey Best-Rowden¹, Brendan Klare², Joshua C. Klontz², Anil K. Jain¹•Institutions (2)

Michigan State University¹, Noblis²

01 Sep 2013

TL;DR: This work demonstrates that all three COTS matchers individually are superior to previously published face recognition results on the unconstrained YouTube Faces database and achieves a 20% improvement in accuracy over previously published results.

...read moreread less

Abstract: Face recognition in video is becoming increasingly important due to the abundance of video data captured by surveillance cameras, mobile devices, Internet uploads, and other sources. Given the aggregate of facial information contained in a video (i.e., a sequence of face images or frames), video-based face recognition solutions can potentially alleviate classic challenges caused by variations in pose, illumination, and expression. However, with this increased focus on the development of algorithms specifically crafted for video-based face recognition, it is important to establish a baseline for the accuracy using state-of-the-art still image matchers. Note that most commercial-off-the-shelf (COTS) offerings are still limited to single frame matching. In order to measure the accuracy of COTS face recognition systems on video data, we first investigate the effectiveness of multi-frame score-level fusion and analyze the consistency across three COTS face matchers. We demonstrate that all three COTS matchers individually are superior to previously published face recognition results on the unconstrained YouTube Faces database. Further, fusion of scores from the three COTS matchers achieves a 20% improvement in accuracy over previously published results. We encourage the use of these results as a competitive baseline for video-to-video face matching on the YouTube Faces database.

...read moreread less

39 citations

Cites methods from "On rank aggregation for face recogn..."

...The accuracies of the proposed COTS fusion schemes are benchmarked against Wolf et al.’s Matched Background Similarity (MBGS) [24], Li et al.’s Adaptive Probabilistic Elastic Matching (APEM) Fusion [15], Cui et al.’s Spatio-Temporal Face Region Descriptor Pairwiseconstrained Multiple Metric Learning (STFRD+PMML) [5], and Bhatt et al.’s method which we call Rank Aggregation [3]....
[...]
...All three COTS face matchers and fusion of three matchers significantly outperform previous methods: Rank Aggregation [3], APEM Fusion [15], and STFRD+PMML [5]....
[...]
..., the interpupillary distances remain the same) between the images used here and those used by [3, 15, 24]....
[...]
...’s method which we call Rank Aggregation [3]....
[...]

Proceedings Article•DOI•

On video based face recognition through adaptive sparse dictionary

[...]

Naimul Mefraz Khan¹, Xiaoming Nan¹, Azhar Quddus, Edward Rosales¹, Ling Guan¹ - Show less +1 more•Institutions (1)

Ryerson University¹

04 May 2015

TL;DR: This paper proposes a video-based face recognition method which improves upon the sparse representation framework with an intelligent and adaptive sparse dictionary that updates the current probe image into the training matrix based on continuously monitoring the probe video through a novel confidence criterion and a Bayesian inference scheme.

...read moreread less

Abstract: Sparse representation-based face recognition has gained considerable attention recently due to its robustness against illumination and occlusion. Recognizing faces from videos has become a topic of importance to alleviate the limit of information content in still images. However, the sparse recognition framework is not applicable to video-based face recognition due to its sensitivity towards pose and alignment changes. In this paper, we propose a video-based face recognition method which improves upon the sparse representation framework. Our key contribution is an intelligent and adaptive sparse dictionary that updates the current probe image into the training matrix based on continuously monitoring the probe video through a novel confidence criterion and a Bayesian inference scheme. Due to this novel approach, our method is robust to pose and alignment and hence can be used to recognize faces from unconstrained videos successfully. Moreover, in a moving scene, camera angle, illumination and other imaging conditions may change quickly leading to performance loss in accuracy. In such situations, it is impractical to re-enroll the individual and re-train the classifiers on a continuous basis. Our novel approach addresses these practical issues. Experimental results on the well known YouTube Face database demonstrates the effectiveness of our method.

...read moreread less

7 citations

Cites methods from "On rank aggregation for face recogn..."

...In [15], Markov chain-based rank aggregation technique was used to calculate a video signature as an ordered set of frame images....
[...]

Patent•

System for video based face recognition using an adaptive dictionary

[...]

Azhar Quddus, Xiaoming Nan, Naimul Mefraz Khan, Ling Guan

31 Dec 2015

TL;DR: In this paper, a dictionary including a target collection defined by images that are known with a defined level of certainty to include a subject and an imposter collection defined of images of individuals other than the subject is used.

...read moreread less

Abstract: The method includes a dictionary including a target collection defined by images that are known with a defined level of certainty to include a subject and an imposter collection defined by images of individuals other than the subject. In the method, images of an area are captured over a period of time. In respect of each image: a matching calculation is carried out, based upon a comparison of the image captured with the images in the dictionary to result in a measure of confidence that the subject is in the area; and an inference determination is made to replace one of the target collection images with a further image that is known with the defined level of certainty, the determination being a function of the measure of confidence resultant from the captured image, the measure resultant from one or more previously captured images and the associated capture times.

...read moreread less

5 citations

Journal Article•DOI•

Ranking, clustering and fusing the normalized LBP temporal facial features for face recognition in video sequences

[...]

P. Ithaya Rani¹, T. Hari Prasath²•Institutions (2)

Mepco Schlenk Engineering College¹, Kalasalingam University²

01 Mar 2018-Multimedia Tools and Applications

TL;DR: A novel approach for recognizing faces in videos with high recognition rate that embeds diverse intra-personal variations such as poses, expressions and facilitates in matching two videos with large variations and exhibits significant performance improvement when compared with the existing techniques.

...read moreread less

Abstract: This paper proposes a novel approach for recognizing faces in videos with high recognition rate. Initially, the feature vector based on Normalized Local Binary Patterns is obtained for the face region. A set of training and testing videos are used in this face recognition procedure. Each frame in the query video is matched with the signature of the faces in the database using Euclidean distance and a rank list is formed. Each ranked list is clustered and its reliability is analyzed for re-ranking. Multiple re-ranked lists of the query video is fused together to form a video signature. This video signature embeds diverse intra-personal variations such as poses, expressions and facilitates in matching two videos with large variations. For matching two videos, their composite ranked lists are compared using a Kendall Tau distance measure. The developed methods are deployed on the YouTube and ChokePoint videos, and they exhibit significant performance improvement owing to their novel approach when compared with the existing techniques.

...read moreread less

2 citations

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Graph embedding discriminant analysis on Grassmannian manifolds for improved image set matching

[...]

Mehrtash Harandi¹, Conrad Sanderson¹, Sareh Shirazi¹, Brian C. Lovell¹•Institutions (1)

NICTA¹

20 Jun 2011

TL;DR: It is shown that by introducing within-class and between-class similarity graphs to characterise intra-class compactness and inter-class separability, the geometrical structure of data can be exploited.

...read moreread less

Abstract: A convenient way of dealing with image sets is to represent them as points on Grassmannian manifolds. While several recent studies explored the applicability of discriminant analysis on such manifolds, the conventional formalism of discriminant analysis suffers from not considering the local structure of the data. We propose a discriminant analysis approach on Grassmannian manifolds, based on a graph-embedding framework. We show that by introducing within-class and between-class similarity graphs to characterise intra-class compactness and inter-class separability, the geometrical structure of data can be exploited. Experiments on several image datasets (PIE, BANCA, MoBo, ETH-80) show that the proposed algorithm obtains considerable improvements in discrimination accuracy, in comparison to three recent methods: Grassmann Discriminant Analysis (GDA), Kernel GDA, and the kernel version of Affine Hull Image Set Distance. We further propose a Grassmannian kernel, based on canonical correlation between subspaces, which can increase discrimination accuracy when used in combination with previous Grassmannian kernels.

...read moreread less

300 citations

"On rank aggregation for face recogn..." refers background in this paper

...Index Terms— Video based face recognition, Rank ag- gregation, Dictionary based face recognition...
[...]

Book Chapter•DOI•

Dictionary-based face recognition from video

[...]

Yi-Chen Chen¹, Vishal M. Patel¹, P. Jonathon Phillips², Rama Chellappa¹•Institutions (2)

University of Maryland, College Park¹, National Institute of Standards and Technology²

07 Oct 2012

TL;DR: This work introduces the concept of video-dictionaries for face recognition, which generalizes the work in sparse representation and dictionaries for faces in still images and performs significantly better than many competitive video-based face recognition algorithms.

...read moreread less

Abstract: The main challenge in recognizing faces in video is effectively exploiting the multiple frames of a face and the accompanying dynamic signature. One prominent method is based on extracting joint appearance and behavioral features. A second method models a person by temporal correlations of features in a video. Our approach introduces the concept of video-dictionaries for face recognition, which generalizes the work in sparse representation and dictionaries for faces in still images. Video-dictionaries are designed to implicitly encode temporal, pose, and illumination information. We demonstrate our method on the Face and Ocular Challenge Series (FOCS) Video Challenge, which consists of unconstrained video sequences. We show that our method is efficient and performs significantly better than many competitive video-based face recognition algorithms.

...read moreread less

153 citations

"On rank aggregation for face recogn..." refers background in this paper

...The challenges and limitations of still face recognition drive the research in video based face recognition....
[...]

Proceedings Article•DOI•

A system identification approach for video-based face recognition

[...]

Gaurav Aggarwal¹, Amit K. Roy Chowdhury¹, Rama Chellappa¹•Institutions (1)

University of Maryland, College Park¹

23 Aug 2004

TL;DR: The paper poses video-to-video face recognition as a dynamical system identification and classification problem and uses an autoregressive and moving average (ARMA) model to represent such a system.

...read moreread less

Abstract: The paper poses video-to-video face recognition as a dynamical system identification and classification problem. We model a moving face as a linear dynamical system whose appearance changes with pose. An autoregressive and moving average (ARMA) model is used to represent such a system. The choice of ARMA model is based on its ability to take care of the change in appearance while modeling the dynamics of pose, expression etc. Recognition is performed using the concept of sub space angles to compute distances between probe and gallery video sequences. The results obtained are very promising given the extent of pose, expression and illumination variation in the video data used for experiments.

...read moreread less

129 citations

Journal Article•DOI•

Manifold–Manifold Distance and its Application to Face Recognition With Image Sets

[...]

Ruiping Wang, Shiguang Shan, Xilin Chen, Qionghai Dai¹, Wen Gao² - Show less +1 more•Institutions (2)

Tsinghua University¹, Peking University²

01 Oct 2012-IEEE Transactions on Image Processing

TL;DR: The proposed manifold-manifold distance (MMD) method is applied to the task of face recognition with image sets, where identification is achieved by seeking the minimum MMD from the probe to the gallery of image sets.

...read moreread less

Abstract: In this paper, we address the problem of classifying image sets for face recognition, where each set contains images belonging to the same subject and typically covering large variations. By modeling each image set as a manifold, we formulate the problem as the computation of the distance between two manifolds, called manifold-manifold distance (MMD). Since an image set can come in three pattern levels, point, subspace, and manifold, we systematically study the distance among the three levels and formulate them in a general multilevel MMD framework. Specifically, we express a manifold by a collection of local linear models, each depicted by a subspace. MMD is then converted to integrate the distances between pairs of subspaces from one of the involved manifolds. We theoretically and experimentally study several configurations of the ingredients of MMD. The proposed method is applied to the task of face recognition with image sets, where identification is achieved by seeking the minimum MMD from the probe to the gallery of image sets. Our experiments demonstrate that, as a general set similarity measure, MMD consistently outperforms other competing nondiscriminative methods and is also promisingly comparable to the state-of-the-art discriminative methods.

...read moreread less

118 citations

"On rank aggregation for face recogn..." refers background in this paper

...Index Terms— Video based face recognition, Rank ag- gregation, Dictionary based face recognition...
[...]

Journal Article•DOI•

Face recognition from video: a review

[...]

Jeremiah R. Barr¹, Kevin W. Bowyer¹, Patrick J. Flynn¹, Soma Biswas¹•Institutions (1)

University of Notre Dame¹

26 Nov 2012-International Journal of Pattern Recognition and Artificial Intelligence

TL;DR: A broad and deep review of recently proposed methods for overcoming the difficulties encountered in unconstrained settings is presented and connections between the ways in which humans and current algorithms recognize faces are drawn.

...read moreread less

Abstract: Driven by key law enforcement and commercial applications, research on face recognition from video sources has intensified in recent years. The ensuing results have demonstrated that videos possess unique properties that allow both humans and automated systems to perform recognition accurately in difficult viewing conditions. However, significant research challenges remain as most video-based applications do not allow for controlled recordings. In this survey, we categorize the research in this area and present a broad and deep review of recently proposed methods for overcoming the difficulties encountered in unconstrained settings. We also draw connections between the ways in which humans and current algorithms recognize faces. An overview of the most popular and difficult publicly available face video databases is provided to complement these discussions. Finally, we cover key research challenges and opportunities that lie ahead for the field as a whole.

...read moreread less

115 citations