scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

FaceSurv: A Benchmark Video Dataset for Face Detection and Recognition Across Spectra and Resolutions

TL;DR: The proposed FaceSurv database contains over 142K face images, spread across videos captured in both visible and near-infrared spectra, offering a plethora of challenges common to surveillance settings.
Abstract: Existing face recognition algorithms achieve high recognition performance for frontal face images with good illumination and close proximity to the imaging device. However, most of the existing algorithms fail to perform equally well in surveillance scenarios, where videos are captured across varying resolutions and spectra. In surveillance settings, cameras are usually placed far away from the subjects, thereby resulting in variations across pose, illumination, occlusion, and resolution. Current video datasets used for face recognition are often captured in constrained environments, and thus fail to simulate the real world scenarios. In this paper, we present the FaceSurv database featuring 252 subjects in 460 videos. The proposed dataset contains over 142K face images, spread across videos captured in both visible and near-infrared spectra. Each video contains a group of individuals walking from 36ft towards the imaging device, offering a plethora of challenges common to surveillance settings. Benchmark experimental protocol and baseline results have been reported with state-of-the-art algorithms for face detection and recognition. It is our assertion that the availability of such a challenging database will facilitate the development of robust face recognition systems relevant to real world surveillance scenarios.
Citations
More filters
Journal ArticleDOI
TL;DR: In this paper, the authors present a comprehensive analysis of various face recognition (FR) systems that leverage the different types of DL techniques, and for the study, they summarize 171 recent contributions from this area and discuss improvement ideas, current and future trends of FR tasks.
Abstract: In recent years, researchers have proposed many deep learning (DL) methods for various tasks, and particularly face recognition (FR) made an enormous leap using these techniques. Deep FR systems benefit from the hierarchical architecture of the DL methods to learn discriminative face representation. Therefore, DL techniques significantly improve state-of-the-art performance on FR systems and encourage diverse and efficient real-world applications. In this paper, we present a comprehensive analysis of various FR systems that leverage the different types of DL techniques, and for the study, we summarize 171 recent contributions from this area. We discuss the papers related to different algorithms, architectures, loss functions, activation functions, datasets, challenges, improvement ideas, current and future trends of DL-based FR systems. We provide a detailed discussion of various DL methods to understand the current state-of-the-art, and then we discuss various activation and loss functions for the methods. Additionally, we summarize different datasets used widely for FR tasks and discuss challenges related to illumination, expression, pose variations, and occlusion. Finally, we discuss improvement ideas, current and future trends of FR tasks.

39 citations

Journal ArticleDOI
TL;DR: A true large scale Surveillance Face Re-ID benchmark (SurvFace) is introduced, characterised by natively low-resolution, motion blur, uncontrolled poses, varying occlusion, poor illumination, and background clutters, where facial images are captured and detected under realistic surveillance scenarios.

26 citations

Proceedings ArticleDOI
01 Jun 2019
TL;DR: A novel noise tolerant deep metric learning algorithm, termed as Density Aware Metric Learning, enforces the model to learn embeddings that are pulled towards the most dense region of the clusters for each class, leading to faster convergence and higher generalizability.
Abstract: Deep metric learning algorithms have been utilized to learn discriminative and generalizable models which are effective for classifying unseen classes. In this paper, a novel noise tolerant deep metric learning algorithm is proposed. The proposed method, termed as Density Aware Metric Learning, enforces the model to learn embeddings that are pulled towards the most dense region of the clusters for each class. It is achieved by iteratively shifting the estimate of the center towards the dense region of the cluster thereby leading to faster convergence and higher generalizability. In addition to this, the approach is robust to noisy samples in the training data, often present as outliers. Detailed experiments and analysis on two challenging cross-modal face recognition databases and two popular object recognition databases exhibit the efficacy of the proposed approach. It has superior convergence, requires lesser training time, and yields better accuracies than several popular deep metric learning methods.

13 citations


Cites background or methods from "FaceSurv: A Benchmark Video Dataset..."

  • ...Comparisons have been performed with the vanilla triplet and quadruplet losses, (their variants for cross-modal matching are implemented for the SCface and FaceSurv databases)....

    [...]

  • ...Table 2: Summarizing the results of face identification on the FaceSurv [9] database....

    [...]

  • ...It outperforms the vanilla triplet and the quadruplet losses and their variants on the FaceSurv database (Table 2) as well....

    [...]

  • ...FaceSurv [9] is a video face database where the subjects walk towards the camera from a distance of about 10 mts....

    [...]

  • ...The proposed algorithm is evaluated on the SCface [8] and FaceSurv [9] datasets for cross-modal face matching, and on the CIFAR10 [14] and STL-10 [5] datasets for object recognition....

    [...]

Journal ArticleDOI
31 Mar 2020
TL;DR: This paper proposes a Subclass Heterogeneity Aware Loss (SHEAL) to train a deep convolutional neural network model such that it produces embeddings suitable for heterogeneous face recognition, both single and multiple heterogeneities.
Abstract: One of the most challenging scenarios of face recognition is matching images in presence of multiple covariates such as cross-spectrum and cross-resolution. In this paper, we propose a Subclass Heterogeneity Aware Loss (SHEAL) to train a deep convolutional neural network model such that it produces embeddings suitable for heterogeneous face recognition, both single and multiple heterogeneities. The performance of the proposed SHEAL function is evaluated on four databases in terms of the recognition performance as well as convergence in time and epochs. We observe that SHEAL not only yields state-of-the-art results for the most challenging case of Cross-Spectral Cross-Resolution face recognition, it also achieves excellent performance on homogeneous face recognition.

11 citations


Cites background or methods from "FaceSurv: A Benchmark Video Dataset..."

  • ...CMC curves for Cross-Resolution Face Recognition (CR-FR), Cross-Spectral Face Recognition (CS-FR) and Cross-Spectral Cross-Resolution Face Recognition (CSCR-FR) on the SCface [10], FaceSurv [11] and CASIA NIR-VIS 2....

    [...]

  • ...challenging databases, namely SCface [10], FaceSurv [11], CASIA NIR-VIS 2....

    [...]

  • ...Images are taken from the SCface [10] and FaceSurv [11] databases....

    [...]

  • ...On the FaceSurv [11] database we have performed CSCR-FR and CR-FR on two different probe resolutions, namely 48 × 48 and 64 × 64....

    [...]

  • ...0 [12] and FaceSurv [11] databases, extensive comparisons have been performed with recent deep metric learning methods and state-of-the-art heterogeneous face recognition methods....

    [...]

Journal ArticleDOI
TL;DR: A Supervised Resolution Enhancement and Recognition Network (SUPREAR-NET), which does not corrupt the useful class-specific information of the face image and transforms a low resolution probe image into a high resolution one, followed by effective matching with the gallery using a trained discriminative model.
Abstract: Heterogeneous face recognition is a challenging problem where the probe and gallery images belong to different modalities such as, low and high resolution, visible and near-infrared spectrum. A Generative Adversarial Network (GAN) enables us to learn an image to image transformation model for enhancing the resolution of a face image. Such a model would be helpful in a heterogeneous face recognition scenario. However, unsupervised GAN based transformation methods in their native formulation might alter useful discriminative information in the transformed face images. This affects the performance of face recognition algorithms when applied on the transformed images. We propose a Supervised Resolution Enhancement and Recognition Network (SUPREAR-NET), which does not corrupt the useful class-specific information of the face image and transforms a low resolution probe image into a high resolution one, followed by effective matching with the gallery using a trained discriminative model. We show the results for cross-resolution face recognition on three datasets including the FaceSurv face dataset, containing poor quality low resolution videos captured at a standoff distance up to 10 meters from the camera. On the FaceSurv, NIST MEDS and CMU MultiPIE datasets, the proposed algorithm outperforms recent unsupervised and supervised GAN algorithms.

3 citations

References
More filters
Journal ArticleDOI
TL;DR: In this paper, a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates is described. But the detection performance is limited to 15 frames per second.
Abstract: This paper describes a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates. There are three key contributions. The first is the introduction of a new image representation called the “Integral Image” which allows the features used by our detector to be computed very quickly. The second is a simple and efficient classifier which is built using the AdaBoost learning algorithm (Freund and Schapire, 1995) to select a small number of critical visual features from a very large set of potential features. The third contribution is a method for combining classifiers in a “cascade” which allows background regions of the image to be quickly discarded while spending more computation on promising face-like regions. A set of experiments in the domain of face detection is presented. The system yields face detection performance comparable to the best previous systems (Sung and Poggio, 1998; Rowley et al., 1998; Schneiderman and Kanade, 2000; Roth et al., 2000). Implemented on a conventional desktop, face detection proceeds at 15 frames per second.

13,037 citations

Proceedings ArticleDOI
07 Jul 2001
TL;DR: A new image representation called the “Integral Image” is introduced which allows the features used by the detector to be computed very quickly and a method for combining classifiers in a “cascade” which allows background regions of the image to be quickly discarded while spending more computation on promising face-like regions.
Abstract: This paper describes a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates. There are three key contributions. The first is the introduction of a new image representation called the "Integral Image" which allows the features used by our detector to be computed very quickly. The second is a simple and efficient classifier which is built using the AdaBoost learning algo- rithm (Freund and Schapire, 1995) to select a small number of critical visual features from a very large set of potential features. The third contribution is a method for combining classifiers in a "cascade" which allows back- ground regions of the image to be quickly discarded while spending more computation on promising face-like regions. A set of experiments in the domain of face detection is presented. The system yields face detection perfor- mance comparable to the best previous systems (Sung and Poggio, 1998; Rowley et al., 1998; Schneiderman and Kanade, 2000; Roth et al., 2000). Implemented on a conventional desktop, face detection proceeds at 15 frames per second.

10,592 citations


"FaceSurv: A Benchmark Video Dataset..." refers background or methods in this paper

  • ...Face detection results are reported with the current state of the art face detection algorithms namely Viola-Jones [24], Fast Face Detector [16], Tiny Face Detector [10], and Single Shot Scale-Invariant Face Detector [31]....

    [...]

  • ...For experiments with the commercial matchers, the annoatetd face images of the FaceSurv dataset are used as probe images, on the other hand, for VGGFace and LightCNN29, Viola-Jones [24] face detector was used on the annotated frames for detection of faces for all the probe videos....

    [...]

  • ...[24] P. Viola and M. J. Jones....

    [...]

  • ...In order to perform baselining on the proposed dataset, we compute face detection results using some popular as well as state-of-art face detectors, namely, Viola Jones Face Detector [24], Fast Face Detector[16], Tiny Face Detector [10], and Single Shot Scale-Invariant Face Detector (S3FD) [31]....

    [...]

  • ...A highly effective and popular technique for face detection was presented by Viola and Jones [24], wherein Haar-like features are extracted and adaptive boosting is used to train a cascade of classifiers....

    [...]

Proceedings ArticleDOI
01 Jan 2015
TL;DR: It is shown how a very large scale dataset can be assembled by a combination of automation and human in the loop, and the trade off between data purity and time is discussed.
Abstract: The goal of this paper is face recognition – from either a single photograph or from a set of faces tracked in a video. Recent progress in this area has been due to two factors: (i) end to end learning for the task using a convolutional neural network (CNN), and (ii) the availability of very large scale training datasets. We make two contributions: first, we show how a very large scale dataset (2.6M images, over 2.6K people) can be assembled by a combination of automation and human in the loop, and discuss the trade off between data purity and time; second, we traverse through the complexities of deep network training and face recognition to present methods and procedures to achieve comparable state of the art results on the standard LFW and YTF face benchmarks.

5,308 citations


"FaceSurv: A Benchmark Video Dataset..." refers background in this paper

  • ...We present results of baseline experiments using two commercial matchers namely COTSA and Verilook [1] (COTS-B), and two state-of-the-art deep convolutional networks namely LightCNN29 [27] and VGGFace [20] on both the face recognition scenarios (CR and CS-CR)....

    [...]

  • ...Similarly, baseline face recognition results have been reported with two Commercial Of The Shelf systems, namely COTS-A and Verilook (COTS-B), and two state-of-the-art deep convolutional networks namely LightCNN29 and VGGFace....

    [...]

  • ...For experiments with the commercial matchers, the annoatetd face images of the FaceSurv dataset are used as probe images, on the other hand, for VGGFace and LightCNN29, Viola-Jones [24] face detector was used on the annotated frames for detection of faces for all the probe videos....

    [...]

  • ...A and Verilook [1] (COTS-B), and two state-of-the-art deep convolutional networks namely LightCNN29 [27] and VGGFace [20] on both the face recognition scenarios (CR and CS-CR)....

    [...]

  • ...For overall video, LightCNN [27] and VGGFace perform better for CS-CR face recognition as compared to the COTS....

    [...]

Journal ArticleDOI
TL;DR: In this article, the authors categorize and evaluate face detection algorithms and discuss relevant issues such as data collection, evaluation metrics and benchmarking, and conclude with several promising directions for future research.
Abstract: Images containing faces are essential to intelligent vision-based human-computer interaction, and research efforts in face processing include face recognition, face tracking, pose estimation and expression recognition. However, many reported methods assume that the faces in an image or an image sequence have been identified and localized. To build fully automated systems that analyze the information contained in face images, robust and efficient face detection algorithms are required. Given a single image, the goal of face detection is to identify all image regions which contain a face, regardless of its 3D position, orientation and lighting conditions. Such a problem is challenging because faces are non-rigid and have a high degree of variability in size, shape, color and texture. Numerous techniques have been developed to detect faces in a single image, and the purpose of this paper is to categorize and evaluate these algorithms. We also discuss relevant issues such as data collection, evaluation metrics and benchmarking. After analyzing these algorithms and identifying their limitations, we conclude with several promising directions for future research.

3,894 citations


"FaceSurv: A Benchmark Video Dataset..." refers background in this paper

  • ...It is a crucial step for many face analysis applications such as face alignment, face tracking, face recognition, and face verification [28]....

    [...]

Proceedings ArticleDOI
20 Jun 2011
TL;DR: A comprehensive database of labeled videos of faces in challenging, uncontrolled conditions, the ‘YouTube Faces’ database, along with benchmark, pair-matching tests are presented and a novel set-to-set similarity measure, the Matched Background Similarity (MBGS), is described.
Abstract: Recognizing faces in unconstrained videos is a task of mounting importance. While obviously related to face recognition in still images, it has its own unique characteristics and algorithmic requirements. Over the years several methods have been suggested for this problem, and a few benchmark data sets have been assembled to facilitate its study. However, there is a sizable gap between the actual application needs and the current state of the art. In this paper we make the following contributions. (a) We present a comprehensive database of labeled videos of faces in challenging, uncontrolled conditions (i.e., ‘in the wild’), the ‘YouTube Faces’ database, along with benchmark, pair-matching tests1. (b) We employ our benchmark to survey and compare the performance of a large variety of existing video face recognition techniques. Finally, (c) we describe a novel set-to-set similarity measure, the Matched Background Similarity (MBGS). This similarity is shown to considerably improve performance on the benchmark tests.

1,423 citations


"FaceSurv: A Benchmark Video Dataset..." refers background in this paper

  • ...Face in Action [8] VIS Single 180 6470 YouTube Faces [25] VIS Single 1595 3425 ChokePoint [26] VIS Single 54 48 PaSC [3] VIS Single 265 2802 SN-Flip [2] VIS Multiple 190 28 McGillFaces [6] VIS Single 60 60 CrowdFaceDB [7] VIS Multiple 257 385 CSCRV [22] VIS&NIR Multiple 160 193 IJB-S [13] VIS Multiple 202 350* Proposed FaceSurv VIS&NIR Multiple 252 460...

    [...]