scispace - formally typeset
Search or ask a question

Showing papers on "Object-class detection published in 2011"


Proceedings ArticleDOI
06 Nov 2011
TL;DR: This paper proposes a very fast binary descriptor based on BRIEF, called ORB, which is rotation invariant and resistant to noise, and demonstrates through experiments how ORB is at two orders of magnitude faster than SIFT, while performing as well in many situations.
Abstract: Feature matching is at the base of many computer vision problems, such as object recognition or structure from motion. Current methods rely on costly descriptors for detection and matching. In this paper, we propose a very fast binary descriptor based on BRIEF, called ORB, which is rotation invariant and resistant to noise. We demonstrate through experiments how ORB is at two orders of magnitude faster than SIFT, while performing as well in many situations. The efficiency is tested on several real-world applications, including object detection and patch-tracking on a smart phone.

8,702 citations


Proceedings ArticleDOI
01 Nov 2011
TL;DR: AFLW provides a large-scale collection of images gathered from Flickr, exhibiting a large variety in face appearance as well as general imaging and environmental conditions, and is well suited to train and test algorithms for multi-view face detection, facial landmark localization and face pose estimation.
Abstract: Face alignment is a crucial step in face recognition tasks. Especially, using landmark localization for geometric face normalization has shown to be very effective, clearly improving the recognition results. However, no adequate databases exist that provide a sufficient number of annotated facial landmarks. The databases are either limited to frontal views, provide only a small number of annotated images or have been acquired under controlled conditions. Hence, we introduce a novel database overcoming these limitations: Annotated Facial Landmarks in the Wild (AFLW). AFLW provides a large-scale collection of images gathered from Flickr, exhibiting a large variety in face appearance (e.g., pose, expression, ethnicity, age, gender) as well as general imaging and environmental conditions. In total 25,993 faces in 21,997 real-world images are annotated with up to 21 landmarks per image. Due to the comprehensive set of annotations AFLW is well suited to train and test algorithms for multi-view face detection, facial landmark localization and face pose estimation. Further, we offer a rich set of tools that ease the integration of other face databases and associated annotations into our joint framework.

1,033 citations


Proceedings ArticleDOI
TL;DR: This work presents a novel approach based on analyzing facial image textures for detecting whether there is a live person in front of the camera or a face print, and analyzes the texture of the facial images using multi-scale local binary patterns (LBP).
Abstract: Current face biometric systems are vulnerable to spoofing attacks. A spoofing attack occurs when a person tries to masquerade as someone else by falsifying data and thereby gaining illegitimate access. Inspired by image quality assessment, characterization of printing artifacts, and differences in light reflection, we propose to approach the problem of spoofing detection from texture analysis point of view. Indeed, face prints usually contain printing quality defects that can be well detected using texture features. Hence, we present a novel approach based on analyzing facial image textures for detecting whether there is a live person in front of the camera or a face print. The proposed approach analyzes the texture of the facial images using multi-scale local binary patterns (LBP). Compared to many previous works, our proposed approach is robust, computationally fast and does not require user-cooperation. In addition, the texture features that are used for spoofing detection can also be used for face recognition. This provides a unique feature space for coupling spoofing detection and face recognition. Extensive experimental analysis on a publicly available database showed excellent results compared to existing works.

628 citations


Proceedings ArticleDOI
06 Nov 2011
TL;DR: This work presents a method for detecting 3D objects using multi-modalities based on an efficient representation of templates that capture the different modalities, and shows in many experiments on commodity hardware that it significantly outperforms state-of-the-art methods on single modalities.
Abstract: We present a method for detecting 3D objects using multi-modalities. While it is generic, we demonstrate it on the combination of an image and a dense depth map which give complementary object information. It works in real-time, under heavy clutter, does not require a time consuming training stage, and can handle untextured objects. It is based on an efficient representation of templates that capture the different modalities, and we show in many experiments on commodity hardware that our approach significantly outperforms state-of-the-art methods on single modalities.

611 citations


Proceedings ArticleDOI
20 Jun 2011
TL;DR: An efficient patch-based face image quality assessment algorithm which quantifies the similarity of a face image to a probabilistic face model, representing an ‘ideal’ face is proposed.
Abstract: In video based face recognition, face images are typically captured over multiple frames in uncontrolled conditions, where head pose, illumination, shadowing, motion blur and focus change over the sequence. Additionally, inaccuracies in face localisation can also introduce scale and alignment variations. Using all face images, including images of poor quality, can actually degrade face recognition performance. While one solution it to use only the ‘best’ of images, current face selection techniques are incapable of simultaneously handling all of the abovementioned issues. We propose an efficient patch-based face image quality assessment algorithm which quantifies the similarity of a face image to a probabilistic face model, representing an ‘ideal’ face. Image characteristics that affect recognition are taken into account, including variations in geometric alignment (shift, rotation and scale), sharpness, head pose and cast shadows. Experiments on FERET and PIE datasets show that the proposed algorithm is able to identify images which are simultaneously the most frontal, aligned, sharp and well illuminated. Further experiments on a new video surveillance dataset (termed ChokePoint) show that the proposed method provides better face subsets than existing face selection techniques, leading to significant improvements in recognition accuracy.

314 citations


Proceedings ArticleDOI
06 Nov 2011
TL;DR: This work focuses on the first layers of a category independent object detection cascade in which a large number of windows from an objectness prior are sampled, and then discriminatively learn to filter these candidate windows by an order of magnitude.
Abstract: Cascades are a popular framework to speed up object detection systems. Here we focus on the first layers of a category independent object detection cascade in which we sample a large number of windows from an objectness prior, and then discriminatively learn to filter these candidate windows by an order of magnitude. We make a number of contributions to cascade design that substantially improve over the state of the art: (i) our novel objectness prior gives much higher recall than competing methods, (ii) we propose objectness features that give high performance with very low computational cost, and (iii) we make use of a structured output ranking approach to learn highly effective, but inexpensive linear feature combinations by directly optimizing cascade performance. Thorough evaluation on the PASCAL VOC data set shows consistent improvement over the current state of the art, and over alternative discriminative learning strategies.

203 citations


Book
28 Jul 2011
TL;DR: This chapter discusses Morphable Models for Training a Component-Based Face Recognition System, a Unified Approach for Analysis and Synthesis of Images and Multimodal Biometrics: Augmenting Face with Other Cues.
Abstract: Preface PART I: THE BASICS Chapter 1: A Guided Tour for Face Processing Chapter 2: Eigenface and Beyond Chapter 3: Statistical Evaluation of Face Recognition Systems PART II: FACE MODELING Chapter 4: 3D Morphable Face Model: A Unified Approach for Analysis and Synthesis of Images Chapter 5: Expression-Invariant Three-Dimensional Face Recognition Chapter 6: 3D Face Modeling from Monocular Video Sequences Chapter 7: Face Modeling by Information Maximization Chapter 8: Face Recognition by Human Chapter 9: Predicting Face Recognition Success for Humans Chapter 10: Distributed Representation of Faces and Objects PART III: ADVANCED METHODS Chapter 11: On the Effect of Illumination and Face Recognition Chapter 12: Modeling Illumination Variation with Spherical Harmonics Chapter 13: Multi-Subregion Based Probabilistic Approach Toward Pose-Invariant Face Recognition Chapter 14:Morphable Models for Training a Component-Based Face Recognition System Chapter 15: Model-Based Face Modeling and Tracking with Application to Video Conferencing Chapter 16: 3D and MultiModal 3D & 2D Face Recognition Chapter 17: Beyond One Still Image: Face Recognition from multiple Still Images or Video Sequence Chapter 18: Subset Modeling of Face Localization Error, Occlusion and Expression Chapter 19: Real-Time Robust Face and Facial Features Detection with Information-Based Maximum Discrimination Chapter 20: Current Landscape of Thermal Infrared Face Recognition Chapter 21: Multimodal Biometrics: Augmenting Face with Other Cues

184 citations


Proceedings ArticleDOI
29 Dec 2011
TL;DR: Tests conducted on large databases show good improvements of classification accuracy as well as true positive and false positive rates compared to the state-of-the-art.
Abstract: Spoofing face recognition systems with photos or videos of someone else is not difficult. Sometimes, all one needs is to display a picture on a laptop monitor or a printed photograph to the biometric system. In order to detect this kind of spoofs, in this paper we present a solution that works either with printed or LCD displayed photographs, even under bad illumination conditions without extra-devices or user involvement. Tests conducted on large databases show good improvements of classification accuracy as well as true positive and false positive rates compared to the state-of-the-art.

173 citations


Journal ArticleDOI
TL;DR: It is argued that both large-and small-scale features of a face image are important for face restoration and recognition, and it is suggested that illumination normalization should be performed mainly on the large-scale featured rather than on the original face image.
Abstract: A face image can be represented by a combination of large-and small-scale features. It is well-known that the variations of illumination mainly affect the large-scale features (low-frequency components), and not so much the small-scale features. Therefore, in relevant existing methods only the small-scale features are extracted as illumination-invariant features for face recognition, while the large-scale intrinsic features are always ignored. In this paper, we argue that both large-and small-scale features of a face image are important for face restoration and recognition. Moreover, we suggest that illumination normalization should be performed mainly on the large-scale features of a face image rather than on the original face image. A novel method of normalizing both the Small-and Large-scale (S&L) features of a face image is proposed. In this method, a single face image is first decomposed into large-and small-scale features. After that, illumination normalization is mainly performed on the large-scale features, and only a minor correction is made on the small-scale features. Finally, a normalized face image is generated by combining the processed large-and small-scale features. In addition, an optional visual compensation step is suggested for improving the visual quality of the normalized image. Experiments on CMU-PIE, Extended Yale B, and FRGC 2.0 face databases show that by using the proposed method significantly better recognition performance and visual results can be obtained as compared to related state-of-the-art methods.

143 citations


Proceedings ArticleDOI
TL;DR: This paper describes an anti-spoofing solution based on a set of low-level feature descriptors capable of distinguishing between ‘live’ and ‘spoof’ images and videos, and explores both spatial and temporal information to learn distinctive characteristics between the two classes.
Abstract: Personal identity verification based on biometrics has received increasing attention since it allows reliable authentication through intrinsic characteristics, such as face, voice, iris, fingerprint, and gait. Particularly, face recognition techniques have been used in a number of applications, such as security surveillance, access control, crime solving, law enforcement, among others. To strengthen the results of verification, biometric systems must be robust against spoofing attempts with photographs or videos, which are two common ways of bypassing a face recognition system. In this paper, we describe an anti-spoofing solution based on a set of low-level feature descriptors capable of distinguishing between ‘live’ and ‘spoof’ images and videos. The proposed method explores both spatial and temporal information to learn distinctive characteristics between the two classes. Experiments conducted to validate our solution with datasets containing images and videos show results comparable to state-of-the-art approaches.

129 citations


Proceedings ArticleDOI
05 Jun 2011
TL;DR: This paper presents an approach for object detection utilizing sparse scene flow, which does not rely on object classes and allows for a robust detection of dynamic objects in traffic scenes.
Abstract: Modern driver assistance systems such as collision avoidance or intersection assistance need reliable information on the current environment. Extracting such information from camera-based systems is a complex and challenging task for inner city traffic scenarios. This paper presents an approach for object detection utilizing sparse scene flow. For consecutive stereo images taken from a moving vehicle, corresponding interest points are extracted. Thus, for every interest point, disparity and optical flow values are known and consequently, scene flow can be calculated. Adjacent interest points describing a similar scene flow are considered to belong to one rigid object. The proposed method does not rely on object classes and allows for a robust detection of dynamic objects in traffic scenes. Leading vehicles are continuously detected for several frames. Oncoming objects are detected within five frames after their appearance.

Journal ArticleDOI
01 May 2011
TL;DR: Analyzing the individual performance of all those public classifiers for face detection presenting their pros and cons with the aim of defining a baseline for other approaches to solve the face detection problem.
Abstract: The human face provides useful information during interaction; therefore, any system integrating Vision-Based Human Computer Interaction requires fast and reliable face and facial feature detection. Different approaches have focused on this ability but only open source implementations have been extensively used by researchers. A good example is the Viola–Jones object detection framework that particularly in the context of facial processing has been frequently used. The OpenCV community shares a collection of public domain classifiers for the face detection scenario. However, these classifiers have been trained in different conditions and with different data but rarely tested on the same datasets. In this paper, we try to fill that gap by analyzing the individual performance of all those public classifiers presenting their pros and cons with the aim of defining a baseline for other approaches. Solid comparisons will also help researchers to choose a specific classifier for their particular scenario. The experimental setup also describes some heuristics to increase the facial feature detection rate while reducing the face false detection rate.

Proceedings ArticleDOI
TL;DR: Both video and static analysis are performed in order to employ complementary information about motion, texture and liveness and consequently to obtain a more robust classification of 2-D face spoofing attacks.
Abstract: We faced the problem of detecting 2-D face spoofing attacks performed by placing a printed photo of a real user in front of the camera. For this type of attack it is not possible to relay just on the face movements as a clue of vitality because the attacker can easily simulate such a case, and also because real users often show a “low vitality” during the authentication session. In this paper, we perform both video and static analysis in order to employ complementary information about motion, texture and liveness and consequently to obtain a more robust classification.

Journal ArticleDOI
TL;DR: This work presents a generic, flexible parallel architecture, which is suitable for all ranges of object detection applications and image sizes, and implements the AdaBoost-based detection algorithm, considered one of the most efficient object detection algorithms.
Abstract: Real-time object detection is becoming necessary for a wide number of applications related to computer vision and image processing, security, bioinformatics, and several other areas. Existing software implementations of object detection algorithms are constrained in small-sized images and rely on favorable conditions in the image frame to achieve real-time detection frame rates. Efforts to design hardware architectures have yielded encouraging results, yet are mostly directed towards a single application, targeting specific operating environments. Consequently, there is a need for hardware architectures capable of detecting several objects in large image frames, and which can be used under several object detection scenarios. In this work, we present a generic, flexible parallel architecture, which is suitable for all ranges of object detection applications and image sizes. The architecture implements the AdaBoost-based detection algorithm, which is considered one of the most efficient object detection algorithms. Through both field-programmable gate array emulation and large-scale implementation, and register transfer level synthesis and simulation, we illustrate that the architecture can detect objects in large images (up to 1024 × 768 pixels) with frame rates that can vary between 64-139 fps for various applications and input image frame sizes.

Proceedings ArticleDOI
Gary Overett1, Lars Petersson1
05 Jun 2011
TL;DR: The aim of this research is to find features capable of driving further improvements atop a preexisting detection framework used commercially to detect traffic signs on the scale of entire national road networks (1000's of kilometres of video).
Abstract: In this paper we present two variant formulations of the well-known Histogram of Oriented Gradients (HOG) features and provide a comparison of these features on a large scale sign detection problem. The aim of this research is to find features capable of driving further improvements atop a preexisting detection framework used commercially to detect traffic signs on the scale of entire national road networks (1000's of kilometres of video). We assume the computationally efficient framework of a cascade of boosted weak classifiers. Rather than comparing features on the general problem of detection we compare their merits in the final stages of a cascaded detection problem where a feature's ability to reduce error is valued more highly than computational efficiency. Results show the benefit of the two new features on a New Zealand speed sign detection problem. We also note the importance of using non-sign training and validation instances taken from the same video data that contains the training and validation positives. This is attributed to the potential for the more powerful HOG features to overfit on specific local patterns which may be present in alternative video data.

Journal ArticleDOI
TL;DR: This paper addresses the task of efficient object class detection by means of the Hough transform by demonstrating PRISM’s flexibility by two complementary implementations: a generatively trained Gaussian Mixture Model as well as a discriminatively trained histogram approach.
Abstract: This paper addresses the task of efficient object class detection by means of the Hough transform. This approach has been made popular by the Implicit Shape Model (ISM) and has been adopted many times. Although ISM exhibits robust detection performance, its probabilistic formulation is unsatisfactory. The PRincipled Implicit Shape Model (PRISM) overcomes these problems by interpreting Hough voting as a dual implementation of linear sliding-window detection. It thereby gives a sound justification to the voting procedure and imposes minimal constraints. We demonstrate PRISM's flexibility by two complementary implementations: a generatively trained Gaussian Mixture Model as well as a discriminatively trained histogram approach. Both systems achieve state-of-the-art performance. Detections are found by gradient-based or branch and bound search, respectively. The latter greatly benefits from PRISM's feature-centric view. It thereby avoids the unfavourable memory trade-off and any on-line pre-processing of the original Efficient Subwindow Search (ESS). Moreover, our approach takes account of the features' scale value while ESS does not. Finally, we show how to avoid soft-matching and spatial pyramid descriptors during detection without losing their positive effect. This makes algorithms simpler and faster. Both are possible if the object model is properly regularised and we discuss a modification of SVMs which allows for doing so.

Journal ArticleDOI
TL;DR: This work shows that foreground-background classification (detection) and within-class classification of the foreground class (pose estimation) can be jointly learned in a multiplicative form of two kernel functions.
Abstract: Object detection is challenging when the object class exhibits large within-class variations. In this work, we show that foreground-background classification (detection) and within-class classification of the foreground class (pose estimation) can be jointly learned in a multiplicative form of two kernel functions. Model training is accomplished via standard SVM learning. When the foreground object masks are provided in training, the detectors can also produce object segmentations. A tracking-by-detection framework to recover foreground state in video sequences is also proposed with our model. The advantages of our method are demonstrated on tasks of object detection, view angle estimation, and tracking. Our approach compares favorably to existing methods on hand and vehicle detection tasks. Quantitative tracking results are given on sequences of moving vehicles and human faces.

PatentDOI
TL;DR: In this article, a spatio-temporal map (STM) is generated from a slice of video data representing a history of pixel data corresponding to a scan line to detect objects in the video data.
Abstract: Systems and methods for detecting and tracking objects, such as motor vehicles, within video data. The systems and method analyze video data, for example, to count objects, determine object speeds, and track the path of objects without relying on the detection and identification of background data within the captured video data. The detection system uses one or more scan lines to generate a spatio-temporal map. A spatio-temporal map is a time progression of a slice of video data representing a history of pixel data corresponding to a scan line. The detection system detects objects in the video data based on intersections of lines within the spatio-temporal map. Once the detection system has detected an object, the detection system may record the detection for counting purposes, display an indication of the object in association with the video data, determine the speed of the object, etc.

Journal ArticleDOI
TL;DR: The motivations for organizing this special section were to better address the challenges of face recognition in real-world scenarios, to promote systematic research and evaluation of promising methods and systems, to provide a snapshot of where the authors are in this domain, and to stimulate discussion about future directions.
Abstract: The motivations for organizing this special section were to better address the challenges of face recognition in real-world scenarios, to promote systematic research and evaluation of promising methods and systems, to provide a snapshot of where we are in this domain, and to stimulate discussion about future directions. We solicited original contributions of research on all aspects of real-world face recognition, including: the design of robust face similarity features and metrics; robust face clustering and sorting algorithms; novel user interaction models and face recognition algorithms for face tagging; novel applications of web face recognition; novel computational paradigms for face recognition; challenges in large scale face recognition tasks, e.g., on the Internet; face recognition with contextual information; face recognition benchmarks and evaluation methodology for moderately controlled or uncontrolled environments; and video face recognition. We received 42 original submissions, four of which were rejected without review; the other 38 papers entered the normal review process. Each paper was reviewed by three reviewers who are experts in their respective topics. More than 100 expert reviewers have been involved in the review process. The papers were equally distributed among the guest editors. A final decision for each paper was made by at least two guest editors assigned to it. To avoid conflict of interest, no guest editor submitted any papers to this special section.

Proceedings ArticleDOI
06 Nov 2011
TL;DR: This work investigates how to learn a universal multi-view age estimator by harnessing unlabeled web videos, a publicly available labeled frontal face corpus, and zero or more non-frontal faces with age labels.
Abstract: Many existing techniques for analyzing face images assume that the faces are at nearly frontal. Generalizing to non-frontal faces is often difficult, due to a dearth of ground truth for non-frontal faces and also to the inherent challenges in handling pose variations. In this work, we investigate how to learn a universal multi-view age estimator by harnessing 1) unlabeled web videos, 2) a publicly available labeled frontal face corpus, and 3) zero or more non-frontal faces with age labels. First, a large diverse human-involved video corpus is collected from online video sharing website. Then, multi-view face detection and tracking are performed to build a large set of frontal-vs-profile face bundles, each of which is from the same tracking sequence, and thus exhibiting the same age. These unlabeled face bundles constitute the so-called video context, and the parametric multi-view age estimator is trained by 1) enforcing the face-to-age relation for the partially labeled faces, 2) imposing the consistency of the predicted ages for the non-frontal and frontal faces within each face bundle, and 3) mutually constraining the multi-view age models with the spatial correspondence priors derived from the face bundles. Our multi-view age estimator performs well on a realistic evaluation dataset that contains faces under varying poses, and whose ground truth age was manually annotated.

Proceedings ArticleDOI
07 Oct 2011
TL;DR: A face recognition systems based on one combination of four individual techniques namely Principal Component Analysis (PCA), Discrete Cosine Transform (DCT), Template Matching using Corr and Partitioned Iterative Function System (PIFS), which fuse the scores of all of these four techniques in a single face recognition system.
Abstract: The objective of face recognition involves the extraction of different features of the human face from the face image for discriminating it from other persons. It is the problem of searching a face in reference database to find the matches as a given face. The purpose is to find a face that has highest similarity with a given face in the database. Many face recognition algorithms have been developed and used as an application of access control and surveillance. For enhancing the performance and accuracy of biometric face recognition system, we use a multi-algorithmic approach, where in a combination of four different individual face recognition techniques is used. In this paper, we develop a face recognition systems based on one combination of four individual techniques namely Principal Component Analysis (PCA), Discrete Cosine Transform (DCT), Template Matching using Correlation (Corr) and Partitioned Iterative Function System (PIFS). We fuse the scores of all of these four techniques in a single face recognition system. We perform a comparative study of face recognition rate of this face recognition system at two precision levels namely at Top-5 and at Top-10 IDs. We experiment it with a standard ORL face database. Experimentally, we find that recognition rate by PCA-DCT technique is better than by individual PCA and DCT techniques and recognition rate by PCA-DCT-Corr technique is better than the PCA-DCT technique. Overall, we find the system based on combination of all of the four individual techniques outperforms.

Journal Article
TL;DR: Face detection based on Adaboost and cascade classifier and face detection which is combined effectively with tracking algorithm based on CamShift is proposed and can operate successfully detection and tracking in scale variation, view variation or occlusion.
Abstract: Face detection based on Adaboost and cascade classifier are studied in this paperHaar-Like features, computing method of integral image,detail definition of micro structural features and their description modes are discussedThe computing methods of four different kinds of rectangle features are also givenMoreover,color histogram is the feature of tracking,and face detection based on Adaboost which is combined effectively with tracking algorithm based on CamShift is proposedTo evaluate the performance of the proposed method,experiments have been conducted on video sequencesExperimental results show this proposed method can operate successfully detection and tracking in scale variation,view variation or occlusion

Proceedings ArticleDOI
01 Dec 2011
TL;DR: An improved segmentation algorithm for face detection in color images with multiple faces and skin tone regions is proposed, which ingeniously uses a novel skin color model, RGB-HS-CbCr for the detection of human faces.
Abstract: Human face detection has become a major field of interest in current research because there is no deterministic algorithm to find face(s) in a given image. Further the algorithms that exist are very much specific to the kind of images they would take as input and detect faces. The problem is to detect faces in the given, colored group photograph. In this paper, an improved segmentation algorithm for face detection in color images with multiple faces and skin tone regions is proposed. Algorithm ingeniously uses a novel skin color model, RGB-HS-CbCr for the detection of human faces. Skin regions are extracted using a set of bounding rules based on the skin color distribution obtained from a training set. The segmented face regions are further classified using a parallel combination of simple morphological operations. Experimental results on a large photo data set have demonstrated that the proposed model is able to achieve good detection success rates for near-frontal faces of varying orientations, skin color and background environment.

01 Jan 2011
TL;DR: Experimental results show that the proposed three-stage scheme for real-time reliable face detection has good performance in the face detection of faces in various poses, faces in skin color-like backgrounds, faces under varying illumination, and faces of various races.
Abstract: A three-stage scheme for real-time reliable face detection is presented. The proposed three-stage scheme is a feature-based method that is mainly based on skin color and facial features. Skin regions are obtained using a YCbCr skin-color model in the first stage. In the second stage, a face template measure is used to obtain face candidates and then a suitable face box is used to effectively remove non-face regions from the face can- didates. Finally, facial features are measured to detect faces from face candidates in the third stage. Experimental results show that the proposed method has good performance in the face detection of faces in various poses, faces in skin color-like backgrounds, faces under varying illumination, and faces of various races.

Journal ArticleDOI
TL;DR: This paper presents an innovative three dimensional occlusion detection and restoration strategy for the recognition of three dimensional faces partially occluded by unforeseen, extraneous objects and demonstrates the robustness and feasibility of the approach.
Abstract: This paper presents an innovative three dimensional occlusion detection and restoration strategy for the recognition of three dimensional faces partially occluded by unforeseen, extraneous objects. The detection method considers occlusions as local deformations of the face that correspond to perturbations in a space designed to represent non-occluded faces. Once detected, occlusions represent missing information, or "holes" in the faces. The restoration module exploits the information provided by the non-occluded part of the face to recover the whole face, using an appropriate basis for the space in which non-occluded faces lie. The restoration strategy does not depend on the method used to detect occlusions and can also be applied to restore faces in the presence of noise and missing pixels due to acquisition inaccuracies. The strategy has been experimented on the occluded acquisitions taken from the Bosphorus 3D face database. A method for the generation of real-looking occlusions is also presented. Artificial occlusions, applied to the UND database, allowed for an in-depth analysis of the capabilities of our approach. Experimental results demonstrate the robustness and feasibility of our approach.

25 Oct 2011
TL;DR: A primal-sketch-based set of image tokens is proposed that is used for object representation and detection and top-down information is introduced based on an efficient method for the evaluation of the likelihood of hypothesized part locations.
Abstract: A combination of techniques that is becoming increasingly popular is the construction of part-based object representations using the outputs of interest-point detectors. Our contributions in this paper are twofold: first, we propose a primal-sketch-based set of image tokens that are used for object representation and detection. Second, top-down information is introduced based on an efficient method for the evaluation of the likelihood of hypothesized part locations. This allows us to use graphical model techniques to complement bottom-up detection, by proposing and finding the parts of the object that were missed by the front-end feature detection stage. Detection results for four object categories validate the merits of this joint top-down and bottom-up approach.

Journal ArticleDOI
TL;DR: The proposed nose tip detection method does not require training and does not rely on any particular model, and it can deal with both frontal and non-frontal poses, and is quite fast, requiring only seconds to process an image of 100-200 pixels with a MATLAB implementation.

Proceedings ArticleDOI
TL;DR: A new face image database, called Near-Infrared Face Recognition at a Distance Database (NFRAD-DB), is introduced and Rank-1 identification accuracy of 28 percent was achieved from the proposed method compared to 18 percent rank-1 accuracy of a state of the art face recognition system, FaceVACS.
Abstract: Face recognition at a distance is gaining wide attention in order to augment the surveillance systems with face recognition capability. However, face recognition at a distance in nighttime has not yet received adequate attention considering the increased security threats at nighttime. We introduce a new face image database, called Near-Infrared Face Recognition at a Distance Database (NFRAD-DB). Images in NFRAD-DB are collected at a distance of up to 60 meters with 50 different subjects using a near-infrared camera, a telescope, and near-infrared illuminator. We provide face recognition performance using FaceVACS, DoG-SIFT, and DoG-MLBP representations. The face recognition test consisted of NIR images of these 50 subjects at 60 meters as probe and visible images at 1 meter with additional mug shot images of 10,000 subjects as gallery. Rank-1 identification accuracy of 28 percent was achieved from the proposed method compared to 18 percent rank-1 accuracy of a state of the art face recognition system, FaceVACS. These recognition results are encouraging given this challenging matching problem due to the illumination pattern and insufficient brightness in NFRAD images.

Proceedings ArticleDOI
10 Jul 2011
TL;DR: A novel method for the detection of seat belt in a monitoring image which contains the full scene information of the moving car, based on the direction information measure in the HSV color space is presented.
Abstract: This paper presents a novel method for the detection of seat belt in a monitoring image which contains the full scene information of the moving car. First, the driver area is located based on the vehicle outline. Then the potential seat belt edges are detected by an effective algorithm based on the direction information measure in the HSV color space, and the result is finally obtained by further verification of the edges. Experiments demonstrate the method makes a good performance even with noisy images.

Proceedings ArticleDOI
06 Mar 2011
TL;DR: It is argued that depth and context can improve frontal face detection, in turn improving the ability of robots to interact with humans, and supported this claim with encouraging preliminary experimental results.
Abstract: The information available to a robot through a variety of sensors and contextual awareness is rich and unique. In this paper, we have argued that depth and context can improve frontal face detection, in turn improving the ability of robots to interact with humans, and supported this claim with encouraging preliminary experimental results. As future work, we will attempt to apply the same concepts to the much more difficult problem of detecting faces in profile, further expanding the population with which a robot can interact.