scispace - formally typeset
Search or ask a question

Showing papers on "Object-class detection published in 2013"


Proceedings ArticleDOI
02 Dec 2013
TL;DR: This paper lifts two state-of-the-art 2D object representations to 3D, on the level of both local feature appearance and location, and shows their efficacy for estimating 3D geometry from images via ultra-wide baseline matching and 3D reconstruction.
Abstract: While 3D object representations are being revived in the context of multi-view object class detection and scene understanding, they have not yet attained wide-spread use in fine-grained categorization. State-of-the-art approaches achieve remarkable performance when training data is plentiful, but they are typically tied to flat, 2D representations that model objects as a collection of unconnected views, limiting their ability to generalize across viewpoints. In this paper, we therefore lift two state-of-the-art 2D object representations to 3D, on the level of both local feature appearance and location. In extensive experiments on existing and newly proposed datasets, we show our 3D object representations outperform their state-of-the-art 2D counterparts for fine-grained categorization and demonstrate their efficacy for estimating 3D geometry from images via ultra-wide baseline matching and 3D reconstruction.

2,662 citations


Proceedings ArticleDOI
01 Aug 2013
TL;DR: This work introduces a real-world benchmark data set for traffic sign detection together with carefully chosen evaluation metrics, baseline results, and a web-interface for comparing approaches, and presents the best-performing algorithms of the IJCNN competition.
Abstract: Real-time detection of traffic signs, the task of pinpointing a traffic sign's location in natural images, is a challenging computer vision task of high industrial relevance. Various algorithms have been proposed, and advanced driver assistance systems supporting detection and recognition of traffic signs have reached the market. Despite the many competing approaches, there is no clear consensus on what the state-of-the-art in this field is. This can be accounted to the lack of comprehensive, unbiased comparisons of those methods. We aim at closing this gap by the “German Traffic Sign Detection Benchmark” presented as a competition at IJCNN 2013 (International Joint Conference on Neural Networks). We introduce a real-world benchmark data set for traffic sign detection together with carefully chosen evaluation metrics, baseline results, and a web-interface for comparing approaches. In our evaluation, we separate sign detection from classification, but still measure the performance on relevant categories of signs to allow for benchmarking specialized solutions. The considered baseline algorithms represent some of the most popular detection approaches such as the Viola-Jones detector based on Haar features and a linear classifier relying on HOG descriptors. Further, a recently proposed problem-specific algorithm exploiting shape and color in a model-based Houghlike voting scheme is evaluated. Finally, we present the best-performing algorithms of the IJCNN competition.

717 citations


Proceedings ArticleDOI
01 Dec 2013
TL;DR: A novel method to decompose an image into large scale perceptually homogeneous elements for efficient salient region detection, using a soft image abstraction representation, which outperforms 18 alternate methods and is computationally more efficient.
Abstract: Detecting visually salient regions in images is one of the fundamental problems in computer vision. We propose a novel method to decompose an image into large scale perceptually homogeneous elements for efficient salient region detection, using a soft image abstraction representation. By considering both appearance similarity and spatial distribution of image pixels, the proposed representation abstracts out unnecessary image details, allowing the assignment of comparable saliency values across similar regions, and producing perceptually accurate salient region detection. We evaluate our salient region detection approach on the largest publicly available dataset with pixel accurate annotations. The experimental results show that the proposed method outperforms 18 alternate methods, reducing the mean absolute error by 25.2% compared to the previous best result, while being computationally more efficient.

566 citations


Proceedings ArticleDOI
23 Jun 2013
TL;DR: This work proposes a novel approach to both learning and detecting local contour-based representations for mid-level features called sketch tokens, which achieve large improvements in detection accuracy for the bottom-up tasks of pedestrian and object detection as measured on INRIA and PASCAL, respectively.
Abstract: We propose a novel approach to both learning and detecting local contour-based representations for mid-level features. Our features, called sketch tokens, are learned using supervised mid-level information in the form of hand drawn contours in images. Patches of human generated contours are clustered to form sketch token classes and a random forest classifier is used for efficient detection in novel images. We demonstrate our approach on both top-down and bottom-up tasks. We show state-of-the-art results on the top-down task of contour detection while being over 200x faster than competing methods. We also achieve large improvements in detection accuracy for the bottom-up tasks of pedestrian and object detection as measured on INRIA and PASCAL, respectively. These gains are due to the complementary information provided by sketch tokens to low-level features such as gradient histograms.

436 citations


Proceedings ArticleDOI
23 Jun 2013
TL;DR: Locality-sensitive hashing as discussed by the authors replaces the dot-product kernel operator in the convolution with a fixed number of hash-table probes that effectively sample all the filter responses in time independent of the size of the filter bank.
Abstract: Many object detection systems are constrained by the time required to convolve a target image with a bank of filters that code for different aspects of an object's appearance, such as the presence of component parts. We exploit locality-sensitive hashing to replace the dot-product kernel operator in the convolution with a fixed number of hash-table probes that effectively sample all of the filter responses in time independent of the size of the filter bank. To show the effectiveness of the technique, we apply it to evaluate 100,000 deformable-part models requiring over a million (part) filters on multiple scales of a target image in less than 20 seconds using a single multi-core processor with 20GB of RAM. This represents a speed-up of approximately 20,000 times - four orders of magnitude - when compared with performing the convolutions explicitly on the same hardware. While mean average precision over the full set of 100,000 object classes is around 0.16 due in large part to the challenges in gathering training data and collecting ground truth for so many classes, we achieve a mAP of at least 0.20 on a third of the classes and 0.30 or better on about 20% of the classes.

371 citations


Proceedings ArticleDOI
23 Jun 2013
TL;DR: The extracted primary object regions are then used to build object models for optimized video segmentation and outperforms both unsupervised and supervised state-of-the-art methods.
Abstract: In this paper, we propose a novel approach to extract primary object segments in videos in the `object proposal' domain. The extracted primary object regions are then used to build object models for optimized video segmentation. The proposed approach has several contributions: First, a novel layered Directed Acyclic Graph (DAG) based framework is presented for detection and segmentation of the primary object in video. We exploit the fact that, in general, objects are spatially cohesive and characterized by locally smooth motion trajectories, to extract the primary object from the set of all available proposals based on motion, appearance and predicted-shape similarity across frames. Second, the DAG is initialized with an enhanced object proposal set where motion based proposal predictions (from adjacent frames) are used to expand the set of object proposals for a particular frame. Last, the paper presents a motion scoring function for selection of object proposals that emphasizes high optical flow gradients at proposal boundaries to discriminate between moving objects and the background. The proposed approach is evaluated using several challenging benchmark videos and it outperforms both unsupervised and supervised state-of-the-art methods.

354 citations


Proceedings ArticleDOI
01 Dec 2013
TL;DR: A novel and very efficient method for generic object detection based on a randomized version of Prim's algorithm, using the connectivity graph of an image's super pixels, with weights modelling the probability that neighbouring super pixels belong to the same object.
Abstract: Generic object detection is the challenging task of proposing windows that localize all the objects in an image, regardless of their classes. Such detectors have recently been shown to benefit many applications such as speeding-up class-specific object detection, weakly supervised learning of object detectors and object discovery. In this paper, we introduce a novel and very efficient method for generic object detection based on a randomized version of Prim's algorithm. Using the connectivity graph of an image's super pixels, with weights modelling the probability that neighbouring super pixels belong to the same object, the algorithm generates random partial spanning trees with large expected sum of edge weights. Object localizations are proposed as bounding-boxes of those partial trees. Our method has several benefits compared to the state-of-the-art. Thanks to the efficiency of Prim's algorithm, it samples proposals very quickly: 1000 proposals are obtained in about 0.7s. With proposals bound to super pixel boundaries yet diversified by randomization, it yields very high detection rates and windows that tightly fit objects. In extensive experiments on the challenging PASCAL VOC 2007 and 2012 and SUN2012 benchmark datasets, we show that our method improves over state-of-the-art competitors for a wide range of evaluation scenarios.

340 citations


Proceedings ArticleDOI
01 Dec 2013
TL;DR: This paper proposes a new learning based face representation: the face identity-preserving (FIP) features, a deep network that combines the feature extraction layers and the reconstruction layer that significantly outperforms the state-of-the-art face recognition methods.
Abstract: Face recognition with large pose and illumination variations is a challenging problem in computer vision. This paper addresses this challenge by proposing a new learning based face representation: the face identity-preserving (FIP) features. Unlike conventional face descriptors, the FIP features can significantly reduce intra-identity variances, while maintaining discriminative ness between identities. Moreover, the FIP features extracted from an image under any pose and illumination can be used to reconstruct its face image in the canonical view. This property makes it possible to improve the performance of traditional descriptors, such as LBP [2] and Gabor [31], which can be extracted from our reconstructed images in the canonical view to eliminate variations. In order to learn the FIP features, we carefully design a deep network that combines the feature extraction layers and the reconstruction layer. The former encodes a face image into the FIP features, while the latter transforms them to an image in the canonical view. Extensive experiments on the large MultiPIE face database [7] demonstrate that it significantly outperforms the state-of-the-art face recognition methods.

320 citations


Proceedings ArticleDOI
01 Jan 2013
TL;DR: It is shown that, without any application specific modification, existing methods for pedestrian detection, and for digit and face classification; can reach performances in the range of 95% ~ 99% of the perfect solution.
Abstract: Traffic sign recognition has been a recurring application domain for visual objects detection. The public datasets have only recently reached large enough size and variety to enable proper empirical studies. We revisit the topic by showing how modern methods perform on two large detection and classification datasets (thousand of images, tens of categories) captured in Belgium and Germany. We show that, without any application specific modification, existing methods for pedestrian detection, and for digit and face classification; can reach performances in the range of 95% ~ 99% of the perfect solution. We show detailed experiments and discuss the trade-off of different options. Our top performing methods use modern variants of HOG features for detection, and sparse representations for classification.

296 citations


Proceedings ArticleDOI
23 Jun 2013
TL;DR: This work presents a fully labeled indoor/outdoor ego-centric hand detection benchmark dataset containing over 200 million labeled pixels, which contains hand images taken under various illumination conditions and highlights the effectiveness of sparse features and the importance of modeling global illumination.
Abstract: We address the task of pixel-level hand detection in the context of ego-centric cameras. Extracting hand regions in ego-centric videos is a critical step for understanding hand-object manipulation and analyzing hand-eye coordination. However, in contrast to traditional applications of hand detection, such as gesture interfaces or sign-language recognition, ego-centric videos present new challenges such as rapid changes in illuminations, significant camera motion and complex hand-object manipulations. To quantify the challenges and performance in this new domain, we present a fully labeled indoor/outdoor ego-centric hand detection benchmark dataset containing over 200 million labeled pixels, which contains hand images taken under various illumination conditions. Using both our dataset and a publicly available ego-centric indoors dataset, we give extensive analysis of detection performance using a wide range of local appearance features. Our analysis highlights the effectiveness of sparse features and the importance of modeling global illumination. We propose a modeling strategy based on our findings and show that our model outperforms several baseline approaches.

262 citations


Proceedings ArticleDOI
04 Jun 2013
TL;DR: A component-based face coding approach for liveness detection that makes good use of micro differences between genuine faces and fake faces is proposed and can achieve the best liveness Detection performance in three databases.
Abstract: Spoofing attacks mainly include printing artifacts, electronic screens and ultra-realistic face masks or models. In this paper, we propose a component-based face coding approach for liveness detection. The proposed method consists of four steps: (1) locating the components of face; (2) coding the low-level features respectively for all the components; (3) deriving the high-level face representation by pooling the codes with weights derived from Fisher criterion; (4) concatenating the histograms from all components into a classifier for identification. The proposed framework makes good use of micro differences between genuine faces and fake faces. Meanwhile, the inherent appearance differences among different components are retained. Extensive experiments on three published standard databases demonstrate that the method can achieve the best liveness detection performance in three databases.

Proceedings ArticleDOI
Xi Li1, Yao Li1, Chunhua Shen1, Anthony Dick1, Anton van den Hengel1 
01 Dec 2013
TL;DR: This work model an image as a hyper graph that utilizes a set of hyper edges to capture the contextual properties of image pixels or regions to solve the problem of salient object detection.
Abstract: Salient object detection aims to locate objects that capture human attention within images. Previous approaches often pose this as a problem of image contrast analysis. In this work, we model an image as a hyper graph that utilizes a set of hyper edges to capture the contextual properties of image pixels or regions. As a result, the problem of salient object detection becomes one of finding salient vertices and hyper edges in the hyper graph. The main advantage of hyper graph modeling is that it takes into account each pixel's (or region's) affinity with its neighborhood as well as its separation from image background. Furthermore, we propose an alternative approach based on center-versus-surround contextual contrast analysis, which performs salient object detection by optimizing a cost-sensitive support vector machine (SVM) objective function. Experimental results on four challenging datasets demonstrate the effectiveness of the proposed approaches against the state-of-the-art approaches to salient object detection.

Proceedings ArticleDOI
25 Jun 2013
TL;DR: The Point-and-Shoot Face Recognition Challenge (PaSC) is introduced, featuring 9,376 still images of 293 people balanced with respect to distance to the camera, alternative sensors, frontal versus not-frontal views, and varying location.
Abstract: Inexpensive “point-and-shoot” camera technology has combined with social network technology to give the general population a motivation to use face recognition technology. Users expect a lot; they want to snap pictures, shoot videos, upload, and have their friends, family and acquaintances more-or-less automatically recognized. Despite the apparent simplicity of the problem, face recognition in this context is hard. Roughly speaking, failure rates in the 4 to 8 out of 10 range are common. In contrast, error rates drop to roughly 1 in 1,000 for well controlled imagery. To spur advancement in face and person recognition this paper introduces the Point-and-Shoot Face Recognition Challenge (PaSC). The challenge includes 9,376 still images of 293 people balanced with respect to distance to the camera, alternative sensors, frontal versus not-frontal views, and varying location. There are also 2,802 videos for 265 people: a subset of the 293. Verification results are presented for public baseline algorithms and a commercial algorithm for three cases: comparing still images to still images, videos to videos, and still images to videos.

Proceedings ArticleDOI
01 Dec 2013
TL;DR: This paper starts from the template-based approach based on the LINE2D/LINEMOD representation, yet extends it in two ways to learn the templates in a discriminative fashion, and proposes a scheme based on cascades that speeds up detection.
Abstract: In this paper we propose a new method for detecting multiple specific 3D objects in real time. We start from the template-based approach based on the LINE2D/LINEMOD representation introduced recently by Hinterstoisser et al., yet extend it in two ways. First, we propose to learn the templates in a discriminative fashion. We show that this can be done online during the collection of the example images, in just a few milliseconds, and has a big impact on the accuracy of the detector. Second, we propose a scheme based on cascades that speeds up detection. Since detection of an object is fast, new objects can be added with very low cost, making our approach scale well. In our experiments, we easily handle 10-30 3D objects at frame rates above 10fps using a single CPU core. We outperform the state-of-the-art both in terms of speed as well as in terms of accuracy, as validated on 3 different datasets. This holds both when using monocular color images (with LINE2D) and when using RGBD images (with LINEMOD). Moreover, we propose a challenging new dataset made of 12 objects, for future competing methods on monocular color images.

Journal ArticleDOI
TL;DR: A comprehensive review with comparisons on available techniques for detecting human beings in surveillance videos is presented and the characteristics of few benchmark datasets as well as the future research directions on human detection have also been discussed.
Abstract: Detecting human beings accurately in a visual surveillance system is crucial for diverse application areas including abnormal event detection, human gait characterization, congestion analysis, person identification, gender classification and fall detection for elderly people. The first step of the detection process is to detect an object which is in motion. Object detection could be performed using background subtraction, optical flow and spatio-temporal filtering techniques. Once detected, a moving object could be classified as a human being using shape-based, texture-based or motion-based features. A comprehensive review with comparisons on available techniques for detecting human beings in surveillance videos is presented in this paper. The characteristics of few benchmark datasets as well as the future research directions on human detection have also been discussed.

Proceedings ArticleDOI
23 Jun 2013
TL;DR: This work presents a novel and robust exemplar-based face detector that integrates image retrieval and discriminative learning, and can detect faces under challenging conditions without explicitly modeling their variations.
Abstract: Detecting faces in uncontrolled environments continues to be a challenge to traditional face detection methods due to the large variation in facial appearances, as well as occlusion and clutter. In order to overcome these challenges, we present a novel and robust exemplar-based face detector that integrates image retrieval and discriminative learning. A large database of faces with bounding rectangles and facial landmark locations is collected, and simple discriminative classifiers are learned from each of them. A voting-based method is then proposed to let these classifiers cast votes on the test image through an efficient image retrieval technique. As a result, faces can be very efficiently detected by selecting the modes from the voting maps, without resorting to exhaustive sliding window-style scanning. Moreover, due to the exemplar-based framework, our approach can detect faces under challenging conditions without explicitly modeling their variations. Evaluation on two public benchmark datasets shows that our new face detection approach is accurate and efficient, and achieves the state-of-the-art performance. We further propose to use image retrieval for face validation (in order to remove false positives) and for face alignment/landmark localization. The same methodology can also be easily generalized to other face-related tasks, such as attribute recognition, as well as general object detection.

Proceedings ArticleDOI
23 Jun 2013
TL;DR: A novel deformable part-based model which exploits region-based segmentation algorithms that compute candidate object regions by bottom-up clustering followed by ranking of those regions that outperform the previous state-of-the-art on VOC 2010 test by 4%.
Abstract: In this paper we are interested in how semantic segmentation can help object detection. Towards this goal, we propose a novel deformable part-based model which exploits region-based segmentation algorithms that compute candidate object regions by bottom-up clustering followed by ranking of those regions. Our approach allows every detection hypothesis to select a segment (including void), and scores each box in the image using both the traditional HOG filters as well as a set of novel segmentation features. Thus our model ``blends'' between the detector and segmentation models. Since our features can be computed very efficiently given the segments, we maintain the same complexity as the original DPM. We demonstrate the effectiveness of our approach in PASCAL VOC 2010, and show that when employing only a root filter our approach outperforms Dalal & Triggs detector on all classes, achieving 13% higher average AP. When employing the parts, we outperform the original DPM in $19$ out of $20$ classes, achieving an improvement of 8% AP. Furthermore, we outperform the previous state-of-the-art on VOC 2010 test by 4%.

Proceedings ArticleDOI
23 Jun 2013
TL;DR: This paper evaluates and compares models that range from standard object class detectors to hierarchical, part-based representations of occluder/occludee pairs, and derives insights that can aid further developments in tackling the occlusion challenge.
Abstract: Despite the success of recent object class recognition systems, the long-standing problem of partial occlusion remains a major challenge, and a principled solution is yet to be found. In this paper we leave the beaten path of methods that treat occlusion as just another source of noise - instead, we include the occluder itself into the modelling, by mining distinctive, reoccurring occlusion patterns from annotated training data. These patterns are then used as training data for dedicated detectors of varying sophistication. In particular, we evaluate and compare models that range from standard object class detectors to hierarchical, part-based representations of occluder/occludee pairs. In an extensive evaluation we derive insights that can aid further developments in tackling the occlusion challenge.

Proceedings ArticleDOI
01 Dec 2013
TL;DR: A method to produce tentative object segmentation masks to suppress background clutter in the features to improve object detection significantly and exploit contextual features in the form of a full-image FV descriptor, and an inter-category rescoring mechanism.
Abstract: We present an object detection system based on the Fisher vector (FV) image representation computed over SIFT and color descriptors. For computational and storage efficiency, we use a recent segmentation-based method to generate class-independent object detection hypotheses, in combination with data compression techniques. Our main contribution is a method to produce tentative object segmentation masks to suppress background clutter in the features. Re-weighting the local image features based on these masks is shown to improve object detection significantly. We also exploit contextual features in the form of a full-image FV descriptor, and an inter-category rescoring mechanism. Our experiments on the VOC 2007 and 2010 datasets show that our detector improves over the current state-of-the-art detection results.

Proceedings ArticleDOI
23 Jun 2013
TL;DR: An efficient clustering framework specially for face clustering in videos, considering that faces in adjacent frames of the same face track are very similar, is introduced and is applicable to other clustering algorithms to significantly reduce the computational cost.
Abstract: In this paper, we focus on face clustering in videos. Given the detected faces from real-world videos, we partition all faces into K disjoint clusters. Different from clustering on a collection of facial images, the faces from videos are organized as face tracks and the frame index of each face is also provided. As a result, many pair wise constraints between faces can be easily obtained from the temporal and spatial knowledge of the face tracks. These constraints can be effectively incorporated into a generative clustering model based on the Hidden Markov Random Fields (HMRFs). Within the HMRF model, the pair wise constraints are augmented by label-level and constraint-level local smoothness to guide the clustering process. The parameters for both the unary and the pair wise potential functions are learned by the simulated field algorithm, and the weights of constraints can be easily adjusted. We further introduce an efficient clustering framework specially for face clustering in videos, considering that faces in adjacent frames of the same face track are very similar. The framework is applicable to other clustering algorithms to significantly reduce the computational cost. Experiments on two face data sets from real-world videos demonstrate the significantly improved performance of our algorithm over state-of-the art algorithms.

Proceedings ArticleDOI
04 Jun 2013
TL;DR: A method to automatically detect the presence of makeup in face images by extracting a feature vector that captures the shape, texture and color characteristics of the input face, and employs a classifier to determine the presence or absence of makeup.
Abstract: Facial makeup has the ability to alter the appearance of a person. Such an alteration can degrade the accuracy of automated face recognition systems, as well as that of meth-ods estimating age and beauty from faces. In this work, we design a method to automatically detect the presence of makeup in face images. The proposed algorithm extracts a feature vector that captures the shape, texture and color characteristics of the input face, and employs a classifier to determine the presence or absence of makeup. Besides extracting features from the entire face, the algorithm also considers portions of the face pertaining to the left eye, right eye, and mouth. Experiments on two datasets consisting of 151 subjects (600 images) and 125 subjects (154 images), respectively, suggest that makeup detection rates of up to 93.5% (at a false positive rate of 1%) can be obtained using the proposed approach. Further, an adaptive pre-processing scheme that exploits knowledge of the presence or absence of facial makeup to improve the matching accuracy of a face matcher is presented.

Proceedings ArticleDOI
04 Jun 2013
TL;DR: A novel face liveness detection approach to counter spoofing attacks by recovering sparse 3D facial structure by detecting facial landmarks and select key frames and training a Support Vector Machine to distinguish the genuine and fake faces.
Abstract: Face recognition, which is security-critical, has been widely deployed in our daily life. However, traditional face recognition technologies in practice can be spoofed easily, for example, by using a simple printed photo. In this paper, we propose a novel face liveness detection approach to counter spoofing attacks by recovering sparse 3D facial structure. Given a face video or several images captured from more than two viewpoints, we detect facial landmarks and select key frames. Then, the sparse 3D facial structure can be recoveredfrom the selected key frames. Finally, an Support Vector Machine (SVM) classifier is trained to distinguish the genuine and fake faces. Compared with the previous works, the proposed method has the following advantages. First, it gives perfect liveness detection results, which meets the security requirement of face biometric systems. Second, it is independent on cameras or systems, which works well on different devices. Experiments with genuine faces versus planar photo faces and warped photo faces demonstrate the superiority of the proposed method over the state-of-the-art liveness detection methods.

Proceedings ArticleDOI
23 Jun 2013
TL;DR: This paper presents an end-to-end video face recognition system, addressing the difficult problem of identifying a video face track using a large dictionary of still face images of a few hundred people, while rejecting unknown individuals.
Abstract: This paper presents an end-to-end video face recognition system, addressing the difficult problem of identifying a video face track using a large dictionary of still face images of a few hundred people, while rejecting unknown individuals. A straightforward application of the popular l1-minimization for face recognition on a frame-by-frame basis is prohibitively expensive, so we propose a novel algorithm Mean Sequence SRC (MSSRC) that performs video face recognition using a joint optimization leveraging all of the available video data and the knowledge that the face track frames belong to the same individual. By adding a strict temporal constraint to the l1-minimization that forces individual frames in a face track to all reconstruct a single identity, we show the optimization reduces to a single minimization over the mean of the face track. We also introduce a new Movie Trailer Face Dataset collected from 101 movie trailers on YouTube. Finally, we show that our method matches or outperforms the state-of-the-art on three existing datasets (YouTube Celebrities, YouTube Faces, and Buffy) and our unconstrained Movie Trailer Face Dataset. More importantly, our method excels at rejecting unknown identities by at least 8% in average precision.

Journal Article
TL;DR: This research work empirically evaluate face recognition which considers both shape and texture information to represent face images based on Local Binary Patterns for person-independent face recognition.
Abstract: The face of a human being conveys a lot of information about identity and emotional state of the person. Face recognition is an interesting and challenging problem, and impacts important applications in many areas such as identification for law enforcement, authentication for banking and security system access, and personal identification among others. In our research work mainly consists of three parts, namely face representation, feature extraction and classification. Face representation represents how to model a face and determines the successive algorithms of detection and recognition. The most useful and unique features of the face image are extracted in the feature extraction phase. In the classification the face image is compared with the images from the database. In our research work, we empirically evaluate face recognition which considers both shape and texture information to represent face images based on Local Binary Patterns for person-independent face recognition. The face area is first divided into small regions from which Local Binary Patterns (LBP), histograms are extracted and concatenated into a single feature vector. This feature vector forms an efficient representation of the face and is used to measure similarities between images.

Journal ArticleDOI
TL;DR: A comprehensive survey of the recent technical achievements in object class detection research, covering different aspects of the research, including core techniques: appearance modeling, localization strategies, and supervised classification methods.
Abstract: Object class detection, also known as category-level object detection, has become one of the most focused areas in computer vision in the new century. This article attempts to provide a comprehensive survey of the recent technical achievements in this area of research. More than 270 major publications are included in this survey covering different aspects of the research, which include: (i) problem description: key tasks and challenges; (ii) core techniques: appearance modeling, localization strategies, and supervised classification methods; (iii) evaluation issues: approaches, metrics, standard datasets, and state-of-the-art results; and (iv) new development: particularly new approaches and applications motivated by the recent boom of social images. Finally, in retrospect of what has been achieved so far, the survey also discusses what the future may hold for object class detection research.

Journal ArticleDOI
TL;DR: A new descriptor, named flip-invariant SIFT (or F-SIFT), that preserves the original properties of SIFT while being tolerant to flips is proposed and demonstrated, which leads to a more than 50% savings in computational cost.
Abstract: Scale-invariant feature transform (SIFT) feature has been widely accepted as an effective local keypoint descriptor for its invariance to rotation, scale, and lighting changes in images. However, it is also well known that SIFT, which is derived from directionally sensitive gradient fields, is not flip invariant. In real-world applications, flip or flip-like transformations are commonly observed in images due to artificial flipping, opposite capturing viewpoint, or symmetric patterns of objects. This paper proposes a new descriptor, named flip-invariant SIFT (or F-SIFT), that preserves the original properties of SIFT while being tolerant to flips. F-SIFT starts by estimating the dominant curl of a local patch and then geometrically normalizes the patch by flipping before the computation of SIFT. We demonstrate the power of F-SIFT on three tasks: large-scale video copy detection, object recognition, and detection. In copy detection, a framework, which smartly indices the flip properties of F-SIFT for rapid filtering and weak geometric checking, is proposed. F-SIFT not only significantly improves the detection accuracy of SIFT, but also leads to a more than 50% savings in computational cost. In object recognition, we demonstrate the superiority of F-SIFT in dealing with flip transformation by comparing it to seven other descriptors. In object detection, we further show the ability of F-SIFT in describing symmetric objects. Consistent improvement across different kinds of keypoint detectors is observed for F-SIFT over the original SIFT.

Proceedings ArticleDOI
01 Dec 2013
TL;DR: An unsupervised detector adaptation algorithm to adapt any offline trained face detector to a specific collection of images, and hence achieve better accuracy, is proposed.
Abstract: We propose an unsupervised detector adaptation algorithm to adapt any offline trained face detector to a specific collection of images, and hence achieve better accuracy. The core of our detector adaptation algorithm is a probabilistic elastic part (PEP) model, which is offline trained with a set of face examples. It produces a statistically aligned part based face representation, namely the PEP representation. To adapt a general face detector to a collection of images, we compute the PEP representations of the candidate detections from the general face detector, and then train a discriminative classifier with the top positives and negatives. Then we re-rank all the candidate detections with this classifier. This way, a face detector tailored to the statistics of the specific image collection is adapted from the original detector. We present extensive results on three datasets with two state-of-the-art face detectors. The significant improvement of detection accuracy over these state of-the-art face detectors strongly demonstrates the efficacy of the proposed face detector adaptation algorithm.

Proceedings ArticleDOI
01 Dec 2013
TL;DR: A large-scale study on the Image Net Large Scale Visual Recognition Challenge data, inspired by the recent work of Hoiem et al, shows that this dataset provides many of the same detection challenges as the PASCAL VOC.
Abstract: The growth of detection datasets and the multiple directions of object detection research provide both an unprecedented need and a great opportunity for a thorough evaluation of the current state of the field of categorical object detection. In this paper we strive to answer two key questions. First, where are we currently as a field: what have we done right, what still needs to be improved? Second, where should we be going in designing the next generation of object detectors? Inspired by the recent work of Hoiem et al. on the standard PASCAL VOC detection dataset, we perform a large-scale study on the Image Net Large Scale Visual Recognition Challenge (ILSVRC) data. First, we quantitatively demonstrate that this dataset provides many of the same detection challenges as the PASCAL VOC. Due to its scale of 1000 object categories, ILSVRC also provides an excellent test bed for understanding the performance of detectors as a function of several key properties of the object classes. We conduct a series of analyses looking at how different detection methods perform on a number of image-level and object-class-level properties such as texture, color, deformation, and clutter. We learn important lessons of the current object detection methods and propose a number of insights for designing the next generation object detectors.

Proceedings ArticleDOI
TL;DR: A novel liveness detection method, based on the 3D structure of the face, that allows a biometric system to distinguish a real face from a photo, increasing the overall performance of the system and reducing its vulnerability.
Abstract: In recent years face recognition systems have been applied in various useful applications, such as surveillance, access control, criminal investigations, law enforcement, and others. However face biometric systems can be highly vulnerable to spoofing attacks where an impostor tries to bypass the face recognition system using a photo or video sequence. In this paper a novel liveness detection method, based on the 3D structure of the face, is proposed. Processing the 3D curvature of the acquired data, the proposed approach allows a biometric system to distinguish a real face from a photo, increasing the overall performance of the system and reducing its vulnerability. In order to test the real capability of the methodology a 3D face database has been collected simulating spoofing attacks, therefore using photographs instead of real faces. The experimental results show the effectiveness of the proposed approach.

Proceedings ArticleDOI
01 Dec 2013
TL;DR: This work proposes to tackle object detection in images withstanding significant clutter and occlusion by a compact and distinctive representation of groups of neighboring line segments aggregated over limited spatial supports and invariant to rotation, translation and scale changes.
Abstract: Object detection in images withstanding significant clutter and occlusion is still a challenging task whenever the object surface is characterized by poor informative content. We propose to tackle this problem by a compact and distinctive representation of groups of neighboring line segments aggregated over limited spatial supports and invariant to rotation, translation and scale changes. Peculiarly, our proposal allows for leveraging on the inherent strengths of descriptor-based approaches, i.e. robustness to occlusion and clutter and scalability with respect to the size of the model library, also when dealing with scarcely textured objects.