Showing papers on "Object-class detection published in 2013"

PDF

Open Access

Proceedings Article•DOI•

3D Object Representations for Fine-Grained Categorization

[...]

Jonathan Krause¹, Michael Stark², Jia Deng¹, Li Fei-Fei¹•Institutions (2)

Stanford University¹, Max Planck Society²

02 Dec 2013

TL;DR: This paper lifts two state-of-the-art 2D object representations to 3D, on the level of both local feature appearance and location, and shows their efficacy for estimating 3D geometry from images via ultra-wide baseline matching and 3D reconstruction.

...read moreread less

Abstract: While 3D object representations are being revived in the context of multi-view object class detection and scene understanding, they have not yet attained wide-spread use in fine-grained categorization. State-of-the-art approaches achieve remarkable performance when training data is plentiful, but they are typically tied to flat, 2D representations that model objects as a collection of unconnected views, limiting their ability to generalize across viewpoints. In this paper, we therefore lift two state-of-the-art 2D object representations to 3D, on the level of both local feature appearance and location. In extensive experiments on existing and newly proposed datasets, we show our 3D object representations outperform their state-of-the-art 2D counterparts for fine-grained categorization and demonstrate their efficacy for estimating 3D geometry from images via ultra-wide baseline matching and 3D reconstruction.

...read moreread less

2,662 citations

Proceedings Article•DOI•

Detection of traffic signs in real-world images: The German traffic sign detection benchmark

[...]

Sebastian Houben¹, Johannes Stallkamp¹, Jan Salmen¹, Marc Schlipsing¹, Christian Igel² - Show less +1 more•Institutions (2)

Ruhr University Bochum¹, University of Copenhagen²

01 Aug 2013

TL;DR: This work introduces a real-world benchmark data set for traffic sign detection together with carefully chosen evaluation metrics, baseline results, and a web-interface for comparing approaches, and presents the best-performing algorithms of the IJCNN competition.

...read moreread less

Abstract: Real-time detection of traffic signs, the task of pinpointing a traffic sign's location in natural images, is a challenging computer vision task of high industrial relevance. Various algorithms have been proposed, and advanced driver assistance systems supporting detection and recognition of traffic signs have reached the market. Despite the many competing approaches, there is no clear consensus on what the state-of-the-art in this field is. This can be accounted to the lack of comprehensive, unbiased comparisons of those methods. We aim at closing this gap by the “German Traffic Sign Detection Benchmark” presented as a competition at IJCNN 2013 (International Joint Conference on Neural Networks). We introduce a real-world benchmark data set for traffic sign detection together with carefully chosen evaluation metrics, baseline results, and a web-interface for comparing approaches. In our evaluation, we separate sign detection from classification, but still measure the performance on relevant categories of signs to allow for benchmarking specialized solutions. The considered baseline algorithms represent some of the most popular detection approaches such as the Viola-Jones detector based on Haar features and a linear classifier relying on HOG descriptors. Further, a recently proposed problem-specific algorithm exploiting shape and color in a model-based Houghlike voting scheme is evaluated. Finally, we present the best-performing algorithms of the IJCNN competition.

...read moreread less

717 citations

Proceedings Article•DOI•

Efficient Salient Region Detection with Soft Image Abstraction

[...]

Ming-Ming Cheng¹, Jonathan Warrell², Wen-Yan Lin³, Shuai Zheng⁴, Vibhav Vineet³, Nigel T. Crook³ - Show less +2 more•Institutions (4)

Tsinghua University¹, Council of Scientific and Industrial Research², Oxford Brookes University³, University of Oxford⁴

01 Dec 2013

TL;DR: A novel method to decompose an image into large scale perceptually homogeneous elements for efficient salient region detection, using a soft image abstraction representation, which outperforms 18 alternate methods and is computationally more efficient.

...read moreread less

Abstract: Detecting visually salient regions in images is one of the fundamental problems in computer vision. We propose a novel method to decompose an image into large scale perceptually homogeneous elements for efficient salient region detection, using a soft image abstraction representation. By considering both appearance similarity and spatial distribution of image pixels, the proposed representation abstracts out unnecessary image details, allowing the assignment of comparable saliency values across similar regions, and producing perceptually accurate salient region detection. We evaluate our salient region detection approach on the largest publicly available dataset with pixel accurate annotations. The experimental results show that the proposed method outperforms 18 alternate methods, reducing the mean absolute error by 25.2% compared to the previous best result, while being computationally more efficient.

...read moreread less

566 citations

Proceedings Article•DOI•

Sketch Tokens: A Learned Mid-level Representation for Contour and Object Detection

[...]

Joseph J. Lim¹, C. Lawrence Zitnick¹, Piotr Dollár¹•Institutions (1)

Massachusetts Institute of Technology¹

23 Jun 2013

TL;DR: This work proposes a novel approach to both learning and detecting local contour-based representations for mid-level features called sketch tokens, which achieve large improvements in detection accuracy for the bottom-up tasks of pedestrian and object detection as measured on INRIA and PASCAL, respectively.

...read moreread less

Abstract: We propose a novel approach to both learning and detecting local contour-based representations for mid-level features. Our features, called sketch tokens, are learned using supervised mid-level information in the form of hand drawn contours in images. Patches of human generated contours are clustered to form sketch token classes and a random forest classifier is used for efficient detection in novel images. We demonstrate our approach on both top-down and bottom-up tasks. We show state-of-the-art results on the top-down task of contour detection while being over 200x faster than competing methods. We also achieve large improvements in detection accuracy for the bottom-up tasks of pedestrian and object detection as measured on INRIA and PASCAL, respectively. These gains are due to the complementary information provided by sketch tokens to low-level features such as gradient histograms.

...read moreread less

436 citations

Proceedings Article•DOI•

Fast, Accurate Detection of 100,000 Object Classes on a Single Machine

[...]

Thomas Dean¹, Mark A. Ruzon¹, Mark Segal¹, Jonathon Shlens¹, Sudheendra Vijayanarasimhan¹, Jay Yagnik¹ - Show less +2 more•Institutions (1)

Google¹

23 Jun 2013

TL;DR: Locality-sensitive hashing as discussed by the authors replaces the dot-product kernel operator in the convolution with a fixed number of hash-table probes that effectively sample all the filter responses in time independent of the size of the filter bank.

...read moreread less

Abstract: Many object detection systems are constrained by the time required to convolve a target image with a bank of filters that code for different aspects of an object's appearance, such as the presence of component parts. We exploit locality-sensitive hashing to replace the dot-product kernel operator in the convolution with a fixed number of hash-table probes that effectively sample all of the filter responses in time independent of the size of the filter bank. To show the effectiveness of the technique, we apply it to evaluate 100,000 deformable-part models requiring over a million (part) filters on multiple scales of a target image in less than 20 seconds using a single multi-core processor with 20GB of RAM. This represents a speed-up of approximately 20,000 times - four orders of magnitude - when compared with performing the convolutions explicitly on the same hardware. While mean average precision over the full set of 100,000 object classes is around 0.16 due in large part to the challenges in gathering training data and collecting ground truth for so many classes, we achieve a mAP of at least 0.20 on a third of the classes and 0.30 or better on about 20% of the classes.

...read moreread less

371 citations

Proceedings Article•DOI•

Video Object Segmentation through Spatially Accurate and Temporally Dense Extraction of Primary Object Regions

[...]

Dong Zhang¹, Omar Javed², Mubarak Shah¹•Institutions (2)

University of Central Florida¹, Princeton University²

23 Jun 2013

TL;DR: The extracted primary object regions are then used to build object models for optimized video segmentation and outperforms both unsupervised and supervised state-of-the-art methods.

...read moreread less

Abstract: In this paper, we propose a novel approach to extract primary object segments in videos in the `object proposal' domain. The extracted primary object regions are then used to build object models for optimized video segmentation. The proposed approach has several contributions: First, a novel layered Directed Acyclic Graph (DAG) based framework is presented for detection and segmentation of the primary object in video. We exploit the fact that, in general, objects are spatially cohesive and characterized by locally smooth motion trajectories, to extract the primary object from the set of all available proposals based on motion, appearance and predicted-shape similarity across frames. Second, the DAG is initialized with an enhanced object proposal set where motion based proposal predictions (from adjacent frames) are used to expand the set of object proposals for a particular frame. Last, the paper presents a motion scoring function for selection of object proposals that emphasizes high optical flow gradients at proposal boundaries to discriminate between moving objects and the background. The proposed approach is evaluated using several challenging benchmark videos and it outperforms both unsupervised and supervised state-of-the-art methods.

...read moreread less

354 citations

Proceedings Article•DOI•

Prime Object Proposals with Randomized Prim's Algorithm

[...]

Santiago Manen¹, Matthieu Guillaumin¹, Luc Van Gool¹•Institutions (1)

ETH Zurich¹

01 Dec 2013

TL;DR: A novel and very efficient method for generic object detection based on a randomized version of Prim's algorithm, using the connectivity graph of an image's super pixels, with weights modelling the probability that neighbouring super pixels belong to the same object.

...read moreread less

Abstract: Generic object detection is the challenging task of proposing windows that localize all the objects in an image, regardless of their classes. Such detectors have recently been shown to benefit many applications such as speeding-up class-specific object detection, weakly supervised learning of object detectors and object discovery. In this paper, we introduce a novel and very efficient method for generic object detection based on a randomized version of Prim's algorithm. Using the connectivity graph of an image's super pixels, with weights modelling the probability that neighbouring super pixels belong to the same object, the algorithm generates random partial spanning trees with large expected sum of edge weights. Object localizations are proposed as bounding-boxes of those partial trees. Our method has several benefits compared to the state-of-the-art. Thanks to the efficiency of Prim's algorithm, it samples proposals very quickly: 1000 proposals are obtained in about 0.7s. With proposals bound to super pixel boundaries yet diversified by randomization, it yields very high detection rates and windows that tightly fit objects. In extensive experiments on the challenging PASCAL VOC 2007 and 2012 and SUN2012 benchmark datasets, we show that our method improves over state-of-the-art competitors for a wide range of evaluation scenarios.

...read moreread less

340 citations

Proceedings Article•DOI•

Deep Learning Identity-Preserving Face Space

[...]

Zhenyao Zhu¹, Ping Luo¹, Xiaogang Wang¹, Xiaoou Tang¹•Institutions (1)

The Chinese University of Hong Kong¹

01 Dec 2013

TL;DR: This paper proposes a new learning based face representation: the face identity-preserving (FIP) features, a deep network that combines the feature extraction layers and the reconstruction layer that significantly outperforms the state-of-the-art face recognition methods.

...read moreread less

Abstract: Face recognition with large pose and illumination variations is a challenging problem in computer vision. This paper addresses this challenge by proposing a new learning based face representation: the face identity-preserving (FIP) features. Unlike conventional face descriptors, the FIP features can significantly reduce intra-identity variances, while maintaining discriminative ness between identities. Moreover, the FIP features extracted from an image under any pose and illumination can be used to reconstruct its face image in the canonical view. This property makes it possible to improve the performance of traditional descriptors, such as LBP [2] and Gabor [31], which can be extracted from our reconstructed images in the canonical view to eliminate variations. In order to learn the FIP features, we carefully design a deep network that combines the feature extraction layers and the reconstruction layer. The former encodes a face image into the FIP features, while the latter transforms them to an image in the canonical view. Extensive experiments on the large MultiPIE face database [7] demonstrate that it significantly outperforms the state-of-the-art face recognition methods.

...read moreread less

320 citations

Proceedings Article•DOI•

Traffic sign recognition — How far are we from the solution?

[...]

Markus Mathias¹, Radu Timofte¹, Rodrigo Benenson², Luc Van Gool¹•Institutions (2)

Katholieke Universiteit Leuven¹, Max Planck Society²

01 Jan 2013

TL;DR: It is shown that, without any application specific modification, existing methods for pedestrian detection, and for digit and face classification; can reach performances in the range of 95% ~ 99% of the perfect solution.

...read moreread less

Abstract: Traffic sign recognition has been a recurring application domain for visual objects detection. The public datasets have only recently reached large enough size and variety to enable proper empirical studies. We revisit the topic by showing how modern methods perform on two large detection and classification datasets (thousand of images, tens of categories) captured in Belgium and Germany. We show that, without any application specific modification, existing methods for pedestrian detection, and for digit and face classification; can reach performances in the range of 95% ~ 99% of the perfect solution. We show detailed experiments and discuss the trade-off of different options. Our top performing methods use modern variants of HOG features for detection, and sparse representations for classification.

...read moreread less

296 citations

Proceedings Article•DOI•

Pixel-Level Hand Detection in Ego-centric Videos

[...]

Cheng Li¹, Kris M. Kitani²•Institutions (2)

Tsinghua University¹, Carnegie Mellon University²

23 Jun 2013

TL;DR: This work presents a fully labeled indoor/outdoor ego-centric hand detection benchmark dataset containing over 200 million labeled pixels, which contains hand images taken under various illumination conditions and highlights the effectiveness of sparse features and the importance of modeling global illumination.

...read moreread less

Abstract: We address the task of pixel-level hand detection in the context of ego-centric cameras. Extracting hand regions in ego-centric videos is a critical step for understanding hand-object manipulation and analyzing hand-eye coordination. However, in contrast to traditional applications of hand detection, such as gesture interfaces or sign-language recognition, ego-centric videos present new challenges such as rapid changes in illuminations, significant camera motion and complex hand-object manipulations. To quantify the challenges and performance in this new domain, we present a fully labeled indoor/outdoor ego-centric hand detection benchmark dataset containing over 200 million labeled pixels, which contains hand images taken under various illumination conditions. Using both our dataset and a publicly available ego-centric indoors dataset, we give extensive analysis of detection performance using a wide range of local appearance features. Our analysis highlights the effectiveness of sparse features and the importance of modeling global illumination. We propose a modeling strategy based on our findings and show that our model outperforms several baseline approaches.

...read moreread less

262 citations

Proceedings Article•DOI•

Face liveness detection with component dependent descriptor

[...]

Jianwei Yang, Zhen Lei, Shengcai Liao, Stan Z. Li

04 Jun 2013

TL;DR: A component-based face coding approach for liveness detection that makes good use of micro differences between genuine faces and fake faces is proposed and can achieve the best liveness Detection performance in three databases.

...read moreread less

Abstract: Spoofing attacks mainly include printing artifacts, electronic screens and ultra-realistic face masks or models. In this paper, we propose a component-based face coding approach for liveness detection. The proposed method consists of four steps: (1) locating the components of face; (2) coding the low-level features respectively for all the components; (3) deriving the high-level face representation by pooling the codes with weights derived from Fisher criterion; (4) concatenating the histograms from all components into a classifier for identification. The proposed framework makes good use of micro differences between genuine faces and fake faces. Meanwhile, the inherent appearance differences among different components are retained. Extensive experiments on three published standard databases demonstrate that the method can achieve the best liveness detection performance in three databases.

...read moreread less

Proceedings Article•DOI•

Contextual Hypergraph Modeling for Salient Object Detection

[...]

Xi Li¹, Yao Li¹, Chunhua Shen¹, Anthony Dick¹, Anton van den Hengel¹ - Show less +1 more•Institutions (1)

University of Adelaide¹

01 Dec 2013

TL;DR: This work model an image as a hyper graph that utilizes a set of hyper edges to capture the contextual properties of image pixels or regions to solve the problem of salient object detection.

...read moreread less

Abstract: Salient object detection aims to locate objects that capture human attention within images. Previous approaches often pose this as a problem of image contrast analysis. In this work, we model an image as a hyper graph that utilizes a set of hyper edges to capture the contextual properties of image pixels or regions. As a result, the problem of salient object detection becomes one of finding salient vertices and hyper edges in the hyper graph. The main advantage of hyper graph modeling is that it takes into account each pixel's (or region's) affinity with its neighborhood as well as its separation from image background. Furthermore, we propose an alternative approach based on center-versus-surround contextual contrast analysis, which performs salient object detection by optimizing a cost-sensitive support vector machine (SVM) objective function. Experimental results on four challenging datasets demonstrate the effectiveness of the proposed approaches against the state-of-the-art approaches to salient object detection.

...read moreread less

Proceedings Article•DOI•

The challenge of face recognition from digital point-and-shoot cameras

[...]

J. Ross Beveridge¹, P. Jonathon Phillips², David S. Bolme³, Bruce A. Draper¹, Geof H. Givens¹, Yui Man Lui¹, Mohammad Nayeem Teli¹, Hao Zhang¹, W. Todd Scruggs⁴, Kevin W. Bowyer⁵, Patrick J. Flynn⁵, Su Cheng² - Show less +8 more•Institutions (5)

Colorado State University¹, National Institute of Standards and Technology², Oak Ridge National Laboratory³, Science Applications International Corporation⁴, University of Notre Dame⁵

25 Jun 2013

TL;DR: The Point-and-Shoot Face Recognition Challenge (PaSC) is introduced, featuring 9,376 still images of 293 people balanced with respect to distance to the camera, alternative sensors, frontal versus not-frontal views, and varying location.

...read moreread less

Abstract: Inexpensive “point-and-shoot” camera technology has combined with social network technology to give the general population a motivation to use face recognition technology. Users expect a lot; they want to snap pictures, shoot videos, upload, and have their friends, family and acquaintances more-or-less automatically recognized. Despite the apparent simplicity of the problem, face recognition in this context is hard. Roughly speaking, failure rates in the 4 to 8 out of 10 range are common. In contrast, error rates drop to roughly 1 in 1,000 for well controlled imagery. To spur advancement in face and person recognition this paper introduces the Point-and-Shoot Face Recognition Challenge (PaSC). The challenge includes 9,376 still images of 293 people balanced with respect to distance to the camera, alternative sensors, frontal versus not-frontal views, and varying location. There are also 2,802 videos for 265 people: a subset of the 293. Verification results are presented for public baseline algorithms and a commercial algorithm for three cases: comparing still images to still images, videos to videos, and still images to videos.

...read moreread less

Proceedings Article•DOI•

Discriminatively Trained Templates for 3D Object Detection: A Real Time Scalable Approach

[...]

Reyes Rios-Cabrera¹, Tinne Tuytelaars²•Institutions (2)

CINVESTAV¹, Katholieke Universiteit Leuven²

01 Dec 2013

TL;DR: This paper starts from the template-based approach based on the LINE2D/LINEMOD representation, yet extends it in two ways to learn the templates in a discriminative fashion, and proposes a scheme based on cascades that speeds up detection.

...read moreread less

Abstract: In this paper we propose a new method for detecting multiple specific 3D objects in real time. We start from the template-based approach based on the LINE2D/LINEMOD representation introduced recently by Hinterstoisser et al., yet extend it in two ways. First, we propose to learn the templates in a discriminative fashion. We show that this can be done online during the collection of the example images, in just a few milliseconds, and has a big impact on the accuracy of the detector. Second, we propose a scheme based on cascades that speeds up detection. Since detection of an object is fast, new objects can be added with very low cost, making our approach scale well. In our experiments, we easily handle 10-30 3D objects at frame rates above 10fps using a single CPU core. We outperform the state-of-the-art both in terms of speed as well as in terms of accuracy, as validated on 3 different datasets. This holds both when using monocular color images (with LINE2D) and when using RGBD images (with LINEMOD). Moreover, we propose a challenging new dataset made of 12 objects, for future competing methods on monocular color images.

...read moreread less

Journal Article•DOI•

Human detection in surveillance videos and its applications - a review

[...]

Manoranjan Paul¹, Shah M.E. Haque¹, Subrata Chakraborty¹•Institutions (1)

Charles Sturt University¹

22 Nov 2013-EURASIP Journal on Advances in Signal Processing

TL;DR: A comprehensive review with comparisons on available techniques for detecting human beings in surveillance videos is presented and the characteristics of few benchmark datasets as well as the future research directions on human detection have also been discussed.

...read moreread less

Abstract: Detecting human beings accurately in a visual surveillance system is crucial for diverse application areas including abnormal event detection, human gait characterization, congestion analysis, person identification, gender classification and fall detection for elderly people. The first step of the detection process is to detect an object which is in motion. Object detection could be performed using background subtraction, optical flow and spatio-temporal filtering techniques. Once detected, a moving object could be classified as a human being using shape-based, texture-based or motion-based features. A comprehensive review with comparisons on available techniques for detecting human beings in surveillance videos is presented in this paper. The characteristics of few benchmark datasets as well as the future research directions on human detection have also been discussed.

...read moreread less

Proceedings Article•DOI•

Detecting and Aligning Faces by Image Retrieval

[...]

Xiaohui Shen¹, Zhe Lin², Jonathan Brandt², Ying Wu¹•Institutions (2)

Northwestern University¹, Adobe Systems²

23 Jun 2013

TL;DR: This work presents a novel and robust exemplar-based face detector that integrates image retrieval and discriminative learning, and can detect faces under challenging conditions without explicitly modeling their variations.

...read moreread less

Abstract: Detecting faces in uncontrolled environments continues to be a challenge to traditional face detection methods due to the large variation in facial appearances, as well as occlusion and clutter. In order to overcome these challenges, we present a novel and robust exemplar-based face detector that integrates image retrieval and discriminative learning. A large database of faces with bounding rectangles and facial landmark locations is collected, and simple discriminative classifiers are learned from each of them. A voting-based method is then proposed to let these classifiers cast votes on the test image through an efficient image retrieval technique. As a result, faces can be very efficiently detected by selecting the modes from the voting maps, without resorting to exhaustive sliding window-style scanning. Moreover, due to the exemplar-based framework, our approach can detect faces under challenging conditions without explicitly modeling their variations. Evaluation on two public benchmark datasets shows that our new face detection approach is accurate and efficient, and achieves the state-of-the-art performance. We further propose to use image retrieval for face validation (in order to remove false positives) and for face alignment/landmark localization. The same methodology can also be easily generalized to other face-related tasks, such as attribute recognition, as well as general object detection.

...read moreread less

Proceedings Article•DOI•

Bottom-Up Segmentation for Top-Down Detection

[...]

Sanja Fidler¹, Roozbeh Mottaghi², Alan L. Yuille², Raquel Urtasun¹•Institutions (2)

Toyota Technological Institute at Chicago¹, University of California, Los Angeles²

23 Jun 2013

TL;DR: A novel deformable part-based model which exploits region-based segmentation algorithms that compute candidate object regions by bottom-up clustering followed by ranking of those regions that outperform the previous state-of-the-art on VOC 2010 test by 4%.

...read moreread less

Abstract: In this paper we are interested in how semantic segmentation can help object detection. Towards this goal, we propose a novel deformable part-based model which exploits region-based segmentation algorithms that compute candidate object regions by bottom-up clustering followed by ranking of those regions. Our approach allows every detection hypothesis to select a segment (including void), and scores each box in the image using both the traditional HOG filters as well as a set of novel segmentation features. Thus our model ``blends'' between the detector and segmentation models. Since our features can be computed very efficiently given the segments, we maintain the same complexity as the original DPM. We demonstrate the effectiveness of our approach in PASCAL VOC 2010, and show that when employing only a root filter our approach outperforms Dalal & Triggs detector on all classes, achieving 13% higher average AP. When employing the parts, we outperform the original DPM in $19$ out of $20$ classes, achieving an improvement of 8% AP. Furthermore, we outperform the previous state-of-the-art on VOC 2010 test by 4%.

...read moreread less

Proceedings Article•DOI•

Occlusion Patterns for Object Class Detection

[...]

Bojan Pepikj¹, Michael Stark¹, Peter V. Gehler¹, Bernt Schiele¹•Institutions (1)

Max Planck Society¹

23 Jun 2013

TL;DR: This paper evaluates and compares models that range from standard object class detectors to hierarchical, part-based representations of occluder/occludee pairs, and derives insights that can aid further developments in tackling the occlusion challenge.

...read moreread less

Abstract: Despite the success of recent object class recognition systems, the long-standing problem of partial occlusion remains a major challenge, and a principled solution is yet to be found. In this paper we leave the beaten path of methods that treat occlusion as just another source of noise - instead, we include the occluder itself into the modelling, by mining distinctive, reoccurring occlusion patterns from annotated training data. These patterns are then used as training data for dedicated detectors of varying sophistication. In particular, we evaluate and compare models that range from standard object class detectors to hierarchical, part-based representations of occluder/occludee pairs. In an extensive evaluation we derive insights that can aid further developments in tackling the occlusion challenge.

...read moreread less

Proceedings Article•DOI•

Segmentation Driven Object Detection with Fisher Vectors

[...]

Ramazan Gokberk Cinbis¹, Jakob Verbeek¹, Cordelia Schmid¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

01 Dec 2013

TL;DR: A method to produce tentative object segmentation masks to suppress background clutter in the features to improve object detection significantly and exploit contextual features in the form of a full-image FV descriptor, and an inter-category rescoring mechanism.

...read moreread less

Abstract: We present an object detection system based on the Fisher vector (FV) image representation computed over SIFT and color descriptors. For computational and storage efficiency, we use a recent segmentation-based method to generate class-independent object detection hypotheses, in combination with data compression techniques. Our main contribution is a method to produce tentative object segmentation masks to suppress background clutter in the features. Re-weighting the local image features based on these masks is shown to improve object detection significantly. We also exploit contextual features in the form of a full-image FV descriptor, and an inter-category rescoring mechanism. Our experiments on the VOC 2007 and 2010 datasets show that our detector improves over the current state-of-the-art detection results.

...read moreread less

Proceedings Article•DOI•

Constrained Clustering and Its Application to Face Clustering in Videos

[...]

Baoyuan Wu, Yifan Zhang, Bao-Gang Hu, Qiang Ji¹•Institutions (1)

Rensselaer Polytechnic Institute¹

23 Jun 2013

TL;DR: An efficient clustering framework specially for face clustering in videos, considering that faces in adjacent frames of the same face track are very similar, is introduced and is applicable to other clustering algorithms to significantly reduce the computational cost.

...read moreread less

Abstract: In this paper, we focus on face clustering in videos. Given the detected faces from real-world videos, we partition all faces into K disjoint clusters. Different from clustering on a collection of facial images, the faces from videos are organized as face tracks and the frame index of each face is also provided. As a result, many pair wise constraints between faces can be easily obtained from the temporal and spatial knowledge of the face tracks. These constraints can be effectively incorporated into a generative clustering model based on the Hidden Markov Random Fields (HMRFs). Within the HMRF model, the pair wise constraints are augmented by label-level and constraint-level local smoothness to guide the clustering process. The parameters for both the unary and the pair wise potential functions are learned by the simulated field algorithm, and the weights of constraints can be easily adjusted. We further introduce an efficient clustering framework specially for face clustering in videos, considering that faces in adjacent frames of the same face track are very similar. The framework is applicable to other clustering algorithms to significantly reduce the computational cost. Experiments on two face data sets from real-world videos demonstrate the significantly improved performance of our algorithm over state-of-the art algorithms.

...read moreread less

Proceedings Article•DOI•

Automatic facial makeup detection with application in face recognition

[...]

Cunjian Chen¹, Antitza Dantcheva², Arun Ross²•Institutions (2)

West Virginia University¹, Michigan State University²

04 Jun 2013

TL;DR: A method to automatically detect the presence of makeup in face images by extracting a feature vector that captures the shape, texture and color characteristics of the input face, and employs a classifier to determine the presence or absence of makeup.

...read moreread less

Abstract: Facial makeup has the ability to alter the appearance of a person. Such an alteration can degrade the accuracy of automated face recognition systems, as well as that of meth-ods estimating age and beauty from faces. In this work, we design a method to automatically detect the presence of makeup in face images. The proposed algorithm extracts a feature vector that captures the shape, texture and color characteristics of the input face, and employs a classifier to determine the presence or absence of makeup. Besides extracting features from the entire face, the algorithm also considers portions of the face pertaining to the left eye, right eye, and mouth. Experiments on two datasets consisting of 151 subjects (600 images) and 125 subjects (154 images), respectively, suggest that makeup detection rates of up to 93.5% (at a false positive rate of 1%) can be obtained using the proposed approach. Further, an adaptive pre-processing scheme that exploits knowledge of the presence or absence of facial makeup to improve the matching accuracy of a face matcher is presented.

...read moreread less

Proceedings Article•DOI•

Face liveness detection using 3D structure recovered from a single camera

[...]

Tao Wang, Jianwei Yang, Zhen Lei, Shengcai Liao, Stan Z. Li - Show less +1 more

04 Jun 2013

TL;DR: A novel face liveness detection approach to counter spoofing attacks by recovering sparse 3D facial structure by detecting facial landmarks and select key frames and training a Support Vector Machine to distinguish the genuine and fake faces.

...read moreread less

Abstract: Face recognition, which is security-critical, has been widely deployed in our daily life. However, traditional face recognition technologies in practice can be spoofed easily, for example, by using a simple printed photo. In this paper, we propose a novel face liveness detection approach to counter spoofing attacks by recovering sparse 3D facial structure. Given a face video or several images captured from more than two viewpoints, we detect facial landmarks and select key frames. Then, the sparse 3D facial structure can be recoveredfrom the selected key frames. Finally, an Support Vector Machine (SVM) classifier is trained to distinguish the genuine and fake faces. Compared with the previous works, the proposed method has the following advantages. First, it gives perfect liveness detection results, which meets the security requirement of face biometric systems. Second, it is independent on cameras or systems, which works well on different devices. Experiments with genuine faces versus planar photo faces and warped photo faces demonstrate the superiority of the proposed method over the state-of-the-art liveness detection methods.

...read moreread less

Proceedings Article•DOI•

Face Recognition in Movie Trailers via Mean Sequence Sparse Representation-Based Classification

[...]

Enrique G. Ortiz¹, Alan Wright¹, Mubarak Shah¹•Institutions (1)

University of Central Florida¹

23 Jun 2013

TL;DR: This paper presents an end-to-end video face recognition system, addressing the difficult problem of identifying a video face track using a large dictionary of still face images of a few hundred people, while rejecting unknown individuals.

...read moreread less

Abstract: This paper presents an end-to-end video face recognition system, addressing the difficult problem of identifying a video face track using a large dictionary of still face images of a few hundred people, while rejecting unknown individuals. A straightforward application of the popular l1-minimization for face recognition on a frame-by-frame basis is prohibitively expensive, so we propose a novel algorithm Mean Sequence SRC (MSSRC) that performs video face recognition using a joint optimization leveraging all of the available video data and the knowledge that the face track frames belong to the same individual. By adding a strict temporal constraint to the l1-minimization that forces individual frames in a face track to all reconstruct a single identity, we show the optimization reduces to a single minimization over the mean of the face track. We also introduce a new Movie Trailer Face Dataset collected from 101 movie trailers on YouTube. Finally, we show that our method matches or outperforms the state-of-the-art on three existing datasets (YouTube Celebrities, YouTube Faces, and Buffy) and our unconstrained Movie Trailer Face Dataset. More importantly, our method excels at rejecting unknown identities by at least 8% in average precision.

...read moreread less

Journal Article•

Face Recognition Using Local Binary Patterns (LBP)

[...]

Md. Abdur Rahim, Md. Shafiul Azam, Nazmul Hossain, Md. Rashedul Islam

31 May 2013-Global journal of computer science and technology

TL;DR: This research work empirically evaluate face recognition which considers both shape and texture information to represent face images based on Local Binary Patterns for person-independent face recognition.

...read moreread less

Abstract: The face of a human being conveys a lot of information about identity and emotional state of the person. Face recognition is an interesting and challenging problem, and impacts important applications in many areas such as identification for law enforcement, authentication for banking and security system access, and personal identification among others. In our research work mainly consists of three parts, namely face representation, feature extraction and classification. Face representation represents how to model a face and determines the successive algorithms of detection and recognition. The most useful and unique features of the face image are extracted in the feature extraction phase. In the classification the face image is compared with the images from the database. In our research work, we empirically evaluate face recognition which considers both shape and texture information to represent face images based on Local Binary Patterns for person-independent face recognition. The face area is first divided into small regions from which Local Binary Patterns (LBP), histograms are extracted and concatenated into a single feature vector. This feature vector forms an efficient representation of the face and is used to measure similarities between images.

...read moreread less

Journal Article•DOI•

Object class detection: A survey

[...]

Xin Zhang¹, Yee-Hong Yang², Zhiguang Han¹, Hui Wang¹, Chao Gao¹ - Show less +1 more•Institutions (2)

National University of Defense Technology¹, University of Alberta²

11 Jul 2013-ACM Computing Surveys

TL;DR: A comprehensive survey of the recent technical achievements in object class detection research, covering different aspects of the research, including core techniques: appearance modeling, localization strategies, and supervised classification methods.

...read moreread less

Abstract: Object class detection, also known as category-level object detection, has become one of the most focused areas in computer vision in the new century. This article attempts to provide a comprehensive survey of the recent technical achievements in this area of research. More than 270 major publications are included in this survey covering different aspects of the research, which include: (i) problem description: key tasks and challenges; (ii) core techniques: appearance modeling, localization strategies, and supervised classification methods; (iii) evaluation issues: approaches, metrics, standard datasets, and state-of-the-art results; and (iv) new development: particularly new approaches and applications motivated by the recent boom of social images. Finally, in retrospect of what has been achieved so far, the survey also discusses what the future may hold for object class detection research.

...read moreread less

Journal Article•DOI•

Flip-Invariant SIFT for Copy and Object Detection

[...]

Wan-Lei Zhao¹, Chong-Wah Ngo²•Institutions (2)

French Institute for Research in Computer Science and Automation¹, City University of Hong Kong²

01 Mar 2013-IEEE Transactions on Image Processing

TL;DR: A new descriptor, named flip-invariant SIFT (or F-SIFT), that preserves the original properties of SIFT while being tolerant to flips is proposed and demonstrated, which leads to a more than 50% savings in computational cost.

...read moreread less

Abstract: Scale-invariant feature transform (SIFT) feature has been widely accepted as an effective local keypoint descriptor for its invariance to rotation, scale, and lighting changes in images. However, it is also well known that SIFT, which is derived from directionally sensitive gradient fields, is not flip invariant. In real-world applications, flip or flip-like transformations are commonly observed in images due to artificial flipping, opposite capturing viewpoint, or symmetric patterns of objects. This paper proposes a new descriptor, named flip-invariant SIFT (or F-SIFT), that preserves the original properties of SIFT while being tolerant to flips. F-SIFT starts by estimating the dominant curl of a local patch and then geometrically normalizes the patch by flipping before the computation of SIFT. We demonstrate the power of F-SIFT on three tasks: large-scale video copy detection, object recognition, and detection. In copy detection, a framework, which smartly indices the flip properties of F-SIFT for rapid filtering and weak geometric checking, is proposed. F-SIFT not only significantly improves the detection accuracy of SIFT, but also leads to a more than 50% savings in computational cost. In object recognition, we demonstrate the superiority of F-SIFT in dealing with flip transformation by comparing it to seven other descriptors. In object detection, we further show the ability of F-SIFT in describing symmetric objects. Consistent improvement across different kinds of keypoint detectors is observed for F-SIFT over the original SIFT.

...read moreread less

Proceedings Article•DOI•

Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation

[...]

Haoxiang Li¹, Gang Hua¹, Zhe Lin², Jonathan Brandt², Jianchao Yang² - Show less +1 more•Institutions (2)

Stevens Institute of Technology¹, Adobe Systems²

01 Dec 2013

TL;DR: An unsupervised detector adaptation algorithm to adapt any offline trained face detector to a specific collection of images, and hence achieve better accuracy, is proposed.

...read moreread less

Abstract: We propose an unsupervised detector adaptation algorithm to adapt any offline trained face detector to a specific collection of images, and hence achieve better accuracy. The core of our detector adaptation algorithm is a probabilistic elastic part (PEP) model, which is offline trained with a set of face examples. It produces a statistically aligned part based face representation, namely the PEP representation. To adapt a general face detector to a collection of images, we compute the PEP representations of the candidate detections from the general face detector, and then train a discriminative classifier with the top positives and negatives. Then we re-rank all the candidate detections with this classifier. This way, a face detector tailored to the statistics of the specific image collection is adapted from the original detector. We present extensive results on three datasets with two state-of-the-art face detectors. The significant improvement of detection accuracy over these state of-the-art face detectors strongly demonstrates the efficacy of the proposed face detector adaptation algorithm.

...read moreread less

Proceedings Article•DOI•

Detecting Avocados to Zucchinis: What Have We Done, and Where Are We Going?

[...]

Olga Russakovsky¹, Jia Deng¹, Zhiheng Huang¹, Alexander C. Berg², Li Fei-Fei¹ - Show less +1 more•Institutions (2)

Stanford University¹, University of North Carolina at Chapel Hill²

01 Dec 2013

TL;DR: A large-scale study on the Image Net Large Scale Visual Recognition Challenge data, inspired by the recent work of Hoiem et al, shows that this dataset provides many of the same detection challenges as the PASCAL VOC.

...read moreread less

Abstract: The growth of detection datasets and the multiple directions of object detection research provide both an unprecedented need and a great opportunity for a thorough evaluation of the current state of the field of categorical object detection. In this paper we strive to answer two key questions. First, where are we currently as a field: what have we done right, what still needs to be improved? Second, where should we be going in designing the next generation of object detectors? Inspired by the recent work of Hoiem et al. on the standard PASCAL VOC detection dataset, we perform a large-scale study on the Image Net Large Scale Visual Recognition Challenge (ILSVRC) data. First, we quantitatively demonstrate that this dataset provides many of the same detection challenges as the PASCAL VOC. Due to its scale of 1000 object categories, ILSVRC also provides an excellent test bed for understanding the performance of detectors as a function of several key properties of the object classes. We conduct a series of analyses looking at how different detection methods perform on a number of image-level and object-class-level properties such as texture, color, deformation, and clutter. We learn important lessons of the current object detection methods and propose a number of insights for designing the next generation object detectors.

...read moreread less

Proceedings Article•DOI•

Liveness detection based on 3D face shape analysis

[...]

Andrea Lagorio¹, Massimo Tistarelli¹, Marinella Cadoni¹, Clinton Fookes², Sridha Sridharan² - Show less +1 more•Institutions (2)

University of Sassari¹, Queensland University of Technology²

04 Apr 2013-Science & Engineering Faculty

TL;DR: A novel liveness detection method, based on the 3D structure of the face, that allows a biometric system to distinguish a real face from a photo, increasing the overall performance of the system and reducing its vulnerability.

...read moreread less

Abstract: In recent years face recognition systems have been applied in various useful applications, such as surveillance, access control, criminal investigations, law enforcement, and others. However face biometric systems can be highly vulnerable to spoofing attacks where an impostor tries to bypass the face recognition system using a photo or video sequence. In this paper a novel liveness detection method, based on the 3D structure of the face, is proposed. Processing the 3D curvature of the acquired data, the proposed approach allows a biometric system to distinguish a real face from a photo, increasing the overall performance of the system and reducing its vulnerability. In order to test the real capability of the methodology a 3D face database has been collected simulating spoofing attacks, therefore using photographs instead of real faces. The experimental results show the effectiveness of the proposed approach.

...read moreread less

Proceedings Article•DOI•

BOLD Features to Detect Texture-less Objects

[...]

Federico Tombari¹, Alessandro Franchi¹, Luigi Di Stefano¹•Institutions (1)

University of Bologna¹

01 Dec 2013

TL;DR: This work proposes to tackle object detection in images withstanding significant clutter and occlusion by a compact and distinctive representation of groups of neighboring line segments aggregated over limited spatial supports and invariant to rotation, translation and scale changes.

...read moreread less

Abstract: Object detection in images withstanding significant clutter and occlusion is still a challenging task whenever the object surface is characterized by poor informative content. We propose to tackle this problem by a compact and distinctive representation of groups of neighboring line segments aggregated over limited spatial supports and invariant to rotation, translation and scale changes. Peculiarly, our proposal allows for leveraging on the inherent strengths of descriptor-based approaches, i.e. robustness to occlusion and clutter and scalability with respect to the size of the model library, also when dealing with scarcely textured objects.

...read moreread less

Collapse