The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions. This paper describes the creation of this benchmark dataset and the advances in object recognition that have been possible as a result. We discuss the challenges of collecting large-scale ground truth annotation, highlight key breakthroughs in categorical object recognition, provide a detailed analysis of the current state of the field of large-scale image classification and object detection, and compare the state-of-the-art computer vision accuracy with human accuracy. We conclude with lessons learned in the 5 years of the challenge, and propose future directions and improvements.

/pdf/imagenet-large-scale-visual-recognition-challenge-caky5nmmix.pdf

ImageNet Large Scale Visual Recognition Challenge

Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

Pattern Recognition and Machine Learning

We consider the problem of automatically recognizing human faces from frontal views with varying expression and illumination, as well as occlusion and disguise. We cast the recognition problem as one of classifying among multiple linear regression models and argue that new theory from sparse signal representation offers the key to addressing this problem. Based on a sparse representation computed by C1-minimization, we propose a general classification algorithm for (image-based) object recognition. This new framework provides new insights into two crucial issues in face recognition: feature extraction and robustness to occlusion. For feature extraction, we show that if sparsity in the recognition problem is properly harnessed, the choice of features is no longer critical. What is critical, however, is whether the number of features is sufficiently large and whether the sparse representation is correctly computed. Unconventional features such as downsampled images and random projections perform just as well as conventional features such as eigenfaces and Laplacianfaces, as long as the dimension of the feature space surpasses certain threshold, predicted by the theory of sparse representation. This framework can handle errors due to occlusion and corruption uniformly by exploiting the fact that these errors are often sparse with respect to the standard (pixel) basis. The theory of sparse representation helps predict how much occlusion the recognition algorithm can handle and how to choose the training images to maximize robustness to occlusion. We conduct extensive experiments on publicly available databases to verify the efficacy of the proposed algorithm and corroborate the above claims.

/pdf/robust-face-recognition-via-sparse-representation-4bfqv6y74o.pdf

Robust Face Recognition via Sparse Representation

In modern face recognition, the conventional pipeline consists of four stages: detect => align => represent => classify. We revisit both the alignment step and the representation step by employing explicit 3D face modeling in order to apply a piecewise affine transformation, and derive a face representation from a nine-layer deep neural network. This deep network involves more than 120 million parameters using several locally connected layers without weight sharing, rather than the standard convolutional layers. Thus we trained it on the largest facial dataset to-date, an identity labeled dataset of four million facial images belonging to more than 4, 000 identities. The learned representations coupling the accurate model-based alignment with the large facial database generalize remarkably well to faces in unconstrained environments, even with a simple classifier. Our method reaches an accuracy of 97.35% on the Labeled Faces in the Wild (LFW) dataset, reducing the error of the current state of the art by more than 27%, closely approaching human-level performance.

/pdf/deepface-closing-the-gap-to-human-level-performance-in-face-3wcmadzjn3.pdf

DeepFace: Closing the Gap to Human-Level Performance in Face Verification

This paper presents a novel and efficient facial image representation based on local binary pattern (LBP) texture features. The face image is divided into several regions from which the LBP feature distributions are extracted and concatenated into an enhanced feature vector to be used as a face descriptor. The performance of the proposed method is assessed in the face recognition problem under different challenges. Other applications and several extensions are also discussed

Face Description with Local Binary Patterns: Application to Face Recognition

In this work, we present a novel approach to face recognition which considers both shape and texture information to represent face images. The face area is first divided into small regions from which Local Binary Pattern (LBP) histograms are extracted and concatenated into a single, spatially enhanced feature histogram efficiently representing the face image. The recognition is performed using a nearest neighbour classifier in the computed feature space with Chi square as a dissimilarity measure. Extensive experiments clearly show the superiority of the proposed scheme over all considered methods (PCA, Bayesian Intra/extrapersonal Classifier and Elastic Bunch Graph Matching) on FERET tests which include testing the robustness of the method against different facial expressions, lighting and aging of the subjects. In addition to its efficiency, the simplicity of the proposed method allows for very fast feature extraction.

/pdf/face-recognition-with-local-binary-patterns-530w9v6pt6.pdf

Face Recognition with Local Binary Patterns

The recent emergence of Local Binary Patterns (LBP) has led to significant progress in applying texture methods to various computer vision problems and applications. The focus of this research has broadened from 2D textures to 3D textures and spatiotemporal (dynamic) textures. Also, where texture was once utilized for applications such as remote sensing, industrial inspection and biomedical image analysis, the introduction of LBP-based approaches have provided outstanding results in problems relating to face and activity analysis, with future scope for face and facial expression recognition, biometrics, visual surveillance and video analysis. Computer Vision Using Local Binary Patterns provides a detailed description of the LBP methods and their variants both in spatial and spatiotemporal domains. This comprehensive reference also provides an excellent overview as to how texture methods can be utilized for solving different kinds of computer vision and image analysis problems. Source codes of the basic LBP algorithms, demonstrations, some databases and a comprehensive LBP bibliography can be found from an accompanying web site. Topics include: local binary patterns and their variants in spatial and spatiotemporal domains, texture classification and segmentation, description of interest regions, applications in image retrieval and 3D recognition - Recognition and segmentation of dynamic textures, background subtraction, recognition of actions, face analysis using still images and image sequences, visual speech recognition and LBP in various applications. Written by pioneers of LBP, this book is an essential resource for researchers, professional engineers and graduate students in computer vision, image analysis and pattern recognition. The book will also be of interest to all those who work with specific applications of machine vision.

/pdf/computer-vision-using-local-binary-patterns-4ea9qycdmn.pdf

Computer Vision Using Local Binary Patterns

In this paper, we propose Local Binary Pattern Histogram Fourier features (LBP-HF), a novel rotation invariant image descriptor computed from discrete Fourier transforms of local binary pattern (LBP) histograms Unlike most other histogram based invariant texture descriptors which normalize rotation locally, the proposed invariants are constructed globally for the whole region to be described In addition to being rotation invariant, the LBP-HF features retain the highly discriminative nature of LBP histograms In the experiments, it is shown that these features outperform non-invariant and earlier version of rotation invariant LBP and the MR8 descriptor in texture classification, material categorization and face recognition tests

/pdf/rotation-invariant-image-description-with-local-binary-2tk4xx7h8o.pdf

Rotation Invariant Image Description with Local Binary Pattern Histogram Fourier Features

In this paper, we propose a novel approach to compute rotation-invariant features from histograms of local noninvariant patterns. We apply this approach to both static and dynamic local binary pattern (LBP) descriptors. For static-texture description, we present LBP histogram Fourier (LBP-HF) features, and for dynamic-texture recognition, we present two rotation-invariant descriptors computed from the LBPs from three orthogonal planes (LBP-TOP) features in the spatiotemporal domain. LBP-HF is a novel rotation-invariant image descriptor computed from discrete Fourier transforms of LBP histograms. The approach can be also generalized to embed any uniform features into this framework, and combining the supplementary information, e.g., sign and magnitude components of the LBP, together can improve the description ability. Moreover, two variants of rotation-invariant descriptors are proposed to the LBP-TOP, which is an effective descriptor for dynamic-texture recognition, as shown by its recent success in different application problems, but it is not rotation invariant. In the experiments, it is shown that the LBP-HF and its extensions outperform noninvariant and earlier versions of the rotation-invariant LBP in the rotation-invariant texture classification. In experiments on two dynamic-texture databases with rotations or view variations, the proposed video features can effectively deal with rotation variations of dynamic textures (DTs). They also are robust with respect to changes in viewpoint, outperforming recent methods proposed for view-invariant recognition of DTs.

Timo Ahonen

Papers

Face Description with Local Binary Patterns: Application to Face Recognition

Face Recognition with Local Binary Patterns

Computer Vision Using Local Binary Patterns

Rotation Invariant Image Description with Local Binary Pattern Histogram Fourier Features

Rotation-Invariant Image and Video Description With Local Binary Pattern Features