Showing papers in "IEEE Transactions on Pattern Analysis and Machine Intelligence in 2010"
TL;DR: An object detection system based on mixtures of multiscale deformable part models that is able to represent highly variable object classes and achieves state-of-the-art results in the PASCAL object detection challenges is described.
Abstract: We describe an object detection system based on mixtures of multiscale deformable part models. Our system is able to represent highly variable object classes and achieves state-of-the-art results in the PASCAL object detection challenges. While deformable part models have become quite popular, their value had not been demonstrated on difficult benchmarks such as the PASCAL data sets. Our system relies on new methods for discriminative training with partially labeled data. We combine a margin-sensitive approach for data-mining hard negative examples with a formalism we call latent SVM. A latent SVM is a reformulation of MI--SVM in terms of latent variables. A latent SVM is semiconvex, and the training problem becomes convex once latent information is specified for the positive examples. This leads to an iterative training algorithm that alternates between fixing latent values for positive examples and optimizing the latent SVM objective function.
TL;DR: A novel algorithm for multiview stereopsis that outputs a dense set of small rectangular patches covering the surfaces visible in the images, which outperforms all others submitted so far for four out of the six data sets.
Abstract: This paper proposes a novel algorithm for multiview stereopsis that outputs a dense set of small rectangular patches covering the surfaces visible in the images. Stereopsis is implemented as a match, expand, and filter procedure, starting from a sparse set of matched keypoints, and repeatedly expanding these before using visibility constraints to filter away false matches. The keys to the performance of the proposed algorithm are effective techniques for enforcing local photometric consistency and global visibility constraints. Simple but effective methods are also proposed to turn the resulting patch model into a mesh which can be further refined by an algorithm that enforces both photometric consistency and regularization constraints. The proposed approach automatically detects and discards outliers and obstacles and does not require any initialization in the form of a visual hull, a bounding box, or valid depth ranges. We have tested our algorithm on various data sets including objects with fine surface details, deep concavities, and thin structures, outdoor scenes observed from a restricted set of viewpoints, and "crowded" scenes where moving obstacles appear in front of a static structure of interest. A quantitative evaluation on the Middlebury benchmark  shows that the proposed method outperforms all others submitted so far for four out of the six data sets.
TL;DR: A probabilistic method, called the Coherent Point Drift (CPD) algorithm, is introduced for both rigid and nonrigid point set registration and a fast algorithm is introduced that reduces the method computation complexity to linear.
Abstract: Point set registration is a key component in many computer vision tasks. The goal of point set registration is to assign correspondences between two sets of points and to recover the transformation that maps one point set to the other. Multiple factors, including an unknown nonrigid spatial transformation, large dimensionality of point set, noise, and outliers, make the point set registration a challenging problem. We introduce a probabilistic method, called the Coherent Point Drift (CPD) algorithm, for both rigid and nonrigid point set registration. We consider the alignment of two point sets as a probability density estimation problem. We fit the Gaussian mixture model (GMM) centroids (representing the first point set) to the data (the second point set) by maximizing the likelihood. We force the GMM centroids to move coherently as a group to preserve the topological structure of the point sets. In the rigid case, we impose the coherence constraint by reparameterization of GMM centroid locations with rigid parameters and derive a closed form solution of the maximization step of the EM algorithm in arbitrary dimensions. In the nonrigid case, we impose the coherence constraint by regularizing the displacement field and using the variational calculus to derive the optimal transformation. We also introduce a fast algorithm that reduces the method computation complexity to linear. We test the CPD algorithm for both rigid and nonrigid transformations in the presence of noise, outliers, and missing points, where CPD shows accurate results and outperforms current state-of-the-art methods.
TL;DR: From the theoretical and experimental results, it can be derived that invariance to light intensity changes and light color changes affects category recognition and the usefulness of invariance is category-specific.
Abstract: Image category recognition is important to access visual information on the level of objects and scene types. So far, intensity-based descriptors have been widely used for feature extraction at salient points. To increase illumination invariance and discriminative power, color descriptors have been proposed. Because many different descriptors exist, a structured overview is required of color invariant descriptors in the context of image category recognition. Therefore, this paper studies the invariance properties and the distinctiveness of color descriptors (software to compute the color descriptors from this paper is available from http://www.colordescriptors.com) in a structured way. The analytical invariance properties of color descriptors are explored, using a taxonomy based on invariance properties with respect to photometric transformations, and tested experimentally using a data set with known illumination conditions. In addition, the distinctiveness of color descriptors is assessed experimentally using two benchmarks, one from the image domain and one from the video domain. From the theoretical and experimental results, it can be derived that invariance to light intensity changes and light color changes affects category recognition. The results further reveal that, for light intensity shifts, the usefulness of invariance is category-specific. Overall, when choosing a single descriptor and no prior knowledge about the data set and object and scene categories is available, the OpponentSIFT is recommended. Furthermore, a combined set of color descriptors outperforms intensity-based SIFT and improves category recognition by 8 percent on the PASCAL VOC 2007 and by 7 percent on the Mediamill Challenge.
TL;DR: A new heuristic for feature detection is presented and, using machine learning, a feature detector is derived from this which can fully process live PAL video using less than 5 percent of the available processing time.
Abstract: The repeatability and efficiency of a corner detector determines how likely it is to be useful in a real-world application. The repeatability is important because the same scene viewed from different positions should yield features which correspond to the same real-world 3D locations. The efficiency is important because this determines whether the detector combined with further processing can operate at frame rate. Three advances are described in this paper. First, we present a new heuristic for feature detection and, using machine learning, we derive a feature detector from this which can fully process live PAL video using less than 5 percent of the available processing time. By comparison, most other detectors cannot even operate at frame rate (Harris detector 115 percent, SIFT 195 percent). Second, we generalize the detector, allowing it to be optimized for repeatability, with little loss of efficiency. Third, we carry out a rigorous comparison of corner detectors based on the above repeatability criterion applied to 3D scenes. We show that, despite being principally constructed for speed, on these stringent tests, our heuristic detector significantly outperforms existing feature detectors. Finally, the comparison demonstrates that using machine learning produces significant improvements in repeatability, yielding a detector that is both very fast and of very high quality.
TL;DR: A linear-time line segment detector that gives accurate results, a controlled number of false detections, and requires no parameter tuning is proposed.
Abstract: We propose a linear-time line segment detector that gives accurate results, a controlled number of false detections, and requires no parameter tuning. This algorithm is tested and compared to state-of-the-art algorithms on a wide set of natural images.
TL;DR: This review shows that, despite their apparent simplicity, the development of a general eye detection technique involves addressing many challenges, requires further theoretical developments, and is consequently of interest to many other domains problems in computer vision and beyond.
Abstract: Despite active research and significant progress in the last 30 years, eye detection and tracking remains challenging due to the individuality of eyes, occlusion, variability in scale, location, and light conditions. Data on eye location and details of eye movements have numerous applications and are essential in face detection, biometric identification, and particular human-computer interaction tasks. This paper reviews current progress and state of the art in video-based eye detection and tracking in order to identify promising techniques as well as issues to be further addressed. We present a detailed review of recent eye models and techniques for eye detection and tracking. We also survey methods for gaze estimation and compare them based on their geometric properties and reported accuracies. This review shows that, despite their apparent simplicity, the development of a general eye detection technique involves addressing many challenges, requires further theoretical developments, and is consequently of interest to many other domains problems in computer vision and beyond.
TL;DR: An EM-based algorithm to compute dense depth and occlusion maps from wide-baseline image pairs using a local image descriptor, DAISY, which is very efficient to compute densely and robust against many photometric and geometric transformations.
Abstract: In this paper, we introduce a local image descriptor, DAISY, which is very efficient to compute densely. We also present an EM-based algorithm to compute dense depth and occlusion maps from wide-baseline image pairs using this descriptor. This yields much better results in wide-baseline situations than the pixel and correlation-based algorithms that are commonly used in narrow-baseline stereo. Also, using a descriptor makes our algorithm robust against many photometric and geometric transformations. Our descriptor is inspired from earlier ones such as SIFT and GLOH but can be computed much faster for our purposes. Unlike SURF, which can also be computed efficiently at every pixel, it does not introduce artifacts that degrade the matching performance when used densely. It is important to note that our approach is the first algorithm that attempts to estimate dense depth maps from wide-baseline image pairs, and we show that it is a good one at that with many experiments for depth estimation accuracy, occlusion detection, and comparing it against other descriptors on laser-scanned ground truth scenes. We also tested our approach on a variety of indoor and outdoor scenes with different photometric and geometric transformations and our experiments support our claim to being robust against these.
TL;DR: This paper analyzes the statistical properties, bias and variance, of the k-fold cross-validation classification error estimator (k-cv) and proposes a novel theoretical decomposition of the variance considering its sources of variance: sensitivity to changes in the training set and sensitivity to changed folds.
Abstract: In the machine learning field, the performance of a classifier is usually measured in terms of prediction error. In most real-world problems, the error cannot be exactly calculated and it must be estimated. Therefore, it is important to choose an appropriate estimator of the error. This paper analyzes the statistical properties, bias and variance, of the k-fold cross-validation classification error estimator (k-cv). Our main contribution is a novel theoretical decomposition of the variance of the k-cv considering its sources of variance: sensitivity to changes in the training set and sensitivity to changes in the folds. The paper also compares the bias and variance of the estimator for different values of k. The experimental study has been performed in artificial domains because they allow the exact computation of the implied quantities and we can rigorously specify the conditions of experimentation. The experimentation has been performed for two classifiers (naive Bayes and nearest neighbor), different numbers of folds, sample sizes, and training sets coming from assorted probability distributions. We conclude by including some practical recommendation on the use of k-fold cross validation.
TL;DR: This work considers factorizations of the form X = FGT, and focuses on algorithms in which G is restricted to containing nonnegative entries, but allowing the data matrix X to have mixed signs, thus extending the applicable range of NMF methods.
Abstract: We present several new variations on the theme of nonnegative matrix factorization (NMF). Considering factorizations of the form X = FGT, we focus on algorithms in which G is restricted to containing nonnegative entries, but allowing the data matrix X to have mixed signs, thus extending the applicable range of NMF methods. We also consider algorithms in which the basis vectors of F are constrained to be convex combinations of the data points. This is used for a kernel extension of NMF. We provide algorithms for computing these new factorizations and we provide supporting theoretical analysis. We also analyze the relationships between our algorithms and clustering algorithms, and consider the implications for sparseness of solutions. Finally, we present experimental results that explore the properties of these new methods.
TL;DR: This work divides the problem of detecting pedestrians from images into different processing steps, each with attached responsibilities, and separates the different proposed methods with respect to each processing stage, favoring a comparative viewpoint.
Abstract: Advanced driver assistance systems (ADASs), and particularly pedestrian protection systems (PPSs), have become an active research area aimed at improving traffic safety. The major challenge of PPSs is the development of reliable on-board pedestrian detection systems. Due to the varying appearance of pedestrians (e.g., different clothes, changing size, aspect ratio, and dynamic shape) and the unstructured environment, it is very difficult to cope with the demanded robustness of this kind of system. Two problems arising in this research area are the lack of public benchmarks and the difficulty in reproducing many of the proposed methods, which makes it difficult to compare the approaches. As a result, surveying the literature by enumerating the proposals one--after-another is not the most useful way to provide a comparative point of view. Accordingly, we present a more convenient strategy to survey the different approaches. We divide the problem of detecting pedestrians from images into different processing steps, each with attached responsibilities. Then, the different proposed methods are analyzed and classified with respect to each processing stage, favoring a comparative viewpoint. Finally, discussion of the important topics is presented, putting special emphasis on the future needs and challenges.
TL;DR: Experimental results on the Brodatz and KTH-TIPS2-a texture databases show that WLD impressively outperforms the other widely used descriptors (e.g., Gabor and SIFT), and experimental results on human face detection also show a promising performance comparable to the best known results onThe MIT+CMU frontal face test set, the AR face data set, and the CMU profile test set.
Abstract: Inspired by Weber's Law, this paper proposes a simple, yet very powerful and robust local descriptor, called the Weber Local Descriptor (WLD). It is based on the fact that human perception of a pattern depends not only on the change of a stimulus (such as sound, lighting) but also on the original intensity of the stimulus. Specifically, WLD consists of two components: differential excitation and orientation. The differential excitation component is a function of the ratio between two terms: One is the relative intensity differences of a current pixel against its neighbors, the other is the intensity of the current pixel. The orientation component is the gradient orientation of the current pixel. For a given image, we use the two components to construct a concatenated WLD histogram. Experimental results on the Brodatz and KTH-TIPS2-a texture databases show that WLD impressively outperforms the other widely used descriptors (e.g., Gabor and SIFT). In addition, experimental results on human face detection also show a promising performance comparable to the best known results on the MIT+CMU frontal face test set, the AR face data set, and the CMU profile test set.
TL;DR: A novel approach of face identification by formulating the pattern recognition problem in terms of linear regression, using a fundamental concept that patterns from a single-object class lie on a linear subspace, and introducing a novel Distance-based Evidence Fusion (DEF) algorithm.
Abstract: In this paper, we present a novel approach of face identification by formulating the pattern recognition problem in terms of linear regression. Using a fundamental concept that patterns from a single-object class lie on a linear subspace, we develop a linear model representing a probe image as a linear combination of class-specific galleries. The inverse problem is solved using the least-squares method and the decision is ruled in favor of the class with the minimum reconstruction error. The proposed Linear Regression Classification (LRC) algorithm falls in the category of nearest subspace classification. The algorithm is extensively evaluated on several standard databases under a number of exemplary evaluation protocols reported in the face recognition literature. A comparative study with state-of-the-art algorithms clearly reflects the efficacy of the proposed approach. For the problem of contiguous occlusion, we propose a Modular LRC approach, introducing a novel Distance-based Evidence Fusion (DEF) algorithm. The proposed methodology achieves the best results ever reported for the challenging problem of scarf occlusion.
TL;DR: Compared with existing algorithms, KRR leads to a better generalization than simply storing the examples as has been done in existing example-based algorithms and results in much less noisy images.
Abstract: This paper proposes a framework for single-image super-resolution. The underlying idea is to learn a map from input low-resolution images to target high-resolution images based on example pairs of input and output images. Kernel ridge regression (KRR) is adopted for this purpose. To reduce the time complexity of training and testing for KRR, a sparse solution is found by combining the ideas of kernel matching pursuit and gradient descent. As a regularized solution, KRR leads to a better generalization than simply storing the examples as has been done in existing example-based algorithms and results in much less noisy images. However, this may introduce blurring and ringing artifacts around major edges as sharp changes are penalized severely. A prior model of a generic image class which takes into account the discontinuity property of images is adopted to resolve this problem. Comparison with existing algorithms shows the effectiveness of the proposed method.
TL;DR: It is demonstrated that explicitly modeling visual word assignment ambiguity improves classification performance compared to the hard assignment of the traditional codebook model, and the proposed model performs consistently.
Abstract: This paper studies automatic image classification by modeling soft assignment in the popular codebook model. The codebook model describes an image as a bag of discrete visual words selected from a vocabulary, where the frequency distributions of visual words in an image allow classification. One inherent component of the codebook model is the assignment of discrete visual words to continuous image features. Despite the clear mismatch of this hard assignment with the nature of continuous features, the approach has been successfully applied for some years. In this paper, we investigate four types of soft assignment of visual words to image features. We demonstrate that explicitly modeling visual word assignment ambiguity improves classification performance compared to the hard assignment of the traditional codebook model. The traditional codebook model is compared against our method for five well-known data sets: 15 natural scenes, Caltech-101, Caltech-256, and Pascal VOC 2007/2008. We demonstrate that large codebook vocabulary sizes completely deteriorate the performance of the traditional model, whereas the proposed model performs consistently. Moreover, we show that our method profits in high-dimensional feature spaces and reaps higher benefits when increasing the number of image categories.
TL;DR: The complete state-of-the-art techniques in the face image-based age synthesis and estimation topics are surveyed, including existing models, popular algorithms, system performances, technical difficulties, popular face aging databases, evaluation protocols, and promising future directions are provided.
Abstract: Human age, as an important personal trait, can be directly inferred by distinct patterns emerging from the facial appearance. Derived from rapid advances in computer graphics and machine vision, computer-based age synthesis and estimation via faces have become particularly prevalent topics recently because of their explosively emerging real-world applications, such as forensic art, electronic customer relationship management, security control and surveillance monitoring, biometrics, entertainment, and cosmetology. Age synthesis is defined to rerender a face image aesthetically with natural aging and rejuvenating effects on the individual face. Age estimation is defined to label a face image automatically with the exact age (year) or the age group (year range) of the individual face. Because of their particularity and complexity, both problems are attractive yet challenging to computer-based application system designers. Large efforts from both academia and industry have been devoted in the last a few decades. In this paper, we survey the complete state-of-the-art techniques in the face image-based age synthesis and estimation topics. Existing models, popular algorithms, system performances, technical difficulties, popular face aging databases, evaluation protocols, and promising future directions are also provided with systematic discussions.
TL;DR: This paper shows that formulating the problem in a naive Bayesian classification framework makes such preprocessing unnecessary and produces an algorithm that is simple, efficient, and robust, and it scales well as the number of classes grows.
Abstract: While feature point recognition is a key component of modern approaches to object detection, existing approaches require computationally expensive patch preprocessing to handle perspective distortion. In this paper, we show that formulating the problem in a naive Bayesian classification framework makes such preprocessing unnecessary and produces an algorithm that is simple, efficient, and robust. Furthermore, it scales well as the number of classes grows. To recognize the patches surrounding keypoints, our classifier uses hundreds of simple binary features and models class posterior probabilities. We make the problem computationally tractable by assuming independence between arbitrary sets of features. Even though this is not strictly true, we demonstrate that our classifier nevertheless performs remarkably well on image data sets containing very significant perspective changes.
TL;DR: The scope of the proposed algorithm goes beyond image analysis and it has the potential to be used for a wide variety of problems for structured prediction problems, including high-level vision and medical image segmentation problems.
Abstract: The notion of using context information for solving high-level vision and medical image segmentation problems has been increasingly realized in the field. However, how to learn an effective and efficient context model, together with an image appearance model, remains mostly unknown. The current literature using Markov Random Fields (MRFs) and Conditional Random Fields (CRFs) often involves specific algorithm design in which the modeling and computing stages are studied in isolation. In this paper, we propose a learning algorithm, auto-context. Given a set of training images and their corresponding label maps, we first learn a classifier on local image patches. The discriminative probability (or classification confidence) maps created by the learned classifier are then used as context information, in addition to the original image patches, to train a new classifier. The algorithm then iterates until convergence. Auto-context integrates low-level and context information by fusing a large number of low-level appearance features with context and implicit shape information. The resulting discriminative algorithm is general and easy to implement. Under nearly the same parameter settings in training, we apply the algorithm to three challenging vision applications: foreground/background segregation, human body configuration estimation, and scene region labeling. Moreover, context also plays a very important role in medical/brain images where the anatomical structures are mostly constrained to relatively fixed positions. With only some slight changes resulting from using 3D instead of 2D features, the auto-context algorithm applied to brain MRI image segmentation is shown to outperform state-of-the-art algorithms specifically designed for this domain. Furthermore, the scope of the proposed algorithm goes beyond image analysis and it has the potential to be used for a wide variety of problems for structured prediction problems.
TL;DR: Experimental results confirmed the effectiveness and the reliability of both the DASVM technique and the proposed circular validation strategy for validating the learning of domain adaptation classifiers when no true labels for the target--domain instances are available.
Abstract: This paper addresses pattern classification in the framework of domain adaptation by considering methods that solve problems in which training data are assumed to be available only for a source domain different (even if related) from the target domain of (unlabeled) test data. Two main novel contributions are proposed: 1) a domain adaptation support vector machine (DASVM) technique which extends the formulation of support vector machines (SVMs) to the domain adaptation framework and 2) a circular indirect accuracy assessment strategy for validating the learning of domain adaptation classifiers when no true labels for the target--domain instances are available. Experimental results, obtained on a series of two-dimensional toy problems and on two real data sets related to brain computer interface and remote sensing applications, confirmed the effectiveness and the reliability of both the DASVM technique and the proposed circular validation strategy.
TL;DR: The Minutia Cylinder-Code is introduced, a novel representation based on 3D data structures (called cylinders), built from minutiae distances and angles and the feasibility of obtaining a very effective fingerprint recognition implementation for light architectures is demonstrated.
Abstract: In this paper, we introduce the Minutia Cylinder-Code (MCC): a novel representation based on 3D data structures (called cylinders), built from minutiae distances and angles. The cylinders can be created starting from a subset of the mandatory features (minutiae position and direction) defined by standards like ISO/IEC 19794-2 (2005). Thanks to the cylinder invariance, fixed-length, and bit-oriented coding, some simple but very effective metrics can be defined to compute local similarities and to consolidate them into a global score. Extensive experiments over FVC2006 databases prove the superiority of MCC with respect to three well-known techniques and demonstrate the feasibility of obtaining a very effective (and interoperable) fingerprint recognition implementation for light architectures.
TL;DR: A set of kinematic features that are derived from the optical flow for human action recognition in videos, including divergence, vorticity, symmetric and antisymmetric flow fields, and third principal invariant of rate of rotation tensor is proposed.
Abstract: We propose a set of kinematic features that are derived from the optical flow for human action recognition in videos. The set of kinematic features includes divergence, vorticity, symmetric and antisymmetric flow fields, second and third principal invariants of flow gradient and rate of strain tensor, and third principal invariant of rate of rotation tensor. Each kinematic feature, when computed from the optical flow of a sequence of images, gives rise to a spatiotemporal pattern. It is then assumed that the representative dynamics of the optical flow are captured by these spatiotemporal patterns in the form of dominant kinematic trends or kinematic modes. These kinematic modes are computed by performing principal component analysis (PCA) on the spatiotemporal volumes of the kinematic features. For classification, we propose the use of multiple instance learning (MIL) in which each action video is represented by a bag of kinematic modes. Each video is then embedded into a kinematic-mode-based feature space and the coordinates of the video in that space are used for classification using the nearest neighbor algorithm. The qualitative and quantitative results are reported on the benchmark data sets.
TL;DR: The main purpose of this paper is to announce the availability of the UBIRIS.v2 database, a multisession iris images database which singularly contains data captured in the visible wavelength, at-a-distance and on on-the-move.
Abstract: The iris is regarded as one of the most useful traits for biometric recognition and the dissemination of nationwide iris-based recognition systems is imminent. However, currently deployed systems rely on heavy imaging constraints to capture near infrared images with enough quality. Also, all of the publicly available iris image databases contain data correspondent to such imaging constraints and therefore are exclusively suitable to evaluate methods thought to operate on these type of environments. The main purpose of this paper is to announce the availability of the UBIRIS.v2 database, a multisession iris images database which singularly contains data captured in the visible wavelength, at-a-distance (between four and eight meters) and on on-the-move. This database is freely available for researchers concerned about visible wavelength iris recognition and will be useful in accessing the feasibility and specifying the constraints of this type of biometric recognition.
TL;DR: The algorithm is inspired by biological mechanisms of motion-based perceptual grouping and extends a discriminant formulation of center-surround saliency previously proposed for static imagery, and yields a robust, versatile, and fully unsupervised spatiotemporal saliency algorithm, applicable to scenes with highly dynamic backgrounds and moving cameras.
Abstract: A spatiotemporal saliency algorithm based on a center-surround framework is proposed. The algorithm is inspired by biological mechanisms of motion-based perceptual grouping and extends a discriminant formulation of center-surround saliency previously proposed for static imagery. Under this formulation, the saliency of a location is equated to the power of a predefined set of features to discriminate between the visual stimuli in a center and a surround window, centered at that location. The features are spatiotemporal video patches and are modeled as dynamic textures, to achieve a principled joint characterization of the spatial and temporal components of saliency. The combination of discriminant center-surround saliency with the modeling power of dynamic textures yields a robust, versatile, and fully unsupervised spatiotemporal saliency algorithm, applicable to scenes with highly dynamic backgrounds and moving cameras. The related problem of background subtraction is treated as the complement of saliency detection, by classifying nonsalient (with respect to appearance and motion dynamics) points in the visual field as background. The algorithm is tested for background subtraction on challenging sequences, and shown to substantially outperform various state-of-the-art techniques. Quantitatively, its average error rate is almost half that of the closest competitor.
TL;DR: A 3D aging modeling technique is proposed and it is shown how it can be used to compensate for the age variations to improve the face recognition performance.
Abstract: One of the challenges in automatic face recognition is to achieve temporal invariance. In other words, the goal is to come up with a representation and matching scheme that is robust to changes due to facial aging. Facial aging is a complex process that affects both the 3D shape of the face and its texture (e.g., wrinkles). These shape and texture changes degrade the performance of automatic face recognition systems. However, facial aging has not received substantial attention compared to other facial variations due to pose, lighting, and expression. We propose a 3D aging modeling technique and show how it can be used to compensate for the age variations to improve the face recognition performance. The aging modeling technique adapts view-invariant 3D face models to the given 2D face aging database. The proposed approach is evaluated on three different databases (i.g., FG-NET, MORPH, and BROWNS) using FaceVACS, a state-of-the-art commercial face recognition engine.
TL;DR: On the FRVT 2006 and the ICE 2006 data sets, recognition performance was comparable for high-resolution frontal face, 3D face, and the iris images and the best performing algorithms were more accurate than humans on unfamiliar faces.
Abstract: This paper describes the large-scale experimental results from the Face Recognition Vendor Test (FRVT) 2006 and the Iris Challenge Evaluation (ICE) 2006. The FRVT 2006 looked at recognition from high-resolution still frontal face images and 3D face images, and measured performance for still frontal face images taken under controlled and uncontrolled illumination. The ICE 2006 evaluation reported verification performance for both left and right irises. The images in the ICE 2006 intentionally represent a broader range of quality than the ICE 2006 sensor would normally acquire. This includes images that did not pass the quality control software embedded in the sensor. The FRVT 2006 results from controlled still and 3D images document at least an order-of-magnitude improvement in recognition performance over the FRVT 2002. The FRVT 2006 and the ICE 2006 compared recognition performance from high-resolution still frontal face images, 3D face images, and the single-iris images. On the FRVT 2006 and the ICE 2006 data sets, recognition performance was comparable for high-resolution frontal face, 3D face, and the iris images. In an experiment comparing human and algorithms on matching face identity across changes in illumination on frontal face images, the best performing algorithms were more accurate than humans on unfamiliar faces.
TL;DR: A robust subspace separation scheme is developed that deals with practical issues in a unified mathematical framework and gives surprisingly good performance in the presence of the three types of pathological trajectories mentioned above.
Abstract: In this paper, we study the problem of segmenting tracked feature point trajectories of multiple moving objects in an image sequence. Using the affine camera model, this problem can be cast as the problem of segmenting samples drawn from multiple linear subspaces. In practice, due to limitations of the tracker, occlusions, and the presence of nonrigid objects in the scene, the obtained motion trajectories may contain grossly mistracked features, missing entries, or corrupted entries. In this paper, we develop a robust subspace separation scheme that deals with these practical issues in a unified mathematical framework. Our methods draw strong connections between lossy compression, rank minimization, and sparse representation. We test our methods extensively on the Hopkins155 motion segmentation database and other motion sequences with outliers and missing data. We compare the performance of our methods to state-of-the-art motion segmentation methods based on expectation-maximization and spectral clustering. For data without outliers or missing information, the results of our methods are on par with the state-of-the-art results and, in many cases, exceed them. In addition, our methods give surprisingly good performance in the presence of the three types of pathological trajectories mentioned above. All code and results are publicly available at http://perception.csl.uiuc.edu/coding/motion/.
TL;DR: A photometric stereo method designed for surfaces with spatially-varying BRDFs, including surfaces with both varying diffuse and specular properties, yielding accurate rerenderings under novel lighting conditions for a wide variety of objects.
Abstract: This paper describes a photometric stereo method designed for surfaces with spatially-varying BRDFs, including surfaces with both varying diffuse and specular properties. Our optimization-based method builds on the observation that most objects are composed of a small number of fundamental materials by constraining each pixel to be representable by a combination of at most two such materials. This approach recovers not only the shape but also material BRDFs and weight maps, yielding accurate rerenderings under novel lighting conditions for a wide variety of objects. We demonstrate examples of interactive editing operations made possible by our approach.
TL;DR: This paper considers feature selection for data classification in the presence of a huge number of irrelevant features, and proposes a new feature-selection algorithm that addresses several major issues with prior work, including problems with algorithm implementation, computational complexity, and solution accuracy.
Abstract: This paper considers feature selection for data classification in the presence of a huge number of irrelevant features. We propose a new feature-selection algorithm that addresses several major issues with prior work, including problems with algorithm implementation, computational complexity, and solution accuracy. The key idea is to decompose an arbitrarily complex nonlinear problem into a set of locally linear ones through local learning, and then learn feature relevance globally within the large margin framework. The proposed algorithm is based on well-established machine learning and numerical analysis techniques, without making any assumptions about the underlying data distribution. It is capable of processing many thousands of features within minutes on a personal computer while maintaining a very high accuracy that is nearly insensitive to a growing number of irrelevant features. Theoretical analyses of the algorithm's sample complexity suggest that the algorithm has a logarithmical sample complexity with respect to the number of features. Experiments on 11 synthetic and real-world data sets demonstrate the viability of our formulation of the feature-selection problem for supervised learning and the effectiveness of our algorithm.
TL;DR: This model represents faces in each age group by a hierarchical And-or graph, in which And nodes decompose a face into parts to describe details crucial for age perception and Or nodes represent large diversity of faces by alternative selections.
Abstract: In this paper, we present a compositional and dynamic model for face aging. The compositional model represents faces in each age group by a hierarchical And-or graph, in which And nodes decompose a face into parts to describe details (e.g., hair, wrinkles, etc.) crucial for age perception and Or nodes represent large diversity of faces by alternative selections. Then a face instance is a transverse of the And-or graph-parse graph. Face aging is modeled as a Markov process on the parse graph representation. We learn the parameters of the dynamic model from a large annotated face data set and the stochasticity of face aging is modeled in the dynamics explicitly. Based on this model, we propose a face aging simulation and prediction algorithm. Inversely, an automatic age estimation algorithm is also developed under this representation. We study two criteria to evaluate the aging results using human perception experiments: (1) the accuracy of simulation: whether the aged faces are perceived of the intended age group, and (2) preservation of identity: whether the aged faces are perceived as the same person. Quantitative statistical analysis validates the performance of our aging model and age estimation algorithm.
TL;DR: This paper introduces a graph-based WSD algorithm which has few parameters and does not require sense-annotated data for training, and investigates several measures of graph connectivity with the aim of identifying those best suited for WSD.
Abstract: Word sense disambiguation (WSD), the task of identifying the intended meanings (senses) of words in context, has been a long-standing research objective for natural language processing. In this paper, we are concerned with graph-based algorithms for large-scale WSD. Under this framework, finding the right sense for a given word amounts to identifying the most ?important? node among the set of graph nodes representing its senses. We introduce a graph-based WSD algorithm which has few parameters and does not require sense-annotated data for training. Using this algorithm, we investigate several measures of graph connectivity with the aim of identifying those best suited for WSD. We also examine how the chosen lexicon and its connectivity influences WSD performance. We report results on standard data sets and show that our graph-based approach performs comparably to the state of the art.