scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Pattern Analysis and Machine Intelligence in 2002"


Journal ArticleDOI
TL;DR: A generalized gray-scale and rotation invariant operator presentation that allows for detecting the "uniform" patterns for any quantization of the angular space and for any spatial resolution and presents a method for combining multiple operators for multiresolution analysis.
Abstract: Presents a theoretically very simple, yet efficient, multiresolution approach to gray-scale and rotation invariant texture classification based on local binary patterns and nonparametric discrimination of sample and prototype distributions. The method is based on recognizing that certain local binary patterns, termed "uniform," are fundamental properties of local image texture and their occurrence histogram is proven to be a very powerful texture feature. We derive a generalized gray-scale and rotation invariant operator presentation that allows for detecting the "uniform" patterns for any quantization of the angular space and for any spatial resolution and presents a method for combining multiple operators for multiresolution analysis. The proposed approach is very robust in terms of gray-scale variations since the operator is, by definition, invariant against any monotonic transformation of the gray scale. Another advantage is computational simplicity as the operator can be realized with a few operations in a small neighborhood and a lookup table. Experimental results demonstrate that good discrimination can be achieved with the occurrence statistics of simple rotation invariant local binary patterns.

14,245 citations


Journal ArticleDOI
TL;DR: It is proved the convergence of a recursive mean shift procedure to the nearest stationary point of the underlying density function and, thus, its utility in detecting the modes of the density.
Abstract: A general non-parametric technique is proposed for the analysis of a complex multimodal feature space and to delineate arbitrarily shaped clusters in it. The basic computational module of the technique is an old pattern recognition procedure: the mean shift. For discrete data, we prove the convergence of a recursive mean shift procedure to the nearest stationary point of the underlying density function and, thus, its utility in detecting the modes of the density. The relation of the mean shift procedure to the Nadaraya-Watson estimator from kernel regression and the robust M-estimators; of location is also established. Algorithms for two low-level vision tasks discontinuity-preserving smoothing and image segmentation - are described as applications. In these algorithms, the only user-set parameter is the resolution of the analysis, and either gray-level or color images are accepted as input. Extensive experimental results illustrate their excellent performance.

11,727 citations


Journal ArticleDOI
TL;DR: This paper presents work on computing shape models that are computationally fast and invariant basic transformations like translation, scaling and rotation, and proposes shape detection using a feature called shape context, which is descriptive of the shape of the object.
Abstract: We present a novel approach to measuring similarity between shapes and exploit it for object recognition. In our framework, the measurement of similarity is preceded by: (1) solving for correspondences between points on the two shapes; (2) using the correspondences to estimate an aligning transform. In order to solve the correspondence problem, we attach a descriptor, the shape context, to each point. The shape context at a reference point captures the distribution of the remaining points relative to it, thus offering a globally discriminative characterization. Corresponding points on two similar shapes will have similar shape contexts, enabling us to solve for correspondences as an optimal assignment problem. Given the point correspondences, we estimate the transformation that best aligns the two shapes; regularized thin-plate splines provide a flexible class of transformation maps for this purpose. The dissimilarity between the two shapes is computed as a sum of matching errors between corresponding points, together with a term measuring the magnitude of the aligning transform. We treat recognition in a nearest-neighbor classification framework as the problem of finding the stored prototype shape that is maximally similar to that in the image. Results are presented for silhouettes, trademarks, handwritten digits, and the COIL data set.

6,693 citations


Journal ArticleDOI
TL;DR: This work presents a simple and efficient implementation of Lloyd's k-means clustering algorithm, which it calls the filtering algorithm, and establishes the practical efficiency of the algorithm's running time.
Abstract: In k-means clustering, we are given a set of n data points in d-dimensional space R/sup d/ and an integer k and the problem is to determine a set of k points in Rd, called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for k-means clustering is Lloyd's (1982) algorithm. We present a simple and efficient implementation of Lloyd's k-means clustering algorithm, which we call the filtering algorithm. This algorithm is easy to implement, requiring a kd-tree as the only major data structure. We establish the practical efficiency of the filtering algorithm in two ways. First, we present a data-sensitive analysis of the algorithm's running time, which shows that the algorithm runs faster as the separation between clusters increases. Second, we present a number of empirical studies both on synthetically generated data and on real data sets from applications in color quantization, data compression, and image segmentation.

5,288 citations


Journal ArticleDOI
TL;DR: In this article, the authors categorize and evaluate face detection algorithms and discuss relevant issues such as data collection, evaluation metrics and benchmarking, and conclude with several promising directions for future research.
Abstract: Images containing faces are essential to intelligent vision-based human-computer interaction, and research efforts in face processing include face recognition, face tracking, pose estimation and expression recognition. However, many reported methods assume that the faces in an image or an image sequence have been identified and localized. To build fully automated systems that analyze the information contained in face images, robust and efficient face detection algorithms are required. Given a single image, the goal of face detection is to identify all image regions which contain a face, regardless of its 3D position, orientation and lighting conditions. Such a problem is challenging because faces are non-rigid and have a high degree of variability in size, shape, color and texture. Numerous techniques have been developed to detect faces in a single image, and the purpose of this paper is to categorize and evaluate these algorithms. We also discuss relevant issues such as data collection, evaluation metrics and benchmarking. After analyzing these algorithms and identifying their limitations, we conclude with several promising directions for future research.

3,894 citations


Journal ArticleDOI
TL;DR: The novelty of the approach is that it does not use a model selection criterion to choose one among a set of preestimated candidate models; instead, it seamlessly integrate estimation and model selection in a single algorithm.
Abstract: This paper proposes an unsupervised algorithm for learning a finite mixture model from multivariate data. The adjective "unsupervised" is justified by two properties of the algorithm: 1) it is capable of selecting the number of components and 2) unlike the standard expectation-maximization (EM) algorithm, it does not require careful initialization. The proposed method also avoids another drawback of EM for mixture fitting: the possibility of convergence toward a singular estimate at the boundary of the parameter space. The novelty of our approach is that we do not use a model selection criterion to choose one among a set of preestimated candidate models; instead, we seamlessly integrate estimation and model selection in a single algorithm. Our technique can be applied to any type of parametric mixture model for which it is possible to write an EM algorithm; in this paper, we illustrate it with experiments involving Gaussian mixtures. These experiments testify for the good performance of our approach.

2,182 citations


Journal ArticleDOI
TL;DR: A face detection algorithm for color images in the presence of varying lighting conditions as well as complex backgrounds is proposedBased on a novel lighting compensation technique and a nonlinear color transformation, this method detects skin regions over the entire image and generates face candidates based on the spatial arrangement of these skin patches.
Abstract: Human face detection plays an important role in applications such as video surveillance, human computer interface, face recognition, and face image database management. We propose a face detection algorithm for color images in the presence of varying lighting conditions as well as complex backgrounds. Based on a novel lighting compensation technique and a nonlinear color transformation, our method detects skin regions over the entire image and then generates face candidates based on the spatial arrangement of these skin patches. The algorithm constructs eye, mouth, and boundary maps for verifying each face candidate. Experimental results demonstrate successful face detection over a wide range of facial variations in color, position, scale, orientation, 3D pose, and expression in images from several photo collections (both indoors and outdoors).

2,075 citations


Journal ArticleDOI
TL;DR: Results indicating that querying for images using Blobworld produces higher precision than does querying using color and texture histograms of the entire image in cases where the image contains distinctive objects are presented.
Abstract: Retrieving images from large and varied collections using image content as a key is a challenging and important problem We present a new image representation that provides a transformation from the raw pixel data to a small set of image regions that are coherent in color and texture This "Blobworld" representation is created by clustering pixels in a joint color-texture-position feature space The segmentation algorithm is fully automatic and has been run on a collection of 10,000 natural images We describe a system that uses the Blobworld representation to retrieve images from this collection An important aspect of the system is that the user is allowed to view the internal representation of the submitted image and the query results Similar systems do not offer the user this view into the workings of the system; consequently, query results from these systems can be inexplicable, despite the availability of knobs for adjusting the similarity metrics By finding image regions that roughly correspond to objects, we allow querying at the level of objects rather than global image properties We present results indicating that querying for images using Blobworld produces higher precision than does querying using color and texture histograms of the entire image in cases where the image contains distinctive objects

1,574 citations


Journal ArticleDOI
TL;DR: An unsupervised feature selection algorithm suitable for data sets, large in both dimension and size, based on measuring similarity between features whereby redundancy therein is removed, which does not need any search and is fast.
Abstract: In this article, we describe an unsupervised feature selection algorithm suitable for data sets, large in both dimension and size. The method is based on measuring similarity between features whereby redundancy therein is removed. This does not need any search and, therefore, is fast. A new feature similarity measure, called maximum information compression index, is introduced. The algorithm is generic in nature and has the capability of multiscale representation of data sets. The superiority of the algorithm, in terms of speed and performance, is established extensively over various real-life data sets of different sizes and dimensions. It is also demonstrated how redundancy and information loss in feature selection can be quantified with an entropy measure.

1,432 citations


Journal ArticleDOI
TL;DR: This work derives a sequence of analytical results which show that the reconstruction constraints provide less and less useful information as the magnification factor increases, and proposes a super-resolution algorithm which attempts to recognize local features in the low-resolution images and then enhances their resolution in an appropriate manner.
Abstract: Nearly all super-resolution algorithms are based on the fundamental constraints that the super-resolution image should generate low resolution input images when appropriately warped and down-sampled to model the image formation process. (These reconstruction constraints are normally combined with some form of smoothness prior to regularize their solution.) We derive a sequence of analytical results which show that the reconstruction constraints provide less and less useful information as the magnification factor increases. We also validate these results empirically and show that, for large enough magnification factors, any smoothness prior leads to overly smooth results with very little high-frequency content. Next, we propose a super-resolution algorithm that uses a different kind of constraint in addition to the reconstruction constraints. The algorithm attempts to recognize local features in the low-resolution images and then enhances their resolution in an appropriate manner. We call such a super-resolution algorithm a hallucination or reconstruction algorithm. We tried our hallucination algorithm on two different data sets, frontal images of faces and printed Roman text. We obtained significantly better results than existing reconstruction-based algorithms, both qualitatively and in terms of RMS pixel error.

1,418 citations


Journal ArticleDOI
TL;DR: The developments of the last 20 years in the area of vision for mobile robot navigation are surveyed and the cases of navigation using optical flows, using methods from the appearance-based paradigm, and by recognition of specific objects in the environment are discussed.
Abstract: Surveys the developments of the last 20 years in the area of vision for mobile robot navigation. Two major components of the paper deal with indoor navigation and outdoor navigation. For each component, we have further subdivided our treatment of the subject on the basis of structured and unstructured environments. For indoor robots in structured environments, we have dealt separately with the cases of geometrical and topological models of space. For unstructured environments, we have discussed the cases of navigation using optical flows, using methods from the appearance-based paradigm, and by recognition of specific objects in the environment.

Journal ArticleDOI
TL;DR: This article evaluates the performance of three clustering algorithms, hard K-Means, single linkage, and a simulated annealing (SA) based technique, in conjunction with four cluster validity indices, namely Davies-Bouldin index, Dunn's index, Calinski-Harabasz index, andA recently developed index I.
Abstract: In this article, we evaluate the performance of three clustering algorithms, hard K-Means, single linkage, and a simulated annealing (SA) based technique, in conjunction with four cluster validity indices, namely Davies-Bouldin index, Dunn's index, Calinski-Harabasz index, and a recently developed index I. Based on a relation between the index I and the Dunn's index, a lower bound of the value of the former is theoretically estimated in order to get unique hard K-partition when the data set has distinct substructures. The effectiveness of the different validity indices and clustering methods in automatically evolving the appropriate number of clusters is demonstrated experimentally for both artificial and real-life data sets with the number of clusters varying from two to ten. Once the appropriate number of clusters is determined, the SA-based clustering technique is used for proper partitioning of the data into the said number of clusters.

Journal ArticleDOI
TL;DR: In this paper, the effects of aging on facial appearance can be explained using learned age transformations and present experimental results to show that reasonably accurate estimates of age can be made for unseen images.
Abstract: The process of aging causes significant alterations in the facial appearance of individuals When compared with other sources of variation in face images, appearance variation due to aging displays some unique characteristics Changes in facial appearance due to aging can even affect discriminatory facial features, resulting in deterioration of the ability of humans and machines to identify aged individuals We describe how the effects of aging on facial appearance can be explained using learned age transformations and present experimental results to show that reasonably accurate estimates of age can be made for unseen images We also show that we can improve our results by taking into account the fact that different individuals age in different ways and by considering the effect of lifestyle Our proposed framework can be used for simulating aging effects on new face images in order to predict how an individual might look like in the future or how he/she used to look in the past The methodology presented has also been used for designing a face recognition system, robust to aging variation In this context, the perceived age of the subjects in the training and test images is normalized before the training and classification procedure so that aging variation is eliminated Experimental results demonstrate that, when age normalization is used, the performance of our face recognition system can be improved

Journal ArticleDOI
TL;DR: A probabilistic approach that is able to compensate for imprecisely localized, partially occluded, and expression-variant faces even when only one single training sample per class is available to the system.
Abstract: The classical way of attempting to solve the face (or object) recognition problem is by using large and representative data sets. In many applications, though, only one sample per class is available to the system. In this contribution, we describe a probabilistic approach that is able to compensate for imprecisely localized, partially occluded, and expression-variant faces even when only one single training sample per class is available to the system. To solve the localization problem, we find the subspace (within the feature space, e.g., eigenspace) that represents this error for each of the training images. To resolve the occlusion problem, each face is divided into k local regions which are analyzed in isolation. In contrast with other approaches where a simple voting space is used, we present a probabilistic method that analyzes how "good" a local match is. To make the recognition system less sensitive to the differences between the facial expression displayed on the training and the testing images, we weight the results obtained on each local area on the basis of how much of this local area is affected by the expression displayed on the current test image.

Journal ArticleDOI
TL;DR: This work states that the FVC2000 protocol, databases, and results will be useful to all practitioners in the field not only as a benchmark for improving methods, but also for enabling an unbiased evaluation of algorithms.
Abstract: Reliable and accurate fingerprint recognition is a challenging pattern recognition problem, requiring algorithms robust in many contexts. FVC2000 competition attempted to establish the first common benchmark, allowing companies and academic institutions to unambiguously compare performance and track improvements in their fingerprint recognition algorithms. Three databases were created using different state-of-the-art sensors and a fourth database was artificially generated; 11 algorithms were extensively tested on the four data sets. We believe that FVC2000 protocol, databases, and results will be useful to all practitioners in the field not only as a benchmark for improving methods, but also for enabling an unbiased evaluation of algorithms.

Journal ArticleDOI
TL;DR: Formulas for the classification error for the following fusion methods: average, minimum, maximum, median, majority vote, and oracle are given.
Abstract: We look at a single point in feature space, two classes, and L classifiers estimating the posterior probability for class /spl omega//sub 1/. Assuming that the estimates are independent and identically distributed (normal or uniform), we give formulas for the classification error for the following fusion methods: average, minimum, maximum, median, majority vote, and oracle.

Journal ArticleDOI
TL;DR: The paper describes how this tracking system has been extended to provide a general framework for tracking in complex configurations and a visual servoing system constructed using this framework is presented together with results showing the accuracy of the tracker.
Abstract: Presents a framework for three-dimensional model-based tracking. Graphical rendering technology is combined with constrained active contour tracking to create a robust wire-frame tracking system. It operates in real time at video frame rate (25 Hz) on standard hardware. It is based on an internal CAD model of the object to be tracked which is rendered using a binary space partition tree to perform hidden line removal. A Lie group formalism is used to cast the motion computation problem into simple geometric terms so that tracking becomes a simple optimization problem solved by means of iterative reweighted least squares. A visual servoing system constructed using this framework is presented together with results showing the accuracy of the tracker. The paper then describes how this tracking system has been extended to provide a general framework for tracking in complex configurations. The adjoint representation of the group is used to transform measurements into common coordinate frames. The constraints are then imposed by means of Lagrange multipliers. Results from a number of experiments performed using this framework are presented and discussed.

Journal ArticleDOI
TL;DR: A set of real-world problems to random labelings of points is compared and it is found that real problems contain structures in this measurement space that are significantly different from the random sets.
Abstract: We studied a number of measures that characterize the difficulty of a classification problem, focusing on the geometrical complexity of the class boundary. We compared a set of real-world problems to random labelings of points and found that real problems contain structures in this measurement space that are significantly different from the random sets. Distributions of problems in this space show that there exist at least two independent factors affecting a problem's difficulty. We suggest using this space to describe a classifier's domain of competence. This can guide static and dynamic selection of classifiers for specific problems as well as subproblems formed by confinement, projection, and transformations of the feature vectors.

Journal ArticleDOI
TL;DR: Nonlinear support vector machines are investigated for appearance-based gender classification with low-resolution "thumbnail" faces processed from the FERET (FacE REcognition Technology) face database, demonstrating robustness and stability with respect to scale and the degree of facial detail.
Abstract: Nonlinear support vector machines (SVMs) are investigated for appearance-based gender classification with low-resolution "thumbnail" faces processed from 1,755 images from the FERET (FacE REcognition Technology) face database. The performance of SVMs (3.4% error) is shown to be superior to traditional pattern classifiers (linear, quadratic, Fisher linear discriminant, nearest-neighbor) as well as more modern techniques, such as radial basis function (RBF) classifiers and large ensemble-RBF networks. Furthermore, the difference in classification performance with low-resolution "thumbnails" (21/spl times/12 pixels) and the corresponding higher-resolution images (84/spl times/48 pixels) was found to be only 1%, thus demonstrating robustness and stability with respect to scale and the degree of facial detail.

Journal ArticleDOI
TL;DR: A new method of calculating mutual information between input and class variables based on the Parzen window is proposed, and this is applied to a feature selection algorithm for classification problems.
Abstract: Mutual information is a good indicator of relevance between variables, and have been used as a measure in several feature selection algorithms. However, calculating the mutual information is difficult, and the performance of a feature selection algorithm depends on the accuracy of the mutual information. In this paper, we propose a new method of calculating mutual information between input and class variables based on the Parzen window, and we apply this to a feature selection algorithm for classification problems.

Journal ArticleDOI
TL;DR: The DDMCMC paradigm provides a unifying framework in which the role of many existing segmentation algorithms are revealed as either realizing Markov chain dynamics or computing importance proposal probabilities and generalizes these segmentation methods in a principled way.
Abstract: This paper presents a computational paradigm called Data-Driven Markov Chain Monte Carlo (DDMCMC) for image segmentation in the Bayesian statistical framework. The paper contributes to image segmentation in four aspects. First, it designs efficient and well-balanced Markov Chain dynamics to explore the complex solution space and, thus, achieves a nearly global optimal solution independent of initial segmentations. Second, it presents a mathematical principle and a K-adventurers algorithm for computing multiple distinct solutions from the Markov chain sequence and, thus, it incorporates intrinsic ambiguities in image segmentation. Third, it utilizes data-driven (bottom-up) techniques, such as clustering and edge detection, to compute importance proposal probabilities, which drive the Markov chain dynamics and achieve tremendous speedup in comparison to the traditional jump-diffusion methods. Fourth, the DDMCMC paradigm provides a unifying framework in which the role of many existing segmentation algorithms, such as, edge detection, clustering, region growing, split-merge, snake/balloon, and region competition, are revealed as either realizing Markov chain dynamics or computing importance proposal probabilities. Thus, the DDMCMC paradigm combines and generalizes these segmentation methods in a principled way. The DDMCMC paradigm adopts seven parametric and nonparametric image models for intensity and color at various regions. We test the DDMCMC paradigm extensively on both color and gray-level images and some results are reported in this paper.

Journal ArticleDOI
TL;DR: A very efficient algorithm is proposed that extracts singular points from the high-resolution directional field of fingerprints and provides a consistent binary decision that is not based on postprocessing steps like applying a threshold on a continuous resemblance measure for singular points.
Abstract: The first subject of the paper is the estimation of a high resolution directional field of fingerprints. Traditional methods are discussed and a method, based on principal component analysis, is proposed. The method not only computes the direction in any pixel location, but its coherence as well. It is proven that this method provides exactly the same results as the "averaged square-gradient method" that is known from literature. Undoubtedly, the existence of a completely different equivalent solution increases the insight into the problem's nature. The second subject of the paper is singular point detection. A very efficient algorithm is proposed that extracts singular points from the high-resolution directional field. The algorithm is based on the Poincare index and provides a consistent binary decision that is not based on postprocessing steps like applying a threshold on a continuous resemblance measure for singular points. Furthermore, a method is presented to estimate the orientation of the extracted singular points. The accuracy of the methods is illustrated by experiments on a live-scanned fingerprint database.

Journal ArticleDOI
TL;DR: Experimental results suggest that using invariant features decreases the probability of being trapped in a local minimum and may be an effective solution for difficult range image registration problems where the scene is very small compared to the model.
Abstract: Investigates the use of Euclidean invariant features in a generalization of iterative closest point (ICP) registration of range images. Pointwise correspondences are chosen as the closest point with respect to a weighted linear combination of positional and feature distances. It is shown that, under ideal noise-free conditions, correspondences formed using this distance function are correct more often than correspondences formed using the positional distance alone. In addition, monotonic convergence to at least a local minimum is shown to hold for this method. When noise is present, a method that automatically sets the optimal relative contribution of features and positions is described. This method trades off the error in feature values due to noise against the error in positions due to misalignment. Experimental results suggest that using invariant features decreases the probability of being trapped in a local minimum and may be an effective solution for difficult range image registration problems where the scene is very small compared to the model.

Journal ArticleDOI
TL;DR: This paper presents the first example of a general system for autonomous localization using active vision, enabled here by a high-performance stereo head, addressing such issues as uncertainty-based measurement selection, automatic map-maintenance, and goal-directed steering.
Abstract: An active approach to sensing can provide the focused measurement capability over a wide field of view which allows correctly formulated simultaneous localization and map-building (SLAM) to be implemented with vision, permitting repeatable longterm localization using only naturally occurring, automatically-detected features. In this paper, we present the first example of a general system for autonomous localization using active vision, enabled here by a high-performance stereo head, addressing such issues as uncertainty-based measurement selection, automatic map-maintenance, and goal-directed steering. We present varied real-time experiments in a complex environment.

Journal ArticleDOI
TL;DR: In this paper, the authors address the problem of fingerprint individuality by quantifying the amount of information available in minutiae features to establish a correspondence between two fingerprint images, and derive an expression which estimates the probability of a false correspondence between minutia-based representations from two arbitrary fingerprints belonging to different fingers.
Abstract: Fingerprint identification is based on two basic premises: (1) persistence and (2) individuality. We address the problem of fingerprint individuality by quantifying the amount of information available in minutiae features to establish a correspondence between two fingerprint images. We derive an expression which estimates the probability of a false correspondence between minutiae-based representations from two arbitrary fingerprints belonging to different fingers. Our results show that (1) contrary to the popular belief, fingerprint matching is not infallible and leads to some false associations, (2) while there is an overwhelming amount of discriminatory information present in the fingerprints, the strength of the evidence degrades drastically with noise in the sensed fingerprint images, (3) the performance of the state-of-the-art automatic fingerprint matchers is not even close to the theoretical limit, and (4) because automatic fingerprint verification systems based on minutia use only a part of the discriminatory information present in the fingerprints, it may be desirable to explore additional complementary representations of fingerprints for automatic matching.

Journal ArticleDOI
Danny Barash1
TL;DR: In this paper, the relationship between bilateral filtering and anisotropic diffusion is examined, and adaptive smoothing is extended to make it consistent, thus enabling a unified viewpoint that relates nonlinear digital image filters and the nonlinear diffusion equation.
Abstract: In this paper, the relationship between bilateral filtering and anisotropic diffusion is examined. The bilateral filtering approach represents a large class of nonlinear digital image filters. We first explore the connection between anisotropic diffusion and adaptive smoothing, and then the connection between adaptive smoothing and bilateral filtering. Previously, adaptive smoothing was considered to be an inconsistent approximation to the nonlinear diffusion equation. We extend adaptive smoothing to make it consistent, thus enabling a unified viewpoint that relates nonlinear digital image filters and the nonlinear diffusion equation.

Journal ArticleDOI
TL;DR: Three methods for parameterizing lip image sequences for recognition using hidden Markov models are compared and two are top-down approaches that fit a model of the inner and outer lip contours and derive lipreading features from a principal component analysis of shape or shape and appearance, respectively.
Abstract: The multimodal nature of speech is often ignored in human-computer interaction, but lip deformations and other body motion, such as those of the head, convey additional information. We integrate speech cues from many sources and this improves intelligibility, especially when the acoustic signal is degraded. The paper shows how this additional, often complementary, visual speech information can be used for speech recognition. Three methods for parameterizing lip image sequences for recognition using hidden Markov models are compared. Two of these are top-down approaches that fit a model of the inner and outer lip contours and derive lipreading features from a principal component analysis of shape or shape and appearance, respectively. The third, bottom-up, method uses a nonlinear scale-space analysis to form features directly from the pixel intensity. All methods are compared on a multitalker visual speech recognition task of isolated letters.

Journal ArticleDOI
TL;DR: This research demonstrates that the LEM, together with the proposed generic line-segment Hausdorff distance measure, provides a new method for face coding and recognition.
Abstract: The automatic recognition of human faces presents a significant challenge to the pattern recognition research community. Typically, human faces are very similar in structure with minor differences from person to person. They are actually within one class of "human face". Furthermore, lighting conditions change, while facial expressions and pose variations further complicate the face recognition task as one of the difficult problems in pattern analysis. This paper proposes a novel concept: namely, that faces can be recognized using a line edge map (LEM). The LEM, a compact face feature, is generated for face coding and recognition. A thorough investigation of the proposed concept is conducted which covers all aspects of human face recognition, i.e. face recognition under (1) controlled/ideal conditions and size variations, (2) varying lighting conditions, (3) varying facial expressions, and (4) varying pose. The system performance is also compared with the eigenface method, one of the best face recognition techniques, and with reported experimental results of other methods. A face pre-filtering technique is proposed to speed up the search process. It is a very encouraging to find that the proposed face recognition technique has performed better than the eigenface method in most of the comparison experiments. This research demonstrates that the LEM, together with the proposed generic line-segment Hausdorff distance measure, provides a new method for face coding and recognition.

Journal ArticleDOI
TL;DR: This work applies the multiresolution wavelet transform to extract the waveletface and performs the linear discriminant analysis on waveletfaces to reinforce discriminant power.
Abstract: Feature extraction, discriminant analysis, and classification rules are three crucial issues for face recognition. We present hybrid approaches to handle three issues together. For feature extraction, we apply the multiresolution wavelet transform to extract the waveletface. We also perform the linear discriminant analysis on waveletfaces to reinforce discriminant power. During classification, the nearest feature plane (NFP) and nearest feature space (NFS) classifiers are explored for robust decisions in presence of wide facial variations. Their relationships to conventional nearest neighbor and nearest feature line classifiers are demonstrated. In the experiments, the discriminant waveletface incorporated with the NFS classifier achieves the best face recognition performance.

Journal ArticleDOI
TL;DR: A fuzzy logic approach, UFM (unified feature matching), for region-based image retrieval, which greatly reduces the influence of inaccurate segmentation and provides a very intuitive quantification.
Abstract: This paper proposes a fuzzy logic approach, UFM (unified feature matching), for region-based image retrieval. In our retrieval system, an image is represented by a set of segmented regions, each of which is characterized by a fuzzy feature (fuzzy set) reflecting color, texture, and shape properties. As a result, an image is associated with a family of fuzzy features corresponding to regions. Fuzzy features naturally characterize the gradual transition between regions (blurry boundaries) within an image and incorporate the segmentation-related uncertainties into the retrieval algorithm. The resemblance of two images is then defined as the overall similarity between two families of fuzzy features and quantified by a similarity measure, UFM measure, which integrates properties of all the regions in the images. Compared with similarity measures based on individual regions and on all regions with crisp-valued feature representations, the UFM measure greatly reduces the influence of inaccurate segmentation and provides a very intuitive quantification. The UFM has been implemented as a part of our experimental SIMPLIcity image retrieval system. The performance of the system is illustrated using examples from an image database of about 60,000 general-purpose images.