scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Pattern Analysis and Machine Intelligence in 1998"


Journal ArticleDOI
TL;DR: In this article, a visual attention system inspired by the behavior and the neuronal architecture of the early primate visual system is presented, where multiscale image features are combined into a single topographical saliency map.
Abstract: A visual attention system, inspired by the behavior and the neuronal architecture of the early primate visual system, is presented. Multiscale image features are combined into a single topographical saliency map. A dynamical neural network then selects attended locations in order of decreasing saliency. The system breaks down the complex problem of scene understanding by rapidly selecting, in a computationally efficient manner, conspicuous locations to be analyzed in detail.

10,525 citations


Journal ArticleDOI
Tin Kam Ho1
TL;DR: A method to construct a decision tree based classifier is proposed that maintains highest accuracy on training data and improves on generalization accuracy as it grows in complexity.
Abstract: Much of previous attention on decision trees focuses on the splitting criteria and optimization of tree sizes. The dilemma between overfitting and achieving maximum accuracy is seldom resolved. A method to construct a decision tree based classifier is proposed that maintains highest accuracy on training data and improves on generalization accuracy as it grows in complexity. The classifier consists of multiple trees constructed systematically by pseudorandomly selecting subsets of components of the feature vector, that is, trees constructed in randomly chosen subspaces. The subspace method is compared to single-tree classifiers and other forest construction methods by experiments on publicly available datasets, where the method's superiority is demonstrated. We also discuss independence between trees in a forest and relate that to the combined classification accuracy.

5,984 citations


Journal ArticleDOI
TL;DR: A common theoretical framework for combining classifiers which use distinct pattern representations is developed and it is shown that many existing schemes can be considered as special cases of compound classification where all the pattern representations are used jointly to make a decision.
Abstract: We develop a common theoretical framework for combining classifiers which use distinct pattern representations and show that many existing schemes can be considered as special cases of compound classification where all the pattern representations are used jointly to make a decision. An experimental comparison of various classifier combination schemes demonstrates that the combination rule developed under the most restrictive assumptions-the sum rule-outperforms other classifier combinations schemes. A sensitivity analysis of the various schemes to estimation errors is carried out to show that this finding can be justified theoretically.

5,670 citations


Journal ArticleDOI
TL;DR: A neural network-based upright frontal face detection system that arbitrates between multiple networks to improve performance over a single network, and a straightforward procedure for aligning positive face examples for training.
Abstract: We present a neural network-based upright frontal face detection system. A retinally connected neural network examines small windows of an image and decides whether each window contains a face. The system arbitrates between multiple networks to improve performance over a single network. We present a straightforward procedure for aligning positive face examples for training. To collect negative examples, we use a bootstrap algorithm, which adds false detections into the training set as training progresses. This eliminates the difficult task of manually selecting nonface training examples, which must be chosen to span the entire space of nonface images. Simple heuristics, such as using the fact that faces rarely overlap in images, can further improve the accuracy. Comparisons with several other state-of-the-art face detection systems are presented, showing that our system has comparable performance in terms of detection and false-positive rates.

4,105 citations


Journal ArticleDOI
TL;DR: A fast fingerprint enhancement algorithm is presented, which can adaptively improve the clarity of ridge and valley structures of input fingerprint images based on the estimated local ridge orientation and frequency.
Abstract: In order to ensure that the performance of an automatic fingerprint identification/verification system will be robust with respect to the quality of input fingerprint images, it is essential to incorporate a fingerprint enhancement algorithm in the minutiae extraction module. We present a fast fingerprint enhancement algorithm, which can adaptively improve the clarity of ridge and valley structures of input fingerprint images based on the estimated local ridge orientation and frequency. We have evaluated the performance of the image enhancement algorithm using the goodness index of the extracted minutiae and the accuracy of an online fingerprint verification system. Experimental results show that incorporating the enhancement algorithm improves both the goodness index and the verification accuracy.

2,212 citations


Journal ArticleDOI
TL;DR: An example-based learning approach for locating vertical frontal views of human faces in complex scenes and shows empirically that the distance metric adopted for computing difference feature vectors, and the "nonface" clusters included in the distribution-based model, are both critical for the success of the system.
Abstract: We present an example-based learning approach for locating vertical frontal views of human faces in complex scenes. The technique models the distribution of human face patterns by means of a few view-based "face" and "nonface" model clusters. At each image location, a difference feature vector is computed between the local image pattern and the distribution-based model. A trained classifier determines, based on the difference feature vector measurements, whether or not a human face exists at the current image location. We show empirically that the distance metric we adopt for computing difference feature vectors, and the "nonface" clusters we include in our distribution-based model, are both critical for the success of our system.

2,013 citations


Journal ArticleDOI
TL;DR: Two real-time hidden Markov model-based systems for recognizing sentence-level continuous American sign language (ASL) using a single camera to track the user's unadorned hands are presented.
Abstract: We present two real-time hidden Markov model-based systems for recognizing sentence-level continuous American sign language (ASL) using a single camera to track the user's unadorned hands. The first system observes the user from a desk mounted camera and achieves 92 percent word accuracy. The second system mounts the camera in a cap worn by the user and achieves 98 percent accuracy (97 percent with an unrestricted grammar). Both experiments use a 40-word lexicon.

1,350 citations


Journal ArticleDOI
TL;DR: This work develops a computationally efficient method for handling the geometric distortions produced by changes in pose and combines geometry and illumination into an algorithm that tracks large image regions using no more computation than would be required to track with no accommodation for illumination changes.
Abstract: As an object moves through the field of view of a camera, the images of the object may change dramatically. This is not simply due to the translation of the object across the image plane; complications arise due to the fact that the object undergoes changes in pose relative to the viewing camera, in illumination relative to light sources, and may even become partially or fully occluded. We develop an efficient general framework for object tracking, which addresses each of these complications. We first develop a computationally efficient method for handling the geometric distortions produced by changes in pose. We then combine geometry and illumination into an algorithm that tracks large image regions using no more computation than would be required to track with no accommodation for illumination changes. Finally, we augment these methods with techniques from robust statistics and treat occluded regions on the object as statistical outliers. Experimental results are given to demonstrate the effectiveness of our methods.

1,261 citations


Journal ArticleDOI
TL;DR: By analyzing the scale-space behavior of a model line profile, it is shown how the bias that is induced by asymmetrical lines can be removed and the algorithm not only returns the precise subpixel line position, but also the width of the line for each line point, also with subpixel accuracy.
Abstract: The extraction of curvilinear structures is an important low-level operation in computer vision that has many applications. Most existing operators use a simple model for the line that is to be extracted, i.e., they do not take into account the surroundings of a line. This leads to the undesired consequence that the line will be extracted in the wrong position whenever a line with different lateral contrast is extracted. In contrast, the algorithm proposed in this paper uses an explicit model for lines and their surroundings. By analyzing the scale-space behavior of a model line profile, it is shown how the bias that is induced by asymmetrical lines can be removed. Furthermore, the algorithm not only returns the precise subpixel line position, but also the width of the line for each line point, also with subpixel accuracy.

1,200 citations


Journal ArticleDOI
TL;DR: The stochastic model allows us to learn a string-edit distance function from a corpus of examples and is applicable to any string classification problem that may be solved using a similarity function against a database of labeled prototypes.
Abstract: In many applications, it is necessary to determine the similarity of two strings. A widely-used notion of string similarity is the edit distance: the minimum number of insertions, deletions, and substitutions required to transform one string into the other. In this report, we provide a stochastic model for string-edit distance. Our stochastic model allows us to learn a string-edit distance function from a corpus of examples. We illustrate the utility of our approach by applying it to the difficult problem of learning the pronunciation of words in conversational speech. In this application, we learn a string-edit distance with nearly one-fifth the error rate of the untrained Levenshtein distance. Our approach is applicable to any string classification problem that may be solved using a similarity function against a database of labeled prototypes.

897 citations


Journal ArticleDOI
TL;DR: The proposed system does not require feature extraction and performs recognition on images regarded as points of a space of high dimension without estimating pose, indicating that SVMs are well-suited for aspect-based recognition.
Abstract: Support vector machines (SVMs) have been recently proposed as a new technique for pattern recognition. Intuitively, given a set of points which belong to either of two classes, a linear SVM finds the hyperplane leaving the largest possible fraction of points of the same class on the same side, while maximizing the distance of either class from the hyperplane. The hyperplane is determined by a subset of the points of the two classes, named support vectors, and has a number of interesting theoretical properties. In this paper, we use linear SVMs for 3D object recognition. We illustrate the potential of SVMs on a database of 7200 images of 100 different objects. The proposed system does not require feature extraction and performs recognition on images regarded as points of a space of high dimension without estimating pose. The excellent recognition rates achieved in all the performed experiments indicate that SVMs are well-suited for aspect-based recognition.

Journal ArticleDOI
TL;DR: A Bayesian treatment is provided, integrating over uncertainty in y and in the parameters that control the Gaussian process prior the necessary integration over y is carried out using Laplace's approximation, and the method is generalized to multiclass problems (m>2) using the softmax function.
Abstract: We consider the problem of assigning an input vector to one of m classes by predicting P(c|x) for c=1,...,m. For a two-class problem, the probability of class one given x is estimated by /spl sigma/(y(x)), where /spl sigma/(y)=1/(1+e/sup -y/). A Gaussian process prior is placed on y(x), and is combined with the training data to obtain predictions for new x points. We provide a Bayesian treatment, integrating over uncertainty in y and in the parameters that control the Gaussian process prior the necessary integration over y is carried out using Laplace's approximation. The method is generalized to multiclass problems (m>2) using the softmax function. We demonstrate the effectiveness of the method on a number of datasets.

Journal ArticleDOI
TL;DR: A prototype biometrics system which integrates faces and fingerprints is developed which overcomes the limitations of face recognition systems as well as fingerprint verification systems and operates in the identification mode with an admissible response time.
Abstract: An automatic personal identification system based solely on fingerprints or faces is often not able to meet the system performance requirements. We have developed a prototype biometrics system which integrates faces and fingerprints. The system overcomes the limitations of face recognition systems as well as fingerprint verification systems. The integrated prototype system operates in the identification mode with an admissible response time. The identity established by the system is more reliable than the identity established by a face recognition system. In addition, the proposed decision fusion scheme enables performance improvement by integrating multiple cues with different confidence measures. Experimental results demonstrate that our system performs very well. It meets the response time as well as the accuracy requirements.

Journal ArticleDOI
TL;DR: Experiments show that the proposed measure of dissimilarity uses the linearly interpolated intensity functions surrounding the pixels alleviates the problem of sampling with little additional computational overhead.
Abstract: Because of image sampling, traditional measures of pixel dissimilarity can assign a large value to two corresponding pixels in a stereo pair, even in the absence of noise and other degrading effects. We propose a measure of dissimilarity that is provably insensitive to sampling because it uses the linearly interpolated intensity functions surrounding the pixels. Experiments on real images show that our measure alleviates the problem of sampling with little additional computational overhead.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a novel method for image corner detection based on the curvature scale-space (CSS) representation. And the method is robust to noise, and they believe that it performs better than the existing corner detectors.
Abstract: This paper describes a novel method for image corner detection based on the curvature scale-space (CSS) representation. The first step is to extract edges from the original image using a Canny detector (1986). The corner points of an image are defined as points where image edges have their maxima of absolute curvature. The corner points are detected at a high scale of the CSS and tracked through multiple lower scales to improve localization. This method is very robust to noise, and we believe that it performs better than the existing corner detectors An improvement to Canny edge detector's response to 45/spl deg/ and 135/spl deg/ edges is also proposed. Furthermore, the CSS detector can provide additional point features (curvature zero-crossings of image edge contours) in addition to the traditional corners.

Journal ArticleDOI
TL;DR: Local scale control is shown to be important for the estimation of blur in complex images, where the potential for interference between nearby edges of very different blur scale requires that estimates be made at the minimum reliable scale.
Abstract: We show that knowledge of sensor properties and operator norms can be exploited to define a unique, locally computable minimum reliable scale for local estimation at each point in the image. This method for local scale control is applied to the problem of detecting and localizing edges in images with shallow depth of field and shadows. We show that edges spanning a broad range of blur scales and contrasts can be recovered accurately by a single system with no input parameters other than the second moment of the sensor noise. A natural dividend of this approach is a measure of the thickness of contours which can be used to estimate focal and penumbral blur. Local scale control is shown to be important for the estimation of blur in complex images, where the potential for interference between nearby edges of very different blur scale requires that estimates be made at the minimum reliable scale.

Journal ArticleDOI
TL;DR: These measures serve as a general tool for image matching that are applicable to other vision problems such as motion estimation and texture-based image retrieval and suggest the superiority of ordinal measures over existing techniques under nonideal conditions.
Abstract: We present ordinal measures of association for image correspondence in the context of stereo. Linear correspondence measures like correlation and the sum of squared difference between intensity distributions are known to be fragile. Ordinal measures which are based on relative ordering of intensity values in windows-rank permutations-have demonstrable robustness. By using distance metrics between two rank permutations, ordinal measures are defined. These measures are independent of absolute intensity scale and invariant to monotone transformations of intensity values like gamma variation between images. We have developed simple algorithms for their efficient implementation. Experiments suggest the superiority of ordinal measures over existing techniques under nonideal conditions. These measures serve as a general tool for image matching that are applicable to other vision problems such as motion estimation and texture-based image retrieval.

Journal ArticleDOI
TL;DR: A new algorithm for error-correcting subgraph isomorphism detection from a set of model graphs to an unknown input graph is proposed based on a compact representation of the model graphs that can be combined with a future cost estimation method that greatly improves its run-time performance.
Abstract: We propose a new algorithm for error-correcting subgraph isomorphism detection from a set of model graphs to an unknown input graph. The algorithm is based on a compact representation of the model graphs. This representation is derived from the set of model graphs in an off-line preprocessing step. The main advantage of the proposed representation is that common subgraphs of different model graphs are represented only once. Therefore, at run time, given an unknown input graph, the computational effort of matching the common subgraphs for each model graph onto the input graph is done only once. Consequently, the new algorithm is only sublinearly dependent on the number of model graphs. Furthermore, the new algorithm can be combined with a future cost estimation method that greatly improves its run-time performance.

Journal ArticleDOI
TL;DR: It is shown that the color of an impinging light plane can be identified from the image of the illuminated scene, even with colorful scenes, despite the fact that it does not rely on spatial color sequences.
Abstract: In range sensing with time-multiplexed structured light, there is a trade-off between accuracy, robustness and the acquisition period. In this paper a novel structured light method is described. Adaptation of the number and form of the projection patterns to the characteristics of the scene takes place as part of the acquisition process. Noise margins are matched to the actual noise level, thus reducing the number of projection patterns to the necessary minimum. Color is used for light plane labeling. The dimension of the pattern space are thus increased without raising the number of projection patterns. It is shown that the color of an impinging light plane can be identified from the image of the illuminated scene, even with colorful scenes. Identification is local and does not rely on spatial color sequences. The suggested approach has been implemented and the theoretical results are supported by experiments.

Journal ArticleDOI
TL;DR: A unified approach to handling moving object detection in both 2D and 3D scenes is described, with a strategy to gracefully bridge the gap between those two extremes, based on a stratification of theMoving object detection problem into scenarios which gradually increase in their complexity.
Abstract: The detection of moving objects is important in many tasks. Previous approaches to this problem can be broadly divided into two classes: 2D algorithms which apply when the scene can be approximated by a flat surface and/or when the camera is only undergoing rotations and zooms, and 3D algorithms which work well only when significant depth variations are present in the scene and the camera is translating. We describe a unified approach to handling moving object detection in both 2D and 3D scenes, with a strategy to gracefully bridge the gap between those two extremes. Our approach is based on a stratification of the moving object detection problem into scenarios which gradually increase in their complexity. We present a set of techniques that match the above stratification. These techniques progressively increase in their complexity, ranging from 2D techniques to more complex 3D techniques. Moreover, the computations required for the solution to the problem at one complexity level become the initial processing step for the solution at the next complexity level. We illustrate these techniques using examples from real-image sequences.

Journal ArticleDOI
TL;DR: A method is described for selecting the optimal focus measure with respect to gray-level noise from a given set of focus measures in passive autofocusing and depth-from-focus applications based on two new metrics that have been defined for estimating the noise-sensitivity of different focus measures.
Abstract: A method is described for selecting the optimal focus measure with respect to gray-level noise from a given set of focus measures in passive autofocusing and depth-from-focus applications. The method is based on two new metrics that have been defined for estimating the noise-sensitivity of different focus measures. The first metric-the autofocusing uncertainty measure (AUM)-is useful in understanding the relation between gray-level noise and the resulting error in lens position for autofocusing. The second metric-autofocusing root-mean-square error (ARMS error)-is an improved metric closely related to AUM. AUM and ARMS error metrics are based on a theoretical noise sensitivity analysis of focus measures, and they are related by a monotonic expression. The theoretical results are validated by actual and simulation experiments. For a given camera, the optimally accurate focus measure may change from one object to the other depending on their focused images. Therefore, selecting the optimal focus measure from a given set involves computing all focus measures in the set.

Journal ArticleDOI
TL;DR: A Bayesian-based methodology is presented which automatically penalizes overcomplex models being fitted to unknown data and is able to select an "optimal" number of components in the model and so partition data sets.
Abstract: A Bayesian-based methodology is presented which automatically penalizes overcomplex models being fitted to unknown data. We show that, with a Gaussian mixture model, the approach is able to select an "optimal" number of components in the model and so partition data sets. The performance of the Bayesian method is compared to other methods of optimal model selection and found to give good results. The methods are tested on synthetic and real data sets.

Journal ArticleDOI
TL;DR: Rotation invariant texture features are computed based on an extension of the popular multi-channel Gabor filtering technique, and their effectiveness is tested with 300 randomly rotated samples of 15 Brodatz textures to solve a practical but hitherto mostly overlooked problem in document image processing.
Abstract: Concerns the extraction of rotation invariant texture features and the use of such features in script identification from document images Rotation invariant texture features are computed based on an extension of the popular multi-channel Gabor filtering technique, and their effectiveness is tested with 300 randomly rotated samples of 15 Brodatz textures These features are then used in an attempt to solve a practical but hitherto mostly overlooked problem in document image processing-the identification of the script of a machine printed document Automatic script and language recognition is an essential front-end process for the efficient and correct use of OCR and language translation products in a multilingual environment Six languages (Chinese, English, Greek, Russian, Persian, and Malayalam) are chosen to demonstrate the potential of such a texture-based approach in script identification

Journal ArticleDOI
R.A. Morano1, Cengizhan Ozturk1, R. Conn1, Stephen Dubin1, S. Zietz1, J. Nissano1 
TL;DR: This work solves the correspondence problem in active stereo vision using a novel pseudorandom coded structured light (SL) scheme that performs well in the presence of occlusion.
Abstract: We solve the correspondence problem in active stereo vision using a novel pseudorandom coded structured light (SL). This coding scheme performs well in the presence of occlusion. In settings where color coding is feasible, 3D information can be obtained using a single image.

Journal ArticleDOI
TL;DR: It is found that there is an inherent limitation in the precision of computing the Zernike moments due to the geometric nature of a circular domain.
Abstract: We give a detailed analysis of the accuracy of Zernike moments in terms of their discretization errors and the reconstruction power. It is found that there is an inherent limitation in the precision of computing the Zernike moments due to the geometric nature of a circular domain. This is explained by relating the accuracy issue to a celebrated problem in analytic number theory of evaluating the lattice points within a circle.

Journal ArticleDOI
TL;DR: A revised definition of optical flow is proposed to overcome shortcomings in interpreting optical flow merely as a geometric transformation field and leads to a general framework for the investigation of problems in dynamic scene analysis, based on the integration and unified treatment of both geometric and radiometric cues in time-varying imagery.
Abstract: Optical flow has been commonly defined as the apparent motion of image brightness patterns in an image sequence. In this paper, we propose a revised definition to overcome shortcomings in interpreting optical flow merely as a geometric transformation field. The new definition is a complete representation of geometric and radiometric variations in dynamic imagery. We argue that this is more consistent with the common interpretation of optical flow induced by various scene events. This leads to a general framework for the investigation of problems in dynamic scene analysis, based on the integration and unified treatment of both geometric and radiometric cues in time-varying imagery. We discuss selected models, including the generalized dynamic image model, for the estimation of optical flow. We show how various 3D scene information are encoded in, and thus may be extracted from, the geometric and radiometric components of optical flow. We provide selected examples based on experiments with real images.

Journal ArticleDOI
TL;DR: Describes a complete system for the recognition of off-line handwriting, including segmentation and normalization of word images to give invariance to scale, slant, slope and stroke thickness.
Abstract: Describes a complete system for the recognition of off-line handwriting. Preprocessing techniques are described, including segmentation and normalization of word images to give invariance to scale, slant, slope and stroke thickness. Representation of the image is discussed and the skeleton and stroke features used are described. A recurrent neural network is used to estimate probabilities for the characters represented in the skeleton. The operation of the hidden Markov model that calculates the best word in the lexicon is also described. Issues of vocabulary choice, rejection, and out-of-vocabulary word recognition are discussed.

Journal ArticleDOI
TL;DR: An analytic-to-holistic approach which can identify faces at different perspective variations is proposed, and it is shown that this approach can achieve a similar level of performance from different viewing directions of a face.
Abstract: We propose an analytic-to-holistic approach which can identify faces at different perspective variations. The database for the test consists of 40 frontal-view faces. The first step is to locate 15 feature points on a face. A head model is proposed, and the rotation of the face can be estimated using geometrical measurements. The positions of the feature points are adjusted so that their corresponding positions for the frontal view are approximated. These feature points are then compared with the feature points of the faces in a database using a similarity transform. In the second step, we set up windows for the eyes, nose, and mouth. These feature windows are compared with those in the database by correlation. Results show that this approach can achieve a similar level of performance from different viewing directions of a face. Under different perspective variations, the overall recognition rates are over 84 percent and 96 percent for the first and the first three likely matched faces, respectively.

Journal ArticleDOI
TL;DR: This work derives the features for image representation which are invariant with respect to blur regardless of the degradation PSF provided that it is centrally symmetric, and proves that there exist two classes of such features: the first one in the spatial domain and the second in the frequency domain.
Abstract: Analysis and interpretation of an image which was acquired by a nonideal imaging system is the key problem in many application areas. The observed image is usually corrupted by blurring, spatial degradations, and random noise. Classical methods like blind deconvolution try to estimate the blur parameters and to restore the image. We propose an alternative approach. We derive the features for image representation which are invariant with respect to blur regardless of the degradation PSF provided that it is centrally symmetric. As we prove in the paper, there exist two classes of such features: the first one in the spatial domain and the second one in the frequency domain. We also derive so-called combined invariants, which are invariant to composite geometric and blur degradations. Knowing these features, we can recognize objects in the degraded scene without any restoration.

Journal ArticleDOI
TL;DR: A prototype system for automatically registering and integrating multiple views of objects from range data and the results can then be used to construct geometric models of the objects.
Abstract: Automatic 3D object model construction is important in applications ranging from manufacturing to entertainment, since CAD models of existing objects may be either unavailable or unusable. We describe a prototype system for automatically registering and integrating multiple views of objects from range data. The results can then be used to construct geometric models of the objects. New techniques for handling key problems such as robust estimation of transformations relating multiple views and seamless integration of registered data to form an unbroken surface have been proposed and implemented in the system. Experimental results on real surface data acquired using a digital interferometric sensor as well as a laser range scanner demonstrate the good performance of our system.