Showing papers in "IEEE Transactions on Pattern Analysis and Machine Intelligence in 1997"
[...]
TL;DR: A face recognition algorithm which is insensitive to large variation in lighting direction and facial expression is developed, based on Fisher's linear discriminant and produces well separated classes in a low-dimensional subspace, even under severe variations in lighting and facial expressions.
Abstract: We develop a face recognition algorithm which is insensitive to large variation in lighting direction and facial expression. Taking a pattern classification approach, we consider each pixel in an image as a coordinate in a high-dimensional space. We take advantage of the observation that the images of a particular face, under varying illumination but fixed pose, lie in a 3D linear subspace of the high dimensional image space-if the face is a Lambertian surface without shadowing. However, since faces are not truly Lambertian surfaces and do indeed produce self-shadowing, images will deviate from this linear subspace. Rather than explicitly modeling this deviation, we linearly project the image into a subspace in a manner which discounts those regions of the face with large deviation. Our projection method is based on Fisher's linear discriminant and produces well separated classes in a low-dimensional subspace, even under severe variation in lighting and facial expressions. The eigenface technique, another method based on linearly projecting the image space to a low dimensional subspace, has similar computational requirements. Yet, extensive experimental results demonstrate that the proposed "Fisherface" method has error rates that are lower than those of the eigenface technique for tests on the Harvard and Yale face databases.
11,674Â citations
[...]
TL;DR: Pfinder is a real-time system for tracking people and interpreting their behavior that uses a multiclass statistical model of color and shape to obtain a 2D representation of head and hands in a wide range of viewing conditions.
Abstract: Pfinder is a real-time system for tracking people and interpreting their behavior. It runs at 10 Hz on a standard SGI Indy computer, and has performed reliably on thousands of people in many different physical locations. The system uses a multiclass statistical model of color and shape to obtain a 2D representation of head and hands in a wide range of viewing conditions. Pfinder has been successfully used in a wide range of applications including wireless interfaces, video databases, and low-bandwidth coding.
4,229Â citations
[...]
TL;DR: A system for recognizing human faces from single images out of a large database containing one image per person, based on a Gabor wavelet transform, which is constructed from a small get of sample image graphs.
Abstract: We present a system for recognizing human faces from single images out of a large database containing one image per person. Faces are represented by labeled graphs, based on a Gabor wavelet transform. Image graphs of new faces are extracted by an elastic graph matching process and can be compared by a simple similarity function. The system differs from the preceding one (Lades et al., 1993) in three respects. Phase information is used for accurate node positioning. Object-adapted graphs are used to handle large rotations in depth. Image graph extraction is based on a novel data structure, the bunch graph, which is constructed from a small get of sample image graphs.
2,902Â citations
[...]
TL;DR: This work studies the problem of choosing an optimal feature set for land use classification based on SAR satellite images using four different texture models and shows that pooling features derived from different texture Models, followed by a feature selection results in a substantial improvement in the classification accuracy.
Abstract: A large number of algorithms have been proposed for feature subset selection. Our experimental results show that the sequential forward floating selection algorithm, proposed by Pudil et al. (1994), dominates the other algorithms tested. We study the problem of choosing an optimal feature set for land use classification based on SAR satellite images using four different texture models. Pooling features derived from different texture models, followed by a feature selection results in a substantial improvement in the classification accuracy. We also illustrate the dangers of using feature selection in small sample size situations.
2,136Â citations
[...]
TL;DR: A fraction of the recycle slurry is treated with sulphuric acid to convert at least some of the gypsum to calcium sulphate hemihydrate and the slurry comprising hemihYDrate is returned to contact the mixture of phosphate rock, phosphoric acid and recycle Gypsum slurry.
Abstract: The use of hand gestures provides an attractive alternative to cumbersome interface devices for human-computer interaction (HCI). In particular, visual interpretation of hand gestures can help in achieving the ease and naturalness desired for HCI. This has motivated a very active research area concerned with computer vision-based analysis and interpretation of hand gestures. We survey the literature on visual interpretation of hand gestures in the context of its role in HCI. This discussion is organized on the basis of the method used for modeling, analyzing, and recognizing gestures. Important differences in the gesture interpretation approaches arise depending on whether a 3D model of the human hand or an image appearance model of the human hand is used. 3D hand models offer a way of more elaborate modeling of hand gestures but lead to computational hurdles that have not been overcome given the real-time requirements of HCI. Appearance-based models lead to computationally efficient "purposive" approaches that work well under constrained situations but seem to lack the generality desirable for HCI. We also discuss implemented gestural systems as well as other potential applications of vision-based gesture recognition. Although the current progress is encouraging, further theoretical as well as computational advances are needed before gestures can be widely used for HCI. We discuss directions of future research in gesture recognition, including its integration with other natural modes of human-computer interaction.
1,906Â citations
[...]
TL;DR: This paper addresses the problem of retrieving images from large image databases with a method based on local grayvalue invariants which are computed at automatically detected interest points and allows for efficient retrieval from a database of more than 1,000 images.
Abstract: This paper addresses the problem of retrieving images from large image databases. The method is based on local grayvalue invariants which are computed at automatically detected interest points. A voting algorithm and semilocal constraints make retrieval possible. Indexing allows for efficient retrieval from a database of more than 1,000 images. Experimental results show correct retrieval in the case of partial visibility, similarity transformations, extraneous features, and small perspective deformations.
1,732Â citations
[...]
TL;DR: This paper shows that by preceding the eight-point algorithm with a very simple normalization (translation and scaling) of the coordinates of the matched points, results are obtained comparable with the best iterative algorithms.
Abstract: The fundamental matrix is a basic tool in the analysis of scenes taken with two uncalibrated cameras, and the eight-point algorithm is a frequently cited method for computing the fundamental matrix from a set of eight or more point matches. It has the advantage of simplicity of implementation. The prevailing view is, however, that it is extremely susceptible to noise and hence virtually useless for most purposes. This paper challenges that view, by showing that by preceding the algorithm with a very simple normalization (translation and scaling) of the coordinates of the matched points, results are obtained comparable with the best iterative algorithms. This improved performance is justified by theory and verified by extensive experiments on real images.
1,625Â citations
[...]
TL;DR: An unsupervised technique for visual learning is presented, which is based on density estimation in high-dimensional spaces using an eigenspace decomposition and is applied to the probabilistic visual modeling, detection, recognition, and coding of human faces and nonrigid objects.
Abstract: We present an unsupervised technique for visual learning, which is based on density estimation in high-dimensional spaces using an eigenspace decomposition. Two types of density estimates are derived for modeling the training data: a multivariate Gaussian (for unimodal distributions) and a mixture-of-Gaussians model (for multimodal distributions). Those probability densities are then used to formulate a maximum-likelihood estimation framework for visual search and target detection for automatic object recognition and coding. Our learning technique is applied to the probabilistic visual modeling, detection, recognition, and coding of human faces and nonrigid objects, such as hands.
1,617Â citations
[...]
TL;DR: An improved version of the minutia extraction algorithm proposed by Ratha et al. (1995), which is much faster and more reliable, is implemented for extracting features from an input fingerprint image captured with an online inkless scanner and an alignment-based elastic matching algorithm has been developed.
Abstract: Fingerprint verification is one of the most reliable personal identification methods. However, manual fingerprint verification is incapable of meeting today's increasing performance requirements. An automatic fingerprint identification system (AFIS) is needed. This paper describes the design and implementation of an online fingerprint verification system which operates in two stages: minutia extraction and minutia matching. An improved version of the minutia extraction algorithm proposed by Ratha et al. (1995), which is much faster and more reliable, is implemented for extracting features from an input fingerprint image captured with an online inkless scanner. For minutia matching, an alignment-based elastic matching algorithm has been developed. This algorithm is capable of finding the correspondences between minutiae in the input image and the stored template without resorting to exhaustive search and has the ability of adaptively compensating for the nonlinear deformations and inexact pose transformations between fingerprints. The system has been tested on two sets of fingerprint images captured with inkless scanners. The verification accuracy is found to be acceptable. Typically, a complete fingerprint verification procedure takes, on an average, about eight seconds on a SPARC 20 workstation. These experimental results show that our system meets the response time requirements of online verification with high accuracy.
1,334Â citations
[...]
TL;DR: Evaluating the sensitivity of image representations to changes in illumination, as well as viewpoint and facial expression, indicated that none of the representations considered is sufficient by itself to overcome image variations because of a change in the direction of illumination.
Abstract: A face recognition system must recognize a face from a novel image despite the variations between images of the same face. A common approach to overcoming image variations because of changes in the illumination conditions is to use image representations that are relatively insensitive to these variations. Examples of such representations are edge maps, image intensity derivatives, and images convolved with 2D Gabor-like filters. Here we present an empirical study that evaluates the sensitivity of these representations to changes in illumination, as well as viewpoint and facial expression. Our findings indicated that none of the representations considered is sufficient by itself to overcome image variations because of a change in the direction of illumination. Similar results were obtained for changes due to viewpoint and expression. Image representations that emphasized the horizontal features were found to be less sensitive to changes in the direction of illumination. However, systems based only on such representations failed to recognize up to 20 percent of the faces in our database. Humans performed considerably better under the same conditions. We discuss possible reasons for this superiority and alternative methods for overcoming illumination effects in recognition.
1,099Â citations
[...]
TL;DR: In this article, the authors present a technique for constructing random fields from a set of training samples, where each feature has a weight that is trained by minimizing the Kullback-Leibler divergence between the model and the empirical distribution of the training data.
Abstract: We present a technique for constructing random fields from a set of training samples. The learning paradigm builds increasingly complex fields by allowing potential functions, or features, that are supported by increasingly large subgraphs. Each feature has a weight that is trained by minimizing the Kullback-Leibler divergence between the model and the empirical distribution of the training data. A greedy algorithm determines how features are incrementally added to the field and an iterative scaling algorithm is used to estimate the optimal values of the weights. The random field models and techniques introduced in this paper differ from those common to much of the computer vision literature in that the underlying random fields are non-Markovian and have a large number of parameters that must be estimated. Relations to other learning approaches, including decision trees, are given. As a demonstration of the method, we describe its application to the problem of automatic word classification in natural language processing.
[...]
TL;DR: A method for combining classifiers that uses estimates of each individual classifier's local accuracy in small regions of feature space surrounding an unknown test sample to confirm the validity of this approach.
Abstract: This paper presents a method for combining classifiers that uses estimates of each individual classifier's local accuracy in small regions of feature space surrounding an unknown test sample. An empirical evaluation using five real data sets confirms the validity of our approach compared to some other combination of multiple classifiers algorithms. We also suggest a methodology for determining the best mix of individual classifiers.
[...]
TL;DR: A computer vision system for observing facial motion by using an optimal estimation optical flow method coupled with geometric, physical and motion-based dynamic models describing the facial structure produces a reliable parametric representation of the face's independent muscle action groups, as well as an accurate estimate of facial motion.
Abstract: We describe a computer vision system for observing facial motion by using an optimal estimation optical flow method coupled with geometric, physical and motion-based dynamic models describing the facial structure. Our method produces a reliable parametric representation of the face's independent muscle action groups, as well as an accurate estimate of facial motion. Previous efforts at analysis of facial expression have been based on the facial action coding system (FACS), a representation developed in order to allow human psychologists to code expression from static pictures. To avoid use of this heuristic coding scheme, we have used our computer vision system to probabilistically characterize facial motion and muscle activation in an experimental population, thus deriving a new, more accurate, representation of human facial expressions that we call FACS+. Finally, we show how this method can be used for coding, analysis, interpretation, and recognition of facial expressions.
[...]
TL;DR: A compact parametrized model of facial appearance which takes into account all sources of variability and can be used for tasks such as image coding, person identification, 3D pose recovery, gender recognition, and expression recognition is described.
Abstract: Face images are difficult to interpret because they are highly variable. Sources of variability include individual appearance, 3D pose, facial expression, and lighting. We describe a compact parametrized model of facial appearance which takes into account all these sources of variability. The model represents both shape and gray-level appearance, and is created by performing a statistical analysis over a training set of face images. A robust multiresolution search algorithm is used to fit the model to faces in new images. This allows the main facial features to be located, and a set of shape, and gray-level appearance parameters to be recovered. A good approximation to a given face can be reconstructed using less than 100 of these parameters. This representation can be used for tasks such as image coding, person identification, 3D pose recovery, gender recognition, and expression recognition. Experimental results are presented for a database of 690 face images obtained under widely varying conditions of 3D pose, lighting, and facial expression. The system performs well on all the tasks listed above.
[...]
TL;DR: This work proposes an original technique, based on ridge line following, where the minutiae are extracted directly from gray scale images, and results achieved are compared with those obtained through some methods based on image binarization.
Abstract: Most automatic systems for fingerprint comparison are based on minutiae matching. Minutiae are essentially terminations and bifurcations of the ridge lines that constitute a fingerprint pattern. Automatic minutiae detection is an extremely critical process, especially in low-quality fingerprints where noise and contrast deficiency can originate pixel configurations similar to minutiae or hide real minutiae. Several approaches have been proposed in the literature; although rather different from each other, all these methods transform fingerprint images into binary images. In this work we propose an original technique, based on ridge line following, where the minutiae are extracted directly from gray scale images. The results achieved are compared with those obtained through some methods based on image binarization. In spite of a greater conceptual complexity, the method proposed performs better both in terms of efficiency and robustness.
[...]
TL;DR: A new method for evaluating edge detection algorithms is presented and applied to measure the relative performance of algorithms by Canny, Nalwa-Binford, Iverson-Zucker, Bergholm, and Rothwell, and the results agree with visual evaluations of the edge images.
Abstract: A new method for evaluating edge detection algorithms is presented and applied to measure the relative performance of algorithms by Canny, Nalwa-Binford, Iverson-Zucker, Bergholm, and Rothwell. The basic measure of performance is a visual rating score which indicates the perceived quality of the edges for identifying an object. The process of evaluating edge detection algorithms with this performance measure requires the collection of a set of gray-scale images, optimizing the input parameters for each algorithm, conducting visual evaluation experiments and applying statistical analysis methods. The novel aspect of this work is the use of a visual task and real images of complex scenes in evaluating edge detectors. The method is appealing because, by definition, the results agree with visual evaluations of the edge images.
[...]
TL;DR: A deterministic annealing approach to pairwise clustering is described which shares the robustness properties of maximum entropy inference and the resulting Gibbs probability distributions are estimated by mean-field approximation.
Abstract: Partitioning a data set and extracting hidden structure from the data arises in different application areas of pattern recognition, speech and image processing. Pairwise data clustering is a combinatorial optimization method for data grouping which extracts hidden structure from proximity data. We describe a deterministic annealing approach to pairwise clustering which shares the robustness properties of maximum entropy inference. The resulting Gibbs probability distributions are estimated by mean-field approximation. A new structure-preserving algorithm to cluster dissimilarity data and to simultaneously embed these data in a Euclidian vector space is discussed which can be used for dimensionality reduction and data visualization. The suggested embedding algorithm which outperforms conventional approaches has been implemented to analyze dissimilarity data from protein analysis and from linguistics. The algorithm for pairwise data clustering is used to segment textured images.
[...]
TL;DR: This work has shown that the paraperspective factorization method can be applied to a much wider range of motion scenarios, including image sequences containing motion toward the camera and aerial image sequences of terrain taken from a low-altitude airplane.
Abstract: The factorization method, first developed by Tomasi and Kanade (1992), recovers both the shape of an object and its motion from a sequence of images, using many images and tracking many feature points to obtain highly redundant feature position information. The method robustly processes the feature trajectory information using singular value decomposition (SVD), taking advantage of the linear algebraic properties of orthographic projection. However, an orthographic formulation limits the range of motions the method can accommodate. Paraperspective projection, first introduced by Ohta et al. (1981), is a projection model that closely approximates perspective projection by modeling several effects not modeled under orthographic projection, while retaining linear algebraic properties. Our paraperspective factorization method can be applied to a much wider range of motion scenarios, including image sequences containing motion toward the camera and aerial image sequences of terrain taken from a low-altitude airplane.
[...]
TL;DR: A comparative study of six well-known pruning methods with the aim of understanding their theoretical foundations, their computational complexity, and the strengths and weaknesses of their formulation, and an objective evaluation of the tendency to overprune/underprune observed in each method is made.
Abstract: In this paper, we address the problem of retrospectively pruning decision trees induced from data, according to a top-down approach. This problem has received considerable attention in the areas of pattern recognition and machine learning, and many distinct methods have been proposed in literature. We make a comparative study of six well-known pruning methods with the aim of understanding their theoretical foundations, their computational complexity, and the strengths and weaknesses of their formulation. Comments on the characteristics of each method are empirically supported. In particular, a wide experimentation performed on several data sets leads us to opposite conclusions on the predictive accuracy of simplified trees from some drawn in the literature. We attribute this divergence to differences in experimental designs. Finally, we prove and make use of a property of the reduced error pruning method to obtain an objective evaluation of the tendency to overprune/underprune observed in each method.
[...]
TL;DR: This work proposes a new and general surface representation scheme for recognizing objects with free-form (sculpted) surfaces, and introduces the shape spectrum of an object, a novel concept, within the framework of COSMOS for object view grouping and matching.
Abstract: We address the problem of representing and recognizing 3D free-form objects when (1) the object viewpoint is arbitrary, (2) the objects may vary in shape and complexity, and (3) no restrictive assumptions are made about the types of surfaces on the object. We assume that a range image of a scene is available, containing a view of a rigid 3D object without occlusion. We propose a new and general surface representation scheme for recognizing objects with free-form (sculpted) surfaces. In this scheme, an object is described concisely in terms of maximal surface patches of constant shape index. The maximal patches that represent the object are mapped onto the unit sphere via their orientations, and aggregated via shape spectral functions. Properties such as surface area, curvedness, and connectivity, which are required to capture local and global information, are also built into the representation. The scheme yields a meaningful and rich description useful for object recognition. A novel concept, the shape spectrum of an object is also introduced within the framework of COSMOS for object view grouping and matching. We demonstrate the generality and the effectiveness of our scheme using real range images of complex objects.
[...]
TL;DR: A technique which is based on elastic matching of sketched templates over the shapes in the images to evaluate similarity ranks and is integrated with arrangements to provide scale invariance and take into account spatial relationships between objects in multi-object queries.
Abstract: Effective image retrieval by content from database requires that visual image properties are used instead of textual labels to properly index and recover pictorial data. Retrieval by shape similarity, given a user-sketched template is particularly challenging, owing to the difficulty to derive a similarity measure that closely conforms to the common perception of similarity by humans. In this paper, we present a technique which is based on elastic matching of sketched templates over the shapes in the images to evaluate similarity ranks. The degree of matching achieved and the elastic deformation energy spent by the sketch to achieve such a match are used to derive a measure of similarity between the sketch and the images in the database and to rank images to be displayed. The elastic matching is integrated with arrangements to provide scale invariance and take into account spatial relationships between objects in multi-object queries. Examples from a prototype system are expounded with considerations about the effectiveness of the approach and comparative performance analysis.
[...]
TL;DR: For linear object classes, it is shown that linear transformations can be learned exactly from a basis set of 2D prototypical views and preliminary evidence that the technique can effectively "rotate" high-resolution face images from a single 2D view is shown.
Abstract: The need to generate new views of a 3D object from a single real image arises in several fields, including graphics and object recognition. While the traditional approach relies on the use of 3D models, simpler techniques are applicable under restricted conditions. The approach exploits image transformations that are specific to the relevant object class, and learnable from example views of other "prototypical" objects of the same class. In this paper, we introduce such a technique by extending the notion of linear class proposed by the authors (1992). For linear object classes, it is shown that linear transformations can be learned exactly from a basis set of 2D prototypical views. We demonstrate the approach on artificial objects and then show preliminary evidence that the technique can effectively "rotate" high-resolution face images from a single 2D view.
[...]
TL;DR: A simplified model of a pushbroom sensor (the linear Pushbroom model) is introduced, which has the advantage of computational simplicity while at the same time giving very accurate results compared with the full orbiting push broom model.
Abstract: Modeling and analyzing pushbroom sensors commonly used in satellite imagery is difficult and computationally intensive due to the motion of an orbiting satellite with respect to the rotating Earth, and the nonlinearity of the mathematical model involving orbital dynamics. In this paper, a simplified model of a pushbroom sensor (the linear pushbroom model) is introduced. It has the advantage of computational simplicity while at the same time giving very accurate results compared with the full orbiting pushbroom model. Besides remote sensing, the linear pushbroom model is also useful in many other imaging applications. Simple noniterative methods are given for solving the major standard photogrammetric problems for the linear pushbroom model: computation of the model parameters from ground-control points; determination of relative model parameters from image correspondences between two images; and scene reconstruction given image correspondences and ground-control points. The linear pushbroom model leads to theoretical insights that are approximately valid for the full model as well. The epipolar geometry of linear pushbroom cameras is investigated and shown to be totally different from that of a perspective camera. Nevertheless, a matrix analogous to the fundamental matrix of perspective cameras is shown to exist for linear pushbroom sensors. From this it is shown that a scene is determined up to an affine transformation from two views with linear pushbroom cameras.
[...]
TL;DR: A simple and practical algorithm to efficiently search for the nearest neighbor within Euclidean distance /spl epsiv/ is presented and made possible to construct an inexpensive hardware search engine which can be 100 times faster than its software equivalent.
Abstract: The problem of finding the closest point in high-dimensional spaces is common in pattern recognition. Unfortunately, the complexity of most existing search algorithms, such as k-d tree and R-tree, grows exponentially with dimension, making them impractical for dimensionality above 15. In nearly all applications, the closest point is of interest only if it lies within a user-specified distance /spl epsiv/. We present a simple and practical algorithm to efficiently search for the nearest neighbor within Euclidean distance /spl epsiv/. The use of projection search combined with a novel data structure dramatically improves performance in high dimensions. A complexity analysis is presented which helps to automatically determine /spl epsiv/ in structured problems. A comprehensive set of benchmarks clearly shows the superiority of the proposed algorithm for a variety of structured and unstructured search problems. Object recognition is demonstrated as an example application. The simplicity of the algorithm makes it possible to construct an inexpensive hardware search engine which can be 100 times faster than its software equivalent. A C++ implementation of our algorithm is available.
[...]
TL;DR: It is found that the partial differential equations given by gradient descent on U(I; /spl Lambda/, S) are essentially reaction-diffusion equations, where the usualEnergy terms produce anisotropic diffusion, while the inverted energy terms produce reaction associated with pattern formation, enhancing preferred image features.
Abstract: This article addresses two important themes in early visual computation: it presents a novel theory for learning the universal statistics of natural images, and, it proposes a general framework of designing reaction-diffusion equations for image processing. We studied the statistics of natural images including the scale invariant properties, then generic prior models were learned to duplicate the observed statistics, based on minimax entropy theory. The resulting Gibbs distributions have potentials of the form U(I; /spl Lambda/, S)=/spl Sigma//sub /spl alpha/=1//sup k//spl Sigma//sub x,y//spl lambda//sup (/spl alpha/)/((F/sup (/spl alpha/)/*I)(x,y)) with S={F/sup (1)/, F/sup (2)/,...,F/sup (K)/} being a set of filters and /spl Lambda/={/spl lambda//sup (1)/(),/spl lambda//sup (2)/(),...,/spl lambda//sup (K)/()} the potential functions. The learned Gibbs distributions confirm and improve the form of existing prior models such as line-process, but, in contrast to all previous models, inverted potentials were found to be necessary. We find that the partial differential equations given by gradient descent on U(I; /spl Lambda/, S) are essentially reaction-diffusion equations, where the usual energy terms produce anisotropic diffusion, while the inverted energy terms produce reaction associated with pattern formation, enhancing preferred image features. We illustrate how these models can be used for texture pattern rendering, denoising, image enhancement, and clutter removal by careful choice of both prior and data models of this type, incorporating the appropriate features.
[...]
TL;DR: The main conclusion of the study is that the active process of graph-editing outperforms the alternatives in terms of its ability to effectively control a large population of contaminating clutter.
Abstract: This paper describes a Bayesian framework for performing relational graph matching by discrete relaxation. Our basic aim is to draw on this framework to provide a comparative evaluation of a number of contrasting approaches to relational matching. Broadly speaking there are two main aspects to this study. Firstly we focus on the issue of how relational inexactness may be quantified. We illustrate that several popular relational distance measures can be recovered as specific limiting cases of the Bayesian consistency measure. The second aspect of our comparison concerns the way in which structural inexactness is controlled. We investigate three different realizations of the matching process which draw on contrasting control models. The main conclusion of our study is that the active process of graph-editing outperforms the alternatives in terms of its ability to effectively control a large population of contaminating clutter.
[...]
TL;DR: A state-based technique for the representation and recognition of gesture is presented, using techniques for computing a prototype trajectory of an ensemble of trajectories and for defining configuration states along the prototype and for recognizing gestures from an unsegmented, continuous stream of sensor data.
Abstract: A state-based technique for the representation and recognition of gesture is presented. We define a gesture to be a sequence of states in a measurement or configuration space. For a given gesture, these states are used to capture both the repeatability and variability evidenced in a training set of example trajectories. Using techniques for computing a prototype trajectory of an ensemble of trajectories, we develop methods for defining configuration states along the prototype and for recognizing gestures from an unsegmented, continuous stream of sensor data. The approach is illustrated by application to a range of gesture-related sensory data: the two-dimensional movements of a mouse input device, the movement of the hand measured by a magnetic spatial position and orientation sensor, and, lastly, the changing eigenvector projection coefficients computed from an image sequence.
[...]
TL;DR: Experimental results prove that the approach using the variable duration outperforms the method using fixed duration in terms of both accuracy and speed.
Abstract: A fast method of handwritten word recognition suitable for real time applications is presented in this paper. Preprocessing, segmentation and feature extraction are implemented using a chain code representation of the word contour. Dynamic matching between characters of a lexicon entry and segment(s) of the input word image is used to rank the lexicon entries in order of best match. Variable duration for each character is defined and used during the matching. Experimental results prove that our approach using the variable duration outperforms the method using fixed duration in terms of both accuracy and speed. Speed of the entire recognition process is about 200 msec on a single SPARC-10 platform and the recognition accuracy is 96.8 percent are achieved for lexicon size of 10, on a database of postal words captured at 212 dpi.
[...]
TL;DR: This work has developed techniques for distinguishing which language is represented in an image of text using a technique based on character shape codes, a representation of Latin text that is inexpensive to compute.
Abstract: Most document recognition work to date has been performed on English text. Because of the large overlap of the character sets found in English and major Western European languages such as French and German, some extensions of the basic English capability to those languages have taken place. However, automatic language identification prior to optical character recognition is not commonly available and adds utility to such systems. Languages and their scripts have attributes that make it possible to determine the language of a document automatically. Detection of the values of these attributes requires the recognition of particular features of the document image and, in the case of languages using Latin-based symbols, the character syntax of the underlying language. We have developed techniques for distinguishing which language is represented in an image of text. This work is restricted to a small but important subset of the world's languages. The method first classifies the script into two broad classes: Han-based and Latin-based. This classification is based on the spatial relationships of features related to the upward concavities in character structures. Language identification within the Han script class (Chinese, Japanese, Korean) is performed by analysis of the distribution of optical density in the text images. We handle 23 Latin-based languages using a technique based on character shape codes, a representation of Latin text that is inexpensive to compute.
[...]
TL;DR: A measure which combines the relative fidelity and efficiency of a curve segmentation is described, and this measure is used to compare the application of 23 algorithms to a curve first used by Teh and Chin (1989).
Abstract: Given the enormous number of available methods for finding polygonal approximations to curves techniques are required to assess different algorithms. Some of the standard approaches are shown to be unsuitable if the approximations contain varying numbers of lines. Instead, we suggest assessing an algorithm's results relative to an optimal polygon, and describe a measure which combines the relative fidelity and efficiency of a curve segmentation. We use this measure to compare the application of 23 algorithms to a curve first used by Teh and Chin (1989); their integral square errors (ISEs) are assessed relative to the optimal ISE. In addition, using an example of pose estimation, it is shown how goal-directed evaluation can be used to select an appropriate assessment criterion.