scispace - formally typeset
Search or ask a question

Showing papers on "Feature vector published in 1997"


Journal ArticleDOI
TL;DR: Three kinds of algorithms that learn axis-parallel rectangles to solve the multiple instance problem are described and compared, giving 89% correct predictions on a musk odor prediction task.

2,767 citations


Book ChapterDOI
08 Oct 1997
TL;DR: A new method for performing a nonlinear form of Principal Component Analysis by the use of integral operator kernel functions is proposed and experimental results on polynomial feature extraction for pattern recognition are presented.
Abstract: A new method for performing a nonlinear form of Principal Component Analysis is proposed. By the use of integral operator kernel functions, one can efficiently compute principal components in highdimensional feature spaces, related to input space by some nonlinear map; for instance the space of all possible d-pixel products in images. We give the derivation of the method and present experimental results on polynomial feature extraction for pattern recognition.

2,223 citations


Journal ArticleDOI
TL;DR: A new approach to shape recognition based on a virtually infinite family of binary features (queries) of the image data, designed to accommodate prior information about shape invariance and regularity, and a comparison with artificial neural networks methods is presented.
Abstract: We explore a new approach to shape recognition based on a virtually infinite family of binary features (queries) of the image data, designed to accommodate prior information about shape invariance and regularity. Each query corresponds to a spatial arrangement of several local topographic codes (or tags), which are in themselves too primitive and common to be informative about shape. All the discriminating power derives from relative angles and distances among the tags. The important attributes of the queries are a natural partial ordering corresponding to increasing structure and complexity; semi-invariance, meaning that most shapes of a given class will answer the same way to two queries that are successive in the ordering; and stability, since the queries are not based on distinguished points and substructures. No classifier based on the full feature set can be evaluated, and it is impossible to determine a priori which arrangements are informative. Our approach is to select informative features and build tree classifiers at the same time by inductive learning. In effect, each tree provides an approximation to the full posterior where the features chosen depend on the branch that is traversed. Due to the number and nature of the queries, standard decision tree construction based on a fixed-length feature vector is not feasible. Instead we entertain only a small random sample of queries at each node, constrain their complexity to increase with tree depth, and grow multiple trees. The terminal nodes are labeled by estimates of the corresponding posterior distribution over shape classes. An image is classified by sending it down every tree and aggregating the resulting distributions. The method is applied to classifying handwritten digits and synthetic linear and nonlinear deformations of three hundred L AT E X symbols. Stateof-the-art error rates are achieved on the National Institute of Standards and Technology database of digits. The principal goal of the experiments on L AT E X symbols is to analyze invariance, generalization error and related issues, and a comparison with artificial neural networks methods is presented in this context.

1,214 citations


Proceedings ArticleDOI
17 Jun 1997
TL;DR: This paper shows that a new variant of the k-d tree search algorithm makes indexing in higher-dimensional spaces practical, and is integrated into a fully developed recognition system, which is able to detect complex objects in real, cluttered scenes in just a few seconds.
Abstract: Shape indexing is a way of making rapid associations between features detected in an image and object models that could have produced them. When model databases are large, the use of high-dimensional features is critical, due to the improved level of discrimination they can provide. Unfortunately, finding the nearest neighbour to a query point rapidly becomes inefficient as the dimensionality of the feature space increases. Past indexing methods have used hash tables for hypothesis recovery, but only in low-dimensional situations. In this paper we show that a new variant of the k-d tree search algorithm makes indexing in higher-dimensional spaces practical. This Best Bin First, or BBF search is an approximate algorithm which finds the nearest neighbour for a large fraction of the queries, and a very close neighbour in the remaining cases. The technique has been integrated into a fully developed recognition system, which is able to detect complex objects in real, cluttered scenes in just a few seconds.

1,044 citations


Journal ArticleDOI
TL;DR: The discriminatory power of various human facial features is studied and a new scheme for Automatic Face Recognition (AFR) is proposed and an efficient projection-based feature extraction and classification scheme for AFR is proposed.
Abstract: In this paper the discriminatory power of various human facial features is studied and a new scheme for Automatic Face Recognition (AFR) is proposed. Using Linear Discriminant Analysis (LDA) of different aspects of human faces in spatial domain, we first evaluate the significance of visual information in different parts/features of the face for identifying the human subject. The LDA of faces also provides us with a small set of features that carry the most relevant information for classification purposes. The features are obtained through eigenvector analysis of scatter matrices with the objective of maximizing between-class and minimizing within-class variations. The result is an efficient projection-based feature extraction and classification scheme for AFR. Soft decisions made based on each of the projections are combined, using probabilistic or evidential approaches to multisource data analysis. For medium-sized databases of human faces, good classification accuracy is achieved using very low-dimensional feature vectors.

892 citations


Proceedings ArticleDOI
26 Oct 1997
TL;DR: Experimental results show that the image retrieval precision increases considerably by using the proposed integration approach, and the relevance feedback technique from the IR domain is used in content-based image retrieval to demonstrate the effectiveness of this conversion.
Abstract: Technology advances in the areas of image processing (IP) and information retrieval (IR) have evolved separately for a long time. However, successful content-based image retrieval systems require the integration of the two. There is an urgent need to develop integration mechanisms to link the image retrieval model to text retrieval model, such that the well established text retrieval techniques can be utilized. Approaches of converting image feature vectors (IF domain) to weighted-term vectors (IR domain) are proposed in this paper. Furthermore, the relevance feedback technique from the IR domain is used in content-based image retrieval to demonstrate the effectiveness of this conversion. Experimental results show that the image retrieval precision increases considerably by using the proposed integration approach.

815 citations


Journal ArticleDOI
TL;DR: The paper demonstrates a successful application of PDBNN to face recognition applications on two public (FERET and ORL) and one in-house (SCR) databases and experimental results on three different databases such as recognition accuracies as well as false rejection and false acceptance rates are elaborated.
Abstract: This paper proposes a face recognition system, based on probabilistic decision-based neural networks (PDBNN). With technological advance on microelectronic and vision system, high performance automatic techniques on biometric recognition are now becoming economically feasible. Among all the biometric identification methods, face recognition has attracted much attention in recent years because it has potential to be most nonintrusive and user-friendly. The PDBNN face recognition system consists of three modules: First, a face detector finds the location of a human face in an image. Then an eye localizer determines the positions of both eyes in order to generate meaningful feature vectors. The facial region proposed contains eyebrows, eyes, and nose, but excluding mouth (eye-glasses will be allowed). Lastly, the third module is a face recognizer. The PDBNN can be effectively applied to all the three modules. It adopts a hierarchical network structures with nonlinear basis functions and a competitive credit-assignment scheme. The paper demonstrates a successful application of PDBNN to face recognition applications on two public (FERET and ORL) and one in-house (SCR) databases. Regarding the performance, experimental results on three different databases such as recognition accuracies as well as false rejection and false acceptance rates are elaborated. As to the processing speed, the whole recognition process (including PDBNN processing for eye localization, feature extraction, and classification) consumes approximately one second on Sparc10, without using hardware accelerator or co-processor.

637 citations


Journal ArticleDOI
TL;DR: In this paper, the authors discuss the effectiveness of several shape measures for content-based similarity retrieval of images, including outline based features (chain code based string features, Fourier descriptors, UNL Fourier features), region-based features (invariant moments, Zernike moments, pseudo-Zernike moment), and combined features.
Abstract: A great deal of work has been done on the evaluation of information retrieval systems for alphanumeric data. The same thing can not be said about the newly emerging multimedia and image database systems. One of the central concerns in these systems is the automatic characterization of image content and retrieval of images based on similarity of image content. In this paper, we discuss effectiveness of several shape measures for content based similarity retrieval of images. The different shape measures we have implemented include outline based features (chain code based string features, Fourier descriptors, UNL Fourier features), region based features (invariant moments, Zernike moments, pseudo-Zernike moments), and combined features (invariant moments & Fourier descriptors, invariant moments & UNL Fourier features). Given an image, all these shape feature measures (vectors) are computed automatically, and the feature vector can either be used for the retrieval purpose or can be stored in the database for future queries. We have tested all of the above shape features for image retrieval on a database of 500 trademark images. The average retrieval efficiency values computed over a set of fifteen representative queries for all the methods is presented. The output of a sample shape similarity query using all the features is also shown.

416 citations


Patent
28 Mar 1997
TL;DR: In this article, a schema is defined as a specific collection of primitives and a specific schema implies a specific set of visual features to be processed and a corresponding feature vector to be used for content-based similarity scoring.
Abstract: A system and method for content-based search and retrieval of visual objects. A base visual information retrieval (VIR) engine utilizes a set of universal primitives to operate on the visual objects. An extensible VIR engine allows custom, modular primitives to be defined and registered. A custom primitive addresses domain specific problems and can utilize any image understanding technique. Object attributes can be extracted over the entire image or over only a portion of the object. A schema is defined as a specific collection of primitives. A specific schema implies a specific set of visual features to be processed and a corresponding feature vector to be used for content-based similarity scoring. A primitive registration interface registers custom primitives and facilitates storing of an analysis function and a comparison function to a schema table. A heterogeneous comparison allows objects analyzed by different schemas to be compared if at least one primitive is in common between the schemas. A threshold-based comparison is utilized to improve performance of the VIR engine. A distance between two feature vectors is computed in any of the comparison processes so as to generate a similarity score.

379 citations


Journal ArticleDOI
TL;DR: It is concluded that in the light of the vast hardware resources available in the ventral stream of the primate visual system relative to those exercised here, the appealingly simple feature-space conjecture remains worthy of serious consideration as a neurobiological model.
Abstract: Severe architectural and timing constraints within the primate visual system support the conjecture that the early phase of object recognition in the brain is based on a feedforward feature-extraction hierarchy. To assess the plausibility of this conjecture in an engineering context, a difficult three-dimensional object recognition domain was developed to challenge a pure feedforward, receptive-field-based recognition model called SEEMORE. SEEMORE is based on 102 viewpoint-invariant nonlinear filters that as a group are sensitive to contour, texture, and color cues. The visual domains consists of 100 real objects of many different types, including rigid (shovel), nonrigid (telephone cord), and statistical (maple leaf cluster) objects and photographs of complex scenes. Objects were individually presented in color video images under normal room lighting conditions. Based on 12 to 36 training views, SEEMORE was required to recognize unnormalized test views of objects that could vary in position, orientation in the image plane and in depth, and scale (factor of 2); for nonrigid objects, recognition was also tested under gross shape deformations. Correct classification performance on a test set consisting of 600 novel object views was 97 percent (chance was 1 percent) and was comparable for the subset of 15 nonrigid objects. Performance was also measured under a variety of image degradation conditions, including partial occlusion, limited clutter, color shift, and additive noise. Generalization behavior and classification errors illustrated the emergence of several striking natural shape categories that are not explicitly encoded in the dimensions of the feature space. It is concluded that in the light of the vast hardware resources available in the ventral stream of the primate visual system relative to those exercised here, the appealingly simple feature-space conjecture remains worthy of serious consideration as a neurobiological model.

371 citations


01 Jan 1997
TL;DR: WBIIS as mentioned in this paper applies a Daubechies' wavelet transform for each of the three opponent color components, and the wavelet coefficients in the lowest few frequency bands, and their variances, are stored as feature vectors.
Abstract: This paper describes WBIIS (Wavelet-Based Image Indexing and Searching), a new image indexing and retrieval algorithm with partial sketch image search- ing capability for large image databases. The algorithm characterizes the color variations over the spatial extent of the image in a manner that provides semantically meaningful image comparisons. The indexing algorithm applies a Daubechies' wavelet transform for each of the three opponent color components. The wavelet coeA- cients in the lowest few frequency bands, and their variances, are stored as feature vectors. To speed up retrieval, a two-step procedure is used that first does a crude selection based on the variances, and then refines the search by performing a feature vector match between the selected images and the query. For better accuracy in searching, two-level multiresolution matching may also be used. Masks are used for partial-sketch queries. This technique performs much better in capturing coherence of image, object granularity, local color/texture, and bias avoidance than traditional color layout algorithms. WBIIS is much faster and more accurate than traditional algorithms. When tested on a database of more than 10000 general-purpose images, the best 100 matches were found in 3.3 seconds.

Proceedings Article
01 Dec 1997
TL;DR: The isometric feature mapping procedure, or isomap, is able to reliably recover low-dimensional nonlinear structure in realistic perceptual data sets, such as a manifold of face images, where conventional global mapping methods find only local minima.
Abstract: Nonlinear dimensionality reduction is formulated here as the problem of trying to find a Euclidean feature-space embedding of a set of observations that preserves as closely as possible their intrinsic metric structure - the distances between points on the observation manifold as measured along geodesic paths. Our isometric feature mapping procedure, or isomap, is able to reliably recover low-dimensional nonlinear structure in realistic perceptual data sets, such as a manifold of face images, where conventional global mapping methods find only local minima. The recovered map provides a canonical set of globally meaningful features, which allows perceptual transformations such as interpolation, extrapolation, and analogy - highly nonlinear transformations in the original observation space - to be computed with simple linear operations in feature space.

Proceedings Article
27 Jul 1997
TL;DR: Some approaches to flame recognition are described, mcluding a prototype system, Smokey, which builds a 47-element feature vector based on the syntax and semantics of each sentence, combining the vectors for the sentences within each message.
Abstract: Abusive messages (flames) can be both a source of frustration and a waste of time for Internet users This paper describes some approaches to flame recognition, mcluding a prototype system, Smokey Smokey builds a 47-element feature vector based on the syntax and semantics of each sentence, combining the vectors for the sentences within each message A training set of 720 messages was used by Quinlan's C45 decision-tree generator to determine feature-based rules that were able to correctly categorize 64% of the flames and 98% of the nonflames in a separate test set of 460 messages Additional techniques for greater accuracy and user customization are also discussed

Patent
28 Mar 1997
TL;DR: In this article, a schema is defined as a specific collection of primitives and a specific schema implies a specific set of visual features to be processed and a corresponding feature vector to be used for content-based similarity scoring.
Abstract: A system and method for content-based search and retrieval of visual objects. A base visual information retrieval (VIR) engine utilizes a set of universal primitives to operate on the visual objects. An extensible VIR engine allows custom, modular primitives to be defined and registered. A custom primitive addresses domain specific problems and can utilize any image understanding technique. Object attributes can be extracted over the entire image or over only a portion of the object. A schema is defined as a specific collection of primitives. A specific schema implies a specific set of visual features to be processed and a corresponding feature vector to be used for content-based similarity scoring. A primitive registration interface registers custom primitives and facilitates storing of an analysis function and a comparison function to a schema table. A heterogeneous comparison allows objects analyzed by different schemas to be compared if at least one primitive is in common between the schemas. A threshold-based comparison is utilized to improve performance of the VIR engine. A distance between two feature vectors is computed in any of the comparison processes so as to generate a similarity score.

Proceedings ArticleDOI
20 Apr 1997
TL;DR: An experimental evaluation of adaptive and non-adaptive visual servoing in 3, 6 and 12 degrees of freedom (DOF), comparing it to traditional joint feedback control finds a trust-region-based adaptive visual feedback controller is very robust and redundant visual information is valuable.
Abstract: We present an experimental evaluation of adaptive and non-adaptive visual servoing in 3, 6 and 12 degrees of freedom (DOF), comparing it to traditional joint feedback control. While the purpose of experiments in most other work has been to show that the particular algorithm presented indeed also works in practice, we do not focus on the algorithm but rather on properties important to visual servoing in general. Our main results are: positioning of a 6 axis PUMA 762 arm is up to 5 times more precise under visual control than under joint control; positioning of a Utah/MIT dextrous hand is better under visual control than under joint control by a factor of 2; and a trust-region-based adaptive visual feedback controller is very robust. For m tracked visual features the algorithm can successfully estimate online the m/spl times/3 (m/spl ges/3) image Jacobian (J) without any prior information, while carrying out a 3 DOF manipulation task. For 6 and higher DOF manipulation, a rough initial estimate of J is beneficial. We also verified that redundant visual information is valuable. Errors due to imprecise tracking and goal specification were reduced as the number of visual features, m, was increased. Furthermore highly redundant systems allow us to detect outliers in the feature vector and deal with partial occlusion.

Book ChapterDOI
17 Sep 1997
TL;DR: In this article, a video-based recognition of isolated signs is proposed, focusing on the manual parameters of sign language, the system aims for the signer dependent recognition of 262 different signs.
Abstract: This paper is concerned with the video-based recognition of isolated signs. Concentrating on the manual parameters of sign language, the system aims for the signer dependent recognition of 262 different signs. For hidden Markov modelling a sign is considered a doubly stochastic process, represented by an unobservable state sequence. The observations emitted by the states are regarded as feature vectors, that are extracted from video frames. The system achieves recognition rates up to 94%.

Patent
02 Apr 1997
TL;DR: In this paper, the authors present techniques for improving manufacturing process control based on inspection of manufactured items at intermediate process steps, based on clustering and binning of defect data. But these techniques are not specific to semiconductor wafers, but may be generalized to any manufacturing process.
Abstract: Techniques for improving manufacturing process control based on inspection of manufactured items at intermediate process steps, based on clustering and binning of defect data. Additionally, the using the defect data produced by inspection machines to improve manufacturing process control specifically relating to semiconductor manufacturing process control. Examples described here relate specifically to semiconductor wafers, but may be generalized to any manufacturing process.

Posted Content
TL;DR: In this paper, the Littlestone's Winnow algorithm was used for text categorization in the domain of text classification, and the results showed that it is suitable for this domain.
Abstract: Learning problems in the text processing domain often map the text to a space whose dimensions are the measured features of the text, e.g., its words. Three characteristic properties of this domain are (a) very high dimensionality, (b) both the learned concepts and the instances reside very sparsely in the feature space, and (c) a high variation in the number of active features in an instance. In this work we study three mistake-driven learning algorithms for a typical task of this nature -- text categorization. We argue that these algorithms -- which categorize documents by learning a linear separator in the feature space -- have a few properties that make them ideal for this domain. We then show that a quantum leap in performance is achieved when we further modify the algorithms to better address some of the specific characteristics of the domain. In particular, we demonstrate (1) how variation in document length can be tolerated by either normalizing feature weights or by using negative weights, (2) the positive effect of applying a threshold range in training, (3) alternatives in considering feature frequency, and (4) the benefits of discarding features while training. Overall, we present an algorithm, a variation of Littlestone's Winnow, which performs significantly better than any other algorithm tested on this task using a similar feature set.

Proceedings ArticleDOI
01 Jun 1997
TL;DR: This paper presents a new parallel method for fast nearest-neighbor search in high-dimensional feature spaces, which provides an almost linear speed-up and a constant scale-up, and outperforms the Hilbert approach by a factor of up to 5.
Abstract: Most similarity search techniques map the data objects into some high-dimensional feature space. The similarity search then corresponds to a nearest-neighbor search in the feature space which is computationally very intensive. In this paper, we present a new parallel method for fast nearest-neighbor search in high-dimensional feature spaces. The core problem of designing a parallel nearest-neighbor algorithm is to find an adequate distribution of the data onto the disks. Unfortunately, the known declustering methods to not perform well for high-dimensional nearest-neighbor search. In contrast, our method has been optimized based on the special properties of high-dimensional spaces and therefore provides a near-optimal distribution of the data items among the disks. The basic idea of our data declustering technique is to assign the buckets corresponding to different quadrants of the data space to different disks. We show that our technique - in contrast to other declustering methods - guarantees that all buckets corresponding to neighboring quadrants are assigned to different disks. We evaluate our method using large amounts of real data (up to 40 MBytes) and compare it with the best known data declustering method, the Hilbert curve. Our experiments show that our method provides an almost linear speed-up and a constant scale-up. Additionally, it outperforms the Hilbert approach by a factor of up to 5.

Patent
28 Mar 1997
TL;DR: In this paper, a schema is defined as a specific collection of primitives and a specific schema implies a specific set of visual features to be processed and a corresponding feature vector to be used for content-based similarity scoring.
Abstract: A system and method for content-based search and retrieval of visual objects. A base visual information retrieval (VIR) engine utilizes a set of universal primitives to operate on the visual objects. An extensible VIR engine allows custom, modular primitives to be defined and registered. A custom primitive addresses domain specific problems and can utilize any image understanding technique. Object attributes can be extracted over the entire image or over only a portion of the object. A schema is defined as a specific collection of primitives. A specific schema implies a specific set of visual features to be processed and a corresponding feature vector to be used for content-based similarity scoring. A primitive registration interface registers custom primitives and facilitates storing of an analysis function and a comparison function to a schema table. A heterogeneous comparison allows objects analyzed by different schemas to be compared if at least one primitive is in common between the schemas. A threshold-based comparison is utilized to improve performance of the VIR engine. A distance between two feature vectors is computed in any of the comparison processes so as to generate a similarity score.

Journal ArticleDOI
TL;DR: The application of deformable templates to recognition of handprinted digits shows that there does exist a good low-dimensional representation space and methods to reduce the computational requirements, the primary limiting factor, are discussed.
Abstract: We investigate the application of deformable templates to recognition of handprinted digits. Two characters are matched by deforming the contour of one to fit the edge strengths of the other, and a dissimilarity measure is derived from the amount of deformation needed, the goodness of fit of the edges, and the interior overlap between the deformed shapes. Classification using the minimum dissimilarity results in recognition rates up to 99.25 percent on a 2,000 character subset of NIST Special Database 1. Additional experiments on an independent test data were done to demonstrate the robustness of this method. Multidimensional scaling is also applied to the 2,000/spl times/2,000 proximity matrix, using the dissimilarity measure as a distance, to embed the patterns as points in low-dimensional spaces. A nearest neighbor classifier is applied to the resulting pattern matrices. The classification accuracies obtained in the derived feature space demonstrate that there does exist a good low-dimensional representation space. Methods to reduce the computational requirements, the primary limiting factor of this method, are discussed.

Proceedings ArticleDOI
10 Mar 1997
TL;DR: The paper describes a technique for automatic estimation of crowd density, which is a part of the problem of automatic crowd monitoring, using texture information based on grey level transition probabilities on digitised images.
Abstract: Human beings perceive images through their properties, like colour, shape, size, and texture. Texture is a fertile source of information about the physical environment. Images of low density crowds tend to present coarse textures, while images of dense crowds tend to present fine textures. The paper describes a technique for automatic estimation of crowd density, which is a part of the problem of automatic crowd monitoring, using texture information based on grey level transition probabilities on digitised images. Crowd density feature vectors are extracted from such images and used by a self organising neural network which is responsible for the crowd density estimation. Results obtained respectively to the estimation of the number of people in a specific area of Liverpool Street Railway Station in London (UK) are presented.

Journal ArticleDOI
TL;DR: The proposed adaptive segmentation method uses local color information to estimate the membership probability in the object, respectively, background class and the method is applied to the recognition and localization of human hands in color camera images of complex laboratory scenes.
Abstract: With the availability of more powerful computers it is nowadays possible to perform pixel based operations on real camera images even in the full color space. New adaptive classification tools like neural networks make it possible to develop special-purpose object detectors that can segment arbitrary objects in real images with a complex distribution in the feature space after training with one or several previously labeled image(s). The paper focuses on a detailed comparison of a neural approach based on local linear maps (LLMs) to a classifier based on normal distributions. The proposed adaptive segmentation method uses local color information to estimate the membership probability in the object, respectively, background class. The method is applied to the recognition and localization of human hands in color camera images of complex laboratory scenes.

Proceedings ArticleDOI
07 May 1997
TL;DR: This paper describes WBIIS (Wavelet-Based Image Indexing and Searching), a new image indexing and retrieval algorithm with partial sketch image searching capability for large image databases that performs much better in capturing coherence of image, object granularity, local color/texture, and bias avoidance than traditional color layout algorithms.
Abstract: This paper describes WBIIS (Wavelet-Based Image Indexing and Searching), a new image indexing and retrieval algorithm with partial sketch image searching capability for large image databases. The algorithm characterizes the color variations over the spatial extent of the image in a manner that provides semantically-meaningful image comparisons. The indexing algorithm applies a Daubechies' wavelet transform for each of the three opponent color components. The wavelet coefficients in the lowest few frequency bands, and their variances, are stored as feature vectors. To speed up retrieval: a two-step procedure is used that first does a crude selection based on the variances, and then refines the search by performing a feature vector match between the selected images and the query. For better accuracy in searching, two level multiresolution matching may also be used. Masks are used for partial-sketch queries. This technique performs much better in capturing coherence of image, object granularity, local color/texture, and bias avoidance than traditional color layout algorithms. When tested on a database of more than 10,000 general-purpose images, WBIIS is much faster and more accurate than traditional algorithms.

Patent
Stephane H. Maes1
TL;DR: In this paper, feature vectors representing each of a plurality of overlapping frames of an arbitrary, text independent speech signal are computed and compared to vector parameters and variances stored as codewords in one or more codebooks corresponding to each of enrolled users to provide speaker dependent information for speech recognition and ambiguity resolution.
Abstract: Feature vectors representing each of a plurality of overlapping frames of an arbitrary, text independent speech signal are computed and compared to vector parameters and variances stored as codewords in one or more codebooks corresponding to each of one or more enrolled users to provide speaker dependent information for speech recognition and/or ambiguity resolution. Other information such as aliases and preferences of each enrolled user may also be enrolled and stored, for example, in a database. Correspondence of the feature vectors may be ranked by closeness of correspondence to a codeword entry and the number of frames corresponding to each codebook are accumulated or counted to identify a potential enrolled speaker. The differences between the parameters of the feature vectors and codewords in the codebooks can be used to identify a new speaker and an enrollment procedure can be initiated. Continuous authorization and access control can be carried out based on any utterance either by verification of the authorization of a speaker of a recognized command or comparison with authorized commands for the recognized speaker. Text independence also permits coherence checks to be carried out for commands to validate the recognition process.

Patent
28 Mar 1997
TL;DR: In this paper, a system and method for content-based search and retrieval of visual objects is presented, where a base visual information retrieval engine utilizes a set of universal primitives to operate on the visual objects.
Abstract: A system and method for content-based search and retrieval of visual objects A base visual information retrieval (VIR) engine utilizes a set of universal primitives to operate on the visual objects An extensible VIR engine allows custom, modular primitives to be defined and registered A custom primitive addresses domain specific problems and can utilize any image understanding technique Object attributes can be extracted over the entire image or over only a portion of the object A schema is defined as a specific collection of primitives A specific schema implies a specific set of visual features to be processed and a corresponding feature vector to be used for content-based similarity scoring A primitive registration interface registers custom primitives and facilitates storing of an analysis function and a comparison function to a schema table A heterogeneous comparison allows objects analyzed by different schemas to be compared if at least one primitive is in common between the schemas A threshold-based comparison is utilized to improve performance of the VIR engine A distance between two feature vectors is computed in any of the comparison processes so as to generate a similarity score

Patent
Edith Cohen1, David D. Lewis1
02 Jan 1997
TL;DR: In this article, a random sampling based retrieval system was proposed to identify instance vectors which have high dot products with the query, which can identify, for any given query vector, those instance vectors that have large dot products, while avoiding explicit computation of all dot products.
Abstract: The invention is an improved retrieval system and method. Many pattern recognition tasks, including estimation, classification, and the finding of similar objects, make use of linear models. For example, many text retrieval systems represent queries as linear functions, and retrieve documents whose vector representation has a high dot product with the query. The fundamental operation in such tasks is the computation of the dot product between a query vector and a large database of instance vectors. Often instance vectors which have high dot products with the query are of interest. The invention relates to a random sampling based retrieval system that can identify, for any given query vector, those instance vectors which have large dot products, while avoiding explicit computation of all dot products.

Proceedings Article
01 Jan 1997
TL;DR: This work studies three mistake-driven learning algorithms for a typical task of this nature -- text categorization and presents an algorithm, a variation of Littlestone's Winnow, which performs significantly better than any other algorithm tested on this task using a similar feature set.
Abstract: Learning problems in the text processing domain often map the text to a space whose dimensions are the measured features of the text, e.g., its words. Three characteristic properties of this domain are (a) very high dimensionality, (b) both the learned concepts and the instances reside very sparsely in the feature space, and (c) a high variation in the number of active features in an instance. In this work we study three mistake-driven learning algorithms for a typical task of this nature -- text categorization. We argue that these algorithms -- which categorize documents by learning a linear separator in the feature space -- have a few properties that make them ideal for this domain. We then show that a quantum leap in performance is achieved when we further modify the algorithms to better address some of the specific characteristics of the domain. In particular, we demonstrate (1) how variation in document length can be tolerated by either normalizing feature weights or by using negative weights, (2) the positive effect of applying a threshold range in training, (3) alternatives in considering feature frequency, and (4) the benefits of discarding features while training. Overall, we present an algorithm, a variation of Littlestone's Winnow, which performs significantly better than any other algorithm tested on this task using a similar feature set.

Proceedings Article
01 Jan 1997
TL;DR: This article presented three unsupervised learning algorithms that are able to distinguish among the known senses (i.e., as defined in some dictionary) of a word, based only on features that can be automatically extracted from untagged text.
Abstract: This paper describes an experimental comparison of three unsupervised learning algorithms that distinguish the sense of an ambiguous word in untagged text. The methods described in this paper, McQuitty's similarity analysis, Ward's minimum-variance method, and the EM algorithm, assign each instance of an ambiguous word to a known sense definition based solely on the values of automatically identifiable features in text. These methods and feature sets are found to be more successful in disambiguating nouns rather than adjectives or verbs. Overall, the most accurate of these procedures is McQuitty's similarity analysis in combination with a high dimensional feature set. 1 I n t r o d u c t i o n Statistical methods for natural language processing are often dependent on the availability of costly knowledge sources such as manually annotated text or semantic networks. This limits the applicability of such approaches to domains where this hard to acquire knowledge is already available. This paper presents three unsupervised learning algorithms that are able to distinguish among the known senses (i.e., as defined in some dictionary) of a word, based only on features that can be automatically extracted from untagged text. The object of unsupervised learning is to determine the class membership of each observation (i.e. each object to be classified), in a sample without using training examples of correct classifications. We discuss three algorithms, McQuitty's similarity analysis (McQuitty, 1966), Ward's minimum-variance method (Ward, 1963) and the EM algorithm (Dempster, Laird, and Rubin, 1977), that can be used to distinguish among the known senses of an ambiguous word without the aid of disambiguated examples. The EM algorithm produces maximum likelihood estimates of the parameters of a probabilistic model, where that model has been specified in advance. Both Ward's and McQuitty's methods are agglomerative clustering algorithms that form classes of unlabeled observations that minimize their respective distance measures between class members. The rest of this paper is organized as follows. First, we present introductions to Ward's and McQuitty 's methods (Section 2) and the EM algorithm (Section 3). We discuss the thirteen words (Section 4) and the three feature sets (Section 5) used in our experiments. We present our experimental results (Section 6) and close with a discussion of related work (Section 7). 2 Agglomerat ive Clustering In general, clustering methods rely on the assumption that classes occupy distinct regions in the feature space. The distance between two points in a multi-dimensional space can be measured using any of a wide variety of metrics (see, e.g. (Devijver and Kittler, 1982)). Observations are grouped in the manner that minimizes the distance between the members of each class. Ward's and McQuitty's method are agglomerative clustering algorithms that differ primarily in how they compute the distance between clusters. All such algorithms begin by placing each observation in a unique cluster, i.e. a cluster of one. The two closest clusters are merged to form a new cluster that replaces the two merged clusters. Merging of the two closest clusters continues until only some specified number of clusters remain. However, our data does not immediately lend itself to a distance-based interpretation. Our features represent part-of-speech (POS) tags, morphological characteristics, and word co-occurrence; such features are nominal and their values do not have scale. Given a POS feature, for example, we could choose noun = 1, verb = 2, adjective = 3, and adverb = 4. That adverb is represented by a larger number than noun is purely coincidental and implies nothing about the relationship between nouns and adverbs. Thus, before we employ either clustering algo-

Proceedings ArticleDOI
17 Jun 1997
TL;DR: An algorithm for tracking non-rigid, moving objects in a sequence of colored images, which were recorded by a non-stationary camera is presented, which remarkably simplifies the correspondence problem and also ensures a robust tracking behaviour.
Abstract: In this contribution we present an algorithm for tracking non-rigid, moving objects in a sequence of colored images, which were recorded by a non-stationary camera. The application background is vision-based driving assistance in the inner city. In an initial step, object parts are determined by a divisive clustering algorithm, which is applied to all pixels in the first image of the sequence. The feature space is defined by the color and position of a pixel. For each new image the clusters of the previous image are adapted iteratively by a parallel k-means clustering algorithm. Instead of tracking single points, edges, or areas over a sequence of images, only the centroids of the clusters are tracked. The proposed method remarkably simplifies the correspondence problem and also ensures a robust tracking behaviour.