scispace - formally typeset
Search or ask a question

Showing papers on "Feature vector published in 1996"


Proceedings Article
03 Dec 1996
TL;DR: This work compares support vector regression (SVR) with a committee regression technique (bagging) based on regression trees and ridge regression done in feature space and expects that SVR will have advantages in high dimensionality space because SVR optimization does not depend on the dimensionality of the input space.
Abstract: A new regression technique based on Vapnik's concept of support vectors is introduced. We compare support vector regression (SVR) with a committee regression technique (bagging) based on regression trees and ridge regression done in feature space. On the basis of these experiments, it is expected that SVR will have advantages in high dimensionality space because SVR optimization does not depend on the dimensionality of the input space.

4,009 citations


Book ChapterDOI
15 Apr 1996
TL;DR: A face recognition algorithm which is insensitive to gross variation in lighting direction and facial expression is developed and the proposed “Fisherface” method has error rates that are significantly lower than those of the Eigenface technique when tested on the same database.
Abstract: We develop a face recognition algorithm which is insensitive to gross variation in lighting direction and facial expression. Taking a pattern classification approach, we consider each pixel in an image as a coordinate in a high-dimensional space. We take advantage of the observation that the images of a particular face under varying illumination direction lie in a 3-D linear subspace of the high dimensional feature space — if the face is a Lambertian surface without self-shadowing. However, since faces are not truly Lambertian surfaces and do indeed produce self-shadowing, images will deviate from this linear subspace. Rather than explicitly modeling this deviation, we project the image into a subspace in a manner which discounts those regions of the face with large deviation. Our projection method is based on Fisher's Linear Discriminant and produces well separated classes in a low-dimensional subspace even under severe variation in lighting and facial expressions. The Eigenface technique, another method based on linearly projecting the image space to a low dimensional subspace, has similar computational requirements. Yet, extensive experimental results demonstrate that the proposed “Fisherface” method has error rates that are significantly lower than those of the Eigenface technique when tested on the same database.

2,428 citations


Proceedings ArticleDOI
07 May 1996
TL;DR: This work introduces the use of a vector Taylor series (VTS) expansion to characterize efficiently and accurately the effects on speech statistics of unknown additive noise and unknown linear filtering in a transmission channel.
Abstract: In this paper we introduce a new analytical approach to environment compensation for speech recognition. Previous attempts at solving analytically the problem of noisy speech recognition have either used an overly-simplified mathematical description of the effects of noise on the statistics of speech or they have relied on the availability of large environment-specific adaptation sets. Some of the previous methods required the use of adaptation data that consists of simultaneously-recorded or "stereo" recordings of clean and degraded speech. In this work we introduce the use of a vector Taylor series (VTS) expansion to characterize efficiently and accurately the effects on speech statistics of unknown additive noise and unknown linear filtering in a transmission channel. The VTS approach is computationally efficient. It can be applied either to the incoming speech feature vectors, or to the statistics representing these vectors. In the first case the speech is compensated and then recognized; in the second case HMM statistics are modified using the VTS formulation. Both approaches use only the actual speech segment being recognized to compute the parameters required for environmental compensation. We evaluate the performance of two implementations of VTS algorithms using the CMU SPHINX-II system on the 100-word alphanumeric CENSUS database and on the 1993 5000-word ARPA Wall Street Journal database. Artificial white Gaussian noise is added to both databases. The VTS approaches provide significant improvements in recognition accuracy compared to previous algorithms.

480 citations


Proceedings Article
03 Dec 1996
TL;DR: This paper combines two techniques for improving generalization performance and speed on a pattern recognition problem by incorporating known invariances of the problem, and applies the reduced set method, applicable to any support vector machine.
Abstract: Support Vector Learning Machines (SVM) are finding application in pattern recognition, regression estimation, and operator inversion for ill-posed problems. Against this very general backdrop, any methods for improving the generalization performance, or for improving the speed in test phase, of SVMs are of increasing interest. In this paper we combine two such techniques on a pattern recognition problem. The method for improving generalization performance (the "virtual support vector" method) does so by incorporating known invariances of the problem. This method achieves a drop in the error rate on 10,000 NIST test digit images of 1.4% to 1.0%. The method for improving the speed (the "reduced set" method) does so by approximating the support vector decision surface. We apply this method to achieve a factor of fifty speedup in test phase over the virtual support vector machine. The combined approach yields a machine which is both 22 times faster than the original machine, and which has better generalization performance, achieving 1.1 % error. The virtual support vector method is applicable to any SVM problem with known invariances. The reduced set method is applicable to any support vector machine.

413 citations


01 Jan 1996
TL;DR: It is argued that a careful mathematical formulation of environmental degradation improves recognition accuracy for both data-driven and model-based compensation procedures and shows how the use of vector Taylor series in combination with a Maximum Likelihood formulation produces dramatic improvements in recognition accuracy.
Abstract: The accuracy of speech recognition systems degrades severely when the systems are operated in adverse acoustical environments. In recent years many approaches have been developed to address the problem of robust speech recognition, using feature-normalization algorithms, microphone arrays, representations based on human hearing, and other approaches. Nevertheless, to date the improvement in recognition accuracy afforded by such algorithms has been limited, in part because of inadequacies in the mathematical models used to characterize the acoustical degradation. This thesis begins with a study of the reasons why speech recognition systems degrade in noise, using Monte Carlo simulation techniques. From observations about these simulations we propose a simple and yet effective model of how the environment affects the parameters used to characterize speech recognition systems and their input. The proposed model of environment degradation is applied to two different approaches to environmental compensation, data-driven methods and model-based methods. Data-driven methods learn how a noisy environment affects the characteristics of incoming speech from direct comparisons of speech recorded in the noisy environment with the same speech recorded under optimal conditions. Model-based methods use a mathematical model of the environment and attempt to use samples of the degraded speech to estimate the parameters of the model. In this thesis we argue that a careful mathematical formulation of environmental degradation improves recognition accuracy for both data-driven and model-based compensation procedures. The representation we develop for data-driven compensation approaches can be applied both to incoming feature vectors and to the stored statistical models used by speech recognition systems. These two approaches to data-driven compensation are referred to as RATZ and STAR, respectively. Finally, we introduce a new approach to model-based compensation with solution based on vector Taylor series, referred to as the VTS algorithms. The proposed compensation algorithms are evaluated in a series of experiments measuring recognition accuracy for speech from the ARPA Wall Street Journal database that is corrupted by additive noise that is artificially injected at various signal-to-noise ratios (SNRs). For any particular SNR, the upper bound on recognition accuracy provided by practical compensation algorithms is the recognition accuracy of a system trained with noisy data at that SNR. The RATZ, VTS, and STAR algorithms achieve this bound at global SNRs as low as 15, 10, and 5 dB, respectively. The experimental results also demonstrate that the recognition error rate obtained using the algorithms proposed in this thesis is significantly better than what could be achieved using the previous state of the art. We include a small number of experimental results that indicate that the improvements in recognition accuracy provided by our approaches extend to degraded speech recorded in natural environments as well. We also introduce a generic formulation of the environment compensation problem and its solution via vector Taylor series. We show how the use of vector Taylor series in combination with a Maximum Likelihood formulation produces dramatic improvements in recognition accuracy.

337 citations


Proceedings Article
William W. Cohen1
04 Aug 1996
TL;DR: It is argued that many decision tree and rule learning algorithms can be easily extended to set-valued features, and it is shown by example that many real-world learning problems can be efficiently and naturally represented with set- valued features.
Abstract: In most learning systems examples are represented as fixed-length "feature vectors", the components of which are either real numbers or nominal values. We propose an extension of the feature-vector representation that allows the value of a feature to be a set of strings; for instance, to represent a small white and black dog with the nominal features size and species and the set-valued feature color, one might use a feature vector with size=small, species=canis-familiaris and color-{white, black}. Since we make no assumptions about the number of possible set elements, this extension of the traditional feature-vector representation is closely connected to Blum's "infinite attribute" representation. We argue that many decision tree and rule learning algorithms can be easily extended to set-valued features. We also show by example that many real-world learning problems can be efficiently and naturally represented with set-valued features; in particular, text categorization problems and problems that arise in propositionalizing first-order representations lend themselves to set-valued features.

281 citations


Dissertation
01 Jan 1996
TL;DR: This thesis presents a learning based approach for detecting classes of objects and patterns with variable image appearance but highly predictable image boundaries, and proposes an active learning formulation for function approximation, and shows that the active example selection strategy learns its target with fewer data samples than random sampling.
Abstract: Object and pattern detection is a classical computer vision problem with many potential applications, ranging from automatic target recognition to image-based industrial inspection tasks in assembly lines. While there have been some successful object and pattern detection systems in the past, most such systems handle only specific rigid objects or patterns that can be accurately described by fixed geometric models or pictorial templates. This thesis presents a learning based approach for detecting classes of objects and patterns with variable image appearance but highly predictable image boundaries. Some examples of such object and pattern classes include human faces, aerial views of structured terrain features like volcanoes, localized material defect signatures in industrial parts, certain tissue anomalies in medical images, and instances of a given digit or character, which may be written or printed in many different styles. The thesis consists of two parts. In part one, we introduce our object and pattern detection approach using a concrete human face detection example. The approach first builds a distribution-based model of the target pattern class in an appropriate feature space to describe the target's variable image appearance. It then learns from examples a similarity measure for matching new patterns against the distribution-based target model. We also discuss some pertinent learning issues, including ideas on virtual example generation and example selection. The approach makes few assumptions about the target pattern class and should therefore be fairly general, as long as the target class has predictable image boundaries. We show that this is indeed the case by demonstrating the technique on two other pattern detection/recognition problems. Because our object and pattern detection approach is very much learning-based, how well a system eventually performs depends heavily on the quality of training examples it receives. The second part of this thesis looks at how one can select high quality examples for function approximation learning tasks. Active learning is an area of research that investigates how a learner can intelligently select future training examples to get better approximation results with less data. We propose an active learning formulation for function approximation, and show for three specific approximation function classes, that the active example selection strategy learns its target with fewer data samples than random sampling. Finally, we simplify the original active learning formulation, and show how it leads to a tractable example selection paradigm, suitable for use in many object and pattern detection problems. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

254 citations


Proceedings ArticleDOI
TL;DR: The VAMSplit R-tree provided better overall performance than all competing structures the authors tested for main memory and secondary memory applications, and modest improvements relative to optimized k-d tree variants.
Abstract: Efficient indexing support is essential to allow content-based image and video databases using similarity-based retrieval to scale to large databases (tens of thousands up to millions of images). In this paper, we take an in depth look at this problem. One of the major difficulties in solving this problem is the high dimension (6-100) of the feature vectors that are used to represent objects. We provide an overview of the work in computational geometry on this problem and highlight the results we found are most useful in practice, including the use of approximate nearest neighbor algorithms. We also present a variant of the optimized k-d tree we call the VAM k-d tree, and provide algorithms to create an optimized R-tree we call the VAMSplit R-tree. We found that the VAMSplit R-tree provided better overall performance than all competing structures we tested for main memory and secondary memory applications. We observed large improvements in performance relative to the R*-tree and SS-tree in secondary memory applications, and modest improvements relative to optimized k-d tree variants.© (1996) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

227 citations


Proceedings ArticleDOI
14 Oct 1996
TL;DR: Ten different feature vectors are tested in a gesture recognition task which utilizes 3D data gathered in real-time from stereo video cameras, and HMMs for learning and recognition of gestures.
Abstract: Ten different feature vectors are tested in a gesture recognition task which utilizes 3D data gathered in real-time from stereo video cameras, and HMMs for learning and recognition of gestures. Results indicate velocity features are superior to positional features, and partial rotational invariance is sufficient for good performance.

211 citations


Proceedings ArticleDOI
C. Podilchuk1, Xiaoyu Zhang2
07 May 1996
TL;DR: An automatic face recognition system which is VQ-based is described and the effects of feature selection, feature dimensionality and codebook size on recognition performance in the VQ framework are examined.
Abstract: Face recognition has many applications ranging from security access to video indexing by content. We describe an automatic face recognition system which is VQ-based and examine the effects of feature selection, feature dimensionality and codebook size on recognition performance in the VQ framework. In particular, we examine DCT-based feature vectors in such a system. DCT-based feature vectors have the additional appeal that the recognition can be performed directly on the bitstream of compressed images which are DCT-based. The system described consists of three parts: a preprocessing step to segment the face, the feature selection process and the classification. Recognition rates for a database of 500 images shows promising results.

198 citations


Journal ArticleDOI
Seong-Whan Lee1
TL;DR: A new scheme for off-line recognition of totally unconstrained handwritten numerals using a simple multilayer cluster neural network trained with the backpropagation algorithm is proposed and it is shown that the use of genetic algorithms avoids the problem of finding local minima in training the multi-layer neural network with gradient descent technique, and improves the recognition rates.
Abstract: In this paper, we propose a new scheme for off-line recognition of totally unconstrained handwritten numerals using a simple multilayer cluster neural network trained with the backpropagation algorithm and show that the use of genetic algorithms avoids the problem of finding local minima in training the multilayer cluster neural network with gradient descent technique, and improves the recognition rates. In the proposed scheme, Kirsch masks are adopted for extracting feature vectors and a three-layer cluster neural network with five independent subnetworks is developed for classifying similar numerals efficiently. In order to verify the performance of the proposed multilayer cluster neural network, experiments with handwritten numeral database of Concordia University of Canada, that of Electro-Technical Laboratory of Japan, and that of Electronics and Telecommunications Research Institute of Korea were performed. For the case of determining the initial weights using a genetic algorithm, 97.10%, 99.12%, and 99.40% correct recognition rates were obtained, respectively.

Proceedings ArticleDOI
03 Oct 1996
TL;DR: This paper examines a maximum a-posteriori decoding strategy for feature-based recognizers and develops a normalization criterion that is useful for a segment-based Viterbi or A* search.
Abstract: Most current speech recognizers use an observation space which is based on a temporal sequence of "frames" (e.g. Mel-cepstra). There is another class of recognizer which further processes these frames to produce a segment-based network, and represents each segment by fixed-dimensional "features". In such feature-based recognizers, the observation space takes the form of a temporal network of feature vectors, so that a single segmentation of an utterance uses a subset of all possible feature vectors. In this paper, we examine a maximum a-posteriori decoding strategy for feature-based recognizers and develop a normalization criterion that is useful for a segment-based Viterbi or A* search. We report experimental results for the task of phonetic recognition on the TIMIT corpus, where we achieved context-independent and context-dependent (using diphones) results on the core test set of 64.1% and 69.5% respectively.

Proceedings ArticleDOI
07 May 1996
TL;DR: Though experimental results are preliminary, performance improvements over the BBN modified Gaussian Bayes decision system have been obtained on the Switchboard corpus.
Abstract: A novel approach to speaker identification is presented. The technique, based on Vapnik's (1995) work with support vectors, is exciting for several reasons. The support vector method is a discriminative approach, modeling the boundaries directly between speakers voices in some feature space rather than by the difficult intermediate step of estimating speaker densities. Most importantly, support vector discriminant classifiers are unique in that they separate training data while keeping discriminating power low, thereby reducing test errors. As a result it is possible to build useful classifiers with many more parameters than training points. Furthermore, Vapnik's theory suggests which class of discriminating functions should be used given the amount of training data by being able to determine bounds on the expected number of test errors. Support vector classifiers are efficient to compute compared to other discriminant functions. Though experimental results are preliminary, performance improvements over the BBN modified Gaussian Bayes decision system have been obtained on the Switchboard corpus.

Patent
Rakesh Agrawal1, W. Equitz1, Christos Faloutsos1, Myron D. Flickner1, Arun N. Swami1 
28 Feb 1996
TL;DR: In this article, a high dimensional indexing method is disclosed which takes a set of objects that can be viewed as N-dimensional data vectors and builds an index which treats the objects like k-dimensional points.
Abstract: A high dimensional indexing method is disclosed which takes a set of objects that can be viewed as N-dimensional data vectors and builds an index which treats the objects like k-dimensional points. The method first defines and applies a set of feature extraction functions that admit some similarity measure for each of the stored objects in the database. The feature vector is then transformed in a manner such that the similarity measure is preserved and that the information of the feature vector v is concentrated in only a few coefficients. The entries of the feature vectors are truncated such that the entries which contribute little on the average to the information of the transformed vectors are removed. An index based on the truncated feature vectors is subsequently built using a point access method (PAM). A preliminary similarity search can then be conducted on the set of truncated transformed vectors using the previously created index to retrieve the qualifying records. A second search on the previously retrieved set of vectors is used to eliminate the false positives and to get the results of the desired similarity search.

Journal ArticleDOI
TL;DR: The article is devoted to the feature-based recognition of blurred images acquired by a linear shift-invariant imaging system against an image database and a set of symmetric blur invariants based on image moments is introduced.
Abstract: The article is devoted to the feature-based recognition of blurred images acquired by a linear shift-invariant imaging system against an image database. The proposed approach consists of describing images by features that are invariant with respect to blur and recognizing images in the feature space. The PSF identification and image restoration are not required. A set of symmetric blur invariants based on image moments is introduced. A numerical experiment is presented to illustrate the utilization of the invariants for blurred image recognition. Robustness of the features is also briefly discussed.

Proceedings ArticleDOI
07 May 1996
TL;DR: The paper shows that performance improvements in recognition accuracy can be obtained by including data derived from a speaker's lip images in the construction of composite feature vectors and a hidden Markov model structure which allows for asynchrony between the audio and visual components.
Abstract: There is a requirement in many human machine interactions to provide accurate automatic speech recognition in the presence of high levels of interfering noise. The the paper shows that performance improvements in recognition accuracy can be obtained by including data derived from a speaker's lip images. We describe the combination of the audio and visual data in the construction of composite feature vectors and a hidden Markov model structure which allows for asynchrony between the audio and visual components. These ideas are applied to a speaker dependent recognition task involving a small vocabulary and subject to interfering noise. The recognition results obtained using composite vectors and cross-product models are compared with those based on an audio-only feature vector. The benefit of this approach is shown to be an increased performance over a very wide range of noise levels.

Patent
Ziad F. Elghazzawi1
15 Mar 1996
TL;DR: In this article, a method and apparatus for classifying an input as belonging to one of a plurality of predefined classes, comprises: (a) developing a pluralityof feature values and forming a feature vector from the feature values which is representative of the input, applying the feature vector to a knowledge-base comprising predictivity curves for each class of said plurality of classes, and developing the plurality of predictivity values for each feature, each predictivity value being indicative of a likelihood of an input belonging to a respective one of each of said classes based on the feature value.
Abstract: A method and apparatus for classifying an input as belonging to one of a plurality of predefined classes, comprises: (a) developing a plurality of feature values and forming a feature vector from the feature values which is representative of the input, (b) applying the feature vector to a knowledge-base comprising predictivity curves for each class of said plurality of classes, and developing a plurality of predictivity values for each feature, each predictivity value being indicative of a likelihood of the input belonging to a respective one of each of said classes based on the feature value; (c) combining the predictivity values developed for each of the features for each class to generate a total predictivity value for each class; and (d) generating a determination of class based upon the total predictivity values generated by the prior step.

Journal ArticleDOI
TL;DR: Several speech features are considered as potential stress-sensitive relayers using a previously established stressed speech database (SUSAS) and a neural network-based classifier is formulated based on an extended delta-bar-delta learning rule.
Abstract: It is well known that the variability in speech production due to task-induced stress contributes significantly to loss in speech processing algorithm performance. If an algorithm could be formulated that detects the presence of stress in speech, then such knowledge could be used to monitor speaker state, improve the naturalness of speech coding algorithms, or increase the robustness of speech recognizers. The goal in this study is to consider several speech features as potential stress-sensitive relayers using a previously established stressed speech database (SUSAS). The following speech parameters are considered: mel, delta-mel, delta-delta-mel, auto-correlation-mel, and cross-correlation-mel cepstral parameters. Next, an algorithm for speaker-dependent stress classification is formulated for the 11 stress conditions: angry, clear, cond50, cond70, fast, Lombard, loud, normal, question, slow, and soft. It is suggested that additional feature variations beyond neutral conditions reflect the perturbation of vocal tract articulator movement under stressed conditions. Given a robust set of features, a neural network-based classifier is formulated based on an extended delta-bar-delta learning rule. The performance is considered for the following three test scenarios: monopartition (nontargeted) and tripartition (both nontargeted and targeted) input feature vectors.

Journal ArticleDOI
TL;DR: The development of fuzzy algorithms for learning vector quantization (FALVQ) are presented, which are derived by minimizing the weighted sum of the squared Euclidean distances between an input vector and the weight vectors of a competitive learning Vector quantization network, which represent the prototypes.
Abstract: This paper presents the development of fuzzy algorithms for learning vector quantization (FALVQ). These algorithms are derived by minimizing the weighted sum of the squared Euclidean distances between an input vector, which represents a feature vector, and the weight vectors of a competitive learning vector quantization (LVQ) network, which represent the prototypes. This formulation leads to competitive algorithms, which allow each input vector to attract all prototypes. The strength of attraction between each input and the prototypes is determined by a set of membership functions, which can be selected on the basis of specific criteria. A gradient-descent-based learning rule is derived for a general class of admissible membership functions which satisfy certain properties. The FALVQ 1, FALVQ 2, and FALVQ 3 families of algorithms are developed by selecting admissible membership functions with different properties. The proposed algorithms are tested and evaluated using the IRIS data set. The efficiency of the proposed algorithms is also illustrated by their use in codebook design required for image compression based on vector quantization.

Patent
Masaki Souma1, Kenji Nagao1
12 Dec 1996
TL;DR: In this paper, a feature extraction system for statistically analyzing a set of samples of feature vectors to calculate a feature being an index for a pattern identification, which is capable of identifying confusing data with a high robustness.
Abstract: A feature extraction system for statistically analyzing a set of samples of feature vectors to calculate a feature being an index for a pattern identification, which is capable of identifying confusing data with a high robustness. In this system, a storage section stores a feature vector inputted through an input section and a neighborhood vector selection section selects a specific feature vector from the feature vectors existing in the storage section. The specific feature is a neighborhood vector close in distance to the feature vector stored in the storage section. Further, the system is equipped with a feature vector space production section for outputting a partial vector space. The partial vector space is made to maximize the local scattering property of the feature vector when the feature vector is orthogonally projected to that space.

Patent
Hirohiko Sagawa1, Masahiro Abe1
09 Sep 1996
TL;DR: In this paper, a continuous sign language recognition apparatus and input apparatus which employ expressions of template patterns of sign-language word as well as the template pattern of sign language word to realize high speed and highly accurate signlanguage recognition.
Abstract: A continuous sign-language recognition apparatus and input apparatus which employ expressions of template patterns of sign-language word as well as the template patterns of sign-language word to realize high speed and highly accurate sign-language recognition. Each of component patterns constituting a template pattern of a sign-language word is expressed by a feature vector representing the pattern when the pattern is a static pattern, or by a time-series pattern of a feature vector when the pattern is a dynamic pattern. Also, using the template patterns of sign-language word, different matching methods are applied to the static and dynamic patterns for recognizing each component pattern, and the respective results are integrated on the basis of a temporal overlap of the respective component patterns to continuously recognize sign-language words.

PatentDOI
TL;DR: A system and method for noninvasively detecting coronary artery disease that utilizes a vasodilator drug to increase the signal-to-noise ratio of an acoustic signal that represents diastolic heart sounds of a patient.

Journal ArticleDOI
TL;DR: An extended version of Kohonen's LVQ algorithm, called Distinction Sensitive Learning Vector Quantization (DSLVQ), is introduced which overcomes a major problem of LVQ, the dependency on proper pre-processing methods for scaling and feature selection.

Journal ArticleDOI
TL;DR: Experimental results indicate that the proposed scheme for off-line recognition of large-set handwritten characters in the framework of stochastic models, the first-order hidden Markov models (HMMs), is very promising for the recognition ofLarge- set handwritten characters with numerous variations.

Book ChapterDOI
15 Apr 1996
TL;DR: A novel framework, based on maximum likelihood, for training models to recognise simple spatial-motion events, such as those described by the verbs pick up, put down, push, pull, drop, and throw, and classifying novel observations into previously trained classes is presented.
Abstract: This paper presents a novel framework, based on maximum likelihood, for training models to recognise simple spatial-motion events, such as those described by the verbs pick up, put down, push, pull, drop, and throw, and classifying novel observations into previously trained classes. The model that we employ does not presuppose prior recognition or tracking of 3D object pose, shape, or identity. We describe our general framework for using maximum-likelihood techniques for visual event classification, the details of the generative model that we use to characterise observations as instances of event types, and the implemented computational techniques used to support training and classification for this generative model. We conclude by illustrating the operation of our implementation on a small example.

Proceedings ArticleDOI
12 Nov 1996
TL;DR: It is shown that for even moderately large databases (in fact, only 1856 texture images), these approaches do not scale well for exact retrieval, but as a browsing tool, these dimensionality reduction techniques hold much promise.
Abstract: The management of large image databases poses several interesting and challenging problems. These problems range from ingesting the data and extracting meta-data to the efficient storage and retrieval of the data. Of particular interest are the retrieval methods and user interactions with an image database during browsing. In image databases, the response to a given query is not an exact well-defined set, rather, the user poses a query and expects a set of responses that should contain many possible candidates from which the user chooses the answer set. We first present the browsing model in Alexandria, a digital library for maps and satellite images. Designed for content-based retrieval, the relevant information in an image is encoded in the form of a multi-% dimensional feature vector. Various techniques have been previously proposed for the efficient retrieval of such vectors by reducing the dimensionality of such vectors. We show that for even moderately large databases (in fact, only 1856 texture images), these approaches do not scale well for exact retrieval. However, as a browsing tool, these dimensionality reduction techniques hold much promise.

Journal Article
TL;DR: This paper formally distinguish three types of features: primary, contextual, and irrelevant features, and formally define what it means for one feature to be context-sensitive to another feature.
Abstract: A large body of research in machine learning is concerned with supervised learning from examples. The examples are typically represented as vectors in a multi- dimensional feature space (also known as attribute-value descriptions). A teacher partitions a set of training examples into a finite number of classes. The task of the learning algorithm is to induce a concept from the training examples. In this paper, we formally distinguish three types of features: primary, contextual, and irrelevant features. We also formally define what it means for one feature to be context-sensitive to another feature. Context-sensitive features complicate the task of the learner and potentially impair the learner's performance. Our formal definitions make it possible for a learner to automatically identify context-sensitive features. After context-sensitive features have been identified, there are several strategies that the learner can employ for managing the features; however, a discussion of these strategies is outside of the scope of this paper. The formal definitions presented here correct a flaw in previously proposed definitions. We discuss the relationship between our work and a formal definition of relevance.

Patent
Tsuhan Chen1, Mehmet Reha Civanlar1
28 Oct 1996
TL;DR: In this paper, audio and video data of the individual speaking at least one selected phrase is obtained. And a feature vector is formed which incorporates both the audio features and the video features. And the individual is authenticated if the feature vector and the stored feature vector form a match within a prescribed threshold.
Abstract: A method and apparatus is provided for determining the authenticity of an individual. In accordance with the method, audio and video data of the individual speaking at least one selected phrase is obtained. Identifying audio features and video features are then extracted from the audio data and the video data, respectively. A feature vector is formed which incorporates both the audio features and the video features. The feature vector is compared to a stored feature vector of a validated user speaking the same selected phrase. The individual is authenticated if the feature vector and the stored feature vector form a match within a prescribed threshold.

Journal ArticleDOI
TL;DR: The proposed feature space is compared to those generated by tissue-parameter-weighted images, principal component images, and angle images, demonstrating its superiority for feature extraction and scene segmentation and its relationship with discriminant analysis is discussed.
Abstract: This paper presents development and application of a feature extraction method for magnetic resonance imaging (MRI), without explicit calculation of tissue parameters. A three-dimensional (3-D) feature space representation of the data is generated in which normal tissues are clustered around prespecified target positions and abnormalities are clustered elsewhere. This is accomplished by a linear minimum mean square error transformation of categorical data to target positions. From the 3-D histogram (cluster plot) of the transformed data, clusters are identified and regions of interest (ROI's) for normal and abnormal tissues are defined. These ROI's are used to estimate signature (prototype) vectors for each tissue type which in turn are used to segment the MRI scene. The proposed feature space is compared to those generated by tissue-parameter-weighted images, principal component images, and angle images, demonstrating its superiority for feature extraction and scene segmentation. Its relationship with discriminant analysis is discussed. The method and its performance are illustrated using a computer simulation and MRI images of an egg phantom and a human brain.

Proceedings ArticleDOI
TL;DR: An algorithm that provides an automated, fast signal recognition once implemented in a real-time hardware system and using wavelet preprocessing, feature extraction and compact signal representation, and a simple but effective statistical pattern matching.
Abstract: We present, in this paper, a wavelet-based acoustic signal analysis to remotely recognize military vehicles using their sound intercepted by acoustic sensors. Since expedited signal recognition is imperative in many military and industrial situations, we developed an algorithm that provides an automated, fast signal recognition once implemented in a real-time hardware system. This algorithm consists of wavelet preprocessing, feature extraction and compact signal representation, and a simple but effective statistical pattern matching. The current status of the algorithm does not require any training. The training is replaced by human selection of reference signals (e.g., squeak or engine exhaust sound) distinctive to each individual vehicle based on human perception. This allows a fast archiving of any new vehicle type in the database once the signal is collected. The wavelet preprocessing provides time-frequency multiresolution analysis using discrete wavelet transform (DWT). Within each resolution level, feature vectors are generated from statistical parameters and energy content of the wavelet coefficients. After applying our algorithm on the intercepted acoustic signals, the resultant feature vectors are compared with the reference vehicle feature vectors in the database using statistical pattern matching to determine the type of vehicle from where the signal originated. Certainly, statistical pattern matching can be replaced by an artificial neural network (ANN); however, the ANN would require training data sets and time to train the net. Unfortunately, this is not always possible for many real world situations, especially collecting data sets from unfriendly ground vehicles to train the ANN. Our methodology using wavelet preprocessing and statistical pattern matching provides robust acoustic signal recognition. We also present an example of vehicle recognition using acoustic signals collected from two different military ground vehicles. In this paper, we will not present the mathematics involved in this research. Instead, the focus of this paper is on the application of various techniques used to achieve our goal of successful recognition.© (1996) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.