scispace - formally typeset
Search or ask a question

Showing papers on "Feature (machine learning) published in 2000"


Journal ArticleDOI
TL;DR: The objective of this review paper is to summarize and compare some of the well-known methods used in various stages of a pattern recognition system and identify research topics and applications which are at the forefront of this exciting and challenging field.
Abstract: The primary goal of pattern recognition is supervised or unsupervised classification. Among the various frameworks in which pattern recognition has been traditionally formulated, the statistical approach has been most intensively studied and used in practice. More recently, neural network techniques and methods imported from statistical learning theory have been receiving increasing attention. The design of a recognition system requires careful attention to the following issues: definition of pattern classes, sensing environment, pattern representation, feature extraction and selection, cluster analysis, classifier design and learning, selection of training and test samples, and performance evaluation. In spite of almost 50 years of research and development in this field, the general problem of recognizing complex patterns with arbitrary orientation, location, and scale remains unsolved. New and emerging applications, such as data mining, web searching, retrieval of multimedia data, face recognition, and cursive handwriting recognition, require robust and efficient pattern recognition techniques. The objective of this review paper is to summarize and compare some of the well-known methods used in various stages of a pattern recognition system and identify research topics and applications which are at the forefront of this exciting and challenging field.

6,527 citations


Journal ArticleDOI
TL;DR: In this paper, the primary goal of pattern recognition is supervised or unsupervised classification, and the various frameworks in which pattern recognition has been traditionally formulated, the statistical approach has been used.
Abstract: The primary goal of pattern recognition is supervised or unsupervised classification. Among the various frameworks in which pattern recognition has been traditionally formulated, the statistical ap...

4,307 citations


Journal ArticleDOI
TL;DR: Introduction to statistical pattern recognition and nonlinear discriminant analysis - statistical methods.
Abstract: Introduction to statistical pattern recognition * Estimation * Density estimation * Linear discriminant analysis * Nonlinear discriminant analysis - neural networks * Nonlinear discriminant analysis - statistical methods * Classification trees * Feature selection and extraction * Clustering * Additional topics * Measures of dissimilarity * Parameter estimation * Linear algebra * Data * Probability theory.

2,082 citations


Journal ArticleDOI
01 Nov 2000
TL;DR: The issues of posterior probability estimation, the link between neural and conventional classifiers, learning and generalization tradeoff in classification, the feature variable selection, as well as the effect of misclassification costs are examined.
Abstract: Classification is one of the most active research and application areas of neural networks. The literature is vast and growing. This paper summarizes some of the most important developments in neural network classification research. Specifically, the issues of posterior probability estimation, the link between neural and conventional classifiers, learning and generalization tradeoff in classification, the feature variable selection, as well as the effect of misclassification costs are examined. Our purpose is to provide a synthesis of the published research in this area and stimulate further research interests and efforts in the identified topics.

1,737 citations


Proceedings Article
Mark Hall1
29 Jun 2000
TL;DR: In this article, a fast, correlation-based filter algorithm that can be applied to continuous and discrete problems is described, which often outperforms the ReliefF attribute estimator when used as a preprocessing step for naive Bayes, instance-based learning, decision trees, locally weighted regression, and model trees.
Abstract: Algorithms for feature selection fall into two broad categories: wrappers that use the learning algorithm itself to evaluate the usefulness of features and filters that evaluate features according to heuristics based on general characteristics of the data. For application to large databases, filters have proven to be more practical than wrappers because they are much faster. However, most existing filter algorithms only work with discrete classification problems. This paper describes a fast, correlation-based filter algorithm that can be applied to continuous and discrete problems. The algorithm often outperforms the well-known ReliefF attribute estimator when used as a preprocessing step for naive Bayes, instance-based learning, decision trees, locally weighted regression, and model trees. It performs more feature selection than ReliefF does—reducing the data dimensionality by fifty percent in most cases. Also, decision and model trees built from the preprocessed data are often significantly smaller.

1,511 citations


Proceedings ArticleDOI
05 Jun 2000
TL;DR: A large improvement in word recognition performance is shown by combining neural-net discriminative feature processing with Gaussian-mixture distribution modeling.
Abstract: Hidden Markov model speech recognition systems typically use Gaussian mixture models to estimate the distributions of decorrelated acoustic feature vectors that correspond to individual subword units. By contrast, hybrid connectionist-HMM systems use discriminatively-trained neural networks to estimate the probability distribution among subword units given the acoustic observations. In this work we show a large improvement in word recognition performance by combining neural-net discriminative feature processing with Gaussian-mixture distribution modeling. By training the network to generate the subword probability posteriors, then using transformations of these estimates as the base features for a conventionally-trained Gaussian-mixture based system, we achieve relative error rate reductions of 35% or more on the multicondition Aurora noisy continuous digits task.

803 citations


Journal ArticleDOI
TL;DR: A speech recognition system that uses both acoustic and visual speech information to improve recognition performance in noisy environments and is demonstrated on a large multispeaker database of continuously spoken digits.
Abstract: This paper describes a speech recognition system that uses both acoustic and visual speech information to improve recognition performance in noisy environments. The system consists of three components: a visual module; an acoustic module; and a sensor fusion module. The visual module locates and tracks the lip movements of a given speaker and extracts relevant speech features. This task is performed with an appearance-based lip model that is learned from example images. Visual speech features are represented by contour information of the lips and grey-level information of the mouth area. The acoustic module extracts noise-robust features from the audio signal. Finally the sensor fusion module is responsible for the joint temporal modeling of the acoustic and visual feature streams and is realized using multistream hidden Markov models (HMMs). The multistream method allows the definition of different temporal topologies and levels of stream integration and hence enables the modeling of temporal dependencies more accurately than traditional approaches. We present two different methods to learn the asynchrony between the two modalities and how to incorporate them in the multistream models. The superior performance for the proposed system is demonstrated on a large multispeaker database of continuously spoken digits. On a recognition task at 15 dB acoustic signal-to-noise ratio (SNR), acoustic perceptual linear prediction (PLP) features lead to 56% error rate, noise robust RASTA-PLP (relative spectra) acoustic features to 7.2% error rate and combined noise robust acoustic features and visual features to 2.5% error rate.

620 citations


Book
01 Jan 2000
TL;DR: Introduction to Fingerprint Recognition, U.J. Erol Fingerprint Feature Processing Techniques and Poroscopy, A.R. Howell Neural Networks for Face recognition, and Ongun Introduction to Face Recognition.
Abstract: Introduction to Fingerprint Recognition, U. Halici, L.C. Jain, and A. Erol Fingerprint Feature Processing Techniques and Poroscopy, A.R. Roddy and J.D. Stosz Fingerprint Sub-Classification: A Neural Network Approach, G.A. Drets and H.G. Leljecstroem A Gabor Filter-Based Method for Fingerprint Identification, Y. Hamamoto Minutiae Extraction and Filtering from Gray-Scale Images, D. Maio and D. Maltoni Feature Selective Filtering for Ridge Extraction, A. Erol, U. Halici, and G. Ongun Introduction to Face Recognition, A.J. Howell Neural Networks for Face Recognition, A.S. Pandya and R.R. Szabo Face Unit Radial Basis Function Networks, A.J. Howell Face Recognition from Correspondence Maps, R.P. Wurtz Face Recognition by Elastic Bunch Graph Matching, L. Wiskott, J.-M. Fellous, N. Kruger, and C. von der Malsburg Facial Expression Synthesis Using Radial Basis Function Networks, I. King and X.Q. Li Recognition of Facial Expressions and Its Application to Human Computer Interaction, T. Onisawa and S. Kitazake

332 citations


Proceedings Article
30 Jun 2000
TL;DR: This paper shows how to efficiently compute a sum over the exponential number of networks that are consistent with a fixed ordering over network variables, and uses this result as the basis for an algorithm that approximates the Bayesian posterior of a feature.
Abstract: In many domains, we are interested in analyzing the structure of the underlying distribution, e.g., whether one variable is a direct parent of the other. Bayesian model-selection attempts to find the MAP model and use its structure to answer these questions. However, when the amount of available data is modest, there might be many models that have non-negligible posterior. Thus, we want compute the Bayesian posterior of a feature, i.e., the total posterior probability of all models that contain it. In this paper, we propose a new approach for this task. We first show how to efficiently compute a sum over the exponential number of networks that are consistent with a fixed ordering over network variables. This allows us to compute, for a given ordering, both the marginal probability of the data and the posterior of a feature. We then use this result as the basis for an algorithm that approximates the Bayesian posterior of a feature. Our approach uses an Markov Chain Monte Carlo (MCMC) method, but over orderings rather than over network structures. The space of orderings is much smaller and more regular than the space of structures, and has a smoother posterior "landscape". We present empirical results on synthetic and real-life datasets that compare our approach to full model averaging (when possible), to MCMC over network structures, and to a non-Bayesian bootstrap approach.

280 citations


Journal ArticleDOI
TL;DR: FSS-EBNA is an evolutionary, population-based, randomized search algorithm, and it can be executed when domain knowledge is not available, using Bayesian networks to factorize the probability distribution of the best solutions in a generation of the search.

238 citations


Journal ArticleDOI
Stan Z. Li1
TL;DR: The results show that the NFL-based method produces consistently better results than the NN-based and other methods.
Abstract: A method is presented for content-based audio classification and retrieval. It is based on a new pattern classification method called the nearest feature line (NFL). In the NFL, information provided by multiple prototypes per class is explored. This contrasts to the nearest neighbor (NN) classification in which the query is compared to each prototype individually. Regarding audio representation, perceptual and cepstral features and their combinations are considered. Extensive experiments are performed to compare various classification methods and feature sets. The results show that the NFL-based method produces consistently better results than the NN-based and other methods. A system resulting from this work has achieved the error rate of 9.78%, as compared to that of 18.34% of a compelling existing system, as tested on a common audio database.

Journal ArticleDOI
TL;DR: This paper reports experiments on three phonological feature systems: the Sound Pattern of English (SPE) system, amulti-valued (MV) feature system which uses traditional phonetic categories such as manner, place, etc., and government Phonology which uses a set of structured primes.

Patent
26 Oct 2000
TL;DR: In this paper, an audio recognition peripheral system consisting of a feature extractor and a vector processor is presented. And the extracted audio recognition features are transmitted to the programmable processor and processed in accordance with an audio classification algorithm.
Abstract: The present invention includes a novel audio recognition peripheral system and method. The audio recognition peripheral system comprises an audio recognition peripheral a programmable processor such as a microprocessor or microcontroller. In one embodiment, the audio recognition peripheral includes a feature extractor and vector processor. The feature extractor receives an audio signal and extracts recognition features. The extracted audio recognition features are transmitted to the programmable processor and processed in accordance with an audio recognition algorithm. During execution of the audio recognition algorithm, the programmable processor signals the audio recognition peripheral to perform vector operations. Thus, computationally intensive recognition operations are advantageously offloaded to the peripheral.

Journal ArticleDOI
TL;DR: It is argued that a fruitful direction for future research may lie in weighing information about facial features together with localized image features in order to provide a better mechanism for feature selection.

Journal ArticleDOI
TL;DR: The developed feature extraction system takes as a input a STEP file defining the geometry and topology of a part and generates as output aSTEP file with form-feature information in AP224 format for form feature-based process planning.

Journal ArticleDOI
TL;DR: Both Bayesian classifiers and neural networks are employed to test the efficiency of the proposed feature and the achieved identification success using a long word exceeds 95%.

Patent
Kari Laurila1, Jilei Tian1
24 Oct 2000
TL;DR: In this paper, a method for use in a speech recognition system in which a speech waveform to be modelled is represented by a set of feature extracted parameters in the time domain, the method comprising dividing individual ones of one or more of the extracted parameters to provide for each divided feature extracted parameter a plurality of frequency channels, and demodulating at least one of the plurality of frequencies to provide at least a corresponding baseband frequency signal.
Abstract: A method for use in a speech recognition system in which a speech waveform to be modelled is represented by a set of feature extracted parameters in the time domain, the method comprising dividing individual ones of one or more of said feature extracted parameters to provide for each divided feature extracted parameter a plurality of frequency channels, and demodulating at least one of the plurality of frequency channels to provide at least one corresponding baseband frequency signal.

Proceedings ArticleDOI
27 Sep 2000
TL;DR: A new integrated method is presented to recognize the emotional expressions of human using both voices and facial expressions using feature parameters from thermal images in addition to visible images, which are trained by neural networks for recognition.
Abstract: A new integrated method is presented to recognize the emotional expressions of human using both voices and facial expressions. For voices, we use such prosodic parameters as pitch signals, energy, and their derivatives, which are trained by hidden Markov model for recognition. For facial expressions, we use feature parameters from thermal images in addition to visible images, which are trained by neural networks for recognition. The thermal images are observed by infrared ray which is not influenced by lighting conditions. The total recognition rates show better performance than that obtained from each single experiment. The results are compared with the recognition by human questionnaire.

Proceedings Article
29 Jun 2000
TL;DR: The work of Principe et al. is extended to mutual information between continuous multidimensional variables and discrete-valued class labels, and Renyi’s quadratic entropy is used.
Abstract: We present feature transformations useful for exploratory data analysis or for pattern recognition. Transformations are learned from example data sets by maximizing the mutual information between transformed data and their class labels. We make use of Renyi’s quadratic entropy, and we extend the work of Principe et al. to mutual information between continuous multidimensional variables and discrete-valued class labels.

Journal ArticleDOI
01 Jul 2000
TL;DR: The reading process has been widely studied and there is a general agreement among researchers that knowledge in different forms and at different levels plays a vital role, which is the underlying philosophy of the Devanagari document recognition system described in this work.
Abstract: The reading process has been widely studied and there is a general agreement among researchers that knowledge in different forms and at different levels plays a vital role. This is the underlying philosophy of the Devanagari document recognition system described in this work. The knowledge sources we use are mostly statistical in nature or in the form of a word dictionary tailored specifically for optical character recognition (OCR). We do not perform any reasoning on these. However, we explore their relative importance and role in the hierarchy. Some of the knowledge sources are acquired a priori by an automated training process while others are extracted from the text as it is processed. A complete Devanagari OCR system has been designed and tested with real-life printed documents of varying size and font. Most of the documents used were photocopies of the original. A performance of approximately 90% correct recognition is achieved.

Proceedings Article
29 Apr 2000
TL;DR: It is argued strongly that the use of an independent morphological dictionary is the preferred choice to more annotated data under such circumstances as data sparseness issue for inflectionally rich languages.
Abstract: Part of Speech tagging for English seems to have reached the the human levels of error, but full morphological tagging for inflectionally rich languages, such as Romanian, Czech, or Hungarian, is still an open problem, and the results are far from being satisfactory. This paper presents results obtained by using a universalized exponential feature-based model for five such languages. It focuses on the data sparseness issue, which is especially severe for such languages (the more so that there are no extensive annotated data for those languages). In conclusion, we argue strongly that the use of an independent morphological dictionary is the preferred choice to more annotated data under such circumstances.

Journal ArticleDOI
TL;DR: A supervised algorithm for word sensedambiguation based on hierarchies of decision lists thatorts a useful degree of conditional branching while minimizing the training data fragmentation typical of decision trees.
Abstract: This paper describes a supervised algorithm for word sensedisambiguation based on hierarchies of decision lists. This algorithmsupports a useful degree of conditional branching while minimizing thetraining data fragmentation typical of decision trees. Classificationsare based on a rich set of collocational, morphological and syntacticcontextual features, extracted automatically from training data andweighted sensitive to the nature of the feature and feature class. Thealgorithm is evaluated comprehensively in the SENSEVAL framework,achieving the top performance of all participating supervised systems onthe 36 test words where training data is available.

Journal ArticleDOI
TL;DR: This work describes how to model the appearance of a 3-D object using multiple views, learn such a model from training images, and use the model for object recognition, and demonstrates that OLIVER is capable of learning to recognize complex objects in cluttered images, while acquiring models that represent those objects using relatively few views.
Abstract: We describe how to model the appearance of a 3-D object using multiple views, learn such a model from training images, and use the model for object recognition. The model uses probability distributions to describe the range of possible variation in the object's appearance. These distributions are organized on two levels. Large variations are handled by partitioning training images into clusters corresponding to distinctly different views of the object. Within each cluster, smaller variations are represented by distributions characterizing uncertainty in the presence, position, and measurements of various discrete features of appearance. Many types of features are used, ranging in abstraction from edge segments to perceptual groupings and regions. A matching procedure uses the feature uncertainty information to guide the search for a match between model and image. Hypothesized feature pairings are used to estimate a viewpoint transformation taking account of feature uncertainty. These methods have been implemented in an object recognition system, OLIVER. Experiments show that OLIVER is capable of learning to recognize complex objects in cluttered images, while acquiring models that represent those objects using relatively few views.

Proceedings ArticleDOI
30 Aug 2000
TL;DR: SVM architectures for multi-class classification problems are discussed, in particular binary trees of SVMs are considered to solve the multi- class problem.
Abstract: Support vector machines (SVM) are learning algorithms derived from statistical learning theory. The SVM approach was originally developed for binary classification problems. In this paper SVM architectures for multi-class classification problems are discussed, in particular we consider binary trees of SVMs to solve the multi-class problem. Numerical results for different classifiers on a benchmark data set of handwritten digits are presented.

Journal ArticleDOI
TL;DR: A large database of hand-labeled fluent speech is used to compute the mutual information between a phonetic classification variable and one spectral feature variable in the time–frequency plane, and the joint mutual information (JMI) between the phonetic Classification variable and two feature variables in thetime-frequency plane.

Dissertation
01 Jan 2000
TL;DR: A Bayesian architecture is shown to generalize a significant number of previous recognition approaches, solving some of the most challenging problems faced by these: joint modeling of color and texture, objective guidelines for controlling the trade-off between feature transformation and feature representation, and unified support for local and global queries without requiring image segmentation.
Abstract: This thesis presents a unified solution to visual recognition and learning in the context of visual information retrieval. Realizing that the design of an effective recognition architecture requires careful consideration of the interplay between feature selection, feature representation, and similarity function, we start by searching for a performance criteria that can simultaneously guide the design of all three components. A natural solution is to formulate visual recognition as a decision theoretical problem, where the goal is to minimize the probability of retrieval error. This leads to a Bayesian architecture that is shown to generalize a significant number of previous recognition approaches, solving some of the most challenging problems faced by these: joint modeling of color and texture, objective guidelines for controlling the trade-off between feature transformation and feature representation, and unified support for local and global queries without requiring image segmentation. The new architecture is shown to perform well on color, texture, and generic image databases, providing a good trade-off between retrieval accuracy, invariance, perceptual relevance of similarity judgments, and complexity. Because all that is needed to perform optimal Bayesian decisions is the ability to evaluate beliefs on the different hypothesis under consideration, a Bayesian architecture is not restricted to visual recognition. On the contrary, it establishes a universal recognition language (the language of probabilities) that provides a computational basis for the integration of information from multiple content sources and modalities. In result, it becomes possible to build retrieval systems that can simultaneously account for text, audio, video, or any other content modalities. Since the ability to learn follows from the ability to integrate information over time, this language is also conducive to the design of learning algorithms. We show that learning is, indeed, an important asset for visual information retrieval by designing both short and long-term learning mechanisms. Over short time scales (within a retrieval session), learning is shown to assure faster convergence to the desired target images. Over long time scales (between retrieval sessions), it allows the retrieval system to tailor itself to the preferences of particular users. In both cases, all the necessary computations are carried out through Bayesian belief propagation algorithms that, although optimal in a decision-theoretic sense, are extremely simple, intuitive, and easy to implement. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

Journal ArticleDOI
TL;DR: A new technique for the recognition of Arabic text using the C4.5 machine learning system that can generalize over the large degree of variations between different fonts and writing style and recognition rules can be constructed by examples.

Proceedings ArticleDOI
05 Jun 2000
TL;DR: It is shown that after a non-linear transformation, a number of features can be effectively used in a HMM-based recognition system.
Abstract: We evaluate the performance of several feature sets on the Aurora task as defined by ETSI. We show that after a non-linear transformation, a number of features can be effectively used in a HMM-based recognition system. The non-linear transformation is computed using a neural network which is discriminatively trained on the phonetically labeled (forcibly aligned) training data. A combination of the non-linearly transformed PLP (perceptive linear predictive coefficients), MSG (modulation filtered spectrogram) and TRAP (temporal pattern) features yields a 63% improvement in error rate as compared to baseline me frequency cepstral coefficients features. The use of the non-linearly transformed RASTA-like features, with system parameters scaled down to take into account the ETSI imposed memory and latency constraints, still yields a 40% improvement in error rate.

01 Sep 2000
TL;DR: In this article, a method for content-based audio classification and retrieval is presented based on a new pattern classification method called the nearest feature line (NFL) in which information provided by multiple prototypes per class is explored This contrasts to the nearest neighbor (NN) classification in which the query is compared to each prototype individually.
Abstract: A method is presented for content-based audio classification and retrieval It is based on a new pattern classification method called the nearest Feature Line (NFL) In the NFL, information provided by multiple prototypes per class is explored This contrasts to the nearest neighbor (NN) classification in which the query is compared to each prototype individually Regarding audio representation, perceptual and cepstral features and their combinations are considered Extensive experiments are performed to compare various classification methods and feature sets The results show that the NFL-based method produces consistently better results than the NN-based and other methods A system resulting from this work has achieved the error rate of 978%, as compared to that of 1834% of a compelling existing system, as tested on a common audio database

Journal ArticleDOI
TL;DR: In this hybrid system of NN and MBR, the feature weight set, which is calculated from the trained neural network, plays the core role in connecting both learning strategies, and the explanation for prediction can be given by obtaining and presenting the most similar examples from the case base.
Abstract: We propose a hybrid prediction system of neural network and memory-based learning. Neural network (NN) and memory-based reasoning (MBR) are frequently applied to data mining with various objectives. They have common advantages over other learning strategies. NN and MBR can be directly applied to classification and regression without additional transformation mechanisms. They also have strength in learning the dynamic behavior of the system over a period of time. Unfortunately, they have shortcomings when applied to data mining tasks. Though the neural network is considered as one of the most powerful and universal predictors, the knowledge representation of NN is unreadable to humans, and this "black box" property restricts the application of NN to data mining problems, which require proper explanations for the prediction. On the other hand, MBR suffers from the feature-weighting problem. When MBR measures the distance between cases, some input features should be treated as more important than other features. Feature weighting should be executed prior to prediction in order to provide the information on the feature importance. In our hybrid system of NN and MBR, the feature weight set, which is calculated from the trained neural network, plays the core role in connecting both learning strategies, and the explanation for prediction can be given by obtaining and presenting the most similar examples from the case base. Moreover, the proposed system has advantages in the typical data mining problems such as scalability to large datasets, high dimensions, and adaptability to dynamic situations. Experimental results show that the hybrid system has a high potential in solving data mining problems.