scispace - formally typeset
Search or ask a question

Showing papers on "Feature (machine learning) published in 1997"


Journal ArticleDOI
TL;DR: A new model of recognition memory is reported, placed within, and introduces, a more elaborate theory that is being developed to predict the phenomena of explicit and implicit, and episodic and generic, memory.
Abstract: A new model of recognition memory is reported. This model is placed within, and introduces, a more elaborate theory that is being developed to predict the phenomena of explicit and implicit, and episodic and generic, memory. The recognition model is applied to basic findings, including phenomena that pose problems for extant models: the list-strength effect (e.g., Ratcliff, Clark, & Shiffrin, 1990), the mirror effect (e.g., Glanzer & Adams, 1990), and the normal-ROC slope effect (e.g., Ratcliff, McKoon, & Tindall, 1994). The model assumes storage of separate episodic images for different words, each image consisting of a vector of feature values. Each image is an incomplete and error prone copy of the studied vector. For the simplest case, it is possible to calculate the probability that a test item is “old,” and it is assumed that a default “old” response is given if this probability is greater than .5. It is demonstrated that this model and its more complete and realistic versions produce excellent qualitative predictions.

850 citations


Journal ArticleDOI
22 May 1997-Nature
TL;DR: In this paper, the authors show that the degree of specificity depends on the difficulty of the training conditions, and that the pattern of specificities maps onto the patterns of receptive field selectivities along the visual pathway.
Abstract: Practising simple visual tasks leads to a dramatic improvement in performing them. This learning is specific to the stimuli used for training. We show here that the degree of specificity depends on the difficulty of the training conditions. We find that the pattern of specificities maps onto the pattern of receptive field selectivities along the visual pathway. With easy conditions, learning generalizes across orientation and retinal position, matching the spatial generalization of higher visual areas. As task difficulty increases, learning becomes more specific with respect to both orientation and position, matching the fine spatial retinotopy exhibited by lower areas. Consequently, we enjoy the benefits of learning generalization when possible, and of fine grain but specific training when necessary. The dynamics of learning show a corresponding feature. Improvement begins with easy cases (when the subject is allowed long processing times) and only subsequently proceeds to harder cases. This learning cascade implies that easy conditions guide the learning of hard ones. Taken together, the specificity and dynamics suggest that learning proceeds as a countercurrent along the cortical hierarchy. Improvement begins at higher generalizing levels, which, in turn, direct harder-condition learning to the subdomain of their lower-level inputs. As predicted by this reverse hierarchy model, learning can be effective using only difficult trials, but on condition that learning onset has previously been enabled. A single prolonged presentation suffices to initiate learning. We call this single-encounter enabling effect 'eureka'.

734 citations


Journal ArticleDOI
TL;DR: It is shown that neural networks can, in fact, represent and classify structured patterns and all the supervised networks developed for the classification of sequences can, on the whole, be generalized to structures.
Abstract: Standard neural networks and statistical methods are usually believed to be inadequate when dealing with complex structures because of their feature-based approach. In fact, feature-based approaches usually fail to give satisfactory solutions because of the sensitivity of the approach to the a priori selection of the features, and the incapacity to represent any specific information on the relationships among the components of the structures. However, we show that neural networks can, in fact, represent and classify structured patterns. The key idea underpinning our approach is the use of the so called "generalized recursive neuron", which is essentially a generalization to structures of a recurrent neuron. By using generalized recursive neurons, all the supervised networks developed for the classification of sequences, such as backpropagation through time networks, real-time recurrent networks, simple recurrent networks, recurrent cascade correlation networks, and neural trees can, on the whole, be generalized to structures. The results obtained by some of the above networks (with generalized recursive neurons) on the classification of logic terms are presented.

569 citations


Book
20 Feb 1997
TL;DR: Part I FUNDAMENTALS of PATTERN RECOGNITION Basic Concepts of Pattern Recognition basic concepts of pattern recognition and decision Theoretic Algorithms.
Abstract: Part I FUNDAMENTALS OF PATTERN RECOGNITION 0. Basic Concepts of Pattern Recognition 1. Decision Theoretic Algorithms 2. Structural Pattern Recognition Part II INTRODUCTORY NEURAL NETWORKS 3. Artificial Neural Network Structures 4. Supervised Training via Error Backpropogation: Derivations 5. Acceleration and Stabilization of Supervised Gradient Training of MLPs Part III ADVANCED FUNDAMENTALS OF NEURAL NETWORKS 6. Supervised Training via Strategic Search 7. Advances in Network Algorithms for Recognition 8. Using Hopfield Recurrent Neural Networks Part IV NEURAL, FEATURE, AND DATA ENGINEERING 9. Neural Engineering and Testing of FANNs 10. Feature and Data Engineering

375 citations


Proceedings Article
27 Jul 1997
TL;DR: Some approaches to flame recognition are described, mcluding a prototype system, Smokey, which builds a 47-element feature vector based on the syntax and semantics of each sentence, combining the vectors for the sentences within each message.
Abstract: Abusive messages (flames) can be both a source of frustration and a waste of time for Internet users This paper describes some approaches to flame recognition, mcluding a prototype system, Smokey Smokey builds a 47-element feature vector based on the syntax and semantics of each sentence, combining the vectors for the sentences within each message A training set of 720 messages was used by Quinlan's C45 decision-tree generator to determine feature-based rules that were able to correctly categorize 64% of the flames and 98% of the nonflames in a separate test set of 460 messages Additional techniques for greater accuracy and user customization are also discussed

287 citations


Journal ArticleDOI
TL;DR: This paper describes an approach for integrating a large number of context-dependent features into a semi-automated tool that provides a learning algorithm for selecting and combining groupings of the data, where groupings can be induced by highly specialized features.

271 citations


Journal ArticleDOI
TL;DR: In this article, the authors showed that varying order of category learning induced the creation of different features that changed the perceptual appearance and the featural representation of identical category exemplars, as a consequence of categorizing and representing objects.
Abstract: Many theories of object recognition and categorization claim that complex objects are represented in terms of characteristic features. The origin of these features has been neglected in theories of object categorization. Do they form a fixed and independent set that exists before experience with objects, or are features progressively extracted and developed as an organism categorizes its world? This paper maintains that features can be flexibly learned, as a consequence of categorizing and representing objects. All three experiments reported in this paper used categories of unfamiliar computer-synthesized two-dimensional objects ("Martian cells"). The results showed that varying order of category learning induced the creation of different features that changed the perceptual appearance and the featural representation of identical category exemplars. Network simulations supported a flexible, rather than a fixed feature interpretation of the data.

247 citations


Journal ArticleDOI
TL;DR: Evidence from object recognition indicates strong invariance to these values, even when distinguishing among objects that are as similar as faces, as shown by Hummel & Biederman.
Abstract: A number of behavioural phenomena distinguish the recognition of faces and objects, even when members of a set of objects are highly similar. Because faces have the same parts in approximately the same relations, individuation of faces typically requires specification of the metric variation in a holistic and integral representation of the facial surface. The direct mapping of a hypercolumn-like pattern of activation onto a representation layer that preserves relative spatial filter values in a two-dimensional (2D) coordinate space, as proposed by C. von der Malsburg and his associates, may account for many of the phenomena associated with face recognition. An additional refinement, in which each column of filters (termed a 'jet') is centred on a particular facial feature (or fiducial point), allows selectivity of the input into the holistic representation to avoid incorporation of occluding or nearby surfaces. The initial hypercolumn representation also characterizes the first stage of object perception, but the image variation for objects at a given location in a 2D coordinate space may be too great to yield sufficient predictability directly from the output of spatial kernels. Consequently, objects can be represented by a structural description specifying qualitative (typically, non-accidental) characterizations of an object's parts, the attributes of the parts, and the relations among the parts, largely based on orientation and depth discontinuities (as shown by Hummel & Biederman). A series of experiments on the name priming or physical matching of complementary images (in the Fourier domain) of objects and faces documents that whereas face recognition is strongly dependent on the original spatial filter values, evidence from object recognition indicates strong invariance to these values, even when distinguishing among objects that are as similar as faces.

235 citations


Journal ArticleDOI
TL;DR: The application of deformable templates to recognition of handprinted digits shows that there does exist a good low-dimensional representation space and methods to reduce the computational requirements, the primary limiting factor, are discussed.
Abstract: We investigate the application of deformable templates to recognition of handprinted digits. Two characters are matched by deforming the contour of one to fit the edge strengths of the other, and a dissimilarity measure is derived from the amount of deformation needed, the goodness of fit of the edges, and the interior overlap between the deformed shapes. Classification using the minimum dissimilarity results in recognition rates up to 99.25 percent on a 2,000 character subset of NIST Special Database 1. Additional experiments on an independent test data were done to demonstrate the robustness of this method. Multidimensional scaling is also applied to the 2,000/spl times/2,000 proximity matrix, using the dissimilarity measure as a distance, to embed the patterns as points in low-dimensional spaces. A nearest neighbor classifier is applied to the resulting pattern matrices. The classification accuracies obtained in the derived feature space demonstrate that there does exist a good low-dimensional representation space. Methods to reduce the computational requirements, the primary limiting factor of this method, are discussed.

200 citations


Proceedings ArticleDOI
07 Sep 1997
TL;DR: A new machine learning algorithm for the diagnosis of cardiac arrhythmia from standard 12 lead ECG recordings is presented, and it is indicated that it outperforms other standard algorithms such as Naive Bayesian and Nearest Neighbor classifiers.
Abstract: A new machine learning algorithm for the diagnosis of cardiac arrhythmia from standard 12 lead ECG recordings is presented. The algorithm is called VF15 for Voting Feature Intervals. VF15 is a supervised and inductive learning algorithm for inducing classification knowledge from examples. The input to VF15 is a training set of records. Each record contains clinical measurements, from ECG signals and some other information such as sex, age, and weight, along with the decision of an expert cardiologist. The knowledge representation is based on a recent technique called Feature Intervals, where a concept is represented by the projections of the training cases on each feature separately. Classification in VF15 is based on a majority voting among the class predictions made by each feature separately. The comparison of the VF15 algorithm indicates that it outperforms other standard algorithms such as Naive Bayesian and Nearest Neighbor classifiers.

195 citations


Journal ArticleDOI
TL;DR: Experimental results show that combination of the classifiers increases reliability of the recognition results and is the unique feature of this work.

Patent
28 Jan 1997
TL;DR: In this article, the positions and velocities of the speech organs (2, 3, 4) as speech is articulated can be defined for each acoustic speech unit (20) by simultaneously recording EM wave reflections and acoustic speech information.
Abstract: By simultaneously recording EM wave reflections (21) and acoustic speech information (24), the positions and velocities of the speech organs (2, 3, 4) as speech is articulated can be defined for each acoustic speech unit (20). Well defined time frames and feature vectors (6, 7, 8, 9) describing the speech, to the degree required, can be formed. Such feature vectors (6, 7, 8, 9) can uniquely characterize the speech unit (20) being articulated each time frame. The onset of speech, rejection of external noise, vocalized pitch periods, articulator conditions, accurate timing, the identification of the speaker, acoustic speech unit (20) recognition, and organ mechanical parameters can be determined.

Journal ArticleDOI
TL;DR: In this chapter the artificial neural networks (ANNs) approach as another solution for the modulation recognition process is studied in some detail and it is suggested that the use of the ANN approach for solving the modulation recognised process may have better performance than the decision-theoretic approach.

Journal ArticleDOI
TL;DR: An object recognition approach based on concurrent coarse-and-fine matching using a multilayer Hopfield neural network is presented, which reinforces the usual intralayer matching process in the conventional single-layer Hopfield network in order to compute the most consistent model-object match across several resolution levels.
Abstract: An object recognition approach based on concurrent coarse-and-fine matching using a multilayer Hopfield neural network is presented. The proposed network consists of several cascaded single-layer Hopfield networks, each encoding object features at a distinct resolution, with bidirectional interconnections linking adjacent layers. The interconnection weights between nodes associating adjacent layers are structured to favor node pairs for which model translation and rotation, when viewed at the two corresponding resolutions, are consistent. This interlayer feedback feature of the algorithm reinforces the usual intralayer matching process in the conventional single-layer Hopfield network in order to compute the most consistent model-object match across several resolution levels. The performance of the algorithm is demonstrated for test images containing single objects, and multiple occluded objects. These results are compared with recognition results obtained using a single-layer Hopfield network.

Journal ArticleDOI
TL;DR: Evidence is presented indicating that, in some domains, normal (Gaussian) distributions are more accurate than uniform distributions for modeling feature fluctuations, which motivates the development of new maximum-likelihood and MAP recognition formulations which are based on normal feature models.
Abstract: This paper examines statistical approaches to model-based object recognition. Evidence is presented indicating that, in some domains, normal (Gaussian) distributions are more accurate than uniform distributions for modeling feature fluctuations. This motivates the development of new maximum-likelihood and MAP recognition formulations which are based on normal feature models. These formulations lead to an expression for the posterior probability of the pose and correspondences given an image. Several avenues are explored for specifying a recognition hypothesis. In the first approach, correspondences are included as a part of the hypotheses. Search for solutions may be ordered as a combinatorial search in correspondence space, or as a search over pose space, where the same criterion can equivalently be viewed as a robust variant of chamfer matching. In the second approach, correspondences are not viewed as being a part of the hypotheses. This leads to a criterion that is a smooth function of pose that is amenable to local search by continuous optimization methods. The criteria is also suitable for optimization via the Expectation-Maximization (EM) algorithm, which alternates between pose refinement and re-estimation of correspondence probabilities until convergence is obtained. Recognition experiments are described using the criteria with features derived from video images and from synthetic range images.

Journal ArticleDOI
TL;DR: Several algorithms for preprocessing, feature extraction, pre-classification, and main classification, and modified Bayes classifier and subspace method for the robust main classification are experimentally compared to improve the recognition accuracy of handwritten Japanese character recognition.

Proceedings Article
25 Aug 1997
TL;DR: This work uses techniques from statistical pattern recognition to efficiently separate the feature words or discriminants from the noise words at each node of the taxonomy, and builds a multi-level classifier that has a small model size and is very fast.
Abstract: We explore how to organize a text database hierarchically to aid better searching and browsing. We propose to exploit the natural hierarchy of topics, or taxonomy, that many corpora, such as intemet directories, digital libraries, and patent databases enjoy. In our system, the user navigates through the query response not as a flat unstructured list, but embedded in the familiar taxonomy, and annotated with document signatures computed dynamicallywith respect to where the user is located at any time. We show how to update such databases with new documents with high speed and accuracy. We use techniques from statistical pattern recognition to efficiently separate the feature words or discriminants from the noise words at each node of the taxonomy. Using these, we build a multi-level classifier. At each node, this classifier can ignore the large number of noise words in a document. Thus the classifier has a small model size and is very fast. However, owing to the use of context-sensitive features, it classifier is very accurate. We report on experiences with the Reuters newswire benchmark, the US Patent database, and web document samples from Yahoo!.

Journal ArticleDOI
TL;DR: An automatic facial expressions recognition system which consists of two parts: facial feature extraction and facial expression recognition, which applies the point distribution model and the gray-level model to find the facial features.

Posted Content
TL;DR: An experimental comparison of three unsupervised learning algorithms that distinguish the sense of an ambiguous word in untagged text using McQuitty's similarity analysis, Ward's minimum-variance method, and the EM algorithm.
Abstract: This paper describes an experimental comparison of three unsupervised learning algorithms that distinguish the sense of an ambiguous word in untagged text. The methods described in this paper, McQuitty's similarity analysis, Ward's minimum-variance method, and the EM algorithm, assign each instance of an ambiguous word to a known sense definition based solely on the values of automatically identifiable features in text. These methods and feature sets are found to be more successful in disambiguating nouns rather than adjectives or verbs. Overall, the most accurate of these procedures is McQuitty's similarity analysis in combination with a high dimensional feature set.

Proceedings Article
01 Jan 1997
TL;DR: This article presented three unsupervised learning algorithms that are able to distinguish among the known senses (i.e., as defined in some dictionary) of a word, based only on features that can be automatically extracted from untagged text.
Abstract: This paper describes an experimental comparison of three unsupervised learning algorithms that distinguish the sense of an ambiguous word in untagged text. The methods described in this paper, McQuitty's similarity analysis, Ward's minimum-variance method, and the EM algorithm, assign each instance of an ambiguous word to a known sense definition based solely on the values of automatically identifiable features in text. These methods and feature sets are found to be more successful in disambiguating nouns rather than adjectives or verbs. Overall, the most accurate of these procedures is McQuitty's similarity analysis in combination with a high dimensional feature set. 1 I n t r o d u c t i o n Statistical methods for natural language processing are often dependent on the availability of costly knowledge sources such as manually annotated text or semantic networks. This limits the applicability of such approaches to domains where this hard to acquire knowledge is already available. This paper presents three unsupervised learning algorithms that are able to distinguish among the known senses (i.e., as defined in some dictionary) of a word, based only on features that can be automatically extracted from untagged text. The object of unsupervised learning is to determine the class membership of each observation (i.e. each object to be classified), in a sample without using training examples of correct classifications. We discuss three algorithms, McQuitty's similarity analysis (McQuitty, 1966), Ward's minimum-variance method (Ward, 1963) and the EM algorithm (Dempster, Laird, and Rubin, 1977), that can be used to distinguish among the known senses of an ambiguous word without the aid of disambiguated examples. The EM algorithm produces maximum likelihood estimates of the parameters of a probabilistic model, where that model has been specified in advance. Both Ward's and McQuitty's methods are agglomerative clustering algorithms that form classes of unlabeled observations that minimize their respective distance measures between class members. The rest of this paper is organized as follows. First, we present introductions to Ward's and McQuitty 's methods (Section 2) and the EM algorithm (Section 3). We discuss the thirteen words (Section 4) and the three feature sets (Section 5) used in our experiments. We present our experimental results (Section 6) and close with a discussion of related work (Section 7). 2 Agglomerat ive Clustering In general, clustering methods rely on the assumption that classes occupy distinct regions in the feature space. The distance between two points in a multi-dimensional space can be measured using any of a wide variety of metrics (see, e.g. (Devijver and Kittler, 1982)). Observations are grouped in the manner that minimizes the distance between the members of each class. Ward's and McQuitty's method are agglomerative clustering algorithms that differ primarily in how they compute the distance between clusters. All such algorithms begin by placing each observation in a unique cluster, i.e. a cluster of one. The two closest clusters are merged to form a new cluster that replaces the two merged clusters. Merging of the two closest clusters continues until only some specified number of clusters remain. However, our data does not immediately lend itself to a distance-based interpretation. Our features represent part-of-speech (POS) tags, morphological characteristics, and word co-occurrence; such features are nominal and their values do not have scale. Given a POS feature, for example, we could choose noun = 1, verb = 2, adjective = 3, and adverb = 4. That adverb is represented by a larger number than noun is purely coincidental and implies nothing about the relationship between nouns and adverbs. Thus, before we employ either clustering algo-

Journal ArticleDOI
TL;DR: The simulations show that results yielded by the methods described in this paper are better than not only the individual classifiers' but also ones obtained by combining multiple classifiers with the same feature.
Abstract: In practical applications of pattern recognition, there are often different features extracted from raw data which needs recognizing. Methods of combining multiple classifiers with different features are viewed as a general problem in various application areas of pattern recognition. In this paper, a systematic investigation has been made and possible solutions are classified into three frameworks, i.e. linear opinion pools, winner-take-all and evidential reasoning. For combining multiple classifiers with different features, a novel method is presented in the framework of linear opinion pools and a modified training algorithm for associative switch is also proposed in the framework of winner-take-all. In the framework of evidential reasoning, several typical methods are briefly reviewed for use. All aforementioned methods have already been applied to text-independent speaker identification. The simulations show that results yielded by the methods described in this paper are better than not only the individual classifiers' but also ones obtained by combining multiple classifiers with the same feature. It indicates that the use of combining multiple classifiers with different features is an effective way to attack the problem of text-independent speaker identification.

Journal ArticleDOI
Se June Hong1
TL;DR: A new approach to classification rules or decision trees from examples by finding each feature's "obligation" to the class discrimination in the context of other features, which is a powerful alternative to the traditional methods.
Abstract: Deriving classification rules or decision trees from examples is an important problem. When there are too many features, discarding weak features before the derivation process is highly desirable. When there are numeric features, they need to be discretized for the rule generation. We present a new approach to these problems. Traditional techniques make use of feature merits based on either the information theoretic, or the statistical correlation between each feature and the class. We instead assign merits to features by finding each feature's "obligation" to the class discrimination in the context of other features. The merits are then used to rank the features, select a feature subset, and discretize the numeric variables. Experience with benchmark example sets demonstrates that the new approach is a powerful alternative to the traditional methods. This paper concludes by posing some new technical issues that arise from this approach.

Journal ArticleDOI
TL;DR: In this paper, a new approach for the recognition of control chart patterns (CCPs) is described, which uses features extracted from a CCP instead of the unprocessed CCP data or its statistical properties for recognition task.
Abstract: This paper describes a new approach for the recognition of control chart patterns (CCPs). The approach uses features extracted from a CCP instead of the unprocessed CCP data or its statistical properties for the recognition task. These features represent the shape of the CCP explicitly. The approach has two main steps: (1) extraction of features and (2) recognition of patterns. A set of CCP feature extraction procedures are described in the paper. The extracted features are recognized using heuristics, induction and neural network techniques. The paper presents the results of analysing several hundred control chart patterns and gives a comparison with those reported in previous work.

Proceedings Article
08 Jul 1997
TL;DR: This paper presents as a baseline an informationgain-weighted CBL algorithm and applies it to three data sets from natural language processing (NLP) with skewed class distributions, and presents two CBL algorithms designed to improve the performance of minority class predictions.
Abstract: This paper addresses the problem of handling skewed class distributions within the case-based learning (CBL) framework. We rst present as a baseline an informationgain-weighted CBL algorithm and apply it to three data sets from natural language processing (NLP) with skewed class distributions. Although overall performance of the baseline CBL algorithm is good, we show that the algorithm exhibits poor performance on minority class instances. We then present two CBL algorithms designed to improve the performance of minority class predictions. Each variation creates test-case-speci c feature weights by rst observing the path taken by the test case in a decision tree created for the learning task, and then using pathspeci c information gain values to create an appropriate weight vector for use during case retrieval. When applied to the NLP data sets, the algorithms are shown to signi cantly increase the accuracy of minority class predictions while maintaining or improving overall classi cation accuracy.

Journal ArticleDOI
TL;DR: An active object recognition strategy which combines the use of an attention mechanism for focusing the search for a 3D object in a 2D image, with a viewpoint control strategy for disambiguating recovered object features is presented.

01 Jan 1997
TL;DR: Diet is described, an algorithm that directs search through a space of discrete weights using cross-validation error as its evaluation function, and shows that, for many data sets, there is an advantage to weighting features, but that increasing the number of possible weights beyond two has very little bene t and sometimes degrades performance.
Abstract: Nearest-neighbor algorithms are known to depend heavily on their distance metric. In this paper, we investigate the use of a weighted Euclidean metric in which the weight for each feature comes from a small set of options. We describe Diet, an algorithm that directs search through a space of discrete weights using cross-validation error as its evaluation function. Although a large set of possible weights can reduce the learner's bias, it can also lead to increased variance and over tting. Our empirical study shows that, for many data sets, there is an advantage to weighting features, but that increasing the number of possible weights beyond two (zero and one) has very little bene t and sometimes degrades performance.

Book ChapterDOI
25 Jul 1997
TL;DR: This paper describes a comprehensive set of techniques for learning local feature weights and evaluates these techniques on a case-base for conflict resolution in air traffic control and shows how introspective learning of feature weights improves retrieval and how it can be used to determine context sensitive local weights.
Abstract: We can learn a lot about what features are important for retrieval by comparing similar cases in a case-base. We can determine which features are important in predicting outcomes and we can assign weights to features accordingly. In the same manner we can discover which features are important in specific contexts and determine localised feature weights that are specific to individual cases. In this paper we describe a comprehensive set of techniques for learning local feature weights and we evaluate these techniques on a case-base for conflict resolution in air traffic control. We show how introspective learning of feature weights improves retrieval and how it can be used to determine context sensitive local weights. We also show that introspective learning does not work well in case-bases containing only pivotal cases because there is no redundancy to be exploited.

Journal ArticleDOI
TL;DR: This paper examines four current theoretical approaches to the representation and recognition of visual objects: structural descriptions, geometric constraints, multidimensional feature spaces and shape-space approximation.

Proceedings Article
01 Jan 1997
TL;DR: A new and simple approach to compensate for speech recognizers degradations is presented which uses mel-filter-bank (MFB) magnitudes as input features and missing feature theory to dynamically modify the probability computations performed in Hidden Markov Model recognizers.
Abstract: Speech recognizers trained with quiet wide-band speech degrade dramatically with high-pass, low-pass, and notch filtering, with noise, and with interruptions of the speech input. A new and simple approach to compensate for these degradations is presented which uses mel-filter-bank (MFB) magnitudes as input features and missing feature theory to dynamically modify the probability computations performed in Hidden Markov Model recognizers. When the identity of features missing due to filtering or masking is provided, recognition accuracy on a large talker-independent digit recognition task often rises from below 50% to above 95%. These promising results suggest future work to continuously estimate SNR's within MFB bands for dynamic adaptation of speech recognizers.

Journal ArticleDOI
TL;DR: Two new neuro-fuzzy schemes, one for classification and one for clustering problems, are proposed that compare quite well with the existing techniques, and in addition offer the advantages of one-pass learning and online adaptation.
Abstract: In this paper, we propose two new neuro-fuzzy schemes, one for classification and one for clustering problems. The classification scheme is based on Simpson's fuzzy min-max method (1992, 1993) and relaxes some assumptions he makes. This enables our scheme to handle mutually nonexclusive classes. The neuro-fuzzy clustering scheme is a multiresolution algorithm that is modeled after the mechanics of human pattern recognition. We also present data from an exhaustive comparison of these techniques with neural, statistical, machine learning, and other traditional approaches to pattern recognition applications. The data sets used for comparisons include those from the machine learning repository at the University of California, Irvine. We find that our proposed schemes compare quite well with the existing techniques, and in addition offer the advantages of one-pass learning and online adaptation.