scispace - formally typeset
Search or ask a question

Showing papers on "Feature (machine learning) published in 1999"


01 Apr 1999
TL;DR: This paper describes a fast, correlation-based filter algorithm that can be applied to continuous and discrete problems and performs more feature selection than ReliefF does—reducing the data dimensionality by fifty percent in most cases.
Abstract: Algorithms for feature selection fall into two broad categories: wrappers that use the learning algorithm itself to evaluate the usefulness of features and filters that evaluate features according to heuristics based on general characteristics of the data. For application to large databases, filters have proven to be more practical than wrappers because they are much faster. However, most existing filter algorithms only work with discrete classification problems. This paper describes a fast, correlation-based filter algorithm that can be applied to continuous and discrete problems. The algorithm often outperforms the well-known ReliefF attribute estimator when used as a preprocessing step for naive Bayes, instance-based learning, decision trees, locally weighted regression, and model trees. It performs more feature selection than ReliefF does—reducing the data dimensionality by fifty percent in most cases. Also, decision and model trees built from the preprocessed data are often significantly smaller.

1,653 citations


Proceedings Article
01 Jan 1999
TL;DR: A formulation of the SVM is proposed that enables a multi-class pattern recognition problem to be solved in a single optimisation and a similar generalization of linear programming machines is proposed.
Abstract: The solution of binary classi cation problems using support vector machines (SVMs) is well developed, but multi-class problems with more than two classes have typically been solved by combining independently produced binary classi ers. We propose a formulation of the SVM that enables a multi-class pattern recognition problem to be solved in a single optimisation. We also propose a similar generalization of linear programming machines. We report experiments using bench-mark datasets in which these two methods achieve a reduction in the number of support vectors and kernel calculations needed. 1. k-Class Pattern Recognition The k-class pattern recognition problem is to construct a decision function given ` iid (independent and identically distributed) samples (points) of an unknown function, typically with noise: (x1; y1); : : : ; (x`; y`) (1) where xi; i = 1; : : : ; ` is a vector of length d and yi 2 f1; : : : ; kg represents the class of the sample. A natural loss function is the number of mistakes made. 2. Solving k-Class Problems with Binary SVMs For the binary pattern recognition problem (case k = 2), the support vector approach has been well developed [3, 5]. The classical approach to solving k-class pattern recognition problems is to consider the problem as a collection of binary classi cation problems. In the one-versus-rest method one constructs k classi ers, one for each class. The n classi er constructs a hyperplane between class n and the k 1 other classes. A particular point is assigned to the class for which the distance from the margin, in the positive direction (i.e. in the direction in which class \one" lies rather than class \rest"), is maximal. This method has been used widely in ESANN'1999 proceedings European Symposium on Artificial Neural Networks Bruges (Belgium), 21-23 April 1999, D-Facto public., ISBN 2-600049-9-X, pp. 219-224

873 citations


Journal ArticleDOI
TL;DR: Assessment of the approach on quantitative and qualitative grounds demonstrates its effectiveness in two very different domains, Wall Street Journal news articles and television broadcast news story transcripts, using a new probabilistically motivated error metric.
Abstract: This paper introduces a new statistical approach to automatically partitioning text into coherent segments. The approach is based on a technique that incrementally builds an exponential model to extract features that are correlated with the presence of boundaries in labeled training text. The models use two classes of features: topicality features that use adaptive language models in a novel way to detect broad changes of topic, and cue-word features that detect occurrences of specific words, which may be domain-specific, that tend to be used near segment boundaries. Assessment of our approach on quantitative and qualitative grounds demonstrates its effectiveness in two very different domains, Wall Street Journal news articles and television broadcast news story transcripts. Quantitative results on these domains are presented using a new probabilistically motivated error metric, which combines precision and recall in a natural and flexible way. This metric is used to make a quantitative assessment of the relative contributions of the different feature types, as well as a comparison with decision trees and previously proposed text segmentation algorithms.

728 citations


Journal ArticleDOI
TL;DR: It is shown that feature sets based upon the short-time Fourier transform, the wavelets transform, and the wavelet packet transform provide an effective representation for classification, provided that they are subject to an appropriate form of dimensionality reduction.

625 citations


Proceedings ArticleDOI
08 Feb 1999
TL;DR: In this paper, a nonlinear form of principal component analysis (PCA) is proposed to perform polynomial feature extraction in high-dimensional feature spaces, related to input space by some nonlinear map; for instance, the space of all possible d-pixel products in images.
Abstract: A new method for performing a nonlinear form of Principal Component Analysis is proposed. By the use of integral operator kernel functions, one can efficiently compute principal components in highdimensional feature spaces, related to input space by some nonlinear map; for instance the space of all possible d-pixel products in images. We give the derivation of the method and present experimental results on polynomial feature extraction for pattern recognition.

430 citations


Book
01 Sep 1999
TL;DR: The authors consolidate a wealth of information previously scattered in disparate articles, journals, and edited volumes, explaining both the theory of neuro-fuzzy computing and the latest methodologies for performing different pattern recognition tasks in the neuro- fuzzy network.
Abstract: From the Publisher: The authors consolidate a wealth of information previously scattered in disparate articles, journals, and edited volumes, explaining both the theory of neuro-fuzzy computing and the latest methodologies for performing different pattern recognition tasks in the neuro-fuzzy network - classification, feature evaluation, rule generation, knowledge extraction, and hybridization. Special emphasis is given to the integration of neuro-fuzzy methods with rough sets and genetic algorithms (GAs) to ensure more efficient recognition systems.

282 citations


Journal ArticleDOI
TL;DR: A hidden Markov model-based approach designed to recognize off-line unconstrained handwritten words for large vocabularies and can be successfully used for handwritten word recognition.
Abstract: Describes a hidden Markov model-based approach designed to recognize off-line unconstrained handwritten words for large vocabularies. After preprocessing, a word image is segmented into letters or pseudoletters and represented by two feature sequences of equal length, each consisting of an alternating sequence of shape-symbols and segmentation-symbols, which are both explicitly modeled. The word model is made up of the concatenation of appropriate letter models consisting of elementary HMMs and an HMM-based interpolation technique is used to optimally combine the two feature sets. Two rejection mechanisms are considered depending on whether or not the word image is guaranteed to belong to the lexicon. Experiments carried out on real-life data show that the proposed approach can be successfully used for handwritten word recognition.

243 citations


Book
02 Mar 1999
TL;DR: Decision functions classification by distance functions and clustering classification using statistical approach feature selection fuzzy classification and pattern recognition syntactic pattern recognition neural nets and pattern classification.
Abstract: Decision functions classification by distance functions and clustering classification using statistical approach feature selection fuzzy classification and pattern recognition syntactic pattern recognition neural nets and pattern classification.

232 citations


Journal ArticleDOI
Nei Kato1, M. Suzuki, Shinichiro Omachi1, Hirotomo Aso1, Yoshiaki Nemoto1 
TL;DR: A precise system for handwritten Chinese and Japanese character recognition using transformation based on partial inclination detection (TPID) and city block distance with deviation and asymmetric Mahalanobis distance (AMD) are presented.
Abstract: This paper presents a precise system for handwritten Chinese and Japanese character recognition. Before extracting directional element feature (DEF) from each character image, transformation based on partial inclination detection (TPID) is used to reduce undesired effects of degraded images. In the recognition process, city block distance with deviation (CBDD) and asymmetric Mahalanobis distance (AMD) are proposed for rough classification and fine classification. With this recognition system, the experimental result of the database ETL9B reaches to 99.42%.

216 citations


Proceedings Article
01 Jan 1999
TL;DR: A new composite similarity metric is presented that combines information from multiple linguistic indicators to measure semantic distance between pairs of small textual units and is evaluated against standard information retrieval techniques, establishing that the new method is more effective in identifying closely related textual units.
Abstract: We present a new composite similarity metric that combines information from multiple linguistic indicators to measure semantic distance between pairs of small textual units. Several potential features are investigated and an optimal combination is selected via machine learning. We discuss a more restrictive definition of similarity than traditional, document-level and information retrieval-oriented, notions of similarity, and motivate it by showing its relevance to the multi-document text summarization problem. Results from our system are evaluated against standard information retrieval techniques, establishing that the new method is more effective in identifying closely related textual units.

205 citations


Journal ArticleDOI
01 Aug 1999
TL;DR: A neural-based crowd estimation system for surveillance in complex scenes at underground station platform is presented, based on the proposed hybrid of the least-squares and global search algorithms which are capable of providing the global search characteristic and fast convergence speed.
Abstract: A neural-based crowd estimation system for surveillance in complex scenes at underground station platform is presented. Estimation is carried out by extracting a set of significant features from sequences of images. Those feature indexes are modeled by a neural network to estimate the crowd density. The learning phase is based on our proposed hybrid of the least-squares and global search algorithms which are capable of providing the global search characteristic and fast convergence speed. Promising experimental results are obtained in terms of accuracy and real-time response capability to alert operators automatically.

Proceedings ArticleDOI
15 Mar 1999
TL;DR: This paper explores the issues involved in applying SVMs to phonetic classification as a first step to speech recognition and presents results on several standard vowel and phonetic Classification tasks and shows better performance than Gaussian mixture classifiers.
Abstract: Support vector machines (SVMs) represent a new approach to pattern classification which has attracted a great deal of interest in the machine learning community. Their appeal lies in their strong connection to the underlying statistical learning theory, in particular the theory of structural risk minimization. SVMs have been shown to be particularly successful in fields such as image identification and face recognition; in many problems SVM classifiers have been shown to perform much better than other nonlinear classifiers such as artificial neural networks and k-nearest neighbors. This paper explores the issues involved in applying SVMs to phonetic classification as a first step to speech recognition. We present results on several standard vowel and phonetic classification tasks and show better performance than Gaussian mixture classifiers. We also present an analysis of the difficulties we foresee in applying SVMs to continuous speech recognition problems.

Journal ArticleDOI
TL;DR: The shapes and firing rates of motor unit action potentials (MUAPs) in an electromyographic signal provide an important source of information for the diagnosis of neuromuscular disorders and two different pattern recognition techniques are presented.
Abstract: The shapes and firing rates of motor unit action potentials (MUAPs) in an electromyographic (EMG) signal provide an important source of information for the diagnosis of neuromuscular disorders. In order to extract this information from EMG signals recorded at low to moderate force levels, it is required: i) to identify the MUAPs composing the EMG signal, ii) to classify MUAPs with similar shape, and iii) to decompose the superimposed MUAP waveforms into their constituent MUAPs. For the classification of MUAPs two different pattern recognition techniques are presented: i) an artificial neural network (ANN) technique based on unsupervised learning, using a modified version of the self-organizing feature maps (SOFM) algorithm and learning vector quantization (LVQ) and ii) a statistical pattern recognition technique based on the Euclidean distance. A total of 1213 MUAPs obtained from 12 normal subjects, 13 subjects suffering from myopathy, and 15 subjects suffering from motor neuron disease were analyzed. The success rate for the ANN technique was 97.6% and for the statistical technique 95.3%. For the decomposition of the superimposed waveforms, a technique using crosscorrelation for MUAP's alignment, and a combination of Euclidean distance and area measures in order to classify the decomposed waveforms is presented. The success rate for the decomposition procedure was 90%.

Journal ArticleDOI
TL;DR: The Self-Organizing Map (SOM) and Learning Vector Quantization (LVQ) algorithms are constructed in this work for variable-length and warped feature sequences and good results have been obtained in speaker-independent speech recognition.
Abstract: The Self-Organizing Map (SOM) and Learning Vector Quantization (LVQ) algorithms are constructed in this work for variable-length and warped feature sequences. The novelty is to associate an entire feature vector sequence, instead of a single feature vector, as a model with each SOM node. Dynamic time warping is used to obtain time-normalized distances between sequences with different lengths. Starting with random initialization, ordered feature sequence maps then ensue, and Learning Vector Quantization can be used to fine tune the prototype sequences for optimal class separation. The resulting SOM models, the prototype sequences, can then be used for the recognition as well as synthesis of patterns. Good results have been obtained in speaker-independent speech recognition.

Journal ArticleDOI
Zhengyou Zhang1
TL;DR: Experiments show that facial expression recognition is mainly a low frequency process, and a spatial resolution of 64 pixels × 64 pixels is probably enough to represent the space of facial expressions.
Abstract: In this paper, we report our experiments on feature-based facial expression recognition within an architecture based on a two-layer perceptron. We investigate the use of two types of features extracted from face images: the geometric positions of a set of fiducial points on a face, and a set of multiscale and multiorientation Gabor wavelet coefficients at these points. They can be used either independently or jointly. The recognition performance with different types of features has been compared, which shows that Gabor wavelet coefficients are much more powerful than geometric positions. Furthermore, since the first layer of the perceptron actually performs a nonlinear reduction of the dimensionality of the feature space, we have also studied the desired number of hidden units, i.e. the appropriate dimension to represent a facial expression in order to achieve a good recognition rate. It turns out that five to seven hidden units are probably enough to represent the space of facial expressions. Then, we have investigated the importance of each individual fiducial point to facial expression recognition. Sensitivity analysis reveals that points on cheeks and on forehead carry little useful information. After discarding them, not only the computational efficiency increases, but also the generalization performance slightly improves. Finally, we have studied the significance of image scales. Experiments show that facial expression recognition is mainly a low frequency process, and a spatial resolution of 64 pixels × 64 pixels is probably enough.

Journal ArticleDOI
TL;DR: A new approach to computer supported diagnosis of skin tumors in dermatology is presented, using neural networks with error back-propagation as learning paradigm to optimized classification performance of the neural classifiers.

Patent
Junji Kashioka1, Satoshi Naoi1
24 Jun 1999
TL;DR: In this article, a character pattern is extracted from image data read from a document, listing, etc., and discriminated between a hand-written character and a typed character by a hand written/typed character discrimination unit.
Abstract: A character pattern is extracted from image data read from a document, listing, etc., and discriminated between a hand-written character and a typed character by a hand-written/typed character discrimination unit. The hand-written/typed character discrimination unit obtains, from the character pattern, N feature vectors containing a feature indicating at least the complexity and the linearity of the character pattern; and discriminating the character pattern between a hand-written character and a typed character using the feature vectors. A character recognition unit performs a character recognizing process based on the result of discriminating whether the character data is a hand-written character or a typed character. As a feature of the above described character pattern, the variance of line widths, the variance of character positions, etc. can also be used.

Patent
Masayoshi Okamoto1
26 Apr 1999
TL;DR: In this paper, an imaginary stroke is used to link from the ending point of each actual stroke of an input handwritten character to the starting point of the subsequent actual stroke thereof to form a single line.
Abstract: In accordance with the present character recognition method, an imaginary stroke is used to link from the ending point of each actual stroke of an input handwritten character to the starting point of the subsequent actual stroke thereof to form a single line. Then a feature level is detected for specifying the position of a turn of the single line and the direction and angle of the turn at the position. According to the detected position of the turn, the detected feature level is patterned on input mesh memories which are in turn compared with a previously formed dictionary mesh memory to calculate the resemblance of the input handwritten character to each handwritten character in a dictionary database. The handwritten character in the dictionary database that has the closest, calculated resemblance is recognized as the input handwritten character. According to the present method, an imaginary stroke added to an input handwritten character also allows correct recognition of a character with each stroke written cursively.

Journal ArticleDOI
TL;DR: The self-organizing hierarchical optimal subspace learning and inference framework (SHOSLIF) system uses the theories of optimal linear projection for optimal feature derivation and a hierarchical structure to achieve logarithmic retrieval complexity.
Abstract: A self-organizing framework for object recognition is described. We describe a hierarchical database structure for image retrieval. The self-organizing hierarchical optimal subspace learning and inference framework (SHOSLIF) system uses the theories of optimal linear projection for optimal feature derivation and a hierarchical structure to achieve logarithmic retrieval complexity. A space-tessellation tree is generated using the most expressive features (MEF) and most discriminating features (MDF) at each level of the tree. The major characteristics of the analysis include: (1) avoiding the limitation of global linear features by deriving a recursively better-fitted set of features for each of the recursively subdivided sets of training samples; (2) generating a smaller tree whose cell boundaries separate the samples along the class boundaries better than the principal component analysis, thereby giving a better generalization capability (i.e., better recognition rate in a disjoint test); (3) accelerating the retrieval using a tree structure for data pruning, utilizing a different set of discriminant features at each level of the tree. We allow for perturbations in the size and position of objects in the images through learning. We demonstrate the technique on a large image database of widely varying real-world objects taken in natural settings, and show the applicability of the approach for variability in position, size, and 3D orientation. This paper concentrates on the hierarchical partitioning of the feature spaces.

Journal ArticleDOI
TL;DR: A new approach to combine multiple features in handwriting recognition based on two ideas: feature selection-based combination and class dependent features that are effective in separating pattern classes and the new feature vector derived from a combination of two types of such features further improves the recognition rate.
Abstract: In this paper, we propose a new approach to combine multiple features in handwriting recognition based on two ideas: feature selection-based combination and class dependent features. A nonparametric method is used for feature evaluation, and the first part of this paper is devoted to the evaluation of features in terms of their class separation and recognition capabilities. In the second part, multiple feature vectors are combined to produce a new feature vector. Based on the fact that a feature has different discriminating powers for different classes, a new scheme of selecting and combining class-dependent features is proposed. In this scheme, a class is considered to have its own optimal feature vector for discriminating itself from the other classes. Using an architecture of modular neural networks as the classifier, a series of experiments were conducted on unconstrained handwritten numerals. The results indicate that the selected features are effective in separating pattern classes and the new feature vector derived from a combination of two types of such features further improves the recognition rate.

Journal ArticleDOI
TL;DR: Performance benefits have been demonstrated from incorporating a linear trajectory description and additionally from modelling variability in the mid-point parameter, and theoretical and experimental comparisons between different types of PTSHMMs, simpler SHMMs and conventional HMMs are presented.

Journal ArticleDOI
TL;DR: Tested on five real-world databases, the MLP provides the highest classification accuracy at the cost of deforming the data structure, whereas the linear models preserve the structure but usually with inferior accuracy.

Proceedings ArticleDOI
20 Sep 1999
TL;DR: Three procedures, based on the curvature coefficient, biquadratic interpolation and gradient vector interpolation, are proposed for calculating the curvatures of the equi-gray-scale curves of an input image.
Abstract: Studies the use of curvature in addition to the gradient of gray-scale character images in order to improve the accuracy of handwritten numeral recognition. Three procedures, based on the curvature coefficient, biquadratic interpolation and gradient vector interpolation, are proposed for calculating the curvature of the equi-gray-scale curves of an input image. The efficiency of the feature vector is tested by recognition experiments for the handwritten numeral database IPTP CDROM1, which is a ZIP code database provided by the Institute for Posts and Telecommunications Policy (IPTP). The experimental results show the usefulness of the curvature feature, and a recognition rate of 99.40%, which is the highest that has ever been reported for this database, is achieved.

Journal ArticleDOI
TL;DR: An artificial neural network based model, which employs a pattern discrimination algorithm to recognise unnatural control chart patterns and is capable of superior ARL performance while the type of the unnatural pattern can also be accurately identified.

Journal ArticleDOI
TL;DR: An approach to constructing a kernel function which takes into account some domain knowledge about a problem and thus diminishes the number of noisy parameters in high dimensional feature space is suggested and its application to Texture Recognition is described.

Journal ArticleDOI
TL;DR: This paper has developed a view matching technique based on an eigenspace approximation to the generalized Hausdorff measure that achieves compact storage and fast indexing that are the main advantages of eIGenspace view matching techniques, while also being tolerant of partial occlusion and background clutter.
Abstract: View-based recognition methods, such as those using eigenspace techniques, have been successful for a number of recognition tasks. Such approaches, however, are somewhat limited in their ability to recognize objects that are partly hidden from view or occur against cluttered backgrounds. In order to address these limitations, we have developed a view matching technique based on an eigenspace approximation to the generalized Hausdorff measure. This method achieves compact storage and fast indexing that are the main advantages of eigenspace view matching techniques, while also being tolerant of partial occlusion and background clutter. The method applies to binary feature maps, such as intensity edges, rather than directly to intensity images.

Book ChapterDOI
01 Aug 1999
TL;DR: The first genetic programming variant is weighting data records for calculating the classification error and modifying the weights during the run, hereby the algorithm is defining its own fitness function in an on-line fashion giving higher weights to 'hard' records.
Abstract: In this paper we report the results of a comparative study on different variations of genetic programming applied on binary data classification problems. The first genetic programming variant is weighting data records for calculating the classification error and modifying the weights during the run. Hereby the algorithm is defining its own fitness function in an on-line fashion giving higher weights to 'hard' records. Another novel feature we study is the atomic representation, where `Booleanization' of data is not performed at the root, but at the leafs of the trees and only Boolean functions are used in the trees' body. As a third aspect we look at generational and steady-state models in combination of both features.

Journal ArticleDOI
TL;DR: The authors explored the integration of facial features and house parts to form holistic representations of complete objects and found that facial features used in the matching task contribute differentially to CPAs across varying probe delays but with a similar pattern to that found in the recognition task.
Abstract: We explore the integration of facial features and house parts to form holistic representations of complete objects. In Experiments 1, 2, and 3, we test for evidence of the holistic representation of houses and faces. We do so by testing for a complete over part probe advantage (CPA) in 2AFC recognition and matching tasks. We present evidence consistent with holistic features being represented for both types of stimuli. In Experiments 4 and 5, we examine further theeffect with faces. Experiment 4 shows thatfacial features used in the matching task contribute differentially to CPAs across varying probe delays but with a similar pattern to that found in the recognition task (Experiment 1). Experiment5 shows thatCPAs are mandatory and cannot be removed by precueing with the probe type or the name of the feature to be probed.

Journal ArticleDOI
TL;DR: The results suggest that recognition across feature variations is based on an averaging mechanism, whereas recognition across viewpoint variations isbased on an approximation mechanism.
Abstract: The prototype effect in face recognition refers to a tendency to recognize the face corresponding to the central value of a series of seen faces, even when this central value or prototype has not been seen. Five experiments investigated the extension and limits of this phenomenon. In all the experiments, participants saw a series of faces, each one in two or more different versions or exemplars, and then performed a recognition test, including seen and unseen exemplars and the unseen prototype face. In Experiment 1, a strong prototype effect for variations in feature location was demonstrated in oldness ratings and in a standard old/new recognition test. Experiments 2A and 2B compared the prototype effect for variations in feature location and variations in head angle and showed that, for the latter, the prototype effect was weaker and more dependent on similarity than for the former. These results suggest that recognition across feature variations is based on an averaging mechanism, whereas recognition across viewpoint variations is based on an approximation mechanism. Experiments 3A and 3B examined the limits of the prototype effect using a face morphing technique that allows a systematic manipulation of face similarity. The results indicated that, as the similarity between face exemplars decreases to the level of similarity between the faces of different individuals, the prototype effect starts to disappear. At the same time, the prototype effect may originate false memories of faces that were never seen.

Journal ArticleDOI
TL;DR: This work focuses only on one aspect of pattern recognition, feature analysis, and discusses various methods using fuzzy logic, neural networks and genetic algorithms for feature ranking, selection and extraction including structure preserving dimensionality reduction.