scispace - formally typeset
Search or ask a question

Showing papers on "Linear classifier published in 1995"


Journal ArticleDOI
TL;DR: High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated and the performance of the support- vector network is compared to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Abstract: The support-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data. High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

37,861 citations


Patent
David Dolan Lewis1
07 Jun 1995
TL;DR: In this paper, a supervised learning system and an annotation system are operated cooperatively to produce a classification vector which can be used to classify documents with respect to a defined class, where the degree of relevance annotation represents the degree to which the document belongs to the defined class.
Abstract: A method and apparatus for training a text classifier is disclosed. A supervised learning system and an annotation system are operated cooperatively to produce a classification vector which can be used to classify documents with respect to a defined class. The annotation system automatically annotates documents with a degree of relevance annotation to produce machine annotated data. The degree of relevance annotation represents the degree to which the document belongs to the defined class. This machine annotated data is used as input to the supervised learning system. In addition to the machine annotated data, the supervised learning system can also receive manually annotated data and/or a user request. The machine annotated data, along with the manually annotated data and/or the user request, are used by the supervised learning system to produce a classification vector. In one embodiment, the supervised learning system comprises a relevance feedback mechanism. The relevance feedback mechanism is operated cooperatively with the annotation system for multiple iterations until a classification vector of acceptable accuracy is produced. The classification vector produced by the invention is the result of a combination of supervised and unsupervised learning.

164 citations


Proceedings Article
J. Bala1, J. Huang1, H. Vafaie1, K. Dejong1, Harry Wechsler1 
20 Aug 1995
TL;DR: A hybrid learning methodology that integrates genetic algorithms (GAs) and decision tree learning (ID3) in order to evolve optimal subsets of discriminatory features for robust pattern classification is introduced.
Abstract: This paper introduces a hybrid learning methodology that integrates genetic algorithms (GAs) and decision tree learning (ID3) in order to evolve optimal subsets of discriminatory features for robust pattern classification. A GA is used to search the space of all possible subsets of a large set of candidate discrimination features. For a given feature subset, ID3 is invoked to produce a decision tree. The classification performance of the decision tree on unseen data is used as a measure of fitness for the given feature set, which, in turn, is used by the GA to evolve better feature sets. This GA-ID3 process iterates until a feature subset is found with satisfactory classification performance. Experimental results are presented which illustrate the feasibility of our approach on difficult problems involving recognizing visual concepts in satellite and facial image data. The results also show improved classification performance and reduced description complexity when compared against standard methods for feature selection.

139 citations


Journal ArticleDOI
TL;DR: This work shows that, in the authors' pattern classification problem, using a feature selection step reduced the number of features used, reduced the processing time requirements, and gave results comparable to the full set of features.
Abstract: In pattern classification problems, the choice of variables to include in the feature vector is a difficult one. The authors have investigated the use of stepwise discriminant analysis as a feature selection step in the problem of segmenting digital chest radiographs. In this problem, locally calculated features are used to classify pixels into one of several anatomic classes. The feature selection step was used to choose a subset of features which gave performance equivalent to the entire set of candidate features, while utilizing less computational resources. The impact of using the reduced/selected feature set on classifier performance is evaluated for two classifiers: a linear discriminator and a neural network. The results from the reduced/selected feature set were compared to that of the full feature set as well as a randomly selected reduced feature set. The results of the different feature sets were also compared after applying an additional postprocessing step which used a rule-based spatial information heuristic to improve the classification results. This work shows that, in the authors' pattern classification problem, using a feature selection step reduced the number of features used, reduced the processing time requirements, and gave results comparable to the full set of features. >

134 citations


Book ChapterDOI
03 Oct 1995
TL;DR: This paper is concerned with the problem of characterization of classification algorithms and involves generation of different kinds of models, which include regression and rule models, piecewise linear models (model trees) and instance based models.
Abstract: This paper is concerned with the problem of characterization of classification algorithms. The aim is to determine under what circumstances a particular classification algorithm is applicable. The method used involves generation of different kinds of models. These include regression and rule models, piecewise linear models (model trees) and instance based models. These are generated automatically on the basis of dataset characteristics and given test results. The lack of data is compensated for by various types of preprocessing. The models obtained are characterized by quantifying their predictive capability and the best models are identified.

129 citations


01 Jan 1995
TL;DR: It is shown that there exists a good linear classifier, that is better than the Nearest Mean classifier for sample sizes for which Fisher’s linear discriminant cannot be used.
Abstract: The generalization of linear classifiers is considered for training sample sizes smaller than the feature size. It is shown that there exists a good linear classifier, that is better than the Nearest Mean classifier for sample sizes for which Fisher’s linear discriminant cannot be used. The use and performance of this small sample size classifier is illustrated by some examples.

67 citations


Journal ArticleDOI
TL;DR: A procedure is proposed for processing images and detecting objects using a sliding window mode and allowing for an effective realization in terms of the computational complexity and the quality of detection.
Abstract: A procedure is proposed for processing images and detecting objects using a sliding window mode and allowing for an effective realization in terms of the computational complexity and the quality of detection. The main stages of data transformation are reported: preliminary image processing, recursive calculation of local features, the generation of the field of values for the discriminant function and object localization. The algorithm of the parametric set-up is developed and realized in the form of the learning of a linear classifier. An example is presented that shows the efficiency of the developed detection procedure.

29 citations


01 May 1995
TL;DR: In this article, the classification accuracy of three neural network classifiers on a satellite image-based pattern classification problem was evaluated, and the performance of the classifiers were analyzed for generalization capability and stability of results.
Abstract: This paper evaluates the classification accuracy of three neural network classifiers on a satellite image-based pattern classification problem. The neural network classifiers used include two types of the Multi-Layer-Perceptron (MLP) and the Radial Basis Function Network. A normal (conventional) classifier is used as a benchmark to evaluate the performance of neural network classifiers. The satellite image consists of 2,460 pixels selected from a section (270 x 360) of a Landsat-5 TM scene from the city of Vienna and its northern surroundings. In addition to evaluation of classification accuracy, the neural classifiers are analysed for generalization capability and stability of results. Best overall results (in terms of accuracy and convergence time) are provided by the MLP-1 classifier with weight elimination. It has a small number of parameters and requires no problem-specific system of initial weight values. Its in-sample classification error is 7.87% and its out-of-sample classification error is 10.24% for the problem at hand. Four classes of simulations serve to illustrate the properties of the classifier in general and the stability of the result with respect to control parameters, and on the training time, the gradient descent control term, initial parameter conditions, and different training and testing sets. (authors' abstract)

29 citations


Proceedings ArticleDOI
14 Aug 1995
TL;DR: This paper presents a signature verification system based on 84 personally selected parameters, first reduced globally by statistical methods and then performed on the reduced parameter set by use of several norms and a linear classifier.
Abstract: This paper presents a signature verification system based on 84 personally selected parameters. An initial set of 300 parameters is first reduced globally by statistical methods. A personalized parameter selection is performed on the reduced parameter set by use of several norms and a linear classifier. The personalized parameter sets of different signers justify that parameter selection should be personalized. Verification results for personalized parameters are given for different norms and compared to results achieved with a function-based system. A method for combining both approaches and the results achieved with this extended system are presented as well.

26 citations


Journal ArticleDOI
TL;DR: This paper evaluates the classification accuracy of three neural network classifiers on a satellite image-based pattern classification problem and finds best overall results (in terms of accuracy and convergence time) are provided by the MLP-1 classifier with weight elimination.
Abstract: This paper evaluates the classification accuracy of three neural network classifiers on a satellite image-based pattern classification problem. The neural network classifiers used include two types of the Multi-Layer-Perceptron (MLP) and the Radial Basis Function Network. A normal (conventional) classifier is used as a benchmark to evaluate the performance of neural network classifiers. The satellite image consists of 2,460 pixels selected from a section (270 x 360) of a Landsat-5 TM scene from the city of Vienna and its northern surroundings. In addition to evaluation of classification accuracy, the neural classifiers are analysed for generalization capability and stability of results. Best overall results (in terms of accuracy and convergence time) are provided by the MLP-1 classifier with weight elimination. It has a small number of parameters and requires no problem-specific system of initial weight values. Its in-sample classification error is 7.87% and its out-of-sample classification error is 10.24% for the problem at hand. Four classes of simulations serve to illustrate the properties of the classifier in general and the stability of the result with respect to control parameters, and on the training time, the gradient descent control term, initial parameter conditions, and different training and testing sets.

24 citations


Proceedings ArticleDOI
05 Aug 1995
TL;DR: Results from classifier trials show that object classification using the hybrid classifier can be done as accurately as using the minimum-distance classifier, but at lower computational expense.
Abstract: We address the problem of autonomously classifying objects from the sounds they make when struck, and present results from different attempts to classify various items. We extract the two most significant spikes in the frequency domain as features, and show that accurate object classification based on these features is possible. Two techniques are discussed: a minimum-distance classifier and a hybrid minimum-distance/decision-tree classifier. Results from classifier trials show that object classification using the hybrid classifier can be done as accurately as using the minimum-distance classifier, but at lower computational expense.

Journal ArticleDOI
01 Jan 1995
TL;DR: It is shown that minimizing the window criterion function yields a linear classifier that minimizes the probability of misclassification (i.e., the "error rate"), but window training may produce a local minimum that exceeds the global minimum error rate.
Abstract: Window training, based on an extended form of stochastic approximation, offers a means of producing linear classifiers that minimize the probability of misclassification of statistically generated data. Associated with window training is a window criterion function. We show that minimizing the window criterion function yields a linear classifier that minimizes the probability of misclassification (i.e., the "error rate"). However window training may produce a local minimum that exceeds the global minimum error rate. We show that this defect does not occur in the error-correcting perceptron. The criterion minimized by that training procedure is "convex"; i.e., the perceptron criterion has only one local minimum. Consequently we recommend that window training be preceded by perceptron training, the perceptron training producing a decision surface which the window training process will move to a position that is likely to be globally optimum. >

Proceedings ArticleDOI
01 Jan 1995
TL;DR: This work applies a minimum classification error training algorithm for simultaneous design of feature extractor and pattern classifier, and demonstrates some of its properties and advantages.
Abstract: Recently, a minimum classification error training algorithm has been proposed for minimizing the misclassification probability based on a given set of training samples using a generalized probabilistic descent method This algorithm is a type of discriminative learning algorithm, but it approaches the objective of minimum classification error in a more direct manner than the conventional discriminative training algorithms We apply this algorithm for simultaneous design of feature extractor and pattern classifier, and demonstrate some of its properties and advantages

Journal ArticleDOI
TL;DR: A scheme for unsupervised probabilistic time series classification is detailed that utilizes autocorrelation terms as discriminatory features and employs the Volterra Connectionist Model to transform the multi-dimensional feature information of each training vector to a one-dimensional classification space.

Proceedings ArticleDOI
09 May 1995
TL;DR: An algorithm for classification task dependent multiscale feature extraction that focuses on dimensionality reduction of the feature space subject to maximum preservation of classification information is suggested.
Abstract: An algorithm for classification task dependent multiscale feature extraction is suggested. The algorithm focuses on dimensionality reduction of the feature space subject to maximum preservation of classification information. It has been shown that, for classification tasks, class separability based features are appropriate alternatives to features selected based on energy and entropy criteria. Application of this idea to feature extraction from multi-scale wavelet packets is presented. At each level of decomposition an optimal linear transform that preserves class separabilities and results in a reduced dimensional feature space is obtained. Classification and feature extraction is performed at each scale and resulting "soft decisions" are integrated across scales. The suggested scheme can also be applied to other orthogonal or non-orthogonal multiscale transforms e.g. local cosine transform or Gabor transform. The suggested algorithm has been tested on classification and segmentation of some radar target signatures as well as textured and document images.

Proceedings ArticleDOI
09 May 1995
TL;DR: This builds on previous work but introduces new techniques which are used to exploit the acoustic and phonetic differences between the languages in the OGI Multi-language Telephone Speech Corpus.
Abstract: Language identification experiments have been carried out on language pairs taken from seven of the languages in the OGI Multi-language Telephone Speech Corpus. This builds on previous work but introduces new techniques which are used to exploit the acoustic and phonetic differences between the languages. Subword hidden Markov models for the pair of languages are matched to unknown utterances resulting in three measures: the acoustic match, the phoneme frequencies and frequency histograms. Each of these measures gives 80 to 90% accuracy in discriminating language pairs. However these multiple knowledge sources are also combined to give improved results. Majority decision, logistic regression and a linear classifier were compared as data fusion techniques. The linear classifier performed the best giving an average accuracy of 89 to 93% on the pairs from the seven languages.

Proceedings ArticleDOI
27 Nov 1995
TL;DR: This paper reports the application of Kohonen's self-organising map (SOM) network to the classification of lithology from well log data, using the learning vector quantization (LVQ) algorithm to train the network under supervised learning.
Abstract: This paper reports the application of Kohonen's self-organising map (SOM) network to the classification of lithology from well log data. The well log data are classified into nodes according to a pre-defined grid arrangement. The learning vector quantization (LVQ) algorithm is then applied to train the network under supervised learning. After the network is trained, it is used as the classification model for subsequent data. Results obtained from example studies using this proposed method have shown to be fast and accurate.

Proceedings ArticleDOI
29 Nov 1995
TL;DR: A simple model of a learning machine that evolves, a perceptron like learning machine obtains a proper set of feature detecting cells through mating, mutation, and natural selection is proposed.
Abstract: We propose a simple model of a learning machine that evolves. When a classification problem is given, a perceptron like learning machine obtains a proper set of feature detecting cells through mating, mutation, and natural selection. Computer simulation showed the expected results. This is one of our trials to approach the evolutionary system in the real world.

Proceedings Article
27 Nov 1995
TL;DR: A new learning algorithm is developed for the design of statistical classifiers minimizing the rate of misclassification and is demonstrated to substantially outperform other design methods on several benchmark examples, while often retaining design complexity comparable to, or only moderately greater than that of strict descent-based methods.
Abstract: A new learning algorithm is developed for the design of statistical classifiers minimizing the rate of misclassification. The method, which is based on ideas from information theory and analogies to statistical physics, assigns data to classes in probability. The distributions are chosen to minimize the expected classification error while simultaneously enforcing the classifier's structure and a level of "randomness" measured by Shannon's entropy. Achievement of the classifier structure is quantified by an associated cost. The constrained optimization problem is equivalent to the minimization of a Helmholtz free energy, and the resulting optimization method is a basic extension of the deterministic annealing algorithm that explicitly enforces structural constraints on assignments while reducing the entropy and expected cost with temperature. In the limit of low temperature, the error rate is minimized directly and a hard classifier with the requisite structure is obtained. This learning algorithm can be used to design a variety of classifier structures. The approach is compared with standard methods for radial basis function design and is demonstrated to substantially outperform other design methods on several benchmark examples, while often retaining design complexity comparable to, or only moderately greater than that of strict descent-based methods.

Proceedings ArticleDOI
27 Nov 1995
TL;DR: This paper extends the work of (Zhang and Giardino, 1992) and reports on the use of machine learning classifiers to obtain the minimum sample size for ground-based data surveys and compares the magnitude of sample sizes required for backpropagation neural networks (NN) and instance-based learning (IBL) with the same classification accuracy on unseen data.
Abstract: Environmental scientists prefer to construct spatial information system (SIS) decision support from the smallest possible data. This is due to the considerable cost of ground-based surveys for data collection. This paper extends the work of (Zhang and Giardino, 1992) and (Eklund et al., 1994) and reports on the use of machine learning classifiers to obtain the minimum sample size for ground-based data surveys. The study of machine learning algorithms proposes a method to assess ground-based data collection using machine learning classifiers. In this domain, the inductive learning program C4.5 was used to verify that a high performance classifier, better than 95% classification accuracy on unseen data, can be constructed using 235 sample points in the study area. We compare this result to the magnitude of sample sizes required for backpropagation neural networks (NN) and instance-based learning (IBL) with the same classification accuracy on unseen data. We examine the reasons and implications for these variations for classification accuracy in this domain.

Proceedings Article
27 Nov 1995
TL;DR: In this paper, a neural network classifier for the 11000 chip is described, which optically reads the E13B font characters at the bottom of checks, and the weights of the classifier are found using the active set method, similar to Vapnik's separating hyperplane algorithm.
Abstract: This paper describes a neural network classifier for the 11000 chip, which optically reads the E13B font characters at the bottom of checks. The first layer of the neural network is a hardware linear classifier which recognizes the characters in this font. A second software neural layer is implemented on an inexpensive microprocessor to clean up the results of the first layer. The hardware linear classifier is mathematically specified using constraints and an optimization principle. The weights of the classifier are found using the active set method, similar to Vapnik's separating hyperplane algorithm. In 7.5 minutes of SPARC 2 time, the method solves for 1523 Lagrange mUltipliers, which is equivalent to training on a data set of approximately 128,000 examples. The resulting network performs quite well: when tested on a test set of 1500 real checks, it has a 99.995% character accuracy rate.

Proceedings ArticleDOI
10 Sep 1995
TL;DR: A strategy for neural network training that combines supervised competitive and gradient descent learning and is then applied to classify high-resolution electrograms gives an increase in classification accuracy about 10%.
Abstract: A strategy for neural network training is described. It combines supervised competitive and gradient descent learning. This algorithm is then applied to classify high-resolution electrograms. The combined approach gives an increase in classification accuracy of about 10%. Nevertheless the results show that more elaborate feature extraction methods have to be considered.

Journal ArticleDOI
TL;DR: The cooperative learning neural network classify the remote sensing data more exactly than the other methods using single-step multi-layer neural network, maximum likelihood classifier and fuzzy set reasoning.
Abstract: Maximum likelihood classifier that is often used in classification of satellite images assumes the distribution of each class to Gaussian. Such linear classifier can classify correctly when the case that classification probability of each class is exclusive. Remotely sensed data, however, belong to several classes and have non-linear separable condition. To improve the classification accuracy of non-linear separable data, the application of the single-step multi-layer back propagation neural networks have been studied by many researchers. In this paper, multi-step multi-layer neural networks, so called cooperative learning neural networks, are proposed to classify the non-linear separable satellite data.The cooperative learning neural network consists of extraction networks for each class and an unification network which unifies the extracted values. The unification network is also used for unification of different environments such as time-series data or neighboring regions. The result of the classification of LANDSAT TM data of Nagoya city using the cooperative learning neural network is introduced. Classified image is compared with the detailed digital land cover information (TDT-112) and the images classified using single-step multi-layer neural network, maximum likelihood classifier and fuzzy set reasoning. As the result of the comparison, the cooperative learning neural network classify the remote sensing data more exactly than the other methods.

Proceedings ArticleDOI
27 Nov 1995
TL;DR: This paper shows that the convex regions induced by a decision tree with linear decision function cannot be represented by linear membership functions as suggested in the literature, and derives explicit expressions for the membership functions of these subregions.
Abstract: In this paper we show that the convex regions induced by a decision tree with linear decision function cannot be represented by linear membership functions as suggested in the literature. It appears that a faithful representation is only possible for subregions. We derive explicit expressions for the membership functions of these subregions. This approximation can be used to initialise a one-hidden-layer neural net.

Journal ArticleDOI
TL;DR: Simple examples clearly demonstrate that highly consistent data lead to solution nonattainability, in neural networks utilizing a logistics sigmoid function.
Abstract: Simple examples clearly demonstrate that highly consistent data lead to solution nonattainability, in neural networks utilizing a logistics sigmoid function. Solution attainability requires a high degree of inconsistency. Bounds are obtained on the optimal value of the mean-square error of a one-layer neural network, in terms of the minimum number of misclassifications obtained from three linear classification problems, and conditions are given that imply solution attainability and nonattainability

Journal ArticleDOI
TL;DR: Using leave-one-out cross-validation, the prediction accuracy of the model is found to be about the same as that of the expert, which allows application of the classification model to a large class of quality improvement studies.