scispace - formally typeset
Search or ask a question

Showing papers on "Linear classifier published in 2004"


Proceedings ArticleDOI
04 Jul 2004
TL;DR: This paper proposes to generalize multiclass Support Vector Machine learning in a formulation that involves features extracted jointly from inputs and outputs, and demonstrates the versatility and effectiveness of the method on problems ranging from supervised grammar learning and named-entity recognition, to taxonomic text classification and sequence alignment.
Abstract: Learning general functional dependencies is one of the main goals in machine learning. Recent progress in kernel-based methods has focused on designing flexible and powerful input representations. This paper addresses the complementary issue of problems involving complex outputs such as multiple dependent output variables and structured output spaces. We propose to generalize multiclass Support Vector Machine learning in a formulation that involves features extracted jointly from inputs and outputs. The resulting optimization problem is solved efficiently by a cutting plane algorithm that exploits the sparseness and structural decomposition of the problem. We demonstrate the versatility and effectiveness of our method on problems ranging from supervised grammar learning and named-entity recognition, to taxonomic text classification and sequence alignment.

1,446 citations


Proceedings ArticleDOI
Tong Zhang1
04 Jul 2004
TL;DR: Stochastic gradient descent algorithms on regularized forms of linear prediction methods, related to online algorithms such as perceptron, are studied, and numerical rate of convergence for such algorithms is obtained.
Abstract: Linear prediction methods, such as least squares for regression, logistic regression and support vector machines for classification, have been extensively used in statistics and machine learning. In this paper, we study stochastic gradient descent (SGD) algorithms on regularized forms of linear prediction methods. This class of methods, related to online algorithms such as perceptron, are both efficient and very simple to implement. We obtain numerical rate of convergence for such algorithms, and discuss its implications. Experiments on text data will be provided to demonstrate numerical and statistical consequences of our theoretical findings.

1,182 citations


Journal ArticleDOI
TL;DR: Both the SVM and LS-SVM classifier with RBF kernel in combination with standard cross-validation procedures for hyperparameter selection achieve comparable test set performances, consistently very good when compared to a variety of methods described in the literature.
Abstract: In Support Vector Machines (SVMs), the solution of the classification problem is characterized by a (convex) quadratic programming (QP) problem. In a modified version of SVMs, called Least Squares SVM classifiers (LS-SVMs), a least squares cost function is proposed so as to obtain a linear set of equations in the dual space. While the SVM classifier has a large margin interpretation, the LS-SVM formulation is related in this paper to a ridge regression approach for classification with binary targets and to Fisher's linear discriminant analysis in the feature space. Multiclass categorization problems are represented by a set of binary classifiers using different output coding schemes. While regularization is used to control the effective number of parameters of the LS-SVM classifier, the sparseness property of SVMs is lost due to the choice of the 2-norm. Sparseness can be imposed in a second stage by gradually pruning the support value spectrum and optimizing the hyperparameters during the sparse approximation procedure. In this paper, twenty public domain benchmark datasets are used to evaluate the test set performance of LS-SVM classifiers with linear, polynomial and radial basis function (RBF) kernels. Both the SVM and LS-SVM classifier with RBF kernel in combination with standard cross-validation procedures for hyperparameter selection achieve comparable test set performances. These SVM and LS-SVM performances are consistently very good when compared to a variety of methods described in the literature including decision tree based algorithms, statistical algorithms and instance based learning methods. We show on ten UCI datasets that the LS-SVM sparse approximation procedure can be successfully applied.

698 citations


Proceedings ArticleDOI
B. Froba1, A. Ernst1
17 May 2004
TL;DR: An efficient four-stage classifier for rapid detection of illumination invariant local structure features for object detection and a modified census transform which enhances the original work of Zabih and Woodfill is proposed.
Abstract: Illumination variation is a big problem in object recognition, which usually requires a costly compensation prior to classification. It would be desirable to have an image-to-image transform, which uncovers only the structure of an object for an efficient matching. In this context the contribution of our work is two-fold. First, we introduce illumination invariant local structure features for object detection. For an efficient computation we propose a modified census transform which enhances the original work of Zabih and Woodfill. We show some shortcomings and how to get over them with the modified version. S6econdly, we introduce an efficient four-stage classifier for rapid detection. Each single stage classifier is a linear classifier, which consists of a set of feature lookup-tables. We show that the first stage, which evaluates only 20 features filters out more than 99% of all background positions. Thus, the classifier structure is much simpler than previous described multi-stage approaches, while having similar capabilities. The combination of illumination invariant features together with a simple classifier leads to a real-time system on standard computers (60 msec, image size: 288/spl times/384, 2GHi Pentium). Detection results are presented on two commonly used databases in this field namely the MIT+CMU set of 130 images and the BioID set of 1526 images. We are achieving detection rates of more than 90% with a very low false positive rate of 10/sup -7/%. We also provide a demo program that can be found on the Internet http://www.iis.fraunhofer.de/bv/biometrie/download/.

534 citations


Proceedings ArticleDOI
Michael Gamon1
23 Aug 2004
TL;DR: It is demonstrated that it is possible to perform automatic sentiment classification in the very noisy domain of customer feedback data by using large feature vectors in combination with feature reduction and the addition of deep linguistic analysis features to a set of surface level word n-gram features contributes consistently to classification accuracy.
Abstract: We demonstrate that it is possible to perform automatic sentiment classification in the very noisy domain of customer feedback data. We show that by using large feature vectors in combination with feature reduction, we can train linear support vector machines that achieve high classification accuracy on data that present classification challenges even for a human annotator. We also show that, surprisingly, the addition of deep linguistic analysis features to a set of surface level word n-gram features contributes consistently to classification accuracy in this domain.

490 citations


Proceedings ArticleDOI
04 Jul 2004
TL;DR: This paper introduces a margin based feature selection criterion and applies it to measure the quality of sets of features and devise novel selection algorithms for multi-class classification problems and provide theoretical generalization bound.
Abstract: Feature selection is the task of choosing a small set out of a given set of features that capture the relevant properties of the data. In the context of supervised classification problems the relevance is determined by the given labels on the training data. A good choice of features is a key for building compact and accurate classifiers. In this paper we introduce a margin based feature selection criterion and apply it to measure the quality of sets of features. Using margins we devise novel selection algorithms for multi-class classification problems and provide theoretical generalization bound. We also study the well known Relief algorithm and show that it resembles a gradient ascent over our margin criterion. We apply our new algorithm to various datasets and show that our new Simba algorithm, which directly optimizes the margin, outperforms Relief.

439 citations


Journal ArticleDOI
TL;DR: The proposed stand-alone Newton method can handle classification problems in very high dimensional spaces, and generates a classifier that depends on very few input features, such as 7 out of the original 28,032.
Abstract: A fast Newton method, that suppresses input space features, is proposed for a linear programming formulation of support vector machine classifiers. The proposed stand-alone method can handle classification problems in very high dimensional spaces, such as 28,032 dimensions, and generates a classifier that depends on very few input features, such as 7 out of the original 28,032. The method can also handle problems with a large number of data points and requires no specialized linear programming packages but merely a linear equation solver. For nonlinear kernel classifiers, the method utilizes a minimal number of kernel functions in the classifier that it generates.

304 citations


Proceedings ArticleDOI
22 Aug 2004
TL;DR: The aim here is to create a classification system in which the training model can adapt quickly to the changes of the underlying data stream, and proposes an on-demand classification process which can dynamically select the appropriate window of past training data to build the classifier.
Abstract: Current models of the classification problem do not effectively handle bursts of particular classes coming in at different times. In fact, the current model of the classification problem simply concentrates on methods for one-pass classification modeling of very large data sets. Our model for data stream classification views the data stream classification problem from the point of view of a dynamic approach in which simultaneous training and testing streams are used for dynamic classification of data sets. This model reflects real life situations effectively, since it is desirable to classify test streams in real time over an evolving training and test stream. The aim here is to create a classification system in which the training model can adapt quickly to the changes of the underlying data stream. In order to achieve this goal, we propose an on-demand classification process which can dynamically select the appropriate window of past training data to build the classifier. The empirical results indicate that the system maintains a high classification accuracy in an evolving data stream, while providing an efficient solution to the classification task.

292 citations


01 Jan 2004
TL;DR: The problem of choosing between the two methods of linear discriminant analysis and logistic regression is considered, and some guidelines for proper choice are set.
Abstract: Two of the most widely used statistical methods for analyzing categorical outcome variables are linear discrimina nt analysis and logistic regression. While both are appropriate for the deve lopment of linear classification models, linear discriminant analysis makes more assumptions about the underlying data. Hence, it is assumed tha t logistic regression is the more flexible and more robust method in case of violations of these assumptions. In this paper we consider the problem of choosing between the two methods, and set some guidelines for proper cho ice. The comparison between the methods is based on several measures of predictive accuracy. The performance of the methods is studied by simula tions. We start with an example where all the assumptions of the linear dis criminant analysis are satisfied and observe the impact of changes regardi ng the sample size, covariance matrix, Mahalanobis distance and directi on of distance between group means. Next, we compare the robustness of the methods towards categorisation and non-normality of explanatory var iables in a closely controlled way. We show that the results of LDA and LR are close whenever the normality assumptions are not too badl y violated, and set some guidelines for recognizing these situations. W e discuss the inappropriateness of LDA in all other cases.

279 citations


Journal ArticleDOI
TL;DR: A systematic benchmarking study comparing linear versions of standard classification and dimensionality reduction techniques with their non-linear versions based on non- linear kernel functions with a radial basis function (RBF) kernel finds that Kernel PCA with linear kernel gives better results.
Abstract: Motivation: Microarrays are capable of determining the expression levels of thousands of genes simultaneously. In combination with classification methods, this technology can be useful to support clinical management decisions for individual patients, e.g. in oncology. The aim of this paper is to systematically benchmark the role of non-linear versus linear techniques and dimensionality reduction methods. Results: A systematic benchmarking study is performed by comparing linear versions of standard classification and dimensionality reduction techniques with their non-linear versions based on non-linear kernel functions with a radial basis function (RBF) kernel. A total of 9 binary cancer classification problems, derived from 7 publicly available microarray datasets, and 20 randomizations of each problem are examined. Conclusions: Three main conclusions can be formulated based on the performances on independent test sets. (1) When performing classification with least squares support vector machines (LS-SVMs) (without dimensionality reduction), RBF kernels can be used without risking too much overfitting. The results obtained with well-tuned RBF kernels are never worse and sometimes even statistically significantly better compared to results obtained with a linear kernel in terms of test set receiver operating characteristic and test set accuracy performances. (2) Even for classification with linear classifiers like LS-SVM with linear kernel, using regularization is very important. (3) When performing kernel principal component analysis (kernel PCA) before classification, using an RBF kernel for kernel PCA tends to result in overfitting, especially when using supervised feature selection. It has been observed that an optimal selection of a large number of features is often an indication for overfitting. Kernel PCA with linear kernel gives better results. Availability: Matlab scripts are available on request. Supplementary information: http://www.esat.kuleuven.ac.be/~npochet/Bioinformatics/

210 citations


Proceedings ArticleDOI
25 Jul 2004
TL;DR: Experiments show that feature selection using weights from linear SVMs yields better classification performance than other feature weighting methods when combined with the three explored learning algorithms.
Abstract: This paper explores feature scoring and selection based on weights from linear classification models. It investigates how these methods combine with various learning models. Our comparative analysis includes three learning algorithms: Naive Bayes, Perceptron, and Support Vector Machines (SVM) in combination with three feature weighting methods: Odds Ratio, Information Gain, and weights from linear models, the linear SVM and Perceptron. Experiments show that feature selection using weights from linear SVMs yields better classification performance than other feature weighting methods when combined with the three explored learning algorithms. The results support the conjecture that it is the sophistication of the feature weighting method rather than its apparent compatibility with the learning algorithm that improves classification performance.

Proceedings ArticleDOI
04 Jul 2004
TL;DR: This work casts linear classifiers into a probabilistic framework and develops a co-EM version of the Support Vector Machine, which conducts experiments on text classification problems and compares the family of semi-supervised support vector algorithms under different conditions, including violations of the assumptions underlying multi-view learning.
Abstract: Multi-view algorithms, such as co-training and co-EM, utilize unlabeled data when the available attributes can be split into independent and compatible subsets. Co-EM outperforms co-training for many problems, but it requires the underlying learner to estimate class probabilities, and to learn from probabilistically labeled data. Therefore, co-EM has so far only been studied with naive Bayesian learners. We cast linear classifiers into a probabilistic framework and develop a co-EM version of the Support Vector Machine. We conduct experiments on text classification problems and compare the family of semi-supervised support vector algorithms under different conditions, including violations of the assumptions underlying multi-view learning. For some problems, such as course web page classification, we observe the most accurate results reported so far.

Book ChapterDOI
30 Nov 2004
TL;DR: This paper addresses a specific image classification task, i.e. to group images according to whether they were taken by photographers or home users, and shows an application in No-Reference holistic quality assessment as a natural extension of such image classification.
Abstract: In this paper, we address a specific image classification task, i.e. to group images according to whether they were taken by photographers or home users. Firstly, a set of low-level features explicitly related to such high-level semantic concept are investigated together with a set of general-purpose low-level features. Next, two different schemes are proposed to find out those most discriminative features and feed them to suitable classifiers: one resorts to boosting to perform feature selection and classifier training simultaneously; the other makes use of the information of the label by Principle Component Analysis for feature re-extraction and feature de-correlation; followed by Maximum Marginal Diversity for feature selection and Bayesian classifier or Support Vector Machine for classification. In addition, we show an application in No-Reference holistic quality assessment as a natural extension of such image classification. Experimental results demonstrate the effectiveness of our methods.

Journal ArticleDOI
TL;DR: The results indicate the proposed method vastly improves on resubstitution and cross-validation, especially for small samples, in terms of bias and variance, while being tens to hundreds of times faster.

Journal ArticleDOI
TL;DR: It is able to demonstrate that biologically plausible control mechanisms can accomplish efficient classification of odors and calculate the range of values of activity and size of the network required to achieve efficient classification within this scheme in insect olfaction.
Abstract: We propose a theoretical framework for odor classification in the olfactory system of insects. The classification task is accomplished in two steps. The first is a transformation from the antennal lobe to the intrinsic Kenyon cells in the mushroom body. This transformation into a higher-dimensional space is an injective function and can be implemented without any type of learning at the synaptic connections. In the second step, the encoded odors in the intrinsic Kenyon cells are linearly classified in the mushroom body lobes. The neurons that perform this linear classification are equivalent to hyperplanes whose connections are tuned by local Hebbian learning and by competition due to mutual inhibition. We calculate the range of values of activity and size of the network required to achieve efficient classification within this scheme in insect olfaction. We are able to demonstrate that biologically plausible control mechanisms can accomplish efficient classification of odors.

Proceedings Article
01 Jan 2004
TL;DR: The classification results show that the topology of the new classifier gives it a significant advantage over other classifiers, by allowing the classifier to model much more complex distributions within the data than Gaussian schemes do.
Abstract: Several factors affecting the automatic classification of musical audio signals are examined. Classification is performed on short audio frames and results are reported as “bag of frames” accuracies, where the audio is segmented into 23ms analysis frames and a majority vote is taken to decide the final classification. The effect of different parameterisations of the audio signal is examined. The effect of the inclusion of information on the temporal variation of these features is examined and finally, the performance of several different classifiers trained on the data is compared. A new classifier is introduced, based on the unsupervised construction of decision trees and either linear discriminant analysis or a pair of single Gaussian classifiers. The classification results show that the topology of the new classifier gives it a significant advantage over other classifiers, by allowing the classifier to model much more complex distributions within the data than Gaussian schemes do.

Journal ArticleDOI
TL;DR: This work presents an overview of kernel methods and provides some guidelines for future development in kernel methods, as well as, some perspectives to the actual signal processing problems in which kernel methods are being applied.
Abstract: The notion of kernels, recently introduced, has drawn much interest as it allows one to obtain nonlinear algorithms from linear ones in a simple and elegant manner. This, in conjunction with the introduction of new linear classification methods such as the support vector machines (SVMs), has produced significant progress in machine learning and related research topics. The success of such algorithms is now spreading as they are applied to more and more domains. Signal processing procedures can benefit from a kernel perspective, making them more powerful and applicable to nonlinear processing in a simpler and nicer way. We present an overview of kernel methods and provide some guidelines for future development in kernel methods, as well as, some perspectives to the actual signal processing problems in which kernel methods are being applied.

Proceedings ArticleDOI
26 Oct 2004
TL;DR: In the one-class scenario distance methods are superior while in the two-class SVM based method outperforms the other methods.
Abstract: Learning strategies and classification methods for verification of signatures from scanned documents are proposed and evaluated. Learning strategies considered are writer independent- those that learn from a set of signature sample (including forgeries) prior to enrollment of a writer, and writer dependent- those that learn only from a newly enrolled individual. Classification methods considered include two distance based methods (one based on a threshold, which is the standard method of signature verification and biometrics, and the other based on a distance probability distribution), a Nave Bayes (NB) classifier based on pairs of feature bit values and a support vector machine (SVM). Two scenarios are considered for the writer dependent scenario: (i) without forgeries (one-class problem) and (ii) with forgery samples being available (two class problem). The features used to characterize a signature capture local geometry, stroke and topology information in the form of a binary vector. In the one-class scenario distance methods are superior while in the two-class SVM based method outperforms the other methods.

Proceedings ArticleDOI
27 Jun 2004
TL;DR: This paper proposes a novel model named biased minimax probability machine, which directly controls the worst-case real accuracy of classification of the future data to build up biased classifier and provides a rigorous treatment on imbalanced data.
Abstract: We consider the problem of the binary classification on imbalanced data, in which nearly all the instances are labelled as one class, while far fewer instances are labelled as the other class, usually the more important class. Traditional machine learning methods seeking an accurate performance over a full range of instances are not suitable to deal with this problem, since they tend to classify all the data into the majority, usually the less important class. Moreover, some current methods have tried to utilize some intermediate factors, e.g., the distribution of the training set, the decision thresholds or the cost matrices, to influence the bias of the classification. However, it remains uncertain whether these methods can improve the performance in a systematic way. In this paper, we propose a novel model named biased minimax probability machine. Different from previous methods, this model directly controls the worst-case real accuracy of classification of the future data to build up biased classifier;. Hence, it provides a rigorous treatment on imbalanced data. The experimental results on the novel model comparing with those of three competitive methods, i.e., the naive Bayesian classifier, the k-nearest neighbor method, and the decision tree method C4.5, demonstrate the superiority of our novel model.

Journal ArticleDOI
TL;DR: An algorithm for the segmentation of fingerprints and a criterion for evaluating the block feature are presented and experiments have shown that the proposed segmentation method performs very well in rejecting false fingerprint features from the noisy background.
Abstract: An algorithm for the segmentation of fingerprints and a criterion for evaluating the block feature are presented. The segmentation uses three block features: the block clusters degree, the block mean information, and the block variance. An optimal linear classifier has been trained for the classification per block and the criteria of minimal number of misclassified samples are used. Morphology has been applied as post processing to reduce the number of classification errors. The algorithm is tested on FVC2002 database, only 2.45% of the blocks are misclassified, while the postprocessing further reduces this ratio. Experiments have shown that the proposed segmentation method performs very well in rejecting false fingerprint features from the noisy background.

Proceedings ArticleDOI
Zhu Zhang1
13 Nov 2004
TL;DR: It is shown that the supervised SVM classifier using various lexical and syntactic features can achieve promising classification accuracy and the proposed bootstrapping algorithm based on random feature projection can significantly reduce the need for labeled training data with only limited sacrifice of performance.
Abstract: This paper approaches the relation classification problem in information extraction framework with bootstrapping on top of Support Vector Machines. A new bootstrapping algorithm is proposed and empirically evaluated on the ACE corpus. We show that the supervised SVM classifier using various lexical and syntactic features can achieve promising classification accuracy. More importantly, the proposed BootProject algorithm based on random feature projection can significantly reduce the need for labeled training data with only limited sacrifice of performance.

Journal ArticleDOI
TL;DR: The theoretical foundation of a new method for classifying voltage and current waveform events that are related to a variety of PQ problems, composed of two sequential processes: feature extraction and classification, is presented.
Abstract: Better software and hardware for automatic classification of power quality (PQ) disturbances are desired for both utilities and commercial customers. Existing automatic recognition methods need improvement in terms of their capability, reliability, and accuracy. This paper presents the theoretical foundation of a new method for classifying voltage and current waveform events that are related to a variety of PQ problems. The method is composed of two sequential processes: feature extraction and classification. The proposed feature extraction tool, time-frequency ambiguity plane with kernel techniques, is new to the power engineering field. The essence of the feature exaction is to project a PQ signal onto a low-dimension time-frequency representation (TFR), which is deliberately designed for maximizing the separability between classes. The technique of designing an optimized TFR from time-frequency ambiguity plane is for the first time applied to the PQ classification problem. A distinct TFR is designed for each class. The classifiers include a Heaviside-function linear classifier and neural networks with feedforward structures. The flexibility of this method allows classification of a very broad range of power quality events. The performance validation and hardware implementation of the proposed method are presented in the second part of this two-paper series.

Proceedings ArticleDOI
27 Jun 2004
TL;DR: This paper presents a novel approach to recognize the six universal facial expressions from visual data and use them to derive the level of interest using psychological evidences using a two-step classification built on the top of refined optical flow computed from sequence of images.
Abstract: This paper presents a novel approach to recognize the six universal facial expressions from visual data and use them to derive the level of interest using psychological evidences. The proposed approach relies on a two-step classification built on the top of refined optical flow computed from sequence of images. First, a bank of linear classifier was applied at frame level and the output of this stage was coalesced to produce a temporal signature for each observation. Second, temporal signatures thus computed from the training data set were used to train discrete hidden Markov models (HMMs) to learn the underlying models for each universal facial expressions. The average recognition rate of the proposed facial expression classifier is 90.9% without classifier fusion and 91.2% with fusion using a five fold cross validation scheme on a database of 488 video sequences that include 97 subjects. Recognized facial expressions were combined with the intensity of activity (motion) around the apex frame to measure the level of interest. To further illustrate the efficacy of the proposed approach two set of experiments, namely, television (TV) broadcast data (108 sequences of facial expression containing severe lighting conditions, diverse subjects and expressions) analysis and emotion elicitation on 21 subjects were conducted.

Journal ArticleDOI
TL;DR: Understanding HIV-1 protease specificity is important when designing HIV inhibitors and several different machine learning algorithms have been applied, but little progress has been made in understanding the specificity because nonlinear and overly complex models have been used.
Abstract: Summary: Several papers have been published where nonlinear machine learning algorithms, e.g. artificial neural networks, support vector machines and decision trees, have been used to model the specificity of the HIV-1 protease and extract specificity rules. We show that the dataset used in these studies is linearly separable and that it is a misuse of nonlinear classifiers to apply them to this problem. The best solution on this dataset is achieved using a linear classifier like the simple perceptron or the linear support vector machine, and it is straightforward to extract rules from these linear models. We identify key residues in peptides that are efficiently cleaved by the HIV-1 protease and list the most prominent rules, relating them to experimental results for the HIV-1 protease. Motivation: Understanding HIV-1 protease specificity is important when designing HIV inhibitors and several different machine learning algorithms have been applied to the problem. However, little progress has been made in understanding the specificity because nonlinear and overly complex models have been used. Results: We show that the problem is much easier than what has previously been reported and that linear classifiers like the simple perceptron or linear support vector machines are at least as good predictors as nonlinear algorithms. We also show how sets of specificity rules can be generated from the resulting linear classifiers. Availability: The datasets used are available at http://www.hh.se/staff/bioinf/

Journal ArticleDOI
TL;DR: The research indicates that multisource information can significantly improve the interpretation and classification of land cover types and the expert classification is a powerful tool in the production of a reliable land cover map.
Abstract: The aim of this study is to explore different data fusion techniques and compare the performances of a standard supervised classification and expert classification. For the supervised classification, different feature extraction approaches are used. To increase the reliability of the classification, different threshold values are determined and fuzzy convolutions are applied. For the expert classification, a set of rules is determined and a hierarchical decision tree is created. Overall, the research indicates that multisource information can significantly improve the interpretation and classification of land cover types and the expert classification is a powerful tool in the production of a reliable land cover map.

Journal ArticleDOI
TL;DR: A case study of PQ event classification with the proposed time-frequency representation (TFR) method, which has been successfully tested with 860 real world PQ events that cover five classes, achieving a recognition rate of 98%.
Abstract: Classification of power-quality (PQ)-related voltage and current waveform disturbances is a key task for power system monitoring. A new method based on the optimized time-frequency representation (TFR) has been proposed in the first paper of this two-paper series. This paper (the second paper) presents a case study of PQ event classification with the proposed method. The classification algorithm has been successfully tested with 860 real world PQ events that cover five classes, achieving a recognition rate of 98%. The algorithm is implemented on a digital signal processor (DSP) based hardware system and optimized according to the DSP architecture to meet the hard real-time constraints. The DSP-based system is capable of processing a five-cycle (83.3 ms) PQ waveform within 11.2 ms. The real-time computing capability of the algorithm has been verified with this result. The scalability of this method is also discussed.

01 Jan 2004
TL;DR: The preliminary results of an efficient language classifier using an ad-hoc Cumulative Frequency Addition of N-grams are described, which is simpler than the conventional Naive Bayesian classification method but performs similarly in speed overall and better in accuracy on short input strings.
Abstract: This paper describes the preliminary results of an efficient language classifier using an ad-hoc Cumulative Frequency Addition of N-grams. The new classification technique is simpler than the conventional Naive Bayesian classification method, but it performs similarly in speed overall and better in accuracy on short input strings. The classifier is also 5-10 times faster than N-gram based rank-order statistical classifiers. Language classification using N-gram based rank-order statistics has been shown to be highly accurate and insensitive to typographical errors, and, as a result, this method has been extensively researched and documented in the language processing literature. However, classification using rank-order statistics is slower than other methods due to the inherent requirement of frequency counting and sorting of N-grams in the test document profile. Accuracy and speed of classification are crucial for a classier to be useful in a high volume categorization environment. Thus, it is important to investigate the performance of the N-gram based classification methods. In particular, if it is possible to eliminate the counting and sorting operations in the rank-order statistics methods, classification speed could be increased substantially. The classifier described here accomplishes that goal by using a new Cumulative Frequency Addition method.

Journal ArticleDOI
TL;DR: Six wavelet transform-based classification methods, using different discriminative training approaches to the design of the feature extractor and classifier are compared, with the DFE training approach using adaptive wavelet shown to outperform the other approaches.

Journal ArticleDOI
TL;DR: This paper examines their applicability to the classification of phonemes in a phonological awareness drilling software package and found, in most cases, that the transformations have a beneficial effect on the classification performance.
Abstract: Kernel-based nonlinear feature extraction and classification algorithms are a popular new research direction in machine learning. This paper examines their applicability to the classification of phonemes in a phonological awareness drilling software package. We first give a concise overview of the nonlinear feature extraction methods such as kernel principal component analysis (KPCA), kernel independent component analysis (KICA), kernel linear discriminant analysis (KLDA), and kernel springy discriminant analysis (KSDA). The overview deals with all the methods in a unified framework, regardless of whether they are unsupervised or supervised. The effect of the transformations on a subsequent classification is tested in combination with learning algorithms such as Gaussian mixture modeling (GMM), artificial neural nets (ANN), projection pursuit learning (PPL), decision tree-based classification (C4.5), and support vector machines (SVMs). We found, in most cases, that the transformations have a beneficial effect on the classification performance. Furthermore, the nonlinear supervised algorithms yielded the best results.

Book ChapterDOI
26 Sep 2004
TL;DR: It is shown that the Gaussian kernel function combined with an optimal choice of parameters can produce high classification accuracy in a Support Vector Machines system.
Abstract: The classification of normal and malginant colon tissue cells is crucial to the diagnosis of colon cancer in humans. Given the right set of feature vectors, Support Vector Machines (SVMs) have been shown to perform reasonably well for the classification [4,13]. In this paper, we address the following question: how does the choice of a kernel function and its parameters affect the SVM classification performance in such a system? We show that the Gaussian kernel function combined with an optimal choice of parameters can produce high classification accuracy.