scispace - formally typeset
Search or ask a question

Showing papers on "Linear classifier published in 2008"


Journal Article
TL;DR: LIBLINEAR is an open source library for large-scale linear classification that supports logistic regression and linear support vector machines and provides easy-to-use command-line tools and library calls for users and developers.
Abstract: LIBLINEAR is an open source library for large-scale linear classification. It supports logistic regression and linear support vector machines. We provide easy-to-use command-line tools and library calls for users and developers. Comprehensive documents are available for both beginners and advanced users. Experiments demonstrate that LIBLINEAR is very efficient on large sparse data sets.

7,848 citations


Journal ArticleDOI
TL;DR: Support vector machines are widely used in computational biology due to their high accuracy, their ability to deal with high-dimensional and large datasets, and their flexibility in modeling diverse sources of data.
Abstract: The increasing wealth of biological data coming from a large variety of platforms and the continued development of new high-throughput methods for probing biological systems require increasingly more sophisticated computational approaches. Putting all these data in simple-to-use databases is a first step; but realizing the full potential of the data requires algorithms that automatically extract regularities from the data, which can then lead to biological insight. Many of the problems in computational biology are in the form of prediction: starting from prediction of a gene's structure, prediction of its function, interactions, and role in disease. Support vector machines (SVMs) and related kernel methods are extremely good at solving such problems [1]–[3]. SVMs are widely used in computational biology due to their high accuracy, their ability to deal with high-dimensional and large datasets, and their flexibility in modeling diverse sources of data [2], [4]–[6]. The simplest form of a prediction problem is binary classification: trying to discriminate between objects that belong to one of two categories—positive (+1) or negative (−1). SVMs use two key concepts to solve this problem: large margin separation and kernel functions. The idea of large margin separation can be motivated by classification of points in two dimensions (see Figure 1). A simple way to classify the points is to draw a straight line and call points lying on one side positive and on the other side negative. If the two sets are well separated, one would intuitively draw the separating line such that it is as far as possible away from the points in both sets (see Figures 2 and ​and3).3). This intuitive choice captures the idea of large margin separation, which is mathematically formulated in the section Classification with Large Margin. Open in a separate window Figure 1 A linear classifier separating two classes of points (squares and circles) in two dimensions. The decision boundary divides the space into two sets depending on the sign of f(x) = 〈w,x〉+b. The grayscale level represents the value of the discriminant function f(x): dark for low values and a light shade for high values.

660 citations


Proceedings ArticleDOI
05 Jul 2008
TL;DR: Empirical evaluation on a range of NLP tasks show that the confidence-weighted linear classifiers introduced here improves over other state of the art online and batch methods, learns faster in the online setting, and lends itself to better classifier combination after parallel training.
Abstract: We introduce confidence-weighted linear classifiers, which add parameter confidence information to linear classifiers. Online learners in this setting update both classifier parameters and the estimate of their confidence. The particular online algorithms we study here maintain a Gaussian distribution over parameter vectors and update the mean and covariance of the distribution with each instance. Empirical evaluation on a range of NLP tasks show that our algorithm improves over other state of the art online and batch methods, learns faster in the online setting, and lends itself to better classifier combination after parallel training.

433 citations


Journal ArticleDOI
TL;DR: This work considers the problem of binary classification where the classifier can, for a particular cost, choose not to classify an observation and proposes a certain convex loss function φ, analogous to the hinge loss used in support vector machines (SVMs).
Abstract: We consider the problem of binary classification where the classifier can, for a particular cost, choose not to classify an observation. Just as in the conventional classification problem, minimization of the sample average of the cost is a difficult optimization problem. As an alternative, we propose the optimization of a certain convex loss function φ, analogous to the hinge loss used in support vector machines (SVMs). Its convexity ensures that the sample average of this surrogate loss can be efficiently minimized. We study its statistical properties. We show that minimizing the expected surrogate loss—the φ-risk—also minimizes the risk. We also study the rate at which the φ-risk approaches its minimum value. We show that fast rates are possible when the conditional probability P(Y=1|X) is unlikely to be close to certain critical values.

409 citations


Book
Jürgen Schürmann1
02 May 2008
TL;DR: In this article, a classification based on statistical models determined by First-and Second Order Statistical Moments is proposed, which is based on Mean-Square Functional Approximations (MFFA).
Abstract: Statistical Decision Theory. Need for Approximations: Fundamental Approaches. Classification Based on Statistical Models Determined by First-and-Second Order Statistical Moments. Classification Based on Mean-Square Functional Approximations. Polynomial Regression. Multilayer Perceptron Regression. Radial Basis Functions. Measurements, Features, and Feature Section. Reject Criteria and Classifier Performance. Combining Classifiers. Conclusion. STATMOD Program: Description of ftp Package. References. Index.

383 citations


Journal ArticleDOI
TL;DR: It is concluded that SLR provides a robust method for fMRI decoding and can also serve as a stand-alone tool for voxel selection, by exploiting correlated noise among voxels to allow for better pattern separation.

372 citations


Journal ArticleDOI
01 Sep 2008
TL;DR: Experimental results indicate that the classification accuracy rates of the proposed approach exceed those of grid search and other approaches, and the SA-SVM is thus useful for parameter determination and feature selection in the SVM.
Abstract: Support vector machine (SVM) is a novel pattern classification method that is valuable in many applications. Kernel parameter setting in the SVM training process, along with the feature selection, significantly affects classification accuracy. The objective of this study is to obtain the better parameter values while also finding a subset of features that does not degrade the SVM classification accuracy. This study develops a simulated annealing (SA) approach for parameter determination and feature selection in the SVM, termed SA-SVM. To measure the proposed SA-SVM approach, several datasets in UCI machine learning repository are adopted to calculate the classification accuracy rate. The proposed approach was compared with grid search which is a conventional method of performing parameter setting, and various other methods. Experimental results indicate that the classification accuracy rates of the proposed approach exceed those of grid search and other approaches. The SA-SVM is thus useful for parameter determination and feature selection in the SVM.

334 citations


Journal ArticleDOI
TL;DR: A multi-purpose image classifier that can be applied to a wide variety of image classification tasks without modifications or fine-tuning, and yet provide classification accuracy comparable to state-of-the-art task-specific image classifiers.

280 citations


Journal ArticleDOI
TL;DR: This research involves the study and implementation of a new pattern recognition technique introduced within the framework of statistical learning theory called Support Vector Machines (SVMs), and its application to remote‐sensing image classification.
Abstract: Land use classification is an important part of many remote sensing applications. A lot of research has gone into the application of statistical and neural network classifiers to remote-sensing images. This research involves the study and implementation of a new pattern recognition technique introduced within the framework of statistical learning theory called Support Vector Machines (SVMs), and its application to remote-sensing image classification. Standard classifiers such as Artificial Neural Network (ANN) need a number of training samples that exponentially increase with the dimension of the input feature space. With a limited number of training samples, the classification rate thus decreases as the dimensionality increases. SVMs are independent of the dimensionality of feature space as the main idea behind this classification technique is to separate the classes with a surface that maximizes the margin between them, using boundary pixels to create the decision surface. Results from SVMs are compared with traditional Maximum Likelihood Classification (MLC) and an ANN classifier. The findings suggest that the ANN and SVM classifiers perform better than the traditional MLC. The SVM and the ANN show comparable results. However, accuracy is dependent on factors such as the number of hidden nodes (in the case of ANN) and kernel parameters (in the case of SVM). The training time taken by the SVM is several magnitudes less.

276 citations


Journal Article
TL;DR: A novel coordinate descent algorithm for training linear SVM with the L2-loss function that is more efficient and stable than state of the art methods such as Pegasos and TRON.
Abstract: Linear support vector machines (SVM) are useful for classifying large-scale sparse data. Problems with sparse features are common in applications such as document classification and natural language processing. In this paper, we propose a novel coordinate descent algorithm for training linear SVM with the L2-loss function. At each step, the proposed method minimizes a one-variable sub-problem while fixing other variables. The sub-problem is solved by Newton steps with the line search technique. The procedure globally converges at the linear rate. As each sub-problem involves only values of a corresponding feature, the proposed approach is suitable when accessing a feature is more convenient than accessing an instance. Experiments show that our method is more efficient and stable than state of the art methods such as Pegasos and TRON.

257 citations


Journal ArticleDOI
TL;DR: This article provides a review of several recently developed penalized feature selection and classification techniques--which belong to the family of embedded feature selection methods--for bioinformatics studies with high-dimensional input.
Abstract: In bioinformatics studies, supervised classification with high-dimensional input variables is frequently encountered. Examples routinely arise in genomic, epigenetic and proteomic studies. Feature selection can be employed along with classifier construction to avoid over-fitting, to generate more reliable classifier and to provide more insights into the underlying causal relationships. In this article, we provide a review of several recently developed penalized feature selection and classification techniques—which belong to the family of embedded feature selection methods—for bioinformatics studies with high-dimensional input. Classification objective functions, penalty functions and computational algorithms are discussed. Our goal is to make interested researchers aware of these feature selection and classification methods that are applicable to high-dimensional bioinformatics data.

Journal ArticleDOI
TL;DR: This paper presents a survey on the main strategies for the generalization of binary classifiers to problems with more than two classes, known as multiclass classification problems, and focuses on strategies that decompose the original multiclass problem into multiple binary subtasks, whose outputs are combined to obtain the final prediction.
Abstract: Several real problems involve the classification of data into categories or classes. Given a data set containing data whose classes are known, Machine Learning algorithms can be employed for the induction of a classifier able to predict the class of new data from the same domain, performing the desired discrimination. Some learning techniques are originally conceived for the solution of problems with only two classes, also named binary classification problems. However, many problems require the discrimination of examples into more than two categories or classes. This paper presents a survey on the main strategies for the generalization of binary classifiers to problems with more than two classes, known as multiclass classification problems. The focus is on strategies that decompose the original multiclass problem into multiple binary subtasks, whose outputs are combined to obtain the final prediction.

Journal ArticleDOI
TL;DR: It is concluded that the classifier, using resting-state brain function as classification feature, has potential ability to improve current diagnosis and treatment evaluation of ADHD.

Journal ArticleDOI
TL;DR: Biological interpretation of the genes selected by this conceptually simple but computer-intensive approach to pre-selection of informative features for supervised classification showed that several of them are involved in precursors to different types of leukemia and lymphoma rather than being genes that are common to several forms of cancers, which is the case for the other methods.
Abstract: Motivation: Pre-selection of informative features for supervised classification is a crucial, albeit delicate, task. It is desirable that feature selection provides the features that contribute most to the classification task per se and which should therefore be used by any classifier later used to produce classification rules. In this article, a conceptually simple but computer-intensive approach to this task is proposed. The reliability of the approach rests on multiple construction of a tree classifier for many training sets randomly chosen from the original sample set, where samples in each training set consist of only a fraction of all of the observed features. Results: The resulting ranking of features may then be used to advantage for classification via a classifier of any type. The approach was validated using Golub et al. leukemia data and the Alizadeh et al. lymphoma data. Not surprisingly, we obtained a significantly different list of genes. Biological interpretation of the genes selected by our method showed that several of them are involved in precursors to different types of leukemia and lymphoma rather than being genes that are common to several forms of cancers, which is the case for the other methods. Availability: Prototype available upon request. Contact: jan.komorowski@lcb.uu.se

Journal ArticleDOI
TL;DR: LIBLINEAR as discussed by the authors is an open source library for large-scale linear classification that supports logistic regression and linear support vector machines, and provides easy-to-use command-line tools and library support.
Abstract: LIBLINEAR is an open source library for large-scale linear classification. It supports logistic regression and linear support vector machines. We provide easy-to-use command-line tools and library ...

Journal ArticleDOI
TL;DR: In the most challenging RW settings, HCT uses an unconventionally low threshold, which keeps the missed-feature detection rate under better control than FDRT and yields a classifier with improved misclassification performance.
Abstract: In important application fields today-genomics and proteomics are examples-selecting a small subset of useful features is crucial for success of Linear Classification Analysis. We study feature selection by thresholding of feature Z-scores and introduce a principle of threshold selection, based on the notion of higher criticism (HC). For i = 1, 2, ..., p, let pi(i) denote the two-sided P-value associated with the ith feature Z-score and pi((i)) denote the ith order statistic of the collection of P-values. The HC threshold is the absolute Z-score corresponding to the P-value maximizing the HC objective (i/p - pi((i)))/sqrt{i/p(1-i/p)}. We consider a rare/weak (RW) feature model, where the fraction of useful features is small and the useful features are each too weak to be of much use on their own. HC thresholding (HCT) has interesting behavior in this setting, with an intimate link between maximizing the HC objective and minimizing the error rate of the designed classifier, and very different behavior from popular threshold selection procedures such as false discovery rate thresholding (FDRT). In the most challenging RW settings, HCT uses an unconventionally low threshold; this keeps the missed-feature detection rate under better control than FDRT and yields a classifier with improved misclassification performance. Replacing cross-validated threshold selection in the popular Shrunken Centroid classifier with the computationally less expensive and simpler HCT reduces the variance of the selected threshold and the error rate of the constructed classifier. Results on standard real datasets and in asymptotic theory confirm the advantages of HCT.

15 Sep 2008
TL;DR: In this article, the authors investigate the relationship between several attribute space reduction techniques and the resulting classification accuracy for two very different application areas, e.g., e-mail filtering and drug discovery.
Abstract: Dimensionality reduction and feature subset selection are two techniques for reducing the attribute space of a feature set, which is an important component of both supervised and unsupervised classification or regression problems. While in feature subset selection a subset of the original attributes is extracted, dimensionality reduction in general produces linear combinations of the original attribute set. In this paper we investigate the relationship between several attribute space reduction techniques and the resulting classification accuracy for two very different application areas. On the one hand, we consider e-mail filtering, where the feature space contains various properties of e-mail messages, and on the other hand, we consider drug discovery problems, where quantitative representations of molecular structures are encoded in terms of information-preserving descriptor values. Subsets of the original attributes constructed by filter and wrapper techniques as well as subsets of linear combinations of the original attributes constructed by three different variants of the principle component analysis (PCA) are compared in terms of the classification performance achieved with various machine learning algorithms as well as in terms of runtime performance. We successively reduce the size of the attribute sets and investigate the changes in the classification results. Moreover, we explore the relationship between the variance captured in the linear combinations within PCA and the resulting classification accuracy. The results show that the classification accuracy based on PCA is highly sensitive to the type of data and that the variance captured the principal components is not necessarily a vital indicator for the classification performance.

Proceedings ArticleDOI
24 Aug 2008
TL;DR: Two variations of a novel two-step approach to automatic record pair classification based on a nearest-neighbour classifier are presented, while the second improves a SVM classifier by iteratively adding more examples into the training sets.
Abstract: The task of linking databases is an important step in an increasing number of data mining projects, because linked data can contain information that is not available otherwise, or that would require time-consuming and expensive collection of specific data The aim of linking is to match and aggregate all records that refer to the same entity One of the major challenges when linking large databases is the efficient and accurate classification of record pairs into matches and non-matches While traditionally classification was based on manually-set thresholds or on statistical procedures, many of the more recently developed classification methods are based on supervised learning techniques They therefore require training data, which is often not available in real world situations or has to be prepared manually, an expensive, cumbersome and time-consuming processThe author has previously presented a novel two-step approach to automatic record pair classification [6, 7] In the first step of this approach, training examples of high quality are automatically selected from the compared record pairs, and used in the second step to train a support vector machine (SVM) classifier Initial experiments showed the feasibility of the approach, achieving results that outperformed k-means clustering In this paper, two variations of this approach are presented The first is based on a nearest-neighbour classifier, while the second improves a SVM classifier by iteratively adding more examples into the training sets Experimental results show that this two-step approach can achieve better classification results than other unsupervised approaches

Journal ArticleDOI
TL;DR: A comparative analysis of SVC with the Maximum Likelihood Classification (MLC) method, which is the most popular conventional supervised classification technique, illustrated that SVC improved the classification accuracy, was robust and did not suffer from dimensionality issues such as the Hughes Effect.
Abstract: Accurate thematic classification is one of the most commonly desired outputs from remote sensing images Recent research efforts to improve the reliability and accuracy of image classification have led to the introduction of the Support Vector Classification (SVC) scheme SVC is a new generation of supervised learning method based on the principle of statistical learning theory, which is designed to decrease uncertainty in the model structure and the fitness of data We have presented a comparative analysis of SVC with the Maximum Likelihood Classification (MLC) method, which is the most popular conventional supervised classification technique SVC is an optimization technique in which the classification accuracy heavily relies on identifying the optimal parameters Using a case study, we verify a method to obtain these optimal parameters such that SVC can be applied efficiently We use multispectral and hyperspectral images to develop thematic classes of known lithologic units in order to compare the classification accuracy of both the methods We have varied the training to testing data proportions to assess the relative robustness and the optimal training sample requirement of both the methods to achieve comparable levels of accuracy The results of our study illustrated that SVC improved the classification accuracy, was robust and did not suffer from dimensionality issues such as the Hughes Effect

Book ChapterDOI
01 Jan 2008
TL;DR: The objective of this study was to evaluate SVMs for their effectiveness and prospects for object-based image analysis as a modern computational intelligence method and the SVM methodology seems very promising for Object Based Image Analysis.
Abstract: The Support Vector Machine is a theoretically superior machine learning methodology with great results in pattern recognition. Especially for supervised classification of high-dimensional datasets and has been found competitive with the best machine learning algorithms. In the past, SVMs were tested and evaluated only as pixel-based image classifiers. During recent years, advances in Remote Sensing occurred in the field of Object-Based Image Analysis (OBIA) with combination of low level and high level computer vision techniques. Moving from pixel-based techniques towards object-based representation, the dimensions of remote sensing imagery feature space increases significantly. This results to increased complexity of the classification process, and causes problems to traditional classification schemes. The objective of this study was to evaluate SVMs for their effectiveness and prospects for object-based image analysis as a modern computational intelligence method. Here, an SVM approach for multi-class classification was followed, based on primitive image objects provided by a multi-resolution segmentation algorithm. Then, a feature selection step took place in order to provide the features for classification which involved spectral, texture and shape information. After the feature selection step, a module that integrated an SVM classifier and the segmentation algorithm was developed in C++. For training the SVM, sample image objects derived from the segmentation procedure were used. The proposed classification procedure followed, resulting in the final object classification. The classification results were compared to the Nearest Neighbor object-based classifier results, and were found satisfactory. The SVM methodology seems very promising for Object Based Image Analysis and future work will focus on integrating SVM classifiers with rule-based classifiers.

Journal ArticleDOI
TL;DR: A relatively new intelligent faults detection and classification method called W-SVM is established that is used to induction motor for faults classification based on transient current signal and the results show that the performance of classification has high accuracy.
Abstract: This paper presents establishing intelligent system for faults detection and classification of induction motor using wavelet support vector machine (W-SVM). Support vector machines (SVM) is well known as intelligent classifier with strong generalization ability. Application of nonlinear SVM using kernel function is widely used for multi-class classification procedure. In this paper, building kernel function using wavelet will be introduced and applied for SVM multi-class classifier. Moreover, the feature vectors for training classification routine are obtained from transient current signal that preprocessed by discrete wavelet transform. In this work, principal component analysis (PCA) and kernel PCA are performed to reduce the dimension of features and to extract the useful features for classification process. Hence, a relatively new intelligent faults detection and classification method called W-SVM is established. This method is used to induction motor for faults classification based on transient current signal. The results show that the performance of classification has high accuracy based on experimental work.

Proceedings ArticleDOI
31 Oct 2008
TL;DR: The proposed wrapper feature selection method GA-SVM can optimize feature subsets and SVM kernel parameters at the same time, therefore can be applied in feature selection of the hyper spectral data.
Abstract: The high-dimensional feature vectors of hyper spectral data often impose a high computational cost as well as the risk of "over fitting" when classification is performed. Therefore it is necessary to reduce the dimensionality through ways like feature selection. Currently, there are two kinds of feature selection methods: filter methods and wrapper methods. The former kind requires no feedback from classifiers and estimates the classification performance indirectly. The latter kind evaluates the "goodness" of selected feature subset directly based on the classification accuracy. Many experimental results have proved that the wrapper methods can yield better performance, although they have the disadvantage of high computational cost. In this paper, we present a Genetic Algorithm (GA) based wrapper method for classification of hyper spectral data using Support Vector Machine (SVM), a state-of-art classifier that has found success in a variety of areas. The genetic algorithm (GA), which seeks to solve optimization problems using the methods of evolution, specifically survival of the fittest, was used to optimize both the feature subset, i.e. band subset, of hyper spectral data and SVM kernel parameters simultaneously. A special strategy was adopted to reduce computation cost caused by the high-dimensional feature vectors of hyper spectral data when the feature subset part of chromosome was designed. The GA-SVM method was realized using the ENVI/IDL language, and was then tested by applying to a HYPERION hyper spectral image. Comparison of the optimized results and the un-optimized results showed that the GA-SVM method could significantly reduce the computation cost while improving the classification accuracy. The number of bands used for classification was reduced from 198 to 13, while the classification accuracy increased from 88.81% to 92.51%. The optimized values of the two SVM kernel parameters were 95.0297 and 0.2021, respectively, which were different from the default values as used in the ENVI software. In conclusion, the proposed wrapper feature selection method GA-SVM can optimize feature subsets and SVM kernel parameters at the same time, therefore can be applied in feature selection of the hyper spectral data.

Journal ArticleDOI
TL;DR: The proposed risk-sensitive loss functions minimize both the approximation and estimation error and indicate the superior performance of the neural classifier using the proposed loss functions both in terms of the overall and per class classification accuracy.

Journal ArticleDOI
TL;DR: The results indicate that the class-dependent feature subsets found by the proposed weight method can effectively remove irrelevant or redundant features, while maintaining or improving (sometimes substantially ) the classification accuracy, in comparison with other feature selection methods.
Abstract: In this paper, we argue that for a C-class classification problem, C 2-class classifiers, each of which discriminating one class from the other classes and having a characteristic input feature subset, should in general outperform, or at least match the performance of, a C-class classifier with one single input feature subset. For each class, we select a desirable feature subset, which leads to the lowest classification error rate for this class using a classifier for a given feature subset search algorithm. To fairly compare all models, we propose a weight method for the class-dependent classifier, i.e., assigning a weight to each model's output before the comparison is carried out. The method's performance is evaluated on two artificial data sets and several real-world benchmark data sets, with the support vector machine (SVM) as the classifier , and with the RELIEF, class separability, and minimal-redundancy-maximal-relevancy (mRMR) as attribute importance measures. Our results indicate that the class-dependent feature subsets found by our approach can effectively remove irrelevant or redundant features, while maintaining or improving (sometimes substantially ) the classification accuracy, in comparison with other feature selection methods.

Proceedings ArticleDOI
27 May 2008
TL;DR: An approach that performs EEG feature extraction during imagined right and left hand movements by using power spectral entropy (PSE) acquires good classification results with the time-variable linear classifier and provides a promising method for on-line BCI system.
Abstract: Brain-Computer Interfaces (BCI) use electroencephalography (EEG) signals recorded from the scalp to create a new communication channel between the brain and an output device by bypassing conventional motor output pathways of nerves and muscles. One of the most important components of BCI is feature extraction of EEG signals. How to rapidly and reliably extract EEG features for expressing the brain states of different mental tasks is the crucial element for exact classification. This paper presents an approach that performs EEG feature extraction during imagined right and left hand movements by using power spectral entropy (PSE). It acquires good classification results with the time-variable linear classifier. The maximal accuracy achieves 90%. The results show that the PSE is a sensitive parameter for EEG of imaginary hand movements. The method is simple and quick and it provides a promising method for on-line BCI system.

Journal ArticleDOI
TL;DR: A novel strategy to model multiclass classification problems using subclass information in the ECOC framework is presented and it is shown that the proposed splitting procedure yields a better performance when the class overlap or the distribution of the training objects conceal the decision boundaries for the base classifier.
Abstract: A common way to model multiclass classification problems is by means of Error-Correcting Output Codes (ECOCs). Given a multiclass problem, the ECOC technique designs a code word for each class, where each position of the code identifies the membership of the class for a given binary problem. A classification decision is obtained by assigning the label of the class with the closest code. One of the main requirements of the ECOC design is that the base classifier is capable of splitting each subgroup of classes from each binary problem. However, we cannot guarantee that a linear classifier model convex regions. Furthermore, nonlinear classifiers also fail to manage some type of surfaces. In this paper, we present a novel strategy to model multiclass classification problems using subclass information in the ECOC framework. Complex problems are solved by splitting the original set of classes into subclasses and embedding the binary problems in a problem-dependent ECOC design. Experimental results show that the proposed splitting procedure yields a better performance when the class overlap or the distribution of the training objects conceal the decision boundaries for the base classifier. The results are even more significant when one has a sufficiently large training size.

Journal ArticleDOI
TL;DR: In this article, a robust feature selection method using the zero-norm l 0 in the context of support vector machines (SVMs) is proposed, which has a finite convergence and requires solving one linear program at each iteration.
Abstract: Feature selection consists of choosing a subset of available features that capture the relevant properties of the data. In supervised pattern classification, a good choice of features is fundamental for building compact and accurate classifiers. In this paper, we develop an efficient feature selection method using the zero-norm l 0 in the context of support vector machines (SVMs). Discontinuity at the origin for l 0 makes the solution of the corresponding optimization problem difficult to solve. To overcome this drawback, we use a robust DC (difference of convex functions) programming approach which is a general framework for non-convex continuous optimisation. We consider an appropriate continuous approximation to l 0 such that the resulting problem can be formulated as a DC program. Our DC algorithm (DCA) has a finite convergence and requires solving one linear program at each iteration. Computational experiments on standard datasets including challenging feature-selection problems of the NIPS 2003 feature selection challenge and gene selection for cancer classification show that the proposed method is promising: while it suppresses up to more than 99% of the features, it can provide a good classification. Moreover, the comparative results illustrate the superiority of the proposed approach over standard methods such as classical SVMs and feature selection concave.

Journal ArticleDOI
TL;DR: A linear classifier based on robust features extracted from normalized power spectra and autocorrelation functions, as well as novel features from the collapsed average, which characterize transient and periodic properties of the signal envelope are developed.
Abstract: This work presents a non-invasive high-throughput system for automatically detecting characteristic behaviours in mice over extended periods of time, useful for phenotyping experiments. The system classifies time intervals on the order of 2 to 4 seconds as corresponding to motions consistent with either active wake or inactivity associated with sleep. A single Polyvinylidine Difluoride (PVDF) sensor on the cage floor generates signals from motion resulting in pressure. This paper develops a linear classifier based on robust features extracted from normalized power spectra and autocorrelation functions, as well as novel features from the collapsed average (autocorrelation of complex spectrum), which characterize transient and periodic properties of the signal envelope. Performance is analyzed through an experiment comparing results from direct human observation and classification of the different behaviours with an automatic classifier used in conjunction with this system. Experimental results from over 28.5 hours of data from 4 mice indicate a 94% classification rate relative to the human observations. Examples of sequential classifications (2 second increments) over transition regions between sleep and wake behaviour are also presented to demonstrate robust performance to signal variation and explain performance limitations.

Journal ArticleDOI
TL;DR: A rule-based classifier derived from improved genetic algorithm approach is proposed to determine the knowledge rules for land-cover classification done automatically from remote sensing image datasets, and preliminary results indicate that the proposed GA rule- based approach for land -cover classification is promising.
Abstract: Classification of land-cover information using remotely-sensed imagery is a challenging topic due to the complexity of landscapes and the spatial and spectral resolution of the images being used. Early studies of land-cover classification used statistical methods such as the maximum likelihood classifier. Recently, however, numerous studies have applied artificial intelligence techniques – for example, expert system, artificial neural networks and support vector machines – as alternatives to remotely-sensed image classification applications. There is a major drawback in applying these models that the user cannot readily realize the final rules. In this paper, a rule-based classifier derived from improved genetic algorithm approach is proposed to determine the knowledge rules for land-cover classification done automatically from remote sensing image datasets. The proposed algorithm is demonstrated for two image datasets classification problems. Results are compared to other approaches in the literatures. The preliminary results indicate that the proposed GA rule-based approach for land-cover classification is promising.

Journal ArticleDOI
TL;DR: A linearization algorithm is proposed that solves a succession of fast linear programs that converges in a few iterations to a local solution that is competitive with the considerably more complex integer programming and other formulations.
Abstract: The multiple instance classification problem (Dietterich et al., Artif. Intell. 89:31–71, [1998]; Auer, Proceedings of 14th International Conference on Machine Learning, pp. 21–29, Morgan Kaufmann, San Mateo, [1997]; Long et al., Mach. Learn. 30(1):7–22, [1998]) is formulated using a linear or nonlinear kernel as the minimization of a linear function in a finite-dimensional (noninteger) real space subject to linear and bilinear constraints. A linearization algorithm is proposed that solves a succession of fast linear programs that converges in a few iterations to a local solution. Computational results on a number of datasets indicate that the proposed algorithm is competitive with the considerably more complex integer programming and other formulations. A distinguishing aspect of our linear classifier not shared by other multiple instance classifiers is the sparse number of features it utilizes. In some tasks, the reduction amounts to less than one percent of the original features.