Showing papers on "Feature selection published in 1999"

PDF

Open Access

Book•

[...]

10 Sep 1999

TL;DR: In this paper, the authors propose a statistical pattern recognition method for pattern recognition using neural networks and nonlinear discriminant analysis (NDA) based on classification trees and feature selection and extraction.

...read moreread less

Abstract: Introduction to statistical pattern recognition * Estimation * Density estimation * Linear discriminant analysis * Nonlinear discriminant analysis - neural networks * Nonlinear discriminant analysis - statistical methods * Classification trees * Feature selection and extraction * Clustering * Additional topics * Measures of dissimilarity * Parameter estimation * Linear algebra * Data * Probability theory

...read moreread less

1,813 citations

Feature selection for discrete and numeric class machine learning

[...]

Mark Hall

01 Apr 1999

TL;DR: This paper describes a fast, correlation-based filter algorithm that can be applied to continuous and discrete problems and performs more feature selection than ReliefF does—reducing the data dimensionality by fifty percent in most cases.

...read moreread less

Abstract: Algorithms for feature selection fall into two broad categories: wrappers that use the learning algorithm itself to evaluate the usefulness of features and filters that evaluate features according to heuristics based on general characteristics of the data. For application to large databases, filters have proven to be more practical than wrappers because they are much faster. However, most existing filter algorithms only work with discrete classification problems. This paper describes a fast, correlation-based filter algorithm that can be applied to continuous and discrete problems. The algorithm often outperforms the well-known ReliefF attribute estimator when used as a preprocessing step for naive Bayes, instance-based learning, decision trees, locally weighted regression, and model trees. It performs more feature selection than ReliefF does—reducing the data dimensionality by fifty percent in most cases. Also, decision and model trees built from the preprocessed data are often significantly smaller.

...read moreread less

1,653 citations

Journal Article•DOI•

Fast algorithms for projected clustering

[...]

Charu C. Aggarwal¹, Joel L. Wolf¹, Philip S. Yu¹, Cecilia Magdalena Procopiuc², Jong Soo Park³ - Show less +1 more•Institutions (3)

IBM¹, Duke University², Sungshin Women's University³

01 Jun 1999

TL;DR: An algorithmic framework for solving the projected clustering problem, in which the subsets of dimensions selected are specific to the clusters themselves, is developed and tested.

...read moreread less

Abstract: The clustering problem is well known in the database literature for its numerous applications in problems such as customer segmentation, classification and trend analysis. Unfortunately, all known algorithms tend to break down in high dimensional spaces because of the inherent sparsity of the points. In such high dimensional spaces not all dimensions may be relevant to a given cluster. One way of handling this is to pick the closely correlated dimensions and find clusters in the corresponding subspace. Traditional feature selection algorithms attempt to achieve this. The weakness of this approach is that in typical high dimensional data mining applications different sets of points may cluster better for different subsets of dimensions. The number of dimensions in each such cluster-specific subspace may also vary. Hence, it may be impossible to find a single small subset of dimensions for all the clusters. We therefore discuss a generalization of the clustering problem, referred to as the projected clustering problem, in which the subsets of dimensions selected are specific to the clusters themselves. We develop an algorithmic framework for solving the projected clustering problem, and test its performance on synthetic data.

...read moreread less

1,111 citations

Book•

Support Vector Machines for Pattern Classification

[...]

Shigeo Abe¹•Institutions (1)

Kobe University¹

26 Oct 1999

TL;DR: This book presents architectures for multiclass classification and function approximation problems, as well as evaluation criteria for classifiers and regressors, and discusses kernel methods for improving the generalization ability of neural networks and fuzzy systems.

...read moreread less

Abstract: A guide on the use of SVMs in pattern classification, including a rigorous performance comparison of classifiers and regressors. The book presents architectures for multiclass classification and function approximation problems, as well as evaluation criteria for classifiers and regressors. Features: Clarifies the characteristics of two-class SVMs; Discusses kernel methods for improving the generalization ability of neural networks and fuzzy systems; Contains ample illustrations and examples; Includes performance evaluation using publicly available data sets; Examines Mahalanobis kernels, empirical feature space, and the effect of model selection by cross-validation; Covers sparse SVMs, learning using privileged information, semi-supervised learning, multiple classifier systems, and multiple kernel learning; Explores incremental training based batch training and active-set training methods, and decomposition techniques for linear programming SVMs; Discusses variable selection for support vector regressors.

...read moreread less

1,002 citations

Proceedings Article•DOI•

Fast and effective text mining using linear-time document clustering

[...]

Bjornar Larsen¹, Chinatsu Aone¹•Institutions (1)

SRA International¹

01 Aug 1999

TL;DR: An unsupervised, near-linear time text clustering system that offers a number of algorithm choices for each phase, and a refinement to center adjustment, “vector average damping,” that further improves cluster quality.

...read moreread less

Abstract: Clustering is a powerful technique for large-scale topic discovery from text. It involves two phases: first, feature extraction maps each document or record to a point in high-dimensional space, then clustering algorithms automatically group the points into a hierarchy of clusters. We describe an unsupervised, near-linear time text clustering system that offers a number of algorithm choices for each phase. We introduce a methodology for measuring the quality of a cluster hierarchy in terms of FMeasure, and present the results of experiments comparing different algorithms. The evaluation considers some feature selection parameters (tfidfand feature vector length) but focuses on the clustering algorithms, namely techniques from Scatter/Gather (buckshot, fractionation, and split/join) and kmeans. Our experiments suggest that continuous center adjustment contributes more to cluster quality than seed selection does. It follows that using a simpler seed selection algorithm gives a better time/quality tradeoff. We describe a refinement to center adjustment, “vector average damping,” that further improves cluster quality. We also compare the near-linear time algorithms to a group average greedy agglomerative clustering algorithm to demonstrate the time/quality tradeoff quantitatively.

...read moreread less

958 citations

Proceedings Article•

Feature Selection for Machine Learning: Comparing a Correlation-Based Filter Approach to the Wrapper

[...]

Mark Hall, Lloyd A. Smith¹•Institutions (1)

University of Waikato¹

01 May 1999

TL;DR: A new fllter approach to feature selection that uses a correlation based heuristic to evaluate the worth of feature subsets when applied as a data preprocessing step for two common machine learning algorithms.

...read moreread less

Abstract: Feature selection is often an essential data processing step prior to applying a learning algorithm. The removal of irrelevant and redundant information often improves the performance of machine learning algorithms. There are two common approaches: a wrapper uses the intended learning algorithm itself to evaluate the usefulness of features, while a fllter evaluates features according to heuristics based on general characteristics of the data. The wrapper approach is generally considered to produce better feature subsets but runs much more slowly than a fllter. This paper describes a new fllter approach to feature selection that uses a correlation based heuristic to evaluate the worth of feature subsets When applied as a data preprocessing step for two common machine learning algorithms, the new method compares favourably with the wrapper but requires much less computation.

...read moreread less

547 citations

Proceedings Article•

Feature Selection for Unbalanced Class Distribution and Naive Bayes

[...]

Dunja Mladenic, Marko Grobelnik

27 Jun 1999

TL;DR: This paper describes an approach to feature subset selection that takes into account problem speciics and learning algorithm characteristics, and shows that considering domain and algorithm characteristics signiicantly improves the results of classiication.

...read moreread less

Abstract: This paper describes an approach to feature subset selection that takes into account problem speciics and learning algorithm characteristics. It is developed for the Naive Bayesian classiier applied on text data, since it combines well with the addressed learning problems. We focus on domains with many features that also have a highly unbalanced class distribution and asymmetric misclassii-cation costs given only implicitly in the problem. By asymmetric misclassiication costs we mean that one of the class values is the target class value for which we want to get predictions and we prefer false positive over false negative. Our example problem is automatic document categorization using machine learning, where we want to identify documents relevant for the selected category. Usually, only about 1%-10% of examples belong to the selected category. Our experimental comparison of eleven feature scoring measures show that considering domain and algorithm characteristics signiicantly improves the results of classiication.

...read moreread less

464 citations

Journal Article•DOI•

Stepwise selection in small data sets : A simulation study of bias in logistic regression analysis

[...]

Ewout W. Steyerberg¹, Marinus J.C. Eijkemans¹, J. D. F. Habbema¹•Institutions (1)

Erasmus University Rotterdam¹

01 Oct 1999-Journal of Clinical Epidemiology

TL;DR: It is concluded that stepwise selection may result in a substantial bias of estimated regression coefficients of selected covariables, similar to that found in the GUSTO-I trial.

...read moreread less

451 citations

Journal Article•DOI•

Segmented principal components transformation for efficient hyperspectral remote-sensing image display and classification

[...]

Xiuping Jia¹, John A. Richards¹•Institutions (1)

University of New South Wales¹

01 Jan 1999-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: A segmented, and possibly multistage, principal components transformation (PCT) is proposed for efficient hyperspectral remote-sensing image classification and display and results have been obtained in terms of classification accuracy, speed, and quality of color image display using two airborne visible/infrared imaging spectrometer (AVIRIS) data sets.

...read moreread less

Abstract: A segmented, and possibly multistage, principal components transformation (PCT) is proposed for efficient hyperspectral remote-sensing image classification and display. The scheme requires, initially, partitioning the complete set of bands into several highly correlated subgroups. After separate transformation of each subgroup, the single-band separabilities are used as a guide to carry out feature selection. The selected features can then be transformed again to achieve a satisfactory data reduction ratio and generate the three most significant components for color display. The scheme reduces the computational load significantly for feature extraction, compared with the conventional PCT. A reduced number of features will also accelerate the maximum likelihood classification process significantly, and the process will not suffer the limitations encountered by trying to use the full set of hyperspectral data when training samples are limited. Encouraging results have been obtained in terms of classification accuracy, speed, and quality of color image display using two airborne visible/infrared imaging spectrometer (AVIRIS) data sets.

...read moreread less

408 citations

Proceedings Article•DOI•

Support vector machines for hyperspectral remote sensing classification

[...]

J. Anthony Gualtieri¹, Robert F. Cromp¹•Institutions (1)

Goddard Space Flight Center¹

29 Jan 1999

TL;DR: The Support Vector Machine (SVM) as discussed by the authors is a new way to design classification algorithms which learn from examples (supervised learning) and generalize when applied to new data.

...read moreread less

Abstract: The Support Vector Machine provides a new way to design classification algorithms which learn from examples (supervised learning) and generalize when applied to new data. We demonstrate its success on a difficult classification problem from hyperspectral remote sensing, where we obtain performances of 96%, and 87% correct for a 4 class problem, and a 16 class problem respectively. These results are somewhat better than other recent results on the same data. A key feature of this classifier is its ability to use high-dimensional data without the usual recourse to a feature selection step to reduce the dimensionality of the data. For this application, this is important, as hyperspectral data consists of several hundred contiguous spectral channels for each exemplar. We provide an introduction to this new approach, and demonstrate its application to classification of an agriculture scene.

...read moreread less

383 citations

Proceedings Article•

Feature selection for ensembles

[...]

David W. Opitz¹•Institutions (1)

University of Montana¹

18 Jul 1999

TL;DR: This paper presents an ensemble feature selection approach that is based on genetic algorithms and shows improved performance over the popular and powerful ensemble approaches of AdaBoost and Bagging and demonstrates the utility of ensemble features selection.

...read moreread less

Abstract: The traditional motivation behind feature selection algorithms is to find the best subset of features for a task using one particular learning algonthm. Given the recent success of ensembles, however, we investigate the notion of ensemble feature selection in this paper. This task is harder than traditional feature selection in that one not only needs to find features germane to the learning task and learning algorithm, but one also needs to find a set of feature subsets that will promote disagreement among the ensemble's classifiers. In this paper, we present an ensemble feature selection approach that is based on genetic algorithms. Our algorithm shows improved performance over the popular and powerful ensemble approaches of AdaBoost and Bagging and demonstrates the utility of ensemble feature selection.

...read moreread less

Journal Article•DOI•

Adaptive floating search methods in feature selection

[...]

Petr Somol¹, Petr Somol², Petr Somol³, Pavel Pudil², Pavel Pudil³, Jana Novovičová², Jana Novovičová³, P. Paclík³, P. Paclík² - Show less +5 more•Institutions (3)

Charles University in Prague¹, University of Economics, Prague², Academy of Sciences of the Czech Republic³

01 Nov 1999-Pattern Recognition Letters

TL;DR: A new suboptimal search strategy for feature selection that represents a more sophisticated version of “classical” floating search algorithms and facilitates finding a solution even closer to the optimal one.

...read moreread less

Proceedings Article•

Data Visualization and Feature Selection: New Algorithms for Nongaussian Data

[...]

Howard Hua Yang¹, John Moody¹•Institutions (1)

Oregon Health & Science University¹

29 Nov 1999

TL;DR: The efficacy of the methods is illustrated on a radar signal analysis problem to find 2-D viewing coordinates for data visualization and to select inputs for a neural network classifier.

...read moreread less

Abstract: Data visualization and feature selection methods are proposed based on the joint mutual information and ICA. The visualization methods can find many good 2-D projections for high dimensional data interpretation, which cannot be easily found by the other existing methods. The new variable selection method is found to be better in eliminating redundancy in the inputs than other methods based on simple mutual information. The efficacy of the methods is illustrated on a radar signal analysis problem to find 2-D viewing coordinates for data visualization and to select inputs for a neural network classifier.

...read moreread less

Journal Article•DOI•

Partitioning-based clustering for Web document categorization

[...]

Daniel Boley¹, Maria Gini¹, Robert A. Gross¹, Eui-Hong Han¹, George Karypis¹, Vipin Kumar¹, Bamshad Mobasher¹, Jerome Moore¹, Kyle Hastings¹ - Show less +5 more•Institutions (1)

University of Minnesota¹

01 Dec 1999

TL;DR: Two new clustering algorithms are introduced that can effectively cluster documents, even in the presence of a very high dimensional feature space, and do not require pre-specified ad hoc distance functions and are capable of automatically discovering document similarities or associations.

...read moreread less

Abstract: Clustering techniques have been used by many intelligent software agents in order to retrieve, filter, and categorize documents available on the World Wide Web. Clustering is also useful in extracting salient features of related Web documents to automatically formulate queries and search for other similar documents on the Web. Traditional clustering algorithms either use a priori knowledge of document structures to define a distance or similarity among these documents, or use probabilistic techniques such as Bayesian classification. Many of these traditional algorithms, however, falter when the dimensionality of the feature space becomes high relative to the size of the document space. In this paper, we introduce two new clustering algorithms that can effectively cluster documents, even in the presence of a very high dimensional feature space. These clustering techniques, which are based on generalizations of graph partitioning, do not require pre-specified ad hoc distance functions, and are capable of automatically discovering document similarities or associations. We conduct several experiments on real Web data using various feature selection heuristics, and compare our clustering schemes to standard distance-based techniques, such as hierarchical agglomeration clustering , and Bayesian classification methods, such as AutoClass .

...read moreread less

Book•

Introduction To Pattern Recognition: Statistical, Structural, Neural And Fuzzy Logic Approaches

[...]

Menahem Friedman, Abraham Kandel

02 Mar 1999

TL;DR: Decision functions classification by distance functions and clustering classification using statistical approach feature selection fuzzy classification and pattern recognition syntactic pattern recognition neural nets and pattern classification.

...read moreread less

Abstract: Decision functions classification by distance functions and clustering classification using statistical approach feature selection fuzzy classification and pattern recognition syntactic pattern recognition neural nets and pattern classification.

...read moreread less

Journal Article•DOI•

Nearest neighbor classifier: simultaneous editing and feature selection

[...]

Ludmila I. Kuncheva¹, Lakhmi C. Jain²•Institutions (2)

Bangor University¹, University of South Australia²

01 Nov 1999-Pattern Recognition Letters

TL;DR: The GA was found to be an expedient solution compared to editing followed by feature selection, feature selection followed by editing, and the individual results from feature selection and editing.

...read moreread less

Journal Article•DOI•

Nearest neighbor classification from multiple feature subsets

[...]

Stephen D. Bay¹•Institutions (1)

University of California, Irvine¹

01 May 1999

TL;DR: MFS, a combining algorithm designed to improve the accuracy of the nearest neighbor NN classifier, is presented, which significantly outperformed several standard NN variants and was competitive with boosted decision trees.

...read moreread less

Abstract: Combining multiple classifiers is an effective technique for improving accuracy. There are many general combining algorithms, such as Bagging, Boosting, or Error Correcting Output Coding, that significantly improve classifiers like decision trees, rule learners, or neural networks. Unfortunately, these combining methods do not improve the nearest neighbor classifier. In this paper, we present MFS, a combining algorithm designed to improve the accuracy of the nearest neighbor NN classifier. MFS combines multiple NN classifiers each using only a random subset of features. The experimental results are encouraging: On 25 datasets from the UCI repository, MFS significantly outperformed several standard NN variants and was competitive with boosted decision trees. In additional experiments, we show that MFS is robust to irrelevant features, and is able to reduce both bias and variance components of error.

...read moreread less

Journal Article•DOI•

Variable selection in large environmental data sets using principal components analysis

[...]

Jacquelynne R. King, Donald A. Jackson¹•Institutions (1)

University of Toronto¹

01 Jan 1999-Environmetrics

TL;DR: In this article, four methods of variable selection along with different criteria levels for deciding on the number of variables to retain were examined along with a selection method that requires one principal component analysis and retains variables by starting with selection from the first component.

...read moreread less

Abstract: In many large environmental datasets redundant variables can be discarded without the loss of extra variation. Principal components analysis can be used to select those variables that contain the most information. Using an environmental dataset consisting of 36 meteorological variables spanning 37 years, four methods of variable selection are examined along with different criteria levels for deciding on the number of variables to retain. Procrustes analysis, a measure of similarity and bivariate plots are used to assess the success of the alternative variable selection methods and criteria levels in extracting representative variables. The Broken-stick model is a consistent approach to choosing significant principal components and is chosen here as the more suitable criterion in combination with a selection method that requires one principal component analysis and retains variables by starting with selection from the first component. Copyright © 1999 John Wiley & Sons, Ltd.

...read moreread less

Proceedings Article•DOI•

Wavelet packet feature extraction for vibration monitoring

[...]

Gary G. Yen¹, Kuo-Chung Lin¹•Institutions (1)

Oklahoma State University–Stillwater¹

10 Jul 1999

TL;DR: The wavelet packet transform is introduced as an alternative means of extracting time-frequency information from vibration signature with the aid of statistical based feature selection criteria, which significantly reduces the long training time that is often associated with neural network classifier and increases the generalization ability of the neural networkclassifier.

...read moreread less

Abstract: Condition monitoring of dynamic systems based on vibration signatures has generally relied upon Fourier based analysis as a means of translating vibration signals in time domain into the frequency domain. However, Fourier analysis provided a poor representation of signals well localized in time. The wavelet packet transform is introduced as an alternative means of extracting time-frequency information from vibration signature. Moreover, with the aid of statistical based feature selection criteria, a lot of feature components containing little discriminant information could be discarded resulting in a feature subset with reduced number of parameters. This significantly reduces the long training time that is often associated with neural network classifier and increases the generalization ability of the neural network classifier.

...read moreread less

Journal Article•DOI•

Variable Selection and Function Estimation in Additive Nonparametric Regression Using a Data-Based Prior: Comment

[...]

Babette Brumback, David Ruppert, Matt P. Wand

01 Sep 1999-Journal of the American Statistical Association

Journal Article•DOI•

ANN-DT: an algorithm for extraction of decision trees from artificial neural networks

[...]

G.P.J. Schmitz¹, Chris Aldrich, F.S. Gouws•Institutions (1)

Stellenbosch University¹

01 Nov 1999-IEEE Transactions on Neural Networks

TL;DR: A novel artificial neural-network decision tree algorithm (ANN-DT), which extracts binary decision trees from a trained neural network, and is shown to have significant benefits in certain cases when compared with the standard criteria of minimum weighted variance over the branches.

...read moreread less

Abstract: Although artificial neural networks can represent a variety of complex systems with a high degree of accuracy, these connectionist models are difficult to interpret. This significantly limits the applicability of neural networks in practice, especially where a premium is placed on the comprehensibility or reliability of systems. A novel artificial neural-network decision tree algorithm (ANN-DT) is therefore proposed, which extracts binary decision trees from a trained neural network. The ANN-DT algorithm uses the neural network to generate outputs for samples interpolated from the training data set. In contrast to existing techniques, ANN-DT can extract rules from feedforward neural networks with continuous outputs. These rules are extracted from the neural network without making assumptions about the internal structure of the neural network or the features of the data. A novel attribute selection criterion based on a significance analysis of the variables on the neural-network output is examined. It is shown to have significant benefits in certain cases when compared with the standard criteria of minimum weighted variance over the branches. In three case studies the ANN-DT algorithm compared favorably with CART, a standard decision tree algorithm.

...read moreread less

Journal Article•DOI•

Feature selection for optimized skin tumor recognition using genetic algorithms

[...]

Heinz Handels¹, Thomas Roß¹, Jürgen Kreusch¹, Helmut H. Wolff¹, Siegfried J. Pöppl¹ - Show less +1 more•Institutions (1)

University of Lübeck¹

01 Jul 1999-Artificial Intelligence in Medicine

TL;DR: A new approach to computer supported diagnosis of skin tumors in dermatology is presented, using neural networks with error back-propagation as learning paradigm to optimized classification performance of the neural classifiers.

...read moreread less

Proceedings Article•

Feature selection in SVM text categorization

[...]

Hirotoshi Taira, Masahiko Haruno

18 Jul 1999

TL;DR: Results suggest a simple strategy for the SVM text categorization: use a full number of words found through a rough filtering technique like part-of-speech tagging, which indicates that SVMs cannot find irrelevant parts of speech.

...read moreread less

Abstract: This paper investigates the effect of prior feature selection in Support Vector Machine (SVM) text categorization The input space was gradually increased by using mutual information (MI) filtering and part-of-speech (POS) filtering, which determine the portion of words that are appropriate for learning from the information-theoretic and the linguistic perspectives, respectively We tested the two filtering methods on SVMs as well as a decision tree algorithm C45 The SVMs' results common to both filtering are that 1) the optimal number of features differed completely across categories, and 2) the average performance for all categories was best when all of the words were used In addition, a comparison of the two filtering methods clarified that POS filtering on SVMs consistently outperformed MI filtering, which indicates that SVMs cannot find irrelevant parts of speech These results suggest a simple strategy for the SVM text categorization: use a full number of words found through a rough filtering technique like part-of-speech tagging

...read moreread less

Journal Article•DOI•

Feature selection with neural networks

[...]

Philippe Leray, Patrick Gallinari

01 Jan 1999-Behaviormetrika

TL;DR: This paper first briefly introduces baseline statistical methods used in regression and classification, then describes families of methods which have been developed specifically for neural networks, and compared on different test problems.

...read moreread less

Abstract: The observed features of a given phenomenon are not all equally informative : some may be noisy, others correlated or irrelevant. The purpose of feature selection is to select a set of features pertinent to a given task. This is a complex process, but it is an important issue in many fields. In neural networks, feature selection has been studied for the last ten years, using conventional and original methods. This paper is a review of neural network approaches to feature selection. We first briefly introduce baseline statistical methods used in regression and classification. We then describe families of methods which have been developed specifically for neural networks. Representative methods are then compared on different test problems.

...read moreread less

Journal Article•DOI•

Bayesian neural networks for classification: how useful is the evidence framework

[...]

William D. Penny¹, Stephen J. Roberts¹•Institutions (1)

Imperial College London¹

01 Jul 1999-Neural Networks

TL;DR: Results on applying the evidence framework to the real-world data sets showed that committees of Bayesian networks achieved classification accuracies similar to the best alternative methods with a minimum of human intervention.

...read moreread less

Journal Article•DOI•

Analysis of class separation and combination of class-dependent features for handwriting recognition

[...]

Il-Seok Oh¹, Jin-Seon Lee², Ching Y. Suen³•Institutions (3)

Chonbuk National University¹, Woosuk University², Concordia University³

01 Oct 1999-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A new approach to combine multiple features in handwriting recognition based on two ideas: feature selection-based combination and class dependent features that are effective in separating pattern classes and the new feature vector derived from a combination of two types of such features further improves the recognition rate.

...read moreread less

Abstract: In this paper, we propose a new approach to combine multiple features in handwriting recognition based on two ideas: feature selection-based combination and class dependent features. A nonparametric method is used for feature evaluation, and the first part of this paper is devoted to the evaluation of features in terms of their class separation and recognition capabilities. In the second part, multiple feature vectors are combined to produce a new feature vector. Based on the fact that a feature has different discriminating powers for different classes, a new scheme of selecting and combining class-dependent features is proposed. In this scheme, a class is considered to have its own optimal feature vector for discriminating itself from the other classes. Using an architecture of modular neural networks as the classifier, a series of experiments were conducted on unconstrained handwritten numerals. The results indicate that the selected features are effective in separating pattern classes and the new feature vector derived from a combination of two types of such features further improves the recognition rate.

...read moreread less

Journal Article•DOI•

Prior elicitation, variable selection and Bayesian computation for logistic regression models

[...]

Ming-Hui Chen¹, Joseph G. Ibrahim², Constantin T. Yiannoutsos²•Institutions (2)

Worcester Polytechnic Institute¹, Harvard University²

01 Jan 1999-Journal of The Royal Statistical Society Series B-statistical Methodology

TL;DR: This work proposes an informative prior distribution for variable selection and proposes novel methods for computing the marginal distribution of the data for the logistic regression model.

...read moreread less

Abstract: Summary. Bayesian selection of variables is often difficult to carry out because of the challenge in specifying prior distributions for the regression parameters for all possible models, specifying a prior distribution on the model space and computations. We address these three issues for the logistic regression model. For the first, we propose an informative prior distribution for variable selection. Several theoretical and computational properties of the prior are derived and illustrated with several examples. For the second, we propose a method for specifying an informative prior on the model space, and for the third we propose novel methods for computing the marginal distribution of the data. The new computational algorithms only require Gibbs samples from the full model to facilitate the computation of the prior and posterior model probabilities for all possible models. Several properties of the algorithms are also derived. The prior specification for the first challenge focuses on the observables in that the elicitation is based on a prior prediction yo for the response vector and a quantity ao quantifying the uncertainty in yo. Then, yo and ao are used to specify a prior for the regression coefficients semi-automatically. Examples using real data are given to demonstrate the methodology.

...read moreread less

Journal Article•DOI•

Selection of variables for interpreting multivariate gas sensor data

[...]

Tomas Eklöv, Per Mårtensson, Ingemar Lundström

16 Feb 1999-Analytica Chimica Acta

TL;DR: Using a forward selection procedure, applying the root mean square error from a multilinear regression model as the selection criterion, it was possible to get good prediction accuracy from a back-propagation neural network (ANN).

...read moreread less

Journal Article•

From simple features to sophisticated evaluation functions

[...]

Michael Buro

01 Jan 1999-Lecture Notes in Computer Science

TL;DR: A practical framework for the semi-automatic construction of evaluation-functions for games based on a structured evaluation function representation is presented that is able to discover new features in a computationally feasible way.

...read moreread less

Abstract: This paper discusses a practical framework for the semiautomatic construction of evaluation-functions for games. Based on a structured evaluation function representation, a procedure for exploring the feature space is presented that is able to discover new features in a computationally feasible way. Besides the theoretical aspects, related practical issues such as the generation of training positions, feature selection, and weight fitting in large linear systems are discussed. Finally, we present experimental results for Othello, which demonstrate the potential of the described approach.

...read moreread less

Proceedings Article•DOI•

A survey of genetic feature selection in mining issues

[...]

Maria J. Martin-Bautista, Maria-Amparo Vila

06 Jul 1999

TL;DR: A survey of the approaches presented in the literature to select relevant features by using genetic algorithms is given and the different values of the genetic parameters utilized as well as the fitness functions are compared.

...read moreread less

Abstract: In this paper, we review the feature selection problem in mining issues. The application of soft computing techniques to data mining and knowledge discovery is now emerging in order to enhance the effectiveness of the traditional classification methods coming from machine learning. A survey of the approaches presented in the literature to select relevant features by using genetic algorithms is given. The different values of the genetic parameters utilized as well as the fitness functions are compared. A more detailed review of the proposals in the mining fields of databases, text and the Web is also given.

...read moreread less

Collapse