Showing papers on "Feature (computer vision) published in 2003"

PDF

Open Access

Journal Article•DOI•

An introduction to variable and feature selection

[...]

Isabelle Guyon, André Elisseeff¹•Institutions (1)

01 Mar 2003-Journal of Machine Learning Research

TL;DR: The contributions of this special issue cover a wide range of aspects of variable selection: providing a better definition of the objective function, feature construction, feature ranking, multivariate feature selection, efficient search methods, and feature validity assessment methods.

...read moreread less

Abstract: Variable and feature selection have become the focus of much research in areas of application for which datasets with tens or hundreds of thousands of variables are available. These areas include text processing of internet documents, gene expression array analysis, and combinatorial chemistry. The objective of variable selection is three-fold: improving the prediction performance of the predictors, providing faster and more cost-effective predictors, and providing a better understanding of the underlying process that generated the data. The contributions of this special issue cover a wide range of aspects of such problems: providing a better definition of the objective function, feature construction, feature ranking, multivariate feature selection, efficient search methods, and feature validity assessment methods.

...read moreread less

14,509 citations

Proceedings Article•

Feature selection for high-dimensional data: a fast correlation-based filter solution

[...]

Lei Yu¹, Huan Liu¹•Institutions (1)

Arizona State University¹

21 Aug 2003

TL;DR: A novel concept, predominant correlation, is introduced, and a fast filter method is proposed which can identify relevant features as well as redundancy among relevant features without pairwise correlation analysis.

...read moreread less

Abstract: Feature selection, as a preprocessing step to machine learning, is effective in reducing dimensionality, removing irrelevant data, increasing learning accuracy, and improving result comprehensibility. However, the recent increase of dimensionality of data poses a severe challenge to many existing feature selection methods with respect to efficiency and effectiveness. In this work, we introduce a novel concept, predominant correlation, and propose a fast filter method which can identify relevant features as well as redundancy among relevant features without pairwise correlation analysis. The efficiency and effectiveness of our method is demonstrated through extensive comparisons with other methods using real-world data of high dimensionality

...read moreread less

2,251 citations

Proceedings Article•DOI•

Real-time simultaneous localisation and mapping with a single camera

[...]

Davison¹•Institutions (1)

University of Oxford¹

13 Oct 2003

TL;DR: This work presents a top-down Bayesian framework for single-camera localisation via mapping of a sparse set of natural features using motion modelling and an information-guided active measurement strategy, in particular addressing the difficult issue of real-time feature initialisation via a factored sampling approach.

...read moreread less

Abstract: Ego-motion estimation for an agile single camera moving through general, unknown scenes becomes a much more challenging problem when real-time performance is required rather than under the off-line processing conditions under which most successful structure from motion work has been achieved. This task of estimating camera motion from measurements of a continuously expanding set of self-mapped visual features is one of a class of problems known as Simultaneous Localisation and Mapping (SLAM) in the robotics community, and we argue that such real-time mapping research, despite rarely being camera-based, is more relevant here than off-line structure from motion methods due to the more fundamental emphasis placed on propagation of uncertainty. We present a top-down Bayesian framework for single-camera localisation via mapping of a sparse set of natural features using motion modelling and an information-guided active measurement strategy, in particular addressing the difficult issue of real-time feature initialisation via a factored sampling approach. Real-time handling of uncertainty permits robust localisation via the creating and active measurement of a sparse map of landmarks such that regions can be re-visited after periods of neglect and localisation can continue through periods when few features are visible. Results are presented of real-time localisation for a hand-waved camera with very sparse prior scene knowledge and all processing carried out on a desktop PC.

...read moreread less

1,967 citations

Proceedings Article•DOI•

Automatic image annotation and retrieval using cross-media relevance models

[...]

Jiwoon Jeon¹, Victor Lavrenko¹, R. Manmatha¹•Institutions (1)

University of Massachusetts Amherst¹

28 Jul 2003

TL;DR: The approach shows the usefulness of using formal information retrieval models for the task of image annotation and retrieval by assuming that regions in an image can be described using a small vocabulary of blobs.

...read moreread less

Abstract: Libraries have traditionally used manual image annotation for indexing and then later retrieving their image collections. However, manual image annotation is an expensive and labor intensive procedure and hence there has been great interest in coming up with automatic ways to retrieve images based on content. Here, we propose an automatic approach to annotating and retrieving images based on a training set of images. We assume that regions in an image can be described using a small vocabulary of blobs. Blobs are generated from image features using clustering. Given a training set of images with annotations, we show that probabilistic models allow us to predict the probability of generating a word given the blobs in an image. This may be used to automatically annotate and retrieve images given a word as a query. We show that relevance models allow us to derive these probabilities in a natural way. Experiments show that the annotation performance of this cross-media relevance model is almost six times as good (in terms of mean precision) than a model based on word-blob co-occurrence model and twice as good as a state of the art model derived from machine translation. Our approach shows the usefulness of using formal information retrieval models for the task of image annotation and retrieval.

...read moreread less

1,275 citations

Journal Article•DOI•

Rough set methods in feature selection and recognition

[...]

Roman W. Swiniarski¹, Andrzej Skowron²•Institutions (2)

San Diego State University¹, University of Warsaw²

06 Mar 2003-Pattern Recognition Letters

TL;DR: The algorithm for feature selection is based on an application of a rough set method to the result of principal components analysis (PCA) used for feature projection and reduction.

...read moreread less

801 citations

Proceedings Article•

A Model for Learning the Semantics of Pictures

[...]

Victor Lavrenko¹, R. Manmatha¹, Jiwoon Jeon¹•Institutions (1)

University of Massachusetts Amherst¹

09 Dec 2003

TL;DR: An approach to learning the semantics of images which allows us to automatically annotate an image with keywords and to retrieve images based on text queries using a formalism that models the generation of annotated images.

...read moreread less

Abstract: We propose an approach to learning the semantics of images which allows us to automatically annotate an image with keywords and to retrieve images based on text queries. We do this using a formalism that models the generation of annotated images. We assume that every image is divided into regions, each described by a continuous-valued feature vector. Given a training set of images with annotations, we compute a joint probabilistic model of image features and words which allow us to predict the probability of generating a word given the image regions. This may be used to automatically annotate and retrieve images given a word as a query. Experiments show that our model significantly outperforms the best of the previously reported results on the tasks of automatic image annotation and retrieval.

...read moreread less

762 citations

Journal Article•DOI•

Classification and feature extraction for remote sensing images from urban areas based on morphological transformations

[...]

Jon Atli Benediktsson, Martino Pesaresi, K. Amason

15 Sep 2003-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: It is seen that relatively few features are needed to achieve the same classification accuracies as in the original feature space when classification of panchromatic high-resolution data from urban areas using morphological and neural approaches.

...read moreread less

Abstract: Classification of panchromatic high-resolution data from urban areas using morphological and neural approaches is investigated. The proposed approach is based on three steps. First, the composition of geodesic opening and closing operations of different sizes is used in order to build a differential morphological profile that records image structural information. Although, the original panchromatic image only has one data channel, the use of the composition operations will give many additional channels, which may contain redundancies. Therefore, feature extraction or feature selection is applied in the second step. Both discriminant analysis feature extraction and decision boundary feature extraction are investigated in the second step along with a simple feature selection based on picking the largest indexes of the differential morphological profiles. Third, a neural network is used to classify the features from the second step. The proposed approach is applied in experiments on high-resolution Indian Remote Sensing 1C (IRS-1C) and IKONOS remote sensing data from urban areas. In experiments, the proposed method performs well in terms of classification accuracies. It is seen that relatively few features are needed to achieve the same classification accuracies as in the original feature space.

...read moreread less

756 citations

Journal Article•

Feature extraction by non parametric mutual information maximization

[...]

Kari Torkkola¹•Institutions (1)

Motorola¹

01 Mar 2003-Journal of Machine Learning Research

TL;DR: A quadratic divergence measure is used instead of a commonly used mutual information measure based on Kullback-Leibler divergence, which allows for an efficient non-parametric implementation and requires no prior assumptions about class densities.

...read moreread less

Abstract: We present a method for learning discriminative feature transforms using as criterion the mutual information between class labels and transformed features. Instead of a commonly used mutual information measure based on Kullback-Leibler divergence, we use a quadratic divergence measure, which allows us to make an efficient non-parametric implementation and requires no prior assumptions about class densities. In addition to linear transforms, we also discuss nonlinear transforms that are implemented as radial basis function networks. Extensions to reduce the computational complexity are also presented, and a comparison to greedy feature selection is made.

...read moreread less

698 citations

Proceedings Article•DOI•

Skeleton based shape matching and retrieval

[...]

Hari Sundar¹, Deborah Silver¹, Nikhil Gagvani², Sven Dickinson³•Institutions (3)

Rutgers University¹, Sarnoff Corporation², University of Toronto³

12 May 2003

TL;DR: The method encodes the geometric and topological information in the form of a skeletal graph and uses graph matching techniques to match the skeletons and to compare them and also describes a visualization tool to aid in the selection and specification of the matched objects.

...read moreread less

Abstract: We describe a novel method for searching and comparing 3D objects. The method encodes the geometric and topological information in the form of a skeletal graph and uses graph matching techniques to match the skeletons and to compare them. The skeletal graphs can be manually annotated to refine or restructure the search. This helps in choosing between a topological similarity and a geometric (shape) similarity. A feature of skeletal matching is the ability to perform part-matching, and its inherent intuitiveness, which helps in defining the search and in visualizing the results. Also, the matching results, which are presented in a per-node basis can be used for driving a number of registration algorithms, most of which require a good initial guess to perform registration. We also describe a visualization tool to aid in the selection and specification of the matched objects.

...read moreread less

644 citations

Journal Article•DOI•

Edge-preserving and scale-dependent properties of total variation regularization

[...]

David M. Strong¹, Tony F. Chan²•Institutions (2)

Pepperdine University¹, University of California, Los Angeles²

01 Dec 2003-Inverse Problems

TL;DR: In this paper, the effect of TV regularization on individual image features is investigated, and it is shown that the effect on individual features is inversely proportional to the scale of each feature.

...read moreread less

Abstract: We give and prove two new and fundamental properties of total-variation-minimizing function regularization (TV regularization): edge locations of function features tend to be preserved, and under certain conditions are preserved exactly; intensity change experienced by individual features is inversely proportional to the scale of each feature. We give and prove exact analytic solutions to the TV regularization problem for simple but important cases. These can also be used to better understand the effects of TV regularization for more general cases. Our results explain why and how TV-minimizing image restoration can remove noise while leaving relatively intact larger-scaled image features, and thus why TV image restoration is especially effective in restoring images with larger-scaled features. Although TV regularization is a global problem, our results show that the effects of TV regularization on individual image features are often quite local. Our results give us a better understanding of what types of images and what types of image degradation are most effectively improved by TV-minimizing image restoration schemes, and they potentially lead to more intelligently designed TV-minimizing restoration schemes.

...read moreread less

609 citations

Proceedings Article•DOI•

On-line selection of discriminative tracking features

[...]

Collins¹, Liu¹•Institutions (1)

Carnegie Mellon University¹

13 Oct 2003

TL;DR: This paper presents an online feature selection mechanism for evaluating multiple features while tracking and adjusting the set of features used to improve tracking performance, and notes susceptibility of the variance ratio feature selection method to distraction by spatially correlated background clutter.

...read moreread less

Abstract: We present a method for evaluating multiple feature spaces while tracking, and for adjusting the set of features used to improve tracking performance. Our hypothesis is that the features that best discriminate between object and background are also best for tracking the object. We develop an online feature selection mechanism based on the two-class variance ratio measure, applied to log likelihood distributions computed with respect to a given feature from samples of object and background pixels. This feature selection mechanism is embedded in a tracking system that adaptively selects the top-ranked discriminative features for tracking. Examples are presented to illustrate how the method adapts to changing appearances of both tracked object and scene background.

...read moreread less

Journal Article•DOI•

Locating features in source code

[...]

T. Eisenbarth¹, Rainer Koschke¹, D. Simon¹•Institutions (1)

University of Stuttgart¹

01 Mar 2003-IEEE Transactions on Software Engineering

TL;DR: A semiautomatic technique is presented that reconstructs the mapping for features that are triggered by the user and exhibit an observable behavior and allows incremental exploration of features while preserving the "mental map" the analyst has gained through the analysis.

...read moreread less

Abstract: Understanding the implementation of a certain feature of a system requires identification of the computational units of the system that contribute to this feature. In many cases, the mapping of features to the source code is poorly documented. In this paper, we present a semiautomatic technique that reconstructs the mapping for features that are triggered by the user and exhibit an observable behavior. The mapping is in general not injective; that is, a computational unit may contribute to several features. Our technique allows for the distinction between general and specific computational units with respect to a given set of features. For a set of features, it also identifies jointly and distinctly required computational units. The presented technique combines dynamic and static analyses to rapidly focus on the system's parts that relate to a specific set of features. Dynamic information is gathered based on a set of scenarios invoking the features. Rather than assuming a one-to-one correspondence between features and scenarios as in earlier work, we can now handle scenarios that invoke many features. Furthermore, we show how our method allows incremental exploration of features while preserving the "mental map" the analyst has gained through the analysis.

...read moreread less

Journal Article•DOI•

Handwritten digit recognition: benchmarking of state-of-the-art techniques

[...]

Cheng-Lin Liu¹, Kazuki Nakashima¹, Hiroshi Sako¹, Hiromichi Fujisawa¹•Institutions (1)

Hitachi¹

01 Oct 2003-Pattern Recognition

TL;DR: The results of handwritten digit recognition on well-known image databases using state-of-the-art feature extraction and classification techniques are competitive to the best ones previously reported on the same databases.

...read moreread less

Journal Article•DOI•

A versatile wavelet domain noise filtration technique for medical imaging

[...]

Aleksandra Pizurica¹, Wilfried Philips¹, Ignace Lemahieu¹, Marc Acheroy²•Institutions (2)

Ghent University¹, Royal Military Academy²

21 May 2003-IEEE Transactions on Medical Imaging

TL;DR: A robust wavelet domain method for noise filtering in medical images that adapts itself to various types of image noise as well as to the preference of the medical expert; a single parameter can be used to balance the preservation of (expert-dependent) relevant details against the degree of noise reduction.

...read moreread less

Abstract: We propose a robust wavelet domain method for noise filtering in medical images. The proposed method adapts itself to various types of image noise as well as to the preference of the medical expert; a single parameter can be used to balance the preservation of (expert-dependent) relevant details against the degree of noise reduction. The algorithm exploits generally valid knowledge about the correlation of significant image features across the resolution scales to perform a preliminary coefficient classification. This preliminary coefficient classification is used to empirically estimate the statistical distributions of the coefficients that represent useful image features on the one hand and mainly noise on the other. The adaptation to the spatial context in the image is achieved by using a wavelet domain indicator of the local spatial activity. The proposed method is of low complexity, both in its implementation and execution time. The results demonstrate its usefulness for noise suppression in medical ultrasound and magnetic resonance imaging. In these applications, the proposed method clearly outperforms single-resolution spatially adaptive algorithms, in terms of quantitative performance measures as well as in terms of visual quality of the images.

...read moreread less

Journal Article•DOI•

Texture classification using wavelet transform

[...]

S. Arivazhagan¹, L. Ganesan²•Institutions (2)

Mepco Schlenk Engineering College¹, Government College²

01 Jun 2003-Pattern Recognition Letters

TL;DR: This paper describes the texture classification using (i) wavelet statistical features, (ii) wavelets co-occurrence features and (iii) a combination of wavelets statistical features and co- Occurrence features of one level wavelet transformed images with different feature databases.

...read moreread less

Journal Article•DOI•

Multi-scale Feature Extraction on Point-Sampled Surfaces

[...]

Mark Pauly¹, Richard Keiser¹, Markus Gross¹•Institutions (1)

ETH Zurich¹

01 Sep 2003-Computer Graphics Forum

TL;DR: Central to the method is a multi‐scale classification operator that allows feature analysis at multiplescales, using the size of the local neighborhoods as a discrete scale parameter, which significantly improves thereliability of the detection phase and makes the method more robust in the presence of noise.

...read moreread less

Abstract: We present a new technique for extracting line-type features on point-sampled geometry. Given an unstructured point cloud as input, our method first applies principal component analysis on local neighborhoods to classify points according to the likelihood that they belong to a feature. Using hysteresis thresholding, we then compute a minimum spanning graph as an initial approximation of the feature lines. To smooth out the features while maintaining a close connection to the underlying surface, we use an adaptation of active contour models. Central to our method is a multi-scale classification operator that allows feature analysis at multiple scales, using the size of the local neighborhoods as a discrete scale parameter. This significantly improves the reliability of the detection phase and makes our method more robust in the presence of noise. To illustrate the usefulness of our method, we have implemented a non-photorealistic point renderer to visualize point-sampled surfaces as line drawings of their extracted feature curves.

...read moreread less

Journal Article•DOI•

The State of the Art in Flow Visualisation: Feature Extraction and Tracking

[...]

Frits H. Post¹, Benjamin Vrolijk¹, Helwig Hauser², Robert S. Laramee², Helmut Doleisch² - Show less +1 more•Institutions (2)

Delft University of Technology¹, VRVis²

01 Dec 2003-Computer Graphics Forum

TL;DR: This work has shown that the steadily increasing performance of computers again has become a driving force for new advances in flow visualisation, especially in techniques based on texturing, feature extraction, vector field clustering, and topology extraction.

...read moreread less

Abstract: Flow visualisation is an attractive topic in data visualisation, offering great challenges for research. Very large data sets must be processed, consisting of multivariate data at large numbers of grid points, often arranged in many time steps. Recently, the steadily increasing performance of computers again has become a driving force for new advances in flow visualisation, especially in techniques based on texturing, feature extraction, vector field clustering, and topology extraction. In this article we present the state of the art in feature-based flow visualisation techniques. We will present numerous feature extraction techniques, categorised according to the type of feature. Next, feature tracking and event detection algorithms are discussed, for studying the evolution of features in time-dependent data sets. Finally, various visualisation techniques are demonstrated.

...read moreread less

Proceedings Article•DOI•

Foreground object detection from videos containing complex background

[...]

Liyuan Li¹, Weimin Huang¹, Irene Yu-Hua Gu², Qi Tian¹•Institutions (2)

Institute for Infocomm Research Singapore¹, Chalmers University of Technology²

02 Nov 2003

TL;DR: A Bayes decision rule for classification of background and foreground from selected feature vectors is formulated and the convergence of the learning process is proved and a formula to select a proper learning rate is also derived.

...read moreread less

Abstract: This paper proposes a novel method for detection and segmentation of foreground objects from a video which contains both stationary and moving background objects and undergoes both gradual and sudden "once-off" changes. A Bayes decision rule for classification of background and foreground from selected feature vectors is formulated. Under this rule, different types of background objects will be classified from foreground objects by choosing a proper feature vector. The stationary background object is described by the color feature, and the moving background object is represented by the color co-occurrence feature. Foreground objects are extracted by fusing the classification results from both stationary and moving pixels. Learning strategies for the gradual and sudden "once-off" background changes are proposed to adapt to various changes in background through the video. The convergence of the learning process is proved and a formula to select a proper learning rate is also derived. Experiments have shown promising results in extracting foreground objects from many complex backgrounds including wavering tree branches, flickering screens and water surfaces, moving escalators, opening and closing doors, switching lights and shadows of moving objects.

...read moreread less

Proceedings Article•DOI•

Identifying important features for intrusion detection using support vector machines and neural networks

[...]

Andrew H. Sung, Srinivas Mukkamala

27 Jan 2003

TL;DR: This paper applies the technique of deleting one feature at a time to perform experiments on SVMs and neural networks to rank the importance of input features for the DARPA collected intrusion data and shows that SVM-based and neural network based IDSs using a reduced number of features can deliver enhanced or comparable performance.

...read moreread less

Abstract: Intrusion detection is a critical component of secure information systems. This paper addresses the issue of identifying important input features in building an intrusion detection system (IDS). Since elimination of the insignificant and/or useless inputs leads to a simplification of the problem, faster and more accurate detection may result. Feature ranking and selection, therefore, is an important issue in intrusion detection. We apply the technique of deleting one feature at a time to perform experiments on SVMs and neural networks to rank the importance of input features for the DARPA collected intrusion data. Important features for each of the 5 classes of intrusion patterns in the DARPA data are identified. It is shown that SVM-based and neural network based IDSs using a reduced number of features can deliver enhanced or comparable performance. An IDS for class-specific detection based on five SVMs is proposed.

...read moreread less

Journal Article•DOI•

Feature fusion: parallel strategy vs. serial strategy

[...]

Jian Yang¹, Jian Yang², Jingyu Yang², Dapeng Zhang¹, Jian Feng Lu² - Show less +1 more•Institutions (2)

Hong Kong Polytechnic University¹, Nanjing University of Science and Technology²

01 Jun 2003-Pattern Recognition

TL;DR: The experimental results indicate that the classification accuracy is increased significantly under parallel feature fusion and also demonstrate that the developed parallel fusion is more effective than the classical serial feature fusion.

...read moreread less

Journal Article•DOI•

Feature Weighting in k -Means Clustering

[...]

Dharmendra S. Modha¹, W. Scott Spangler¹•Institutions (1)

IBM¹

01 Sep 2003-Machine Learning

TL;DR: An abstract framework for integrating multiple feature spaces in the k-means clustering algorithm is presented and the effectiveness of feature weighting in clustering on several different application domains is demonstrated.

...read moreread less

Abstract: Data sets with multiple, heterogeneous feature spaces occur frequently. We present an abstract framework for integrating multiple feature spaces in the k-means clustering algorithm. Our main ideas are (i) to represent each data object as a tuple of multiple feature vectors, (ii) to assign a suitable (and possibly different) distortion measure to each feature space, (iii) to combine distortions on different feature spaces, in a convex fashion, by assigning (possibly) different relative weights to each, (iv) for a fixed weighting, to cluster using the proposed convex k-means algorithm, and (v) to determine the optimal feature weighting to be the one that yields the clustering that simultaneously minimizes the average within-cluster dispersion and maximizes the average between-cluster dispersion along all the feature spaces. Using precision/recall evaluations and known ground truth classifications, we empirically demonstrate the effectiveness of feature weighting in clustering on several different application domains.

...read moreread less

Journal Article•DOI•

Feature interaction: a critical review and considered forecast

[...]

Muffy Calder¹, Mario Kolberg², Evan H. Magill², Stephan Reiff-Marganiec¹•Institutions (2)

University of Glasgow¹, University of Stirling²

15 Jan 2003-Computer Networks

TL;DR: The state of the art of the field of feature interactions in telecommunications services is reviewed, concentrating on three major research trends: software engineering approaches, formal methods, and on line techniques.

...read moreread less

Book•

Computer Vision and Image Processing

[...]

Tim Morris

09 Sep 2003

TL;DR: The present paper presents a meta-modelling system that automates the very labor-intensive and therefore time-heavy and expensive and expensive process of integrating image and video data into a model that can be transformed into discrete-time transforms.

...read moreread less

Abstract: Preface Introduction Image Representation Image Transforms and Feature Extraction Morphology Region Detection Region Description Region Labelling System Architecture Motion Tracking Image and Video Tracking Conclusions Appendix A: Fourier Transform Appendix B: Wavelet Transform Appendix C: Linear Mathematics Index

...read moreread less

Journal Article•DOI•

A hybrid fingerprint matcher

[...]

Arun Ross¹, Anil K. Jain¹, James G. Reisman²•Institutions (2)

Michigan State University¹, Princeton University²

01 Jul 2003-Pattern Recognition

TL;DR: A hybrid fingerprint matching scheme that uses both minutiae and ridge flow information to represent and match fingerprints, where the entire image is taken into account while constructing the ridge feature map.

...read moreread less

Proceedings Article•

Features for audio and music classification

[...]

Martin F. McKinney¹, Jeroen Breebaart¹•Institutions (1)

Philips¹

26 Oct 2003

TL;DR: Four audio feature sets are evaluated in their ability to classify five general audio classes and seven popular music genres and show that the temporal behavior of features is important for both music and audio classification.

...read moreread less

Abstract: Four audio feature sets are evaluated in their ability to classify five general audio classes and seven popular music genres. The feature sets include low-level signal properties, mel-frequency spectral coefficients, and two new sets based on perceptual models of hearing. The temporal behavior of the features is analyzed and parameterized and these parameters are included as additional features. Using a standard Gaussian framework for classification, results show that the temporal behavior of features is important for both music and audio classification. In addition, classification is better, on average, if based on features from models of auditory perception rather than on standard features.

...read moreread less

Journal Article•DOI•

Class prediction and discovery using gene microarray and proteomics mass spectroscopy data: curses, caveats, cautions.

[...]

Ray L. Somorjai¹, Brion Dolenko¹, Richard Baumgartner¹•Institutions (1)

National Research Council¹

12 Aug 2003-Bioinformatics

TL;DR: This work shows for several publicly available microarray and proteomics datasets how the 'curse of dimensionality' and dataset sparsity influence classification outcomes, and suggests an approach to assess the relative quality of apparently equally good classifiers.

...read moreread less

Abstract: Motivation: Tw op ractical realities constrain the analysis of microarray data, mass spectra from proteomics, and biomedical infrared or magnetic resonance spectra. One is the ‘curse of dimensionality’: the number of features characterizing these data is in the thousands or tens of thousands. The other is the ‘curse of dataset sparsity’: the number of samples is limited. The consequences of these two curses are far-reaching when such data are used to classify the presence or absence of disease. Results: Using very simple classifiers, we show for several publicly available microarray and proteomics datasets how these curses influence classification outcomes. In particular, even if the sample per feature ratio is increased to the recommended 5–10 by feature extraction/reduction methods, dataset sparsity can render any classification result statistically suspect. In addition, several ‘optimal’ feature sets are typically identifiable for sparse datasets, all producing perfect classification results, both for the training and independent validation sets. This non-uniqueness leads to interpretational difficulties and casts doubt on the biological relevance of any of these ‘optimal’ feature sets. We suggest an approach to assess the relative quality of apparently equally good classifiers.

...read moreread less

Proceedings Article•

Efficient and Robust Feature Extraction by Maximum Margin Criterion

[...]

Haifeng Li¹, Tao Jiang¹, Keshu Zhang²•Institutions (2)

University of California, Riverside¹, University of New Orleans²

09 Dec 2003

TL;DR: In this paper, a new feature extractor based on maximum margin criterion (MMC) was proposed, which does not depend on the nonsingularity of the within-class scatter matrix.

...read moreread less

Abstract: A new feature extraction criterion, maximum margin criterion (MMC), is proposed in this paper. This new criterion is general in the sense that, when combined with a suitable constraint, it can actually give rise to the most popular feature extractor in the literature, linear discriminate analysis (LDA). We derive a new feature extractor based on MMC using a different constraint that does not depend on the nonsingularity of the within-class scatter matrix Sw. Such a dependence is a major drawback of LDA especially when the sample size is small. The kernelized (nonlinear) counterpart of this linear feature extractor is also established in this paper. Our preliminary experimental results on face images demonstrate that the new feature extractors are efficient and stable.

...read moreread less

Proceedings Article•DOI•

Multimedia content processing through cross-modal association

[...]

Dongge Li¹, Nevenka Dimitrova², Mingkun Li³, Ishwar K. Sethi³•Institutions (3)

Motorola¹, Philips², University of Rochester³

02 Nov 2003

TL;DR: This paper investigates different cross-modal association methods using the linear correlation model, and introduces a novel method for cross- modal association called Cross-modAL Factor Analysis (CFA), which shows several advantages in analysis performance and feature usage.

...read moreread less

Abstract: Multimodal information processing has received considerable attention in recent years The focus of existing research in this area has been predominantly on the use of fusion technology In this paper, we suggest that cross-modal association can provide a new set of powerful solutions in this area We investigate different cross-modal association methods using the linear correlation model We also introduce a novel method for cross-modal association called Cross-modal Factor Analysis (CFA) Our earlier work on Latent Semantic Indexing (LSI) is extended for applications that use off-line supervised training As a promising research direction and practical application of cross-modal association, cross-modal information retrieval where queries from one modality are used to search for content in another modality using low-level features is then discussed in detail Different association methods are tested and compared using the proposed cross-modal retrieval system All these methods achieve significant dimensionality reduction Among them CFA gives the best retrieval performance Finally, this paper addresses the use of cross-modal association to detect talking heads The CFA method achieves 911% detection accuracy, while LSI and Canonical Correlation Analysis (CCA) achieve 661% and 739% accuracy, respectively As shown by experiments, cross-modal association provides many useful benefits, such as robust noise resistance and effective feature selection Compared to CCA and LSI, the proposed CFA shows several advantages in analysis performance and feature usage Its capability in feature selection and noise resistance also makes CFA a promising tool for many multimedia analysis applications

...read moreread less

Proceedings Article•

An evaluation on feature selection for text clustering

[...]

Tao Liu¹, Shengping Liu², Zheng Chen³, Wei-Ying Ma³•Institutions (3)

Nankai University¹, Peking University², Microsoft³

21 Aug 2003

TL;DR: An "Iterative Feature Selection (IF)" method is proposed that addresses the unavailability of label problem by utilizing effective supervised feature selection method to iteratively select features and perform clustering.

...read moreread less

Abstract: Feature selection methods have been successfully applied to text categorization but seldom applied to text clustering due to the unavailability of class label information. In this paper, we first give empirical evidence that feature selection methods can improve the efficiency and performance of text clustering algorithm. Then we propose a new feature selection method called "Term Contribution (TC)" and perform a comparative study on a variety of feature selection methods for text clustering, including Document Frequency (DF), Term Strength (TS), Entropy-based (En), Information Gain (IG) and χ2 statistic (CHI). Finally, we propose an "Iterative Feature Selection (IF)" method that addresses the unavailability of label problem by utilizing effective supervised feature selection method to iteratively select features and perform clustering. Detailed experimental results on Web Directory data are provided in the paper.

...read moreread less

DOI•

Interactive feature specification for focus+context visualization of complex simulation data

[...]

Helmut Doleisch¹, Martin Gasser¹, Helwig Hauser¹•Institutions (1)

VRVis¹

26 May 2003

TL;DR: A framework for fiexible and interactive specification of high-dimensional and/or complex features in simulation data based on a simple and compact feature definition language (FDL) and a focus+context style of visualization in 3D is realized.

...read moreread less

Abstract: Visualization of high-dimensional, large data sets, resulting from computational simulation, is one of the most challenging fields in scientific viualization. When visualization aims at supporting the analysis of such data sets, feature-based approches are very useful to reduce the amount of data which is shown at each instance of time and guide the user to the most interesting areas of the data. When using feature-based visualization, one of the most difficult questions is how to extract or specify the features. This is mostly done (semi-)automatic up to now. Especially when interactive analysis of the data is the main goal of the visualization, tools supporting interactive specification of features are needed.In this paper we present a framework for fiexible and interactive specification of high-dimensional and/or complex features in simulation data. The framework makes use of multiple, linked views from information as well as scienti c visualization and is based on a simple and compact feature definition language (FDL). It allows the definition of one or several features, which can be complex and/or hierarchically described by brushing multiple dimensions (using non-binary and composite brushes). The result of the specification is linked to all views, thereby a focus+context style of visualization in 3D is realized. To demonstrate the usage of the specification, as well as the linked tools, applications from flow simulation in the automotive industry are presented.

...read moreread less

Collapse