scispace - formally typeset
Open AccessJournal ArticleDOI

Classification of arrayCGH data using fused SVM

TLDR
This work proposes a new method for supervised classification of arrayCGH data that incorporates the biological specificities of DNA copy number variations along the genome as prior knowledge and demonstrates that the introduction of the new prior on the classifier leads not only to more accurate predictions, but also to the identification of known and new regions of interest in the genome.
Abstract
Motivation: Array-based comparative genomic hybridization (arrayCGH) has recently become a popular tool to identify DNA copy number variations along the genome. These profiles are starting to be used as markers to improve prognosis or diagnosis of cancer, which implies that methods for automated supervised classification of arrayCGH data are needed. Like gene expression profiles, arrayCGH profiles are characterized by a large number of variables usually measured on a limited number of samples. However, arrayCGH profiles have a particular structure of correlations between variables, due to the spatial organization of bacterial artificial chromosomes along the genome. This suggests that classical classification methods, often based on the selection of a small number of discriminative features, may not be the most accurate methods and may not produce easily interpretable prediction rules. Results: We propose a new method for supervised classification of arrayCGH data. The method is a variant of support vector machine that incorporates the biological specificities of DNA copy number variations along the genome as prior knowledge. The resulting classifier is a sparse linear classifier based on a limited number of regions automatically selected on the chromosomes, leading to easy interpretation and identification of discriminative regions of the genome. We test this method on three classification problems for bladder and uveal cancer, involving both diagnosis and prognosis. We demonstrate that the introduction of the new prior on the classifier leads not only to more accurate predictions, but also to the identification of known and new regions of interest in the genome. Availability: All data and algorithms are publicly available. Contact: franck.rapaport@curie.fr

read more

Citations
More filters
Posted Content

Optimization with Sparsity-Inducing Penalties

TL;DR: In this article, the authors present from a general perspective optimization tools and techniques dedicated to such sparsityinducing penalties, including proximal methods, block-coordinate descent, reweighted $\ell_2$-penalized techniques, working-set and homotopy methods, as well as non-convex formulations and extensions.
Book

Optimization with Sparsity-Inducing Penalties

TL;DR: This monograph covers proximal methods, block-coordinate descent, reweighted l2-penalized techniques, working-set and homotopy methods, as well as non-convex formulations and extensions, and provides an extensive set of experiments to compare various algorithms from a computational point of view.
Journal ArticleDOI

Applications of Support Vector Machine (SVM) Learning in Cancer Genomics.

TL;DR: The recent progress of SVMs in cancer genomic studies is reviewed and the strength of the SVM learning and its future perspective incancer genomic applications is comprehended.
Posted Content

Structured Variable Selection with Sparsity-Inducing Norms

TL;DR: In this paper, the authors consider the empirical risk minimization problem for linear supervised learning, with regularization by structured sparsityinducing norms, defined as sums of Euclidean norms on certain subsets of variables.
References
More filters
Journal ArticleDOI

Regression Shrinkage and Selection via the Lasso

TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.
Journal ArticleDOI

Support-Vector Networks

TL;DR: High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated and the performance of the support- vector network is compared to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Book

Convex Optimization

TL;DR: In this article, the focus is on recognizing convex optimization problems and then finding the most appropriate technique for solving them, and a comprehensive introduction to the subject is given. But the focus of this book is not on the optimization problem itself, but on the problem of finding the appropriate technique to solve it.
Journal ArticleDOI

The hallmarks of cancer.

TL;DR: This work has been supported by the Department of the Army and the National Institutes of Health, and the author acknowledges the support and encouragement of the National Cancer Institute.

Statistical learning theory

TL;DR: Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.
Related Papers (5)