A comparision between methods for generating differentially expressed genes from microarray data for prediction of disease

doi:10.1109/C3IT.2015.7060148

Citations

PDF

Open Access

More filters

Book Chapter•DOI•

Incremental Wrapper Based Random Forest Gene Subset Selection for Tumor Discernment

[...]

Alia Fatima¹, Usman Qamar¹, Saad Rehman¹, Aiman Khan Nazir¹•Institutions (1)

University of the Sciences¹

03 Sep 2018

TL;DR: The Incremental Wrapper based Random Forest Gene Subset Selection of Tumor discernment that mechanisms on the principle of incremental wrapper based feature subset selection with random forest classification algorithm and this algorithm also works as performance validator are presented.

...read moreread less

Abstract: High-dimensional cancer related dataset permits the researchers to timely diagnose and facilitate in effective treatment of the cancer. Biomedicine application process on the thousands of features. It is challenging to extract the precise statistics from this high-dimensional dataset. This paper presents the Incremental Wrapper based Random Forest Gene Subset Selection of Tumor discernment that mechanisms on the principle of incremental wrapper based feature subset selection with random forest classification algorithm and this algorithm also works as performance validator. Incremental wrapper based feature subset selection is a technique to pick out a finest conceivable subset of genes from the high-dimensional data with low computational cost. Random Forest will increase the overall performance as it works better in cancer related high-dimensional dataset. The efficacy of the random forest classification algorithm as performance validator will significantly improve by working on a selective discriminative subset of prognostic genes as compare to the raw data. We evaluate the proposed methodology on the six publicly available cancer related high dimensional datasets and found that the proposed methodology outperform as compare to standard random forests.

...read moreread less

References

PDF

Open Access

More filters

Journal Article•DOI•

Random Forests

[...]

Leo Breiman¹•Institutions (1)

University of California, Berkeley¹

01 Oct 2001

TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.

...read moreread less

Abstract: Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, aaa, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.

...read moreread less

79,257 citations

Journal Article•DOI•

Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.

[...]

Todd R. Golub¹, Todd R. Golub², Donna K. Slonim¹, Pablo Tamayo¹, Christine Huard¹, Michelle Gaasenbeek¹, Jill P. Mesirov¹, Hilary A. Coller¹, Mignon L. Loh², James R. Downing³, Michael A. Caligiuri⁴, Clara D. Bloomfield⁴, Eric S. Lander¹ - Show less +9 more•Institutions (4)

Massachusetts Institute of Technology¹, Harvard University², St. Jude Children's Research Hospital³, Ohio State University⁴

15 Oct 1999-Science

TL;DR: A generic approach to cancer classification based on gene expression monitoring by DNA microarrays is described and applied to human acute leukemias as a test case and suggests a general strategy for discovering and predicting cancer classes for other types of cancer, independent of previous biological knowledge.

...read moreread less

Abstract: Although cancer classification has improved over the past 30 years, there has been no general approach for identifying new cancer classes (class discovery) or for assigning tumors to known classes (class prediction). Here, a generic approach to cancer classification based on gene expression monitoring by DNA microarrays is described and applied to human acute leukemias as a test case. A class discovery procedure automatically discovered the distinction between acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL) without previous knowledge of these classes. An automatically derived class predictor was able to determine the class of new leukemia cases. The results demonstrate the feasibility of cancer classification based solely on gene expression monitoring and suggest a general strategy for discovering and predicting cancer classes for other types of cancer, independent of previous biological knowledge.

...read moreread less

12,530 citations

"A comparision between methods for g..." refers background or methods in this paper

...The majority number votes that an object gets from its neighbors, is used to classify a particular object....
[...]
...The first dataset is the famous Leukemia Dataset which had been initially used by Golub et al (1999) [17]....
[...]

Journal Article•DOI•

Gene Selection for Cancer Classification using Support Vector Machines

[...]

Isabelle Guyon, Jason Weston, Stephen Barnhill, Vladimir Vapnik¹•Institutions (1)

AT&T Labs¹

11 Mar 2002-Machine Learning

TL;DR: In this article, a Support Vector Machine (SVM) method based on recursive feature elimination (RFE) was proposed to select a small subset of genes from broad patterns of gene expression data, recorded on DNA micro-arrays.

...read moreread less

Abstract: DNA micro-arrays now permit scientists to screen thousands of genes simultaneously and determine whether those genes are active, hyperactive or silent in normal or cancerous tissue. Because these new micro-array devices generate bewildering amounts of raw data, new analytical methods must be developed to sort out whether cancer tissues have distinctive signatures of gene expression over normal tissues or other types of cancer tissues. In this paper, we address the problem of selection of a small subset of genes from broad patterns of gene expression data, recorded on DNA micro-arrays. Using available training examples from cancer and normal patients, we build a classifier suitable for genetic diagnosis, as well as drug discovery. Previous attempts to address this problem select genes with correlation techniques. We propose a new method of gene selection utilizing Support Vector Machine methods based on Recursive Feature Elimination (RFE). We demonstrate experimentally that the genes selected by our techniques yield better classification performance and are biologically relevant to cancer. In contrast with the baseline method, our method eliminates gene redundancy automatically and yields better and more compact gene subsets. In patients with leukemia our method discovered 2 genes that yield zero leave-one-out error, while 64 genes are necessary for the baseline method to get the best result (one leave-one-out error). In the colon cancer database, using only 4 genes our method is 98% accurate, while the baseline method is only 86% accurate.

...read moreread less

7,939 citations

Posted Content•DOI•

Making large scale SVM learning practical

[...]

Thorsten Joachims

29 Oct 1999-Technical reports

TL;DR: SVM light as discussed by the authors is an implementation of an SVM learner which addresses the problem of large-scale SVM training with many training examples on the shelf, which makes large scale SVM learning more practical.

...read moreread less

Abstract: Training a support vector machine SVM leads to a quadratic optimization problem with bound constraints and one linear equality constraint Despite the fact that this type of problem is well understood, there are many issues to be considered in designing an SVM learner In particular, for large learning tasks with many training examples on the shelf optimization techniques for general quadratic programs quickly become intractable in their memory and time requirements SVM light is an implementation of an SVM learner which addresses the problem of large tasks This chapter presents algorithmic and computational results developed for SVM light V 20, which make large-scale SVM training more practical The results give guidelines for the application of SVMs to large domains

...read moreread less

4,145 citations

Journal Article•DOI•

Use of a cDNA microarray to analyse gene expression patterns in human cancer.

[...]

Joseph L. DeRisi¹, Lolita Penland¹, Patrick O. Brown¹, M. L. Bittner¹, P. S. Meltzer¹, M. Ray¹, Yi Chen¹, Y. A. Su¹, J. M. Trent¹ - Show less +5 more•Institutions (1)

Stanford University¹

01 Dec 1996-Nature Genetics

TL;DR: Previously unrecognized alterations in the expression of specific genes provide leads for further investigation of the genetic basis of the tumorigenic phenotype of these cells.

...read moreread less

Abstract: The development and progression of cancer and the experimental reversal of tumorigenicity are accompanied by complex changes in patterns of gene expression. Microarrays of cDNA provide a powerful tool for studying these complex phenomena. The tumorigenic properties of a human melanoma cell line, UACC-903, can be suppressed by introduction of a normal human chromosome 6, resulting in a reduction of growth rate, restoration of contact inhibition, and suppression of both soft agar clonogenicity and tumorigenicity in nude mice. We used a high density microarray of 1,161 DNA elements to search for differences in gene expression associated with tumour suppression in this system. Fluorescent probes for hybridization were derived from two sources of cellular mRNA [UACC-903 and UACC-903(+6)] which were labelled with different fluors to provide a direct and internally controlled comparison of the mRNA levels corresponding to each arrayed gene. The fluorescence signals representing hybridization to each arrayed gene were analysed to determine the relative abundance in the two samples of mRNAs corresponding to each gene. Previously unrecognized alterations in the expression of specific genes provide leads for further investigation of the genetic basis of the tumorigenic phenotype of these cells.

...read moreread less

2,242 citations

"A comparision between methods for g..." refers background in this paper

...I. INTRODUCTION One of the important problems in extracting and analyzing information from gene-expression data is the association of high-dimensionality....
[...]
...Generally, the 2-sample t-statistics can to some extent measure the difference in the distributions between the different groups....
[...]

A comparision between methods for generating differentially expressed genes from microarray data for prediction of disease

Citations

References

"A comparision between methods for g..." refers background or methods in this paper

"A comparision between methods for g..." refers background in this paper

Related Papers (5)