scispace - formally typeset
Search or ask a question

Showing papers by "Charles DeLisi published in 2012"


Journal ArticleDOI
TL;DR: This work critically review these components of plasticity, suggest best practices for carrying out each step, and proposes a voting method for meeting the challenge of assessing different methods on a large number of experimental data sets in the absence of a gold standard.
Abstract: A central goal of biology is understanding and describing the molecular basis of plasticity: the sets of genes that are combinatorially selected by exogenous and endogenous environmental changes, and the relations among the genes. The most viable current approach to this problem consists of determining whether sets of genes are connected by some common theme, e.g. genes from the same pathway are overrepresented among those whose differential expression in response to a perturbation is most pronounced. There are many approaches to this problem, and the results they produce show a fair amount of dispersion, but they all fall within a common framework consisting of a few basic components. We critically review these components, suggest best practices for carrying out each step, and propose a voting method for meeting the challenge of assessing different methods on a large number of experimental data sets in the absence of a gold standard.

218 citations


Journal ArticleDOI
TL;DR: A standardized method for representing cancer markers as 2-level hierarchical feature vectors, with a basic gene level as well as a second level of (more stable) pathway markers, for the purpose of discriminating cancer subtypes, and it is expected that identification of such canonical biomarkers will improve clinical utility of high-throughput datasets for diagnostic and prognostic applications.
Abstract: Background Molecular markers based on gene expression profiles have been used in experimental and clinical settings to distinguish cancerous tumors in stage, grade, survival time, metastasis, and drug sensitivity. However, most significant gene markers are unstable (not reproducible) among data sets. We introduce a standardized method for representing cancer markers as 2-level hierarchical feature vectors, with a basic gene level as well as a second level of (more stable) pathway markers, for the purpose of discriminating cancer subtypes. This extends standard gene expression arrays with new pathway-level activation features obtained directly from off-the-shelf gene set enrichment algorithms such as GSEA. Such so-called pathway-based expression arrays are significantly more reproducible across datasets. Such reproducibility will be important for clinical usefulness of genomic markers, and augment currently accepted cancer classification protocols.

104 citations


Journal ArticleDOI
TL;DR: A method for identifying repositioned drug candidates against breast cancer, myelogenous leukemia and prostate cancer by looking for inverse correlations between the most perturbed gene expression levels in human cancer tissue and themost perturbed expression levels induced by bioactive compounds is developed and applied.
Abstract: The cost and time to develop a drug continues to be a major barrier to widespread distribution of medication. Although the genomic revolution appears to have had little impact on this problem, and might even have exacerbated it because of the flood of additional and usually ineffective leads, the emergence of high throughput resources promises the possibility of rapid, reliable and systematic identification of approved drugs for originally unintended uses. In this paper we develop and apply a method for identifying such repositioned drug candidates against breast cancer, myelogenous leukemia and prostate cancer by looking for inverse correlations between the most perturbed gene expression levels in human cancer tissue and the most perturbed expression levels induced by bioactive compounds. The method uses variable gene signatures to identify bioactive compounds that modulate a given disease. This is in contrast to previous methods that use small and fixed signatures. This strategy is based on the observation that diseases stem from failed/modified cellular functions, irrespective of the particular genes that contribute to the function, i.e., this strategy targets the functional signatures for a given cancer. This function-based strategy broadens the search space for the effective drugs with an impressive hit rate. Among the 79, 94 and 88 candidate drugs for breast cancer, myelogenous leukemia and prostate cancer, 32%, 13% and 17% respectively are either FDA-approved/in-clinical-trial drugs, or drugs with suggestive literature evidences, with an FDR of 0.01. These findings indicate that the method presented here could lead to a substantial increase in efficiency in drug discovery and development, and has potential application for the personalized medicine.

68 citations


Journal ArticleDOI
TL;DR: The National Center for Integrative and Biomedical Informatics (NCIBI) supports information access and data analysis for biomedical researchers, enabling them to build computational and knowledge models of biological systems to address the Driving Biological Problems (DBPs).

17 citations


Journal ArticleDOI
TL;DR: The activity of TBP with poly-T stretches is reported for the first time to the authors' knowledge by presenting an elegant stepwise analysis of multiple techniques: discovery by a novel quantitative detection of microarrays, confirmation by a traditional gel electrophoresis, and a full genome prediction with computational analyses.

16 citations


Journal ArticleDOI
TL;DR: The findings demonstrate that combining co-expression analysis on regulatee sets with a literature-derived network can successfully identify causal regulators and help develop possible hypothesis to explain disease progression.
Abstract: Identification of active causal regulators is a crucial problem in understanding mechanism of diseases or finding drug targets. Methods that infer causal regulators directly from primary data have been proposed and successfully validated in some cases. These methods necessarily require very large sample sizes or a mix of different data types. Recent studies have shown that prior biological knowledge can successfully boost a method's ability to find regulators. We present a simple data-driven method, Correlation Set Analysis (CSA), for comprehensively detecting active regulators in disease populations by integrating co-expression analysis and a specific type of literature-derived causal relationships. Instead of investigating the co-expression level between regulators and their regulatees, we focus on coherence of regulatees of a regulator. Using simulated datasets we show that our method performs very well at recovering even weak regulatory relationships with a low false discovery rate. Using three separate real biological datasets we were able to recover well known and as yet undescribed, active regulators for each disease population. The results are represented as a rank-ordered list of regulators, and reveals both single and higher-order regulatory relationships. CSA is an intuitive data-driven way of selecting directed perturbation experiments that are relevant to a disease population of interest and represent a starting point for further investigation. Our findings demonstrate that combining co-expression analysis on regulatee sets with a literature-derived network can successfully identify causal regulators and help develop possible hypothesis to explain disease progression.

12 citations


BookDOI
29 Nov 2012
TL;DR: This volume surveys and demonstrates the science and technology of converting an unprecedented data deluge to new knowledge and biological insight, and seeks to aid researchers in the further development of databases, mining and visualization systems that are central to the paradigm altering discoveries being made with increasing frequency.
Abstract: The post-genomic revolution is witnessing the generation of petabytes of data annually, with deep implications ranging across evolutionary theory, developmental biology, agriculture, and disease processes. Data Mining for Systems Biology: Methods and Protocols, surveys and demonstrates the science and technology of converting an unprecedented data deluge to new knowledge and biological insight. The volume is organized around two overlapping themes, network inference and functional inference. Written in the highly successful Methods in Molecular Biology series format, chapters include introductions to their respective topics, lists of the necessary materials and reagents, step-by-step, readily reproducible protocols, and key tips on troubleshooting and avoiding known pitfalls. Authoritative and practical, Data Mining for Systems Biology: Methods and Protocols also seeks to aid researchers in the further development of databases, mining and visualization systems that are central to the paradigm altering discoveries being made with increasing frequency.

3 citations