scispace - formally typeset
Search or ask a question
Author

Jianfeng Xu

Bio: Jianfeng Xu is an academic researcher from Virginia Tech. The author has contributed to research in topics: Copy number analysis. The author has an hindex of 1, co-authored 1 publications receiving 37 citations.

Papers
More filters
Journal ArticleDOI
Guoqiang Yu1, Bai Zhang1, G. Steven Bova1, Jianfeng Xu1, Ie-Ming Shih1, Yue Joseph Wang1 
TL;DR: A statistically principled in silico approach, Bayesian Analysis of COpy number Mixtures (BACOM), to accurately estimate genomic deletion type and normal tissue contamination, and accordingly recover the true copy number profile in cancer cells is reported.
Abstract: Motivation: Identification of somatic DNA copy number alterations (CNAs) and significant consensus events (SCEs) in cancer genomes is a main task in discovering potential cancer-driving genes such as oncogenes and tumor suppressors. The recent development of SNP array technology has facilitated studies on copy number changes at a genome-wide scale with high resolution. However, existing copy number analysis methods are oblivious to normal cell contamination and cannot distinguish between contributions of cancerous and normal cells to the measured copy number signals. This contamination could significantly confound downstream analysis of CNAs and affect the power to detect SCEs in clinical samples. Results: We report here a statistically principled in silico approach, Bayesian Analysis of COpy number Mixtures (BACOM), to accurately estimate genomic deletion type and normal tissue contamination, and accordingly recover the true copy number profile in cancer cells. We tested the proposed method on two simulated datasets, two prostate cancer datasets and The Cancer Genome Atlas high-grade ovarian dataset, and obtained very promising results supported by the ground truth and biological plausibility. Moreover, based on a large number of comparative simulation studies, the proposed method gives significantly improved power to detect SCEs after in silico correction of normal tissue contamination. We develop a cross-platform open-source Java application that implements the whole pipeline of copy number analysis of heterogeneous cancer tissues including relevant processing steps. We also provide an R interface, bacomR, for running BACOM within the R environment, making it straightforward to include in existing data pipelines. Availability: The cross-platform, stand-alone Java application, BACOM, the R interface, bacomR, all source code and the simulation data used in this article are freely available at authors' web site: http://www.cbil.ece.vt.edu/software.htm. Contact: ude.tv@gnaweuy Supplementary Information: Supplementary data are available at Bioinformatics online.

40 citations


Cited by
More filters
Journal ArticleDOI
28 Jul 2016-Cell
TL;DR: A view of how the somatic genome drives the cancer proteome and associations between protein and post-translational modification levels and clinical outcomes in HGSC is provided.

728 citations

Journal ArticleDOI
TL;DR: A predictor for survival in estrogen receptor–negative breast cancer that integrated both image-based and gene expression analyses and significantly outperformed classifiers that use single data types, such as microarray expression signatures is devised.
Abstract: Solid tumors are heterogeneous tissues composed of a mixture of cancer and normal cells, which complicates the interpretation of their molecular profiles. Furthermore, tissue architecture is generally not reflected in molecular assays, rendering this rich information underused. To address these challenges, we developed a computational approach based on standard hematoxylin and eosin-stained tissue sections and demonstrated its power in a discovery and validation cohort of 323 and 241 breast tumors, respectively. To deconvolute cellular heterogeneity and detect subtle genomic aberrations, we introduced an algorithm based on tumor cellularity to increase the comparability of copy number profiles between samples. We next devised a predictor for survival in estrogen receptor-negative breast cancer that integrated both image-based and gene expression analyses and significantly outperformed classifiers that use single data types, such as microarray expression signatures. Image processing also allowed us to describe and validate an independent prognostic factor based on quantitative analysis of spatial patterns between stromal cells, which are not detectable by molecular assays. Our quantitative, image-based method could benefit any large-scale cancer study by refining and complementing molecular assays of tumor samples.

368 citations

01 Jan 2012
TL;DR: Yuan et al. as discussed by the authors developed a computational approach based on standard hematoxylin and eosin-stained tissue sections and demonstrated its power in a discovery and validation cohort of 323 and 241 breast tumors, respectively.
Abstract: Image analysis of breast cancer tissue improves and complements genomic data to predict patient survival. Digitizing Pathology for Genomics The tumor microenvironment is a complex milieu that includes not only the cancer cells but also the stromal cells, immune cells, and even normal, healthy cells. Molecular analysis of tumor tissue is therefore a challenging task because all this “extra” genomic information can muddle the results. Conversely, biopsy tissue staining can provide a spatial and cellular readout (architecture and content), but it is mostly qualitative information. In response, Yuan and colleagues have developed a quantitative, computational approach to pathology. When combined with molecular analyses, the authors were able to uncover new knowledge about breast tumor biology and, in turn, predict patient survival. Yuan et al. first collected histopathology images, gene expression data, and DNA copy number variation data for 564 breast cancer patients. Using a portion of the images (the “discovery set”), they developed an image processing approach that automatically classified cells as cancer, lymphocyte, or stroma on the basis of their size and shape. This approach was validated on the remaining samples, and any errors in this analysis were digitally corrected before obtaining a plot of tumor cellular heterogeneity. With exact knowledge of the tumor’s cellular composition, the authors were able to correct copy number data to more accurately reflect HER2 status compared with uncorrected data. Yuan and colleagues combined their digital pathology with genomic information to devise an integrated predictor of survival for estrogen receptor (ER)–negative patients. Higher number of infiltrating lymphocytes (immune cells) as quantified by their image analysis platform were found in a subset of patients with better clinical outcome than the rest of ER-negative patients, and this outcome difference was significantly enhanced with the addition of gene expression. The quantitative and objective nature of this integrated predictor could benefit diagnosis and prognosis in many areas of cancer by using the rich combination of tumor cellular content and genomic data. Solid tumors are heterogeneous tissues composed of a mixture of cancer and normal cells, which complicates the interpretation of their molecular profiles. Furthermore, tissue architecture is generally not reflected in molecular assays, rendering this rich information underused. To address these challenges, we developed a computational approach based on standard hematoxylin and eosin–stained tissue sections and demonstrated its power in a discovery and validation cohort of 323 and 241 breast tumors, respectively. To deconvolute cellular heterogeneity and detect subtle genomic aberrations, we introduced an algorithm based on tumor cellularity to increase the comparability of copy number profiles between samples. We next devised a predictor for survival in estrogen receptor–negative breast cancer that integrated both image-based and gene expression analyses and significantly outperformed classifiers that use single data types, such as microarray expression signatures. Image processing also allowed us to describe and validate an independent prognostic factor based on quantitative analysis of spatial patterns between stromal cells, which are not detectable by molecular assays. Our quantitative, image-based method could benefit any large-scale cancer study by refining and complementing molecular assays of tumor samples.

286 citations

01 Jun 2016
TL;DR: In this article, a detailed analysis of the molecular components and underlying mechanisms associated with ovarian cancer was provided, such as how different copy-number alterna-tions in the Proteome, the proteins associated with chromosomal instability, the sets of signalingpathways that diverse genome rearrangements converge on, and the ones associated with short overall survival.
Abstract: To provide a detailed analysis of the molecular com-ponents and underlying mechanisms associatedwith ovarian cancer, we performed a comprehensivemass-spectrometry-based proteomic characteriza-tion of 174 ovarian tumors previously analyzed byThe Cancer Genome Atlas (TCGA), of which 169were high-grade serous carcinomas (HGSCs). Inte-grating our proteomic measurements with thegenomic data yielded a number of insights into dis-ease, such as how different copy-number alterna-tionsinfluencetheproteome,theproteinsassociatedwith chromosomal instability, the sets of signalingpathways that diverse genome rearrangementsconverge on, and the ones most associated withshort overall survival. Specific protein acetylationsassociated with homologous recombination defi-ciency suggest a potential means for stratifying pa-tients for therapy. In addition to providing a valuableresource,thesefindingsprovideaviewofhowtheso-maticgenomedrivesthecancerproteomeandasso-ciations between protein and post-translationalmodification levels and clinical outcomes in HGSC.

160 citations

Journal ArticleDOI
TL;DR: DBS (Deviation Binary Segmentation) is implemented in a platform-independent and open-source Java application (ToolSeg), including a graphical user interface and simulation data generation, as well as various segmentation methods in the native Java language.
Abstract: Genome-wide DNA copy number changes are the hallmark events in the initiation and progression of cancers. Quantitative analysis of somatic copy number alterations (CNAs) has broad applications in cancer research. With the increasing capacity of high-throughput sequencing technologies, fast and efficient segmentation algorithms are required when characterizing high density CNAs data. A fast and informative segmentation algorithm, DBS (Deviation Binary Segmentation), is developed and discussed. The DBS method is based on the least absolute error principles and is inspired by the segmentation method rooted in the circular binary segmentation procedure. DBS uses point-by-point model calculation to ensure the accuracy of segmentation and combines a binary search algorithm with heuristics derived from the Central Limit Theorem. The DBS algorithm is very efficient requiring a computational complexity of O(n*log n), and is faster than its predecessors. Moreover, DBS measures the change-point amplitude of mean values of two adjacent segments at a breakpoint, where the significant degree of change-point amplitude is determined by the weighted average deviation at breakpoints. Accordingly, using the constructed binary tree of significant degree, DBS informs whether the results of segmentation are over- or under-segmented. DBS is implemented in a platform-independent and open-source Java application (ToolSeg), including a graphical user interface and simulation data generation, as well as various segmentation methods in the native Java language.

113 citations