scispace - formally typeset
Search or ask a question
Journal ArticleDOI

BACOM: in silico detection of genomic deletion types and correction of normal cell contamination in copy number data.

Guoqiang Yu1, Bai Zhang1, G. Steven Bova1, Jianfeng Xu1, Ie-Ming Shih1, Yue Joseph Wang1 
01 Jun 2011-Bioinformatics (Oxford University Press)-Vol. 27, Iss: 11, pp 1473-1480
TL;DR: A statistically principled in silico approach, Bayesian Analysis of COpy number Mixtures (BACOM), to accurately estimate genomic deletion type and normal tissue contamination, and accordingly recover the true copy number profile in cancer cells is reported.
Abstract: Motivation: Identification of somatic DNA copy number alterations (CNAs) and significant consensus events (SCEs) in cancer genomes is a main task in discovering potential cancer-driving genes such as oncogenes and tumor suppressors. The recent development of SNP array technology has facilitated studies on copy number changes at a genome-wide scale with high resolution. However, existing copy number analysis methods are oblivious to normal cell contamination and cannot distinguish between contributions of cancerous and normal cells to the measured copy number signals. This contamination could significantly confound downstream analysis of CNAs and affect the power to detect SCEs in clinical samples. Results: We report here a statistically principled in silico approach, Bayesian Analysis of COpy number Mixtures (BACOM), to accurately estimate genomic deletion type and normal tissue contamination, and accordingly recover the true copy number profile in cancer cells. We tested the proposed method on two simulated datasets, two prostate cancer datasets and The Cancer Genome Atlas high-grade ovarian dataset, and obtained very promising results supported by the ground truth and biological plausibility. Moreover, based on a large number of comparative simulation studies, the proposed method gives significantly improved power to detect SCEs after in silico correction of normal tissue contamination. We develop a cross-platform open-source Java application that implements the whole pipeline of copy number analysis of heterogeneous cancer tissues including relevant processing steps. We also provide an R interface, bacomR, for running BACOM within the R environment, making it straightforward to include in existing data pipelines. Availability: The cross-platform, stand-alone Java application, BACOM, the R interface, bacomR, all source code and the simulation data used in this article are freely available at authors' web site: http://www.cbil.ece.vt.edu/software.htm. Contact: ude.tv@gnaweuy Supplementary Information: Supplementary data are available at Bioinformatics online.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
28 Jul 2016-Cell
TL;DR: A view of how the somatic genome drives the cancer proteome and associations between protein and post-translational modification levels and clinical outcomes in HGSC is provided.

728 citations


Cites methods from "BACOM: in silico detection of genom..."

  • ...…survival of >1.5 years, while non-HRD patients were defined as lacking these genomic aberrations, with a follow-up or time to death of <2.5 years; additional selection criteria include available residual tissue volume and a tumor tissue contamination score estimated using CNAs (Yu et al., 2011)....

    [...]

  • ...5 years; additional selection criteria include available residual tissue volume and a tumor tissue contamination score estimated using CNAs (Yu et al., 2011)....

    [...]

Journal ArticleDOI
TL;DR: A predictor for survival in estrogen receptor–negative breast cancer that integrated both image-based and gene expression analyses and significantly outperformed classifiers that use single data types, such as microarray expression signatures is devised.
Abstract: Solid tumors are heterogeneous tissues composed of a mixture of cancer and normal cells, which complicates the interpretation of their molecular profiles. Furthermore, tissue architecture is generally not reflected in molecular assays, rendering this rich information underused. To address these challenges, we developed a computational approach based on standard hematoxylin and eosin-stained tissue sections and demonstrated its power in a discovery and validation cohort of 323 and 241 breast tumors, respectively. To deconvolute cellular heterogeneity and detect subtle genomic aberrations, we introduced an algorithm based on tumor cellularity to increase the comparability of copy number profiles between samples. We next devised a predictor for survival in estrogen receptor-negative breast cancer that integrated both image-based and gene expression analyses and significantly outperformed classifiers that use single data types, such as microarray expression signatures. Image processing also allowed us to describe and validate an independent prognostic factor based on quantitative analysis of spatial patterns between stromal cells, which are not detectable by molecular assays. Our quantitative, image-based method could benefit any large-scale cancer study by refining and complementing molecular assays of tumor samples.

368 citations


Cites background from "BACOM: in silico detection of genom..."

  • ...Quantitative tumor cellularity estimates correct cancer copy number profiles Existing statistical methods estimate tumor cellularity indirectly from molecular data by assuming independence of allele-specific signals, identification of discrete copy number events (17), or equality of copy number in different clones, which can often be unrealistic....

    [...]

01 Jan 2012
TL;DR: Yuan et al. as discussed by the authors developed a computational approach based on standard hematoxylin and eosin-stained tissue sections and demonstrated its power in a discovery and validation cohort of 323 and 241 breast tumors, respectively.
Abstract: Image analysis of breast cancer tissue improves and complements genomic data to predict patient survival. Digitizing Pathology for Genomics The tumor microenvironment is a complex milieu that includes not only the cancer cells but also the stromal cells, immune cells, and even normal, healthy cells. Molecular analysis of tumor tissue is therefore a challenging task because all this “extra” genomic information can muddle the results. Conversely, biopsy tissue staining can provide a spatial and cellular readout (architecture and content), but it is mostly qualitative information. In response, Yuan and colleagues have developed a quantitative, computational approach to pathology. When combined with molecular analyses, the authors were able to uncover new knowledge about breast tumor biology and, in turn, predict patient survival. Yuan et al. first collected histopathology images, gene expression data, and DNA copy number variation data for 564 breast cancer patients. Using a portion of the images (the “discovery set”), they developed an image processing approach that automatically classified cells as cancer, lymphocyte, or stroma on the basis of their size and shape. This approach was validated on the remaining samples, and any errors in this analysis were digitally corrected before obtaining a plot of tumor cellular heterogeneity. With exact knowledge of the tumor’s cellular composition, the authors were able to correct copy number data to more accurately reflect HER2 status compared with uncorrected data. Yuan and colleagues combined their digital pathology with genomic information to devise an integrated predictor of survival for estrogen receptor (ER)–negative patients. Higher number of infiltrating lymphocytes (immune cells) as quantified by their image analysis platform were found in a subset of patients with better clinical outcome than the rest of ER-negative patients, and this outcome difference was significantly enhanced with the addition of gene expression. The quantitative and objective nature of this integrated predictor could benefit diagnosis and prognosis in many areas of cancer by using the rich combination of tumor cellular content and genomic data. Solid tumors are heterogeneous tissues composed of a mixture of cancer and normal cells, which complicates the interpretation of their molecular profiles. Furthermore, tissue architecture is generally not reflected in molecular assays, rendering this rich information underused. To address these challenges, we developed a computational approach based on standard hematoxylin and eosin–stained tissue sections and demonstrated its power in a discovery and validation cohort of 323 and 241 breast tumors, respectively. To deconvolute cellular heterogeneity and detect subtle genomic aberrations, we introduced an algorithm based on tumor cellularity to increase the comparability of copy number profiles between samples. We next devised a predictor for survival in estrogen receptor–negative breast cancer that integrated both image-based and gene expression analyses and significantly outperformed classifiers that use single data types, such as microarray expression signatures. Image processing also allowed us to describe and validate an independent prognostic factor based on quantitative analysis of spatial patterns between stromal cells, which are not detectable by molecular assays. Our quantitative, image-based method could benefit any large-scale cancer study by refining and complementing molecular assays of tumor samples.

286 citations

01 Jun 2016
TL;DR: In this article, a detailed analysis of the molecular components and underlying mechanisms associated with ovarian cancer was provided, such as how different copy-number alterna-tions in the Proteome, the proteins associated with chromosomal instability, the sets of signalingpathways that diverse genome rearrangements converge on, and the ones associated with short overall survival.
Abstract: To provide a detailed analysis of the molecular com-ponents and underlying mechanisms associatedwith ovarian cancer, we performed a comprehensivemass-spectrometry-based proteomic characteriza-tion of 174 ovarian tumors previously analyzed byThe Cancer Genome Atlas (TCGA), of which 169were high-grade serous carcinomas (HGSCs). Inte-grating our proteomic measurements with thegenomic data yielded a number of insights into dis-ease, such as how different copy-number alterna-tionsinfluencetheproteome,theproteinsassociatedwith chromosomal instability, the sets of signalingpathways that diverse genome rearrangementsconverge on, and the ones most associated withshort overall survival. Specific protein acetylationsassociated with homologous recombination defi-ciency suggest a potential means for stratifying pa-tients for therapy. In addition to providing a valuableresource,thesefindingsprovideaviewofhowtheso-maticgenomedrivesthecancerproteomeandasso-ciations between protein and post-translationalmodification levels and clinical outcomes in HGSC.

160 citations

Journal ArticleDOI
TL;DR: DBS (Deviation Binary Segmentation) is implemented in a platform-independent and open-source Java application (ToolSeg), including a graphical user interface and simulation data generation, as well as various segmentation methods in the native Java language.
Abstract: Genome-wide DNA copy number changes are the hallmark events in the initiation and progression of cancers. Quantitative analysis of somatic copy number alterations (CNAs) has broad applications in cancer research. With the increasing capacity of high-throughput sequencing technologies, fast and efficient segmentation algorithms are required when characterizing high density CNAs data. A fast and informative segmentation algorithm, DBS (Deviation Binary Segmentation), is developed and discussed. The DBS method is based on the least absolute error principles and is inspired by the segmentation method rooted in the circular binary segmentation procedure. DBS uses point-by-point model calculation to ensure the accuracy of segmentation and combines a binary search algorithm with heuristics derived from the Central Limit Theorem. The DBS algorithm is very efficient requiring a computational complexity of O(n*log n), and is faster than its predecessors. Moreover, DBS measures the change-point amplitude of mean values of two adjacent segments at a breakpoint, where the significant degree of change-point amplitude is determined by the weighted average deviation at breakpoints. Accordingly, using the constructed binary tree of significant degree, DBS informs whether the results of segmentation are over- or under-segmented. DBS is implemented in a platform-independent and open-source Java application (ToolSeg), including a graphical user interface and simulation data generation, as well as various segmentation methods in the native Java language.

113 citations


Cites methods from "BACOM: in silico detection of genom..."

  • ...Firstly, the criterion of segmentation use Eqn (6) in BACOM, however Eqn (9) is the criterion in DBS....

    [...]

  • ...Then, in this case, the algorithm framework of DBS is fully equivalent to one in BACOM, which is similar to the idea used in Circular Binary Segmentation procedure....

    [...]

  • ...With default preprocessing of the data, on average, DBS is about 5 times faster than PCF, is about 15 times than CBS, and is about 23 times than the method in BACOM....

    [...]

  • ...Using default parameter settings, we compared the computing times of DBS, CBS, PCF and the method in BACOM on the 10 samples in the 26 K simulation data set, on 10 samples in 160 K simulation data set, and on 10 samples from Affymetrix SNP 6.0 Array....

    [...]

  • ...Fu Y, Yu G, Levine DA, Wang N, Shih Ie M, Zhang Z, Clarke R, Wang Y. BACOM2....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: In this paper, a different approach to problems of multiple significance testing is presented, which calls for controlling the expected proportion of falsely rejected hypotheses -the false discovery rate, which is equivalent to the FWER when all hypotheses are true but is smaller otherwise.
Abstract: SUMMARY The common approach to the multiplicity problem calls for controlling the familywise error rate (FWER). This approach, though, has faults, and we point out a few. A different approach to problems of multiple significance testing is presented. It calls for controlling the expected proportion of falsely rejected hypotheses -the false discovery rate. This error rate is equivalent to the FWER when all hypotheses are true but is smaller otherwise. Therefore, in problems where the control of the false discovery rate rather than that of the FWER is desired, there is potential for a gain in power. A simple sequential Bonferronitype procedure is proved to control the false discovery rate for independent test statistics, and a simulation study shows that the gain in power is substantial. The use of the new procedure and the appropriateness of the criterion are illustrated with examples.

83,420 citations

Journal ArticleDOI
TL;DR: A modification ofbinary segmentation is developed, which is called circular binary segmentation, to translate noisy intensity measurements into regions of equal copy number in DNA sequence copy number.
Abstract: DNA sequence copy number is the number of copies of DNA at a region of a genome. Cancer progression often involves alterations in DNA copy number. Newly developed microarray technologies enable simultaneous measurement of copy number at thousands of sites in a genome. We have developed a modification of binary segmentation, which we call circular binary segmentation, to translate noisy intensity measurements into regions of equal copy number. The method is evaluated by simulation and is demonstrated on cell line data with known copy number alterations and on a breast cancer cell line data set.

2,269 citations

Journal ArticleDOI
TL;DR: Evidence is provided that widespread DNA copy number alteration can lead directly to global deregulation of gene expression, which may contribute to the development or progression of cancer.
Abstract: Genomic DNA copy number alterations are key genetic events in the development and progression of human cancers. Here we report a genome-wide microarray comparative genomic hybridization (array CGH) analysis of DNA copy number variation in a series of primary human breast tumors. We have profiled DNA copy number alteration across 6,691 mapped human genes, in 44 predominantly advanced, primary breast tumors and 10 breast cancer cell lines. While the overall patterns of DNA amplification and deletion corroborate previous cytogenetic studies, the high-resolution (gene-by-gene) mapping of amplicon boundaries and the quantitative analysis of amplicon shape provide significant improvement in the localization of candidate oncogenes. Parallel microarray measurements of mRNA levels reveal the remarkable degree to which variation in gene copy number contributes to variation in gene expression in tumor cells. Specifically, we find that 62% of highly amplified genes show moderately or highly elevated expression, that DNA copy number influences gene expression across a wide range of DNA copy number alterations (deletion, low-, mid- and high-level amplification), that on average, a 2-fold change in DNA copy number is associated with a corresponding 1.5-fold change in mRNA levels, and that overall, at least 12% of all the variation in gene expression among the breast tumors is directly attributable to underlying variation in gene copy number. These findings provide evidence that widespread DNA copy number alteration can lead directly to global deregulation of gene expression, which may contribute to the development or progression of cancer.

1,258 citations

Journal ArticleDOI
TL;DR: A systematic method, called Genomic Identification of Significant Targets in Cancer (GISTIC), designed for analyzing chromosomal aberrations in cancer, is used to study gliomas and the results support the feasibility and utility of systematic characterization of the cancer genome.
Abstract: Comprehensive knowledge of the genomic alterations that underlie cancer is a critical foundation for diagnostics, prognostics, and targeted therapeutics. Systematic efforts to analyze cancer genomes are underway, but the analysis is hampered by the lack of a statistical framework to distinguish meaningful events from random background aberrations. Here we describe a systematic method, called Genomic Identification of Significant Targets in Cancer (GISTIC), designed for analyzing chromosomal aberrations in cancer. We use it to study chromosomal aberrations in 141 gliomas and compare the results with two prior studies. Traditional methods highlight hundreds of altered regions with little concordance between studies. The new approach reveals a highly concordant picture involving ≈35 significant events, including 16–18 broad events near chromosome-arm size and 16–21 focal events. Approximately half of these events correspond to known cancer-related genes, only some of which have been previously tied to glioma. We also show that superimposed broad and focal events may have different biological consequences. Specifically, gliomas with broad amplification of chromosome 7 have properties different from those with overlapping focalEGFR amplification: the broad events act in part through effects on MET and its ligand HGF and correlate with MET dependence in vitro. Our results support the feasibility and utility of systematic characterization of the cancer genome.

1,043 citations

Journal ArticleDOI
TL;DR: It is shown through a high-resolution genome-wide single nucleotide polymorphism and copy number survey that most, if not all, metastatic prostate cancers have monoclonal origins and maintain a unique signature copy number pattern of the parent cancer cell while also accumulating a variable number of separate subclonally sustained changes.
Abstract: Many studies have shown that primary prostate cancers are multifocal and are composed of multiple genetically distinct cancer cell clones. Whether or not multiclonal primary prostate cancers typically give rise to multiclonal or monoclonal prostate cancer metastases is largely unknown, although studies at single chromosomal loci are consistent with the latter case. Here we show through a high-resolution genome-wide single nucleotide polymorphism and copy number survey that most, if not all, metastatic prostate cancers have monoclonal origins and maintain a unique signature copy number pattern of the parent cancer cell while also accumulating a variable number of separate subclonally sustained changes. We find no relationship between anatomic site of metastasis and genomic copy number change pattern. Taken together with past animal and cytogenetic studies of metastasis and recent single-locus genetic data in prostate and other metastatic cancers, these data indicate that despite common genomic heterogeneity in primary cancers, most metastatic cancers arise from a single precursor cancer cell. This study establishes that genomic archeology of multiple anatomically separate metastatic cancers in individuals can be used to define the salient genomic features of a parent cancer clone of proven lethal metastatic phenotype.

631 citations

Related Papers (5)
18 Feb 2010-Nature