scispace - formally typeset
Search or ask a question
Posted ContentDOI

SuperFreq: Integrated mutation detection and clonal tracking in cancer

TL;DR: SuperFreq is a cancer exome sequencing analysis pipeline that integrates identification of somatic single nucleotide variants (SNVs) and copy number alterations (CNAs) and clonal tracking for both and can be applied in many different experimental settings for the analysis of exomes and other capture libraries.
Abstract: Motivation Analysing multiple tumour samples from an individual cancer patient allows insight into the way the disease evolves. Monitoring the expansion and contraction of distinct clones helps to reveal the mutations that initiate the disease and those that drive progression; therefore, the ability to identify and track clones using genomics data is of great interest. Existing approaches for clonal tracking typically require the user to combine multiple tools that are not purpose-made. Furthermore, most methods require a matched normal (non-tumour) sample, which limits the scope of application. Results We have built superFreq, a cancer exome sequencing analysis tool that calls and annotates somatic SNVs and CNAs and attributes them to clones. SuperFreq makes use of unrelated control samples and does not require matched normal samples. We demonstrate the ability of superFreq to track clones by combining real samples in known proportions to simulating a multi-sample analysis. In addition, we compared superFreq to other somatic SNV callers and CNA callers on exome sequencing data from cancer-normal pairs, including 304 participants gathered from 33 cancer types in The Cancer Genome Atlas (TCGA). SuperFreq offers a reliable platform to identify somatic mutations and to track clones. SuperFreq recalled 91% of somatic SNVs identified by a consensus of four other methods, with a median of 1 additional somatic SNV per sample that was not found by any other method. CNA calls from superFreq showed good agreement with those generated by Sequenza, or those from ASCAT generated using matched SNP arrays. Using our simulated data set for testing multi-sample clonal tracking, we found that superFreq identified 93% of clones with a cellular fraction of at least 50%, and mutations were assigned to clones with high recall and close to 100% precision. In addition, SuperFreq maintained a similar level of performance for most aspects of the analysis without a matched normal control. SuperFreq is a highly adaptable method and has already been used in multiple different projects. Availability SuperFreq is implemented in R and available on github at https://github.com/ChristofferFlensburg/superFreq.
Citations
More filters
Journal ArticleDOI
TL;DR: A primary gastric cancer organoid biobank that comprises normal, dysplastic, cancer, and lymph node metastases from 34 patients, including detailed whole-exome and transcriptome analysis, provides a useful resource for studying both cancer cell biology and precision cancer therapy.

393 citations

Journal ArticleDOI
TL;DR: The integrated genomic and immune landscapes show that metastases propagate and evolve as communities of clones, reveal their predicted neo-antigen landscapes, and show that they can accumulate HLA loss of heterozygosity (LOH).

85 citations

Journal ArticleDOI
01 Aug 2019
TL;DR: The emergence of a drug resistance mutation is demonstrated and the evolution of tumor subclones within a cholangiocarcinoma disease course is characterized.
Abstract: Cholangiocarcinoma is a highly aggressive and lethal malignancy, with limited treatment options available. Recently, FGFR inhibitors have been developed and utilized in FGFR-mutant cholangiocarcinoma; however, resistance often develops and the genomic determinants of resistance are not fully characterized. We completed whole-exome sequencing (WES) of 11 unique tumor samples obtained from a rapid research autopsy on a patient with FGFR-fusion-positive cholangiocarcinoma who initially responded to the pan-FGFR inhibitor, INCB054828. In vitro studies were carried out to characterize the novel FGFR alteration and secondary FGFR2 mutation identified. Multisite WES and analysis of tumor heterogeneity through subclonal inference identified four genetically distinct cancer cell populations, two of which were only observed after treatment. Additionally, WES revealed an FGFR2 N549H mutation hypothesized to confer resistance to the FGFR inhibitor INCB054828 in a single tumor sample. This hypothesis was corroborated with in vitro cell-based studies in which cells expressing FGFR2-CLIP1 fusion were sensitive to INCB054828 (IC50 value of 10.16 nM), whereas cells with the addition of the N549H mutation were resistant to INCB054828 (IC50 value of 1527.57 nM). Furthermore, the FGFR2 N549H secondary mutation displayed cross-resistance to other selective FGFR inhibitors, but remained sensitive to the nonselective inhibitor, ponatinib. Rapid research autopsy has the potential to provide unprecedented insights into the clonal evolution of cancer throughout the course of the disease. In this study, we demonstrate the emergence of a drug resistance mutation and characterize the evolution of tumor subclones within a cholangiocarcinoma disease course.

56 citations


Cites methods from "SuperFreq: Integrated mutation dete..."

  • ...(2016) using their tool superFREQ (Flensburg et al. 2018) for subclonal analysis of four metastatic breast cancer cases, and by Brady et al....

    [...]

  • ...Another possibility is metastatic cross-seeding, as was observed by Savas et al. (2016) using their tool superFREQ (Flensburg et al. 2018) for subclonal analysis of four metastatic breast cancer cases, and by Brady et al. (2019) in pediatric osteosarcoma....

    [...]

Journal ArticleDOI
TL;DR: The role of the RAS/MAPK pathway in the pathogenesis of sHDT is confirmed, providing further evidence of a common neoplastic precursor and, in the case of FL, gives additional insight into the stage in lymphomagenesis at which transdifferentiation may occur.

21 citations

Journal ArticleDOI
TL;DR: The chemosensitivity data in vitro reveal the potential value of clinical application for PDOs to predict chemotherapy response (FOLFOX or FOLFIRI) and clinical prognosis of CRLM patients and can be utilized to deliver a potential application for personalized medicine.
Abstract: There is no effective method to predict chemotherapy response and postoperative prognosis of colorectal cancer liver metastasis (CRLM) patients. Patient‐derived organoid (PDO) has become an important preclinical model. Herein, a living biobank with 50 CRLM organoids derived from primary tumors and paired liver metastatic lesions is successfully constructed. CRLM PDOs from the multiomics levels (histopathology, genome, transcriptome and single‐cell sequencing) are comprehensively analyzed and confirmed that this organoid platform for CRLM could capture intra‐ and interpatient heterogeneity. The chemosensitivity data in vitro reveal the potential value of clinical application for PDOs to predict chemotherapy response (FOLFOX or FOLFIRI) and clinical prognosis of CRLM patients. Taken together, CRLM PDOs can be utilized to deliver a potential application for personalized medicine.

18 citations

References
More filters
Journal ArticleDOI
TL;DR: FeatureCounts as discussed by the authors is a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments, which implements highly efficient chromosome hashing and feature blocking techniques.
Abstract: MOTIVATION: Next-generation sequencing technologies generate millions of short sequence reads, which are usually aligned to a reference genome. In many applications, the key information required for downstream analysis is the number of reads mapping to each genomic feature, for example to each exon or each gene. The process of counting reads is called read summarization. Read summarization is required for a great variety of genomic analyses but has so far received relatively little attention in the literature. RESULTS: We present featureCounts, a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments. featureCounts implements highly efficient chromosome hashing and feature blocking techniques. It is considerably faster than existing methods (by an order of magnitude for gene-level summarization) and requires far less computer memory. It works with either single or paired-end reads and provides a wide range of options appropriate for different sequencing applications. AVAILABILITY AND IMPLEMENTATION: featureCounts is available under GNU General Public License as part of the Subread (http://subread.sourceforge.net) or Rsubread (http://www.bioconductor.org) software packages.

14,103 citations


"SuperFreq: Integrated mutation dete..." refers methods in this paper

  • ...FeatureCounts[31] is used to determine the read count over each capture region (exon) for each sample....

    [...]

Journal ArticleDOI
TL;DR: The hierarchical model of Lonnstedt and Speed (2002) is developed into a practical approach for general microarray experiments with arbitrary numbers of treatments and RNA samples and the moderated t-statistic is shown to follow a t-distribution with augmented degrees of freedom.
Abstract: The problem of identifying differentially expressed genes in designed microarray experiments is considered. Lonnstedt and Speed (2002) derived an expression for the posterior odds of differential expression in a replicated two-color experiment using a simple hierarchical parametric model. The purpose of this paper is to develop the hierarchical model of Lonnstedt and Speed (2002) into a practical approach for general microarray experiments with arbitrary numbers of treatments and RNA samples. The model is reset in the context of general linear models with arbitrary coefficients and contrasts of interest. The approach applies equally well to both single channel and two color microarray experiments. Consistent, closed form estimators are derived for the hyperparameters in the model. The estimators proposed have robust behavior even for small numbers of arrays and allow for incomplete data arising from spot filtering or spot quality weights. The posterior odds statistic is reformulated in terms of a moderated t-statistic in which posterior residual standard deviations are used in place of ordinary standard deviations. The empirical Bayes approach is equivalent to shrinkage of the estimated sample variances towards a pooled estimate, resulting in far more stable inference when the number of arrays is small. The use of moderated t-statistics has the advantage over the posterior odds that the number of hyperparameters which need to estimated is reduced; in particular, knowledge of the non-null prior for the fold changes are not required. The moderated t-statistic is shown to follow a t-distribution with augmented degrees of freedom. The moderated t inferential approach extends to accommodate tests of composite null hypotheses through the use of moderated F-statistics. The performance of the methods is demonstrated in a simulation study. Results are presented for two publicly available data sets.

11,864 citations


"SuperFreq: Integrated mutation dete..." refers methods in this paper

  • ...SuperFreq runs limma-voom [32, 33] with sample weights [34] on the bias-corrected counts to test for an increase or decrease in coverage indicating a CNA....

    [...]

Journal ArticleDOI
Monkol Lek, Konrad J. Karczewski1, Konrad J. Karczewski2, Eric Vallabh Minikel2, Eric Vallabh Minikel1, Kaitlin E. Samocha, Eric Banks1, Timothy Fennell1, Anne H. O’Donnell-Luria1, Anne H. O’Donnell-Luria3, Anne H. O’Donnell-Luria2, James S. Ware, Andrew J. Hill2, Andrew J. Hill4, Andrew J. Hill1, Beryl B. Cummings2, Beryl B. Cummings1, Taru Tukiainen2, Taru Tukiainen1, Daniel P. Birnbaum1, Jack A. Kosmicki, Laramie E. Duncan2, Laramie E. Duncan1, Karol Estrada2, Karol Estrada1, Fengmei Zhao1, Fengmei Zhao2, James Zou1, Emma Pierce-Hoffman1, Emma Pierce-Hoffman2, Joanne Berghout5, David Neil Cooper6, Nicole A. Deflaux7, Mark A. DePristo1, Ron Do, Jason Flannick2, Jason Flannick1, Menachem Fromer, Laura D. Gauthier1, Jackie Goldstein2, Jackie Goldstein1, Namrata Gupta1, Daniel P. Howrigan2, Daniel P. Howrigan1, Adam Kiezun1, Mitja I. Kurki1, Mitja I. Kurki2, Ami Levy Moonshine1, Pradeep Natarajan, Lorena Orozco, Gina M. Peloso1, Gina M. Peloso2, Ryan Poplin1, Manuel A. Rivas1, Valentin Ruano-Rubio1, Samuel A. Rose1, Douglas M. Ruderfer8, Khalid Shakir1, Peter D. Stenson6, Christine Stevens1, Brett Thomas1, Brett Thomas2, Grace Tiao1, María Teresa Tusié-Luna, Ben Weisburd1, Hong-Hee Won9, Dongmei Yu, David Altshuler10, David Altshuler1, Diego Ardissino, Michael Boehnke11, John Danesh12, Stacey Donnelly1, Roberto Elosua, Jose C. Florez1, Jose C. Florez2, Stacey Gabriel1, Gad Getz2, Gad Getz1, Stephen J. Glatt13, Christina M. Hultman14, Sekar Kathiresan, Markku Laakso15, Steven A. McCarroll1, Steven A. McCarroll2, Mark I. McCarthy16, Mark I. McCarthy17, Dermot P.B. McGovern18, Ruth McPherson19, Benjamin M. Neale2, Benjamin M. Neale1, Aarno Palotie, Shaun Purcell8, Danish Saleheen20, Jeremiah M. Scharf, Pamela Sklar, Patrick F. Sullivan21, Patrick F. Sullivan14, Jaakko Tuomilehto22, Ming T. Tsuang23, Hugh Watkins17, Hugh Watkins16, James G. Wilson24, Mark J. Daly1, Mark J. Daly2, Daniel G. MacArthur2, Daniel G. MacArthur1 
18 Aug 2016-Nature
TL;DR: The aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC) provides direct evidence for the presence of widespread mutational recurrence.
Abstract: Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.

8,758 citations

Journal ArticleDOI
TL;DR: Intratumor heterogeneity can lead to underestimation of the tumor genomics landscape portrayed from single tumor-biopsy samples and may present major challenges to personalized-medicine and biomarker development.
Abstract: Background Intratumor heterogeneity may foster tumor evolution and adaptation and hinder personalized-medicine strategies that depend on results from single tumor-biopsy samples. Methods To examine intratumor heterogeneity, we performed exome sequencing, chromosome aberration analysis, and ploidy profiling on multiple spatially separated samples obtained from primary renal carcinomas and associated metastatic sites. We characterized the consequences of intratumor heterogeneity using immunohistochemical analysis, mutation functional analysis, and profiling of messenger RNA expression. Results Phylogenetic reconstruction revealed branched evolutionary tumor growth, with 63 to 69% of all somatic mutations not detectable across every tumor region. Intratumor heterogeneity was observed for a mutation within an autoinhibitory domain of the mammalian target of rapamycin (mTOR) kinase, correlating with S6 and 4EBP phosphorylation in vivo and constitutive activation of mTOR kinase activity in vitro. Mutational intratumor heterogeneity was seen for multiple tumor-suppressor genes converging on loss of function; SETD2, PTEN, and KDM5C underwent multiple distinct and spatially separated inactivating mutations within a single tumor, suggesting convergent phenotypic evolution. Gene-expression signatures of good and poor prognosis were detected in different regions of the same tumor. Allelic composition and ploidy profiling analysis revealed extensive intratumor heterogeneity, with 26 of 30 tumor samples from four tumors harboring divergent allelic-imbalance profiles and with ploidy heterogeneity in two of four tumors. Conclusions Intratumor heterogeneity can lead to underestimation of the tumor genomics landscape portrayed from single tumor-biopsy samples and may present major challenges to personalized-medicine and biomarker development. Intratumor heterogeneity, associated with heterogeneous protein function, may foster tumor adaptation and therapeutic failure through Darwinian selection. (Funded by the Medical Research Council and others.)

6,672 citations


"SuperFreq: Integrated mutation dete..." refers background in this paper

  • ...In a clinical setting it can help detect the cause of relapse or drug resistance, identify early driver mutations, or track the course of metastasis[1-5]....

    [...]

Journal ArticleDOI
TL;DR: The dbSNP database is a general catalog of genome variation to address the large-scale sampling designs required by association studies, gene mapping and evolutionary biology, and is integrated with other sources of information at NCBI such as GenBank, PubMed, LocusLink and the Human Genome Project data.
Abstract: In response to a need for a general catalog of genome variation to address the large-scale sampling designs required by association studies, gene mapping and evolutionary biology, the National Center for Biotechnology Information (NCBI) has established the dbSNP database [S.T.Sherry, M.Ward and K.Sirotkin (1999) Genome Res., 9, 677–679]. Submissions to dbSNP will be integrated with other sources of information at NCBI such as GenBank, PubMed, LocusLink and the Human Genome Project data. The complete contents of dbSNP are available to the public at website: http://www.ncbi.nlm.nih.gov/SNP. The complete contents of dbSNP can also be downloaded in multiple formats via anonymous FTP at ftp:// ncbi.nlm.nih.gov/snp/.

6,449 citations