scispace - formally typeset
Search or ask a question

Exome and whole-genome sequencing of esophageal adenocarcinoma identifies recurrent driver events and mutational complexity

TL;DR: A mutational signature defined by a high prevalence of A>C transversions at AA dinucleotides is identified and the potential activation of the RAC1 pathway is suggested as a contributor to EAC tumorigenesis.
Abstract: National Human Genome Research Institute (U.S.) (Large Scale Sequencing Program Grant U54 HG003067)
Citations
More filters
Journal ArticleDOI
Ludmil B. Alexandrov1, Serena Nik-Zainal2, Serena Nik-Zainal3, David C. Wedge1, Samuel Aparicio4, Sam Behjati5, Sam Behjati1, Andrew V. Biankin, Graham R. Bignell1, Niccolo Bolli5, Niccolo Bolli1, Åke Borg3, Anne Lise Børresen-Dale6, Anne Lise Børresen-Dale7, Sandrine Boyault8, Birgit Burkhardt8, Adam Butler1, Carlos Caldas9, Helen Davies1, Christine Desmedt, Roland Eils5, Jorunn E. Eyfjord10, John A. Foekens11, Mel Greaves12, Fumie Hosoda13, Barbara Hutter5, Tomislav Ilicic1, Sandrine Imbeaud14, Sandrine Imbeaud15, Marcin Imielinsk15, Natalie Jäger5, David T. W. Jones16, David T. Jones1, Stian Knappskog11, Stian Knappskog17, Marcel Kool11, Sunil R. Lakhani18, Carlos López-Otín18, Sancha Martin1, Nikhil C. Munshi19, Nikhil C. Munshi20, Hiromi Nakamura13, Paul A. Northcott16, Marina Pajic21, Elli Papaemmanuil1, Angelo Paradiso22, John V. Pearson23, Xose S. Puente18, Keiran Raine1, Manasa Ramakrishna1, Andrea L. Richardson22, Andrea L. Richardson19, Julia Richter22, Philip Rosenstiel22, Matthias Schlesner5, Ton N. Schumacher24, Paul N. Span25, Jon W. Teague1, Yasushi Totoki13, Andrew Tutt24, Rafael Valdés-Mas18, Marit M. van Buuren25, Laura van ’t Veer26, Anne Vincent-Salomon27, Nicola Waddell23, Lucy R. Yates1, Icgc PedBrain24, Jessica Zucman-Rossi14, Jessica Zucman-Rossi15, P. Andrew Futreal1, Ultan McDermott1, Peter Lichter24, Matthew Meyerson15, Matthew Meyerson19, Sean M. Grimmond23, Reiner Siebert22, Elias Campo28, Tatsuhiro Shibata13, Stefan M. Pfister16, Stefan M. Pfister11, Peter J. Campbell29, Peter J. Campbell30, Peter J. Campbell2, Michael R. Stratton2, Michael R. Stratton31 
22 Aug 2013-Nature
TL;DR: It is shown that hypermutation localized to small genomic regions, ‘kataegis’, is found in many cancer types, and this results reveal the diversity of mutational processes underlying the development of cancer.
Abstract: All cancers are caused by somatic mutations; however, understanding of the biological processes generating these mutations is limited. The catalogue of somatic mutations from a cancer genome bears the signatures of the mutational processes that have been operative. Here we analysed 4,938,362 mutations from 7,042 cancers and extracted more than 20 distinct mutational signatures. Some are present in many cancer types, notably a signature attributed to the APOBEC family of cytidine deaminases, whereas others are confined to a single cancer class. Certain signatures are associated with age of the patient at cancer diagnosis, known mutagenic exposures or defects in DNA maintenance, but many are of cryptic origin. In addition to these genome-wide mutational signatures, hypermutation localized to small genomic regions, 'kataegis', is found in many cancer types. The results reveal the diversity of mutational processes underlying the development of cancer, with potential implications for understanding of cancer aetiology, prevention and therapy.

7,904 citations

Journal ArticleDOI
Adam J. Bass1, Vesteinn Thorsson2, Ilya Shmulevich2, Sheila Reynolds2  +254 moreInstitutions (32)
11 Sep 2014-Nature
TL;DR: A comprehensive molecular evaluation of 295 primary gastric adenocarcinomas as part of The Cancer Genome Atlas (TCGA) project is described and a molecular classification dividing gastric cancer into four subtypes is proposed.
Abstract: Gastric cancer was the world’s third leading cause of cancer mortality in 2012, responsible for 723,000 deaths1. The vast majority of gastric cancers are adenocarcinomas, which can be further subdivided into intestinal and diffuse types according to the Lauren classification2. An alternative system, proposed by the World Health Organization, divides gastric cancer into papillary, tubular, mucinous (colloid) and poorly cohesive carcinomas3. These classification systems have little clinical utility, making the development of robust classifiers that can guide patient therapy an urgent priority. The majority of gastric cancers are associated with infectious agents, including the bacterium Helicobacter pylori4 and Epstein–Barr virus (EBV). The distribution of histological subtypes of gastric cancer and the frequencies of H. pylori and EBV associated gastric cancer vary across the globe5. A small minority of gastric cancer cases are associated with germline mutation in E-cadherin (CDH1)6 or mismatch repair genes7 (Lynch syndrome), whereas sporadic mismatch repair-deficient gastric cancers have epigenetic silencing of MLH1 in the context of a CpG island methylator phenotype (CIMP)8. Molecular profiling of gastric cancer has been performed using gene expression or DNA sequencing9–12, but has not led to a clear biologic classification scheme. The goals of this study by The Cancer Genome Atlas (TCGA) were to develop a robust molecular classification of gastric cancer and to identify dysregulated pathways and candidate drivers of distinct classes of gastric cancer.

4,583 citations

01 Jun 2013
TL;DR: The MutSigCV method as mentioned in this paper applies mutational heterogeneity to exome sequences from 3,083 tumour-normal pairs and discovers extraordinary variation in mutation frequency and spectrum within cancer types, which sheds light on mutational processes and disease aetiology.
Abstract: Major international projects are underway that are aimed at creating a comprehensive catalogue of all the genes responsible for the initiation and progression of cancer. These studies involve the sequencing of matched tumour–normal samples followed by mathematical analysis to identify those genes in which mutations occur more frequently than expected by random chance. Here we describe a fundamental problem with cancer genome studies: as the sample size increases, the list of putatively significant genes produced by current analytical methods burgeons into the hundreds. The list includes many implausible genes (such as those encoding olfactory receptors and the muscle protein titin), suggesting extensive false-positive findings that overshadow true driver events. We show that this problem stems largely from mutational heterogeneity and provide a novel analytical methodology, MutSigCV, for resolving the problem. We apply MutSigCV to exome sequences from 3,083 tumour–normal pairs and discover extraordinary variation in mutation frequency and spectrum within cancer types, which sheds light on mutational processes and disease aetiology, and in mutation frequency across the genome, which is strongly correlated with DNA replication timing and also with transcriptional activity. By incorporating mutational heterogeneity into the analyses, MutSigCV is able to eliminate most of the apparent artefactual findings and enable the identification of genes truly associated with cancer.

2,145 citations

Journal ArticleDOI
Nishant Agrawal1, Rehan Akbani1, B. Arman Aksoy1, Adrian Ally1  +239 moreInstitutions (1)
23 Oct 2014-Cell
TL;DR: The genomic landscape of 496 PTCs is described and a reclassification of thyroid cancers into molecular subtypes that better reflect their underlying signaling and differentiation properties is proposed, which has the potential to improve their pathological classification and better inform the management of the disease.

2,096 citations

Journal ArticleDOI
TL;DR: An R Bioconductor package, Maftools, is described, which offers a multitude of analysis and visualization modules that are commonly used in cancer genomic studies, including driver gene identification, pathway, signature, enrichment, and association analyses, and is independent of larger alignment files.
Abstract: Numerous large-scale genomic studies of matched tumor-normal samples have established the somatic landscapes of most cancer types. However, the downstream analysis of data from somatic mutations entails a number of computational and statistical approaches, requiring usage of independent software and numerous tools. Here, we describe an R Bioconductor package, Maftools, which offers a multitude of analysis and visualization modules that are commonly used in cancer genomic studies, including driver gene identification, pathway, signature, enrichment, and association analyses. Maftools only requires somatic variants in Mutation Annotation Format (MAF) and is independent of larger alignment files. With the implementation of well-established statistical and computational methods, Maftools facilitates data-driven research and comparative analysis to discover novel results from publicly available data sets. In the present study, using three of the well-annotated cohorts from The Cancer Genome Atlas (TCGA), we describe the application of Maftools to reproduce known results. More importantly, we show that Maftools can also be used to uncover novel findings through integrative analysis.

1,990 citations

References
More filters
Journal ArticleDOI
TL;DR: A new method and the corresponding software tool, PolyPhen-2, which is different from the early tool polyPhen1 in the set of predictive features, alignment pipeline, and the method of classification is presented and performance, as presented by its receiver operating characteristic curves, was consistently superior.
Abstract: To the Editor: Applications of rapidly advancing sequencing technologies exacerbate the need to interpret individual sequence variants. Sequencing of phenotyped clinical subjects will soon become a method of choice in studies of the genetic causes of Mendelian and complex diseases. New exon capture techniques will direct sequencing efforts towards the most informative and easily interpretable protein-coding fraction of the genome. Thus, the demand for computational predictions of the impact of protein sequence variants will continue to grow. Here we present a new method and the corresponding software tool, PolyPhen-2 (http://genetics.bwh.harvard.edu/pph2/), which is different from the early tool PolyPhen1 in the set of predictive features, alignment pipeline, and the method of classification (Fig. 1a). PolyPhen-2 uses eight sequence-based and three structure-based predictive features (Supplementary Table 1) which were selected automatically by an iterative greedy algorithm (Supplementary Methods). Majority of these features involve comparison of a property of the wild-type (ancestral, normal) allele and the corresponding property of the mutant (derived, disease-causing) allele, which together define an amino acid replacement. Most informative features characterize how well the two human alleles fit into the pattern of amino acid replacements within the multiple sequence alignment of homologous proteins, how distant the protein harboring the first deviation from the human wild-type allele is from the human protein, and whether the mutant allele originated at a hypermutable site2. The alignment pipeline selects the set of homologous sequences for the analysis using a clustering algorithm and then constructs and refines their multiple alignment (Supplementary Fig. 1). The functional significance of an allele replacement is predicted from its individual features (Supplementary Figs. 2–4) by Naive Bayes classifier (Supplementary Methods). Figure 1 PolyPhen-2 pipeline and prediction accuracy. (a) Overview of the algorithm. (b) Receiver operating characteristic (ROC) curves for predictions made by PolyPhen-2 using five-fold cross-validation on HumDiv (red) and HumVar3 (light green). UniRef100 (solid ... We used two pairs of datasets to train and test PolyPhen-2. We compiled the first pair, HumDiv, from all 3,155 damaging alleles with known effects on the molecular function causing human Mendelian diseases, present in the UniProt database, together with 6,321 differences between human proteins and their closely related mammalian homologs, assumed to be non-damaging (Supplementary Methods). The second pair, HumVar3, consists of all the 13,032 human disease-causing mutations from UniProt, together with 8,946 human nsSNPs without annotated involvement in disease, which were treated as non-damaging. We found that PolyPhen-2 performance, as presented by its receiver operating characteristic curves, was consistently superior compared to PolyPhen (Fig. 1b) and it also compared favorably with the three other popular prediction tools4–6 (Fig. 1c). For a false positive rate of 20%, PolyPhen-2 achieves the rate of true positive predictions of 92% and 73% on HumDiv and HumVar, respectively (Supplementary Table 2). One reason for a lower accuracy of predictions on HumVar is that nsSNPs assumed to be non-damaging in HumVar contain a sizable fraction of mildly deleterious alleles. In contrast, most of amino acid replacements assumed non-damaging in HumDiv must be close to selective neutrality. Because alleles that are even mildly but unconditionally deleterious cannot be fixed in the evolving lineage, no method based on comparative sequence analysis is ideal for discriminating between drastically and mildly deleterious mutations, which are assigned to the opposite categories in HumVar. Another reason is that HumDiv uses an extra criterion to avoid possible erroneous annotations of damaging mutations. For a mutation, PolyPhen-2 calculates Naive Bayes posterior probability that this mutation is damaging and reports estimates of false positive (the chance that the mutation is classified as damaging when it is in fact non-damaging) and true positive (the chance that the mutation is classified as damaging when it is indeed damaging) rates. A mutation is also appraised qualitatively, as benign, possibly damaging, or probably damaging (Supplementary Methods). The user can choose between HumDiv- and HumVar-trained PolyPhen-2. Diagnostics of Mendelian diseases requires distinguishing mutations with drastic effects from all the remaining human variation, including abundant mildly deleterious alleles. Thus, HumVar-trained PolyPhen-2 should be used for this task. In contrast, HumDiv-trained PolyPhen-2 should be used for evaluating rare alleles at loci potentially involved in complex phenotypes, dense mapping of regions identified by genome-wide association studies, and analysis of natural selection from sequence data, where even mildly deleterious alleles must be treated as damaging.

11,571 citations

Journal ArticleDOI
TL;DR: Three methods of performing normalization at the probe intensity level are presented: a one number scaling based algorithm and a method that uses a non-linear normalizing relation by comparing the variability and bias of an expression measure and the simplest and quickest complete data method is found to perform favorably.
Abstract: Motivation: When running experiments that involve multiple high density oligonucleotide arrays, it is important to remove sources of variation between arrays of non-biological origin. Normalization is a process for reducing this variation. It is common to see non-linear relations between arrays and the standard normalization provided by Affymetrix does not perform well in these situations. Results: We present three methods of performing normalization at the probe intensity level. These methods are called complete data methods because they make use of data from all arrays in an experiment to form the normalizing relation. These algorithms are compared to two methods that make use of a baseline array: a one number scaling based algorithm and a method that uses a non-linear normalizing relation by comparing the variability and bias of an expression measure. Two publicly available datasets are used to carry out the comparisons. The simplest and quickest complete data method is found to perform favorably. Availabilty: Software implementing all three of the complete data normalization methods is available as part of the R package Affy, which is a part of the Bioconductor project http://www.bioconductor.org. Contact: bolstad@stat.berkeley.edu Supplementary information: Additional figures may be found at http://www.stat.berkeley.edu/∼bolstad/normalize/ index.html

8,324 citations

Journal ArticleDOI
09 Oct 2009-Science
TL;DR: Hi-C is described, a method that probes the three-dimensional architecture of whole genomes by coupling proximity-based ligation with massively parallel sequencing and demonstrates the power of Hi-C to map the dynamic conformations of entire genomes.
Abstract: We describe Hi-C, a method that probes the three-dimensional architecture of whole genomes by coupling proximity-based ligation with massively parallel sequencing. We constructed spatial proximity maps of the human genome with Hi-C at a resolution of 1 megabase. These maps confirm the presence of chromosome territories and the spatial proximity of small, gene-rich chromosomes. We identified an additional level of genome organization that is characterized by the spatial segregation of open and closed chromatin to form two genome-wide compartments. At the megabase scale, the chromatin conformation is consistent with a fractal globule, a knot-free, polymer conformation that enables maximally dense packing while preserving the ability to easily fold and unfold any genomic locus. The fractal globule is distinct from the more commonly used globular equilibrium model. Our results demonstrate the power of Hi-C to map the dynamic conformations of whole genomes.

7,180 citations

Journal ArticleDOI
Donna M. Muzny1, Matthew N. Bainbridge1, Kyle Chang1, Huyen Dinh1  +317 moreInstitutions (24)
19 Jul 2012-Nature
TL;DR: Integrative analyses suggest new markers for aggressive colorectal carcinoma and an important role for MYC-directed transcriptional activation and repression.
Abstract: To characterize somatic alterations in colorectal carcinoma, we conducted a genome-scale analysis of 276 samples, analysing exome sequence, DNA copy number, promoter methylation and messenger RNA and microRNA expression. A subset of these samples (97) underwent low-depth-of-coverage whole-genome sequencing. In total, 16% of colorectal carcinomas were found to be hypermutated: three-quarters of these had the expected high microsatellite instability, usually with hypermethylation and MLH1 silencing, and one-quarter had somatic mismatch-repair gene and polymerase e (POLE) mutations. Excluding the hypermutated cancers, colon and rectum cancers were found to have considerably similar patterns of genomic alteration. Twenty-four genes were significantly mutated, and in addition to the expected APC, TP53, SMAD4, PIK3CA and KRAS mutations, we found frequent mutations in ARID1A, SOX9 and FAM123B. Recurrent copy-number alterations include potentially drug-targetable amplifications of ERBB2 and newly discovered amplification of IGF2. Recurrent chromosomal translocations include the fusion of NAV2 and WNT pathway member TCF7L1. Integrative analyses suggest new markers for aggressive colorectal carcinoma and an important role for MYC-directed transcriptional activation and repression.

6,883 citations

Journal ArticleDOI
TL;DR: The dbSNP database is a general catalog of genome variation to address the large-scale sampling designs required by association studies, gene mapping and evolutionary biology, and is integrated with other sources of information at NCBI such as GenBank, PubMed, LocusLink and the Human Genome Project data.
Abstract: In response to a need for a general catalog of genome variation to address the large-scale sampling designs required by association studies, gene mapping and evolutionary biology, the National Center for Biotechnology Information (NCBI) has established the dbSNP database [S.T.Sherry, M.Ward and K.Sirotkin (1999) Genome Res., 9, 677–679]. Submissions to dbSNP will be integrated with other sources of information at NCBI such as GenBank, PubMed, LocusLink and the Human Genome Project data. The complete contents of dbSNP are available to the public at website: http://www.ncbi.nlm.nih.gov/SNP. The complete contents of dbSNP can also be downloaded in multiple formats via anonymous FTP at ftp:// ncbi.nlm.nih.gov/snp/.

6,449 citations