scispace - formally typeset
Search or ask a question

Showing papers by "Andy G. Lynch published in 2014"


Journal ArticleDOI
01 Oct 2014-eLife
TL;DR: This study analyzed somatic alterations in mtDNA from 1675 tumors and identified 1907 somatic substitutions, which exhibited dramatic replicative strand bias, predominantly C > T and A > G on the mitochondrial heavy strand.
Abstract: Recent sequencing studies have extensively explored the somatic alterations present in the nuclear genomes of cancers. Although mitochondria control energy metabolism and apoptosis, the origins and impact of cancer-associated mutations in mtDNA are unclear. In this study, we analyzed somatic alterations in mtDNA from 1675 tumors. We identified 1907 somatic substitutions, which exhibited dramatic replicative strand bias, predominantly C > T and A > G on the mitochondrial heavy strand. This strand-asymmetric signature differs from those found in nuclear cancer genomes but matches the inferred germline process shaping primate mtDNA sequence content. A number of mtDNA mutations showed considerable heterogeneity across tumor types. Missense mutations were selectively neutral and often gradually drifted towards homoplasmy over time. In contrast, mutations resulting in protein truncation undergo negative selection and were almost exclusively heteroplasmic. Our findings indicate that the endogenous mutational mechanism has far greater impact than any other external mutagens in mitochondria and is fundamentally linked to mtDNA replication.

391 citations


Journal ArticleDOI
Jose M. C. Tubio1, Yang Li1, Young Seok Ju1, Inigo Martincorena1, Susanna L. Cooke1, Marta Tojo2, Gunes Gundem1, Christodoulos P. Pipinikas3, Jorge Zamora1, Keiran Raine1, Andrew Menzies1, Pablo Román-García1, Anthony Fullam1, Moritz Gerstung1, Adam Shlien1, Patrick S. Tarpey1, Elli Papaemmanuil1, Stian Knappskog4, Stian Knappskog5, Stian Knappskog1, Peter Van Loo6, Peter Van Loo1, Manasa Ramakrishna1, Helen Davies1, John Marshall1, David C. Wedge1, Jon W. Teague1, Adam Butler1, Serena Nik-Zainal1, Serena Nik-Zainal7, Ludmil B. Alexandrov1, Sam Behjati1, Lucy R. Yates1, Niccolo Bolli7, Niccolo Bolli1, Laura Mudie1, Claire Hardy1, Sancha Martin1, Stuart McLaren1, Sarah O’Meara1, Elizabeth Anderson1, Mark Maddison1, Stephen J. Gamble1, Christopher S. Foster8, Anne Y. Warren7, Hayley C. Whitaker7, Daniel Brewer9, Daniel Brewer10, Rosalind A. Eeles9, Colin Cooper9, Colin Cooper10, David E. Neal7, Andy G. Lynch7, Tapio Visakorpi11, William B. Isaacs12, Laura Van't Veer13, Carlos Caldas7, Christine Desmedt14, Christos Sotiriou14, Samuel Aparicio, John A. Foekens15, Jorunn E. Eyfjord16, Sunil R. Lakhani17, Sunil R. Lakhani18, Gilles Thomas19, Ola Myklebost20, Paul N. Span21, Anne Lise Børresen-Dale20, Andrea L. Richardson22, Marc J. van de Vijver, Anne Vincent-Salomon23, Gert Van den Eynden, Adrienne M. Flanagan24, Adrienne M. Flanagan3, P. Andrew Futreal1, P. Andrew Futreal25, Sam M. Janes3, G. Steven Bova11, Michael R. Stratton1, Ultan McDermott1, Peter J. Campbell1, Peter J. Campbell7 
01 Aug 2014-Science
TL;DR: It is found that 3′ transduction activity in a patient’s tumor was always associated with hypomethylation of that element, and in some cases transduction events can scatter exons, genes, and regulatory elements widely across the genome.
Abstract: Long interspersed nuclear element-1 (L1) retrotransposons are mobile repetitive elements that are abundant in the human genome. L1 elements propagate through RNA intermediates. In the germ line, neighboring, nonrepetitive sequences are occasionally mobilized by the L1 machinery, a process called 3' transduction. Because 3' transductions are potentially mutagenic, we explored the extent to which they occur somatically during tumorigenesis. Studying cancer genomes from 244 patients, we found that tumors from 53% of the patients had somatic retrotranspositions, of which 24% were 3' transductions. Fingerprinting of donor L1s revealed that a handful of source L1 elements in a tumor can spawn from tens to hundreds of 3' transductions, which can themselves seed further retrotranspositions. The activity of individual L1 elements fluctuated during tumor evolution and correlated with L1 promoter hypomethylation. The 3' transductions disseminated genes, exons, and regulatory elements to new locations, most often to heterochromatic regions of the genome.

338 citations


Journal ArticleDOI
TL;DR: In conclusion, mutations in EAC driver genes generally occur exceptionally early in disease development with profound implications for diagnostic and therapeutic strategies.
Abstract: Cancer genome sequencing studies have identified numerous driver genes, but the relative timing of mutations in carcinogenesis remains unclear. The gradual progression from premalignant Barrett's esophagus to esophageal adenocarcinoma (EAC) provides an ideal model to study the ordering of somatic mutations. We identified recurrently mutated genes and assessed clonal structure using whole-genome sequencing and amplicon resequencing of 112 EACs. We next screened a cohort of 109 biopsies from 2 key transition points in the development of malignancy: benign metaplastic never-dysplastic Barrett's esophagus (NDBE; n=66) and high-grade dysplasia (HGD; n=43). Unexpectedly, the majority of recurrently mutated genes in EAC were also mutated in NDBE. Only TP53 and SMAD4 mutations occurred in a stage-specific manner, confined to HGD and EAC, respectively. Finally, we applied this knowledge to identify high-risk Barrett's esophagus in a new non-endoscopic test. In conclusion, mutations in EAC driver genes generally occur exceptionally early in disease development with profound implications for diagnostic and therapeutic strategies.

305 citations


01 Jan 2014
TL;DR: I) duplexing the oligos: a) Order the oligo the following way: Example target sequence: 5’CCCAGGTATTGTTAGCGGTTTGAACGCTGCGTTTT3’ and add GATCA to the 3’ end of the reverse oligo (see the colored sequences below).
Abstract: II) duplexing the oligos: a) Order the oligos the following way: Example target sequence: 5’CCCAGGTATTGTTAGCGGTTTGAACGCTGCAGG 3’ Select the 20 nt sequence flanked by an NGG sequence (Cas9 PAM). This 20 nt sequence will be the gRNA sequence that will be inserted in the BplI site in bRA plasmid. Add GTTTT to the 3’ end of the forward oligo. Take the reverse complement of the 20nt sequence and add GATCA to the 3’ end of the reverse oligo (see the colored sequences below). When these complementary sequences are duplexed, these additional sequences will serve as overhangs that are complementary to the BplI sites. Oligo1 (Forward) for the example target sequence: 5’GTTAGCGGTTTGAACGCTGCGTTTT3’ Oligo2 (Reverse) for the example target sequence : 5’GCAGCGTTCAAACCGCTAACGATCA3’

8 citations


Posted ContentDOI
24 Dec 2014-bioRxiv
TL;DR: It is concluded that somatic mutation calling remains an unsolved problem and critical issues that need to be addressed before this valuable technology can be routinely used to inform clinical decision-making are highlighted.
Abstract: The emergence of next generation DNA sequencing technology is enabling high-resolution cancer genome analysis. Large-scale projects like the International Cancer Genome Consortium (ICGC) are systematically scanning cancer genomes to identify recurrent somatic mutations. Second generation DNA sequencing, however, is still an evolving technology and procedures, both experimental and analytical, are constantly changing. Thus the research community is still defining a set of best practices for cancer genome data analysis, with no single protocol emerging to fulfil this role. Here we describe an extensive benchmark exercise to identify and resolve issues of somatic mutation calling. Whole genome sequence datasets comprising tumor-normal pairs from two different types of cancer, chronic lymphocytic leukaemia and medulloblastoma, were shared within the ICGC and submissions of somatic mutation calls were compared to verified mutations and to each other. Varying strategies to call mutations, incomplete awareness of sources of artefacts, and even lack of agreement on what constitutes an artefact or real mutation manifested in widely varying mutation call rates and somewhat low concordance among submissions. We conclude that somatic mutation calling remains an unsolved problem. However, we have identified many issues that are easy to remedy that are presented here. Our study highlights critical issues that need to be addressed before this valuable technology can be routinely used to inform clinical decision-making.

7 citations


01 Jan 2014
TL;DR: All analysed strains originate from single hybridisation event, as most of LOH events are shared by all strains, and some selected duplications and deletions among those longer than 5Kb are represented.
Abstract: Candida metapsilosis chromosomes graphs For each chromosome we have plotted: i) coding genes for +/-strand (grey bars) and GC-content in 1kb windows (blue plot) in the bottom track and log2 of observed vs expected value in 1kb windows for depth of coverage (blue) in the top fourteen tracks. In the bottom panel, the X axis reflects the genomic position given in bp (from 0 up to 3.25Mb for the longest chromosome), while the Y axis the GC% content (0-100%). In the other panels the Y axis reflect the log2 of observed vs expected depth-of-coverage (from-4 to +4). In addition, loss of heterozygosity (LOH) regions have been marked in grey, if the same genotype as reference was kept (hapA), and orange, if alternative genotype was kept (hapB). Four replicas (pe300, pe600, mp500 and pe400ov) were analysed for PL429. C. metapsilosis genome is a mixture of heterozygous (light grey), haplotype B (dark grey) and haplotype A (orange) regions. We suspect all analysed strains originate from single hybridisation event, as most of LOH events are shared by all strains. Examples of large LOH, duplications and deletions have been annotated ie. rDNA cluster (scaffold5), scaffold5 triploidy in PL448, partial scaffold2 triploidy in SZMC21154 and PL448, and complete LOH in scaffold6 in PL448. For the sake of simplicity, only some selected duplications and deletions among those longer than 5Kb are represented (annotated as such in Supplementary table S4). rDNA cluster is found on the edge of the largest LOH (over 350kb, scaffold5). Interestingly, we have also found rDNA cluster in long (200kb) LOH track in C. orthopsilosis MCO448 (PMID: 24747362).

7 citations


Peer ReviewDOI
07 Aug 2014-eLife

6 citations


01 Jan 2014
TL;DR: Nucleic composition comparison baseon NatAnaerobicFreq NCBAnaerilicFreq NatAeroAnaeroFreqRatio NCBAerobicFreQ NatBacteriaFreq 1 A 1 28.46 28.79 28.32 28.64 26.74 6 T 2 30.32 30.09 31.74 1.14 1.12 8 23.91 1.17 1.00 10 24.74 0.99 0.86 0.
Abstract: Nucleic composition comparison base codon NatAnaerobicFreq NCBAnaerobicFreq NatAerobicFreq NCBAerobicFreq NatBacteriaFreq 1 A 1 28.46 28.76 22.29 23.65 24.88 2 T 1 17.34 17.49 14.44 16.03 15.92 3 G 1 34.53 34.53 38.07 38.07 36.28 4 C 1 19.67 19.22 25.21 22.25 22.92 5 A 2 31.09 31.09 26.64 26.70 28.74 6 T 2 30.32 30.32 28.79 28.79 29.58 7 G 2 16.98 16.90 19.29 18.99 17.99 8 C 2 21.61 21.69 25.28 25.52 23.69 9 A 3 22.75 24.42 12.87 24.42 17.89 10 T 3 26.74 24.95 16.15 24.71 21.91 11 G 3 24.00 25.66 32.72 26.15 28.55 12 C 3 26.52 24.98 38.26 24.72 31.65 NCBBacteriaFreq NatAeroAnaeroFreqRatio NCBAeroAnaeroFreqRatio 1 25.84 0.78 0.82 2 16.63 0.83 0.92 3 36.28 1.10 1.10 4 21.24 1.28 1.16 5 28.77 0.86 0.86 6 29.58 0.95 0.95 7 17.74 1.14 1.12 8 23.91 1.17 1.18 9 24.48 0.57 1.00 10 24.74 0.60 0.99 11 26.03 1.36 1.02 12 24.74 1.44 0.99

1 citations


Journal ArticleDOI
TL;DR: It is shown that the CNM-based approach is a useful diagnostic for the assessment of model fit and inference in ChIP-seq data and beyond and suggested that in some cases the need for zero inflation is driven by the model's inability to cope with both artifactual large read counts and the frequently observed very low read counts.
Abstract: Chromatin Immunoprecipitation followed by sequencing (ChIP-seq) is a valuable tool for epigenetic studies. Analysis of the data arising from ChIP-seq experiments often requires implicit or explicit statistical modelling of the read counts. The simple Poisson model is attractive, but does not provide a good fit to observed ChIP-seq data. Researchers therefore often either extend to a more general model (e.g. the Negative Binomial), and/or exclude regions of the genome that do not conform to the model. Since many modelling strategies employed for ChIP-seq data reduce to fitting a mixture of Poisson distributions, we explore the problem of inferring the optimal mixing distribution. We apply the Constrained Newton Method (CNM), which suggests the Negative Binomial - Negative Binomial (NB-NB) mixture model as a candidate for modelling ChIP-seq data. We illustrate fitting the NB-NB model with an accelerated EM algorithm on four data sets from three species. Zero-inflated models have been suggested as an approach to improve model fit for ChIP-seq data. We show that the NB-NB mixture model requires no zero-inflation and suggest that in some cases the need for zero inflation is driven by the model's inability to cope with both artefactual large read counts and the frequently observed very low read counts. We see that the CNM-based approach is a useful diagnostic for the assessment of model fit and inference in ChIP-seq data and beyond. Use of the suggested NB-NB mixture model will be of value not only when calling peaks or otherwise modelling ChIP-seq data, but also when simulating data or constructing blacklists de novo.

1 citations