scispace - formally typeset
SciSpace - Your AI assistant to discover and understand research papers | Product Hunt

Posted ContentDOI

Germline loss of MBD4 predisposes to leukaemia due to a mutagenic cascade driven by 5mC

01 Nov 2017-bioRxiv (Cold Spring Harbor Laboratory)-pp 180588

TL;DR: A novel cancer predisposition syndrome resulting from germline biallelic inactivation of MBD4 that leads to the development of acute myeloid leukaemia (AML), and a critical interaction with somatic mutations in DNMT3A that accelerates leukaemogenesis and accounts for the conserved path to AML is highlighted.

AbstractCytosine methylation is essential for normal mammalian development, yet also provides a major mutagenic stimulus. Methylcytosine (5mC) is prone to spontaneous deamination, which introduces cytosine to thymine transition mutations (C>T) upon replication. Cells endure hundreds of 5mC deamination events each day and an intricate repair network is engaged to restrict this damage. Central to this network are the DNA glycosylases MBD4 and TDG, which recognise T:G mispairing and initiate base excision repair (BER). Here we describe a novel cancer predisposition syndrome resulting from germline biallelic inactivation of MBD4 that leads to the development of acute myeloid leukaemia (AML). These leukaemias have an extremely high burden of C>T mutations, specifically in the context of methylated CG dinucleotides (CG>TG). This dependence on 5mC as a source of mutations may explain the remarkable observation that MBD4-deficient AMLs share a common set of driver mutations, including biallelic mutations in DNMT3A and hotspot mutations in IDH1/IDH2. By assessing serial samples taken over the course of treatment, we highlight a critical interaction with somatic mutations in DNMT3A that accelerates leukaemogenesis and accounts for the conserved path to AML. MBD4-deficiency was also detected, rarely, in sporadic cancers, which display the same mutational signature. Collectively these cancers provide a model of 5mC-dependent hypermutation and reveal factors that shape its mutagenic influence.

Topics: Somatic hypermutation (56%), Base excision repair (55%), MBD4 (53%), DNA glycosylase (52%), Germline (51%)

Summary (2 min read)

Affiliation

  • These authors contributed equally to this work 9 MBD4-deficiency was also detected, rarely, in sporadic cancers, which display the same mutational signature.
  • Both cases exhibited an elevated mutation rate and strong enrichment for CG>TG mutations (Fig. 1d, Extended Data Fig. 1a).
  • This shift in functional activity – the expansion of DNMT3Amutant clones – increases the likelihood that cells with biallelic DNMT3A mutations will emerge, which appears to be key for initiating AML in MBD4-deficient patients.
  • The authors confirmed that recombinant DNMT3A enhances TDG glycosylase activity in vitro (Fig. 4a), but had no impact on MBD4 glycosylase activity (Extended Data Fig. 7).

Contributions

  • All authors discussed the results and agree with the conclusions presented.
  • C, Relative mutation rate in different genomic features per Mb of CG dinucleotides (CG corrected), or corrected for methylation status in CD34+ cells (5mC corrected).
  • Each coloured area is proportional to the representation of the clone and vertical lines indicate sampling points31.
  • B, Schematic representation of the repair pathways governing T:G mismatch repair and the combined influence of germline mutations in MBD4 and somatic mutations in DNMT3A (at top) in AML.

Extended Data References – pg. 20-21

  • Supplementary Information Somatic mutations detected in MBD4-deficient AML at diagnosis (hg19).
  • A quality score is provided , variants with a score >0.5 were used for mutation signature analysis.

AML cases

  • Sanger sequencing traces were generated from cloned PCR products after amplification from DNA (top).
  • B, A schematic of the MBD4 gene is shown at top together with the position of two candidate loss-of-function variants that impact splice sites.
  • Sites with mutations were typically fully methylated in the control sample.
  • Individual values are plotted (n=2) and the bar shows the mean.
  • The relative mutation rate was calculated per bin based on CG or 5mCG abundance (as in a).

Clinical synopsis

  • The AML was negative for NPM1, FLT3 and CEBPa mutations.
  • She had induction chemotherapy (high dose cytarabine, idarubicin and etoposide) and achieved complete morphologic and cytogenetic remission.
  • Bone marrow examination 5 weeks post allogeneic HSCT showed complete morphologic and cytogenetic remission; and full donor chimerism.
  • Relapsed AML (of WEHI-AML-1 origin) occurred 11 weeks post allogeneic HSCT.

Methods

  • Patient characteristics and sample collection EMC-AML-1, WEHI-AML-1 and WEHI-AML-2 were diagnosed with AML and treated with combination chemotherapy as per the protocols at their respective institutions [see Clinical Synopsis].
  • They gave informed consent according to the Declaration of Helsinki for participation in research and for collection of samples over the course of their treatment.
  • DNA libraries were quantified and used for both whole genome sequencing and whole exome sequencing.
  • Reduced representation bisulfite sequencing (RRBS) For WEHI-AML-1 and WEHI-AML-2, between 75 to 100 ng of DNA was used to construct RRBS libraries using the Ovation RRBS Methyl-Seq System (NuGEN, San Carlos, CA, USA).
  • DNA was restriction enzyme digested using Mspl followed by ligation with indexed adaptors.

RNA sequencing

  • For WEHI-AML-1 and WEHI-AML-2, total RNA was extracted using TRIzol (Thermo Fisher Scientific, Waltham, MA, USA) as per manufacturer’s instructions.
  • As the mutations occurred almost exclusively in a CG context, the rate of CG>TG mutations per CG was calculated for each genomic feature.
  • Transcriptional strand and expression level: Transcriptional strand bias analysis was performed by determining the template and non-template strands per gene as reported in Ensembl v7513.
  • Libraries were generated as per manufacturer’s instructions and the sequencing was performed on a MiSeq.

Site-directed mutagenesis and cloning

  • And anti-sense 5’- TTGTATTTCCAGGGCGGCACGACTGGGCTGGAGAGTCT-3’. QuikChange II XL Site-Directed Mutagenesis Kit (Agilent Technologies, Santa Clara, CA, USA) was used to generate the DNMT3A and MBD4 mutants.
  • Proteins were verified by SDS-PAGE using a NuPage Novex 4-12% Bis-Tris Protein Gel run in a Bis-Tris XCell SureLock™ Mini-Cell system (Thermo Fisher Scientific, Waltham, MA, USA) with 1x MOPS at 200V for 90 minutes.
  • MBD4 and TDG glycosylase activity assays MBD4 and TDG glycosylase activity assays were performed as described (Hashimoto et al., NAR, 2012).

Data availability

  • Sequencing data from WEHI-AML-1 and WEHI-AML-2 have been deposited at the European Genome Phenome Archive (EGA) [EGAS00001002581].
  • The data are available for ethically approved research into haematological malignancy upon completion of a data transfer agreement.
  • Sequencing data from EMC-AML-1 were sourced from the dbGaP under accession phs001027.
  • TCGA data were downloaded from the GDC Data Commons.
  • Code to reproduce the figures and data are made available through GitHub (https://github.com/MathijsSanders/AML-RoaMeR).

Extended Data References

  • Distinct evolution and dynamics of epigenetic and genetic heterogeneity in acute myeloid leukemia.
  • The UCSC Genome Browser database: 2017 update.

Did you find this useful? Give us your feedback

...read more

Content maybe subject to copyright    Report

1
Title
Germline loss of MBD4 predisposes to leukaemia due to a mutagenic cascade
driven by 5mC
Authors
Mathijs A. Sanders
1,8
, Edward Chew
2,3,4,5,8
, Christoffer Flensburg
2,4
, Annelieke
Zeilemaker
1
, Sarah E. Miller
2
, Adil S. al Hinai
1,6
, Ashish Bajel
3,5
, Bram Luiken
1
,
Melissa Rijken
1
, Tamara Mclennan
7
, Remco M. Hoogenboezem
1
, François G.
Kavelaars
1
, Marnie E. Blewitt
4,7
, Eric M. Bindels
1
, Warren S. Alexander
2,4
, Bob
Löwenberg
1
, Andrew W. Roberts
2,3,4,5
, Peter J.M. Valk
1,9
*, Ian J. Majewski
2,4,9
*
Affiliation
1
Department of Hematology, Erasmus University Medical Center, Rotterdam, The
Netherlands
2
Division of Cancer and Haematology, The Walter and Eliza Hall Institute of Medical
Research, Parkville, Australia
3
Department of Clinical Haematology and Bone Marrow Transplantation, Royal
Melbourne Hospital, Parkville, Australia
4
Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne,
Parkville, Australia
5
Victorian Comprehensive Cancer Centre, Parkville, Australia
6
National Genetic Center, Royal Hospital, Ministry of Health, Sultanate of Oman
7
Division of Molecular Medicine, The Walter and Eliza Hall Institute of Medical
Research, Parkville, Australia
8
These authors contributed equally to this work
9
These authors jointly directed this work
* Correspondence
Peter J.M. Valk
Department of Hematology
Erasmus University Medical Center
Em: p.valk@erasmusmc.nl
Ian J. Majewski
Cancer and Haematology Division
The Walter and Eliza Hall Institute of Medical Research
Em: majewski@wehi.edu.au
.CC-BY-NC-ND 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted November 1, 2017. ; https://doi.org/10.1101/180588doi: bioRxiv preprint

2
Cytosine methylation is essential for normal mammalian development, yet also
provides a major mutagenic stimulus. Methylcytosine (5mC) is prone to spontaneous
deamination, which introduces cytosine to thymine transition mutations (C>T) upon
replication
1
. Cells endure hundreds of 5mC deamination events each day and an
intricate repair network is engaged to restrict this damage. Central to this network
are the DNA glycosylases MBD4
2
and TDG
3,4
, which recognise T:G mispairing and
initiate base excision repair (BER). Here we describe a novel cancer predisposition
syndrome resulting from germline biallelic inactivation of MBD4 that leads to the
development of acute myeloid leukaemia (AML). These leukaemias have an
extremely high burden of C>T mutations, specifically in the context of methylated CG
dinucleotides (CG>TG). This dependence on 5mC as a source of mutations may
explain the remarkable observation that MBD4-deficient AMLs share a common set
of driver mutations, including biallelic mutations in DNMT3A and hotspot mutations in
IDH1/IDH2. By assessing serial samples taken over the course of treatment, we
highlight a critical interaction with somatic mutations in DNMT3A that accelerates
leukaemogenesis and accounts for the conserved path to AML. MBD4-deficiency
was also detected, rarely, in sporadic cancers, which display the same mutational
signature. Collectively these cancers provide a model of 5mC-dependent
hypermutation and reveal factors that shape its mutagenic influence.
We identified three patients with AML, including two siblings, that were distinctive
because of their early age of onset (all <35 years old) and an extremely high
mutational burden (~33-fold above what is typical for AML) (Fig. 1a, Clinical
Synopsis). Virtually all of the somatic mutations identified were C>T in the context of
a CG dinucleotide (>95% of SNVs) (Fig. 1b, Extended Data Fig. 1). This differs
markedly from the distribution of C>T mutations in AML generally and is more
refined than the mutational signature ascribed to ageing, which includes a strong
contribution from 5mC deamination
5
. All three cases carried rare germline loss-of-
function variants in the gene encoding the DNA glycosylase MBD4
2
(Fig. 1c,
Extended Data Table 1). Case EMC-AML-1 carried a homozygous MBD4 in-frame
deletion of Histidine 567 (His567) in the glycosylase domain. An in vitro glycosylase
assay confirmed that loss of His567 results in a catalytically inactive MBD4 protein
(Fig. 1c). The siblings (WEHI-AML-1, WEHI-AML-2) were compound heterozygotes
with a frameshift in exon 3 and a variant that disrupts the splice acceptor of exon 7
(Fig. 1c, Extended Data Table 1). Analysis of the MBD4 mRNA allowed for phasing
of the variants to distinct alleles and confirmed aberrant splicing that excludes exon 7
and disrupts the glycosylase domain (Extended Data Fig. 2). MBD4 has not
previously been associated with haematological malignancy, but somatic mutations
have been detected in sporadic colon cancers with mismatch repair (MMR)
deficiency
6,7
. Two patients (EMC-AML-1, WEHI-AML-2) also had colorectal polyps, a
common manifestation of DNA repair defects, including those associated with loss of
BER components MUTYH
8-10
and NTHL1
11
.
We accessed large cancer databases to explore the link between MBD4-deficiency
and the distinctive CG>TG signature. Analysis of the Cancer Genome Atlas (TCGA)
identified nine cases, from 10,683 total, that carried germline loss-of-function
.CC-BY-NC-ND 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted November 1, 2017. ; https://doi.org/10.1101/180588doi: bioRxiv preprint

3
variants in MBD4 (Fig. 1c, Extended Data Table 1). In two cases, a uveal
melanoma (TCGA-UVM-1) and a glioblastoma multiforme (TCGA-GBM-1), splice
site mutations were accompanied by loss of the wildtype MBD4 allele due to somatic
copy number alterations (Extended Data Fig. 3a). Analysis of RNA sequencing from
both tumours confirmed aberrant splicing of MBD4, predicted to result in protein
truncation and loss of function (Extended Data Fig. 3b). Both cases exhibited an
elevated mutation rate and strong enrichment for CG>TG mutations (Fig. 1d,
Extended Data Fig. 1a). This signature was also observed in a glioma cell line,
SW1783, that carries a homozygous truncating variant in MBD4 at Leu563
(Extended Data Fig. 1a). The cancers that retained a wildtype allele did not display
a prominent CG>TG signature (Fig. 1d). These results suggest both alleles of MBD4
must be inactivated to block its repair activity, which is consistent with other BER-
associated cancer syndromes
8,11
. Analysis of a larger cohort will be required to
determine whether heterozygous loss of MBD4 predisposes to cancer.
Whole genome sequencing and methylation profiling were performed to refine the
mutational signature associated with MBD4-deficiency in AML. While MBD4 is
known to interact with the MMR pathway
12
, MBD4-deficienct leukaemias were
largely devoid of small insertions and deletions, suggesting MMR remains intact.
Overall, >15,000 substitution mutations were identified in each AML genome, of
which >90% were CG>TG (Fig. 2a, Extended Data Fig. 1b). The proportion of
mutations was higher in the context of the ACG triplet and lower in the context of
TCG, with CCG and GCG being intermediate. This difference remained after
correction for trimer abundance and methylation status (Fig. 2b), and was found to
be significant in the exome data from the five MBD4-deficient cancers (p= 0.007937,
Mann-Whitney U test) (Extended Data Fig. 1). The ACA trimer was the most
commonly mutated site outside of a CG context, and this matches the most common
site of non-CG methylation
13
. The mutation rate for a given region was linked to 5mC
abundance. Sparsely methylated regions, such as promoters and CG islands, were
rarely mutated (Fig. 2c). Correcting for 5mC abundance revealed a consistent
mutation rate across different genomic features (Fig. 2c). Reduced representation
bisulfite sequencing (RRBS) confirmed that >95% of CG sites mutated in the AML
were fully methylated in matched normal bone marrow available for two cases (Fig.
2d). Assessment of the mutated sites in each AML directly revealed ~50%
methylation, indicating the non-mutated CG site on the alternate allele was
methylated (Fig. 2d). Similar results were obtained when we assessed sites mutated
in the MBD4-deficient glioblastoma (Extended Data Fig. 4). We next assessed the
influence of genetic and epigenetic features known to influence mutation rate
14
.
Extending the analysis of sequence context to include one base either side of the
CG identified higher mutation rates in the context of a 3’ cytosine (NCGC), with the
highest rate at ACGC (Fig. 2e). The relative mutation rate was not influenced by the
transcriptional strand (Extended Data Fig. 5a), but was higher in late replicating
regions (Fig. 2f) and at lowly expressed genes (Extended Data Fig. 5b).
Collectively these results suggest that while 5mC is the dominant factor contributing
to the mutation rate, the local sequence context, replication timing and expression
status also contribute. The differences between tetramers and enrichment in late
.CC-BY-NC-ND 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted November 1, 2017. ; https://doi.org/10.1101/180588doi: bioRxiv preprint

4
replicating regions were also evident in rare germline CG>TG SNPs from the
gnomAD database
15
, indicating this phenomenon is not restricted to cancer
(Extended Data Fig. 5c).
The three cases with germline MBD4-deficiency shared a common path to AML.
They acquired biallelic DNMT3A mutations and IDH1/IDH2 hotspot mutations, all of
which were CG>TG (Fig. 3). Biallelic DNMT3A mutations are uncommon in AML,
affecting ~3% of patients in TCGA-AML, and when considering they also have
coincident IDH1/IDH2 mutations, it is highly unlikely that these three individuals
share this pattern of driver mutations by chance. These mutations impact 5mC at
multiple levels deposition (DNMT3A), removal (IDH1/IDH2) and repair (MBD4)
and this convergence suggests that modulating DNA methylation is central to AML
pathogenesis in MBD4-deficient cases. Analysis of sequential bone marrow biopsies
taken during treatment and single cell genotyping allowed us to refine the order of
somatic mutation acquisition in two cases (EMC-AML-1, WEHI-AML-1) (Fig. 3a-b,
Extended Data Fig. 6). DNMT3A mutations present in the AML at diagnosis were
also detected in non-malignant bone marrow populations in both cases, indicating
that these mutations are among the first acquired. Mutations in DNMT3A enhance
the self-renewal capacity of haematopoietic stem cells (HSCs) and are associated
with age-related clonal haematopoiesis
16-19
. In both patients, marked expansion of
clones carrying DNMT3A mutations occurred with treatment (Fig. 3a-b), suggesting
a strong advantage over normal HSCs. EMC-AML-1 experienced multiple clonal
outgrowths, with nine distinct DNMT3A mutations, and repeated selection of clones
with biallelic mutations. This shift in functional activity the expansion of DNMT3A-
mutant clones increases the likelihood that cells with biallelic DNMT3A mutations
will emerge, which appears to be key for initiating AML in MBD4-deficient patients.
There is a marked discrepancy between the substantial mutation burden in MBD4-
deficient AMLs and the modest 2-3 fold increase in mutation rate in MBD4-deficient
mice
20,21
. It is unclear whether this difference is a reflection of longer disease latency
in humans, as compared to mice, or whether somatic mutations in the AML further
compromise DNA repair. Mutations in DNMT3A and IDH1/IDH2 have been
associated with altered DNA repair in model systems
22,23
. It also remains unclear
why TDG, a glycosylase with overlapping substrate specificity, does not compensate
for MBD4 loss. One possible explanation stems from the observation that
DNMT3A/B can directly stimulate TDG glycosylase activity
24,25
. We confirmed that
recombinant DNMT3A enhances TDG glycosylase activity in vitro (Fig. 4a), but had
no impact on MBD4 glycosylase activity (Extended Data Fig. 7). Mutant forms of
DNMT3A showed weaker stimulation, and even inhibit TDG at higher concentrations
(Fig. 4a). We propose a model for AML pathogenesis whereby inhibition of DNMT3A
contributes in two ways: loss of one allele enables expansion of a premalignant
clone, then acquisition of a second DNMT3A mutation increases the CG>TG
mutation rate due to impaired TDG activity (Fig. 4b). Supporting this model, the
premalignant clone identified in WEHI-AML-1, which had a monoallelic DNMT3A
mutation, did not carry additional mutations that would suggest an elevated mutation
rate. The sporadic cancers that became MBD4-deficient (TCGA-UVM-1 and TCGA-
.CC-BY-NC-ND 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted November 1, 2017. ; https://doi.org/10.1101/180588doi: bioRxiv preprint

5
GBM-1) did not acquire mutations in DNMT3A or IDH1/IDH2, which may indicate
that this interaction is specific to the haematopoietic compartment.
The last five years have seen a concerted effort to define mutational processes that
shape the cancer genome
5
. Deamination of 5mC is the most common source of
somatic mutations and this damage continues to accumulate with age
26
. Our results
highlight the important role for MBD4 in safeguarding against damage wrought by
5mC deamination. One manifestation of this damage is clonal haematopoiesis, a
phenomenon typically observed in people >70 years of age. Individuals with biallelic
loss of MBD4 in the germline sustain high levels of damage from 5mC deamination
and experience clonal expansions decades earlier, which eventually progress to
AML. There are more than 40 million 5mC residues in the genome, yet these
individuals develop the same type of cancer AML with a common set of driver
mutations. Our results indicate this convergence results from the combination of a
highly restricted mutational signature, which accesses a select set of driver genes,
and the dual role of DNMT3A, which regulates HSC function and directly contributes
to DNA repair. This interaction between mutational process, driver landscape and
stem cell biology has broad implications, and may explain the tissue restricted
pattern of disease in this and other cancer predisposition syndromes.
Acknowledgements
The authors would like to thank Simon He, Anita Rijneveld, Kirsten van Lom and
Kirsten Gussinklo for providing clinical information and reviewing samples; Meaghan
Wall for assistance with cytogenetics; Naomi Sprigg for assistance with sample
collection; Elwin Rombouts for assistance with single cell sorting; Hideharu
Hashimoto and Xiaodong Cheng for the TDG expression vector; Sari van Rossum
and Joyce Lebbink for assistance with recombinant protein isolation; the
Australasian Leukaemia and Lymphoma Group for access to clinical samples; and
Stephen Wilcox for technical assistance with sequencing. Additional sequencing was
performed at The Australian Genome Research Facility (Melbourne, Australia) and
the Kinghorn Centre for Clinical Genomics (Sydney, Australia).! Sean Grimmond,
Jason Wong, Oliver Sieber, Alicia Oshlack and Stephen Nutt provided valuable
feedback on the manuscript.
!
This work was made possible through support from the Australian National Health
and Medical Research Council (NHMRC) (Program Grant 1113577, to W.S.A and
A.W.R), an Independent Research Institutes Infrastructure Support Scheme Grant
(9000220), a Victorian State Government Operational Infrastructure Support Grant,
the Netherlands Organisation for Scientific Research (NWO) and the Center for
Translational Molecular Medicine (CTMM). M.A.S is supported by a grant from
CTMM (GR03O-102) and a Rubicon fellowship from NWO (019.153LW.038), E.C. is
a recipient of a PhD scholarship from the Leukaemia Foundation of Australia, A.H. is
a recipient of a PhD scholarship from the Ministry of Health - Sultanate of Oman,
M.E.B is supported by the Bellberry-Viertel fellowship, W.S.A and A.W.R are
supported by fellowships from NHMRC (1058344 and 1079560, respectively), and
I.J.M. is supported by the Victorian Cancer Agency.! We wish to acknowledge the
.CC-BY-NC-ND 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted November 1, 2017. ; https://doi.org/10.1101/180588doi: bioRxiv preprint

Citations
More filters

Journal ArticleDOI
TL;DR: This review considers both coding and non-coding driver mutations, and discusses how such mutations might be identified from cancer sequencing datasets, and some of the tools and database that are available for the annotation of somatic variants and the identification of cancer driver genes.
Abstract: In the last decade, the costs of genome sequencing have decreased considerably. The commencement of large-scale cancer sequencing projects has enabled cancer genomics to join the big data revolution. One of the challenges still facing cancer genomics research is determining which are the driver mutations in an individual cancer, as these contribute only a small subset of the overall mutation profile of a tumour. Focusing primarily on somatic single nucleotide mutations in this review, we consider both coding and non-coding driver mutations, and discuss how such mutations might be identified from cancer sequencing datasets. We describe some of the tools and database that are available for the annotation of somatic variants and the identification of cancer driver genes. We also address the use of genome-wide variation in mutation load to establish background mutation rates from which to identify driver mutations under positive selection. Finally, we describe the ways in which mutational signatures can act as clues for the identification of cancer drivers, as these mutations may cause, or arise from, certain mutational processes. By defining the molecular changes responsible for driving cancer development, new cancer treatment strategies may be developed or novel preventative measures proposed.

15 citations


Cites background from "Germline loss of MBD4 predisposes t..."

  • ...numbers of C > T mutations (associated with signature 1, following the deamination of methylated cytosines), researchers uncovered a germline mutation in the DNA glycosylase MBD4 that may predispose cells to subsequently developing certain driver mutations that accelerate oncogenesis (Sanders et al. 2017)....

    [...]

  • ...…of C > T mutations (associated with signature 1, following the deamination of methylated cytosines), researchers uncovered a germline mutation in the DNA glycosylase MBD4 that may predispose cells to subsequently developing certain driver mutations that accelerate oncogenesis (Sanders et al. 2017)....

    [...]


Journal ArticleDOI
TL;DR: SuperFreq is a cancer exome sequencing analysis pipeline that integrates identification of somatic single nucleotide variants (SNVs) and copy number alterations (CNAs) and clonal tracking for both and can be applied in many different experimental settings for the analysis of exomes and other capture libraries.
Abstract: Analysing multiple cancer samples from an individual patient can provide insight into the way the disease evolves. Monitoring the expansion and contraction of distinct clones helps to reveal the mutations that initiate the disease and those that drive progression. Existing approaches for clonal tracking from sequencing data typically require the user to combine multiple tools that are not purpose-built for this task. Furthermore, most methods require a matched normal (non-tumour) sample, which limits the scope of application. We developed SuperFreq, a cancer exome sequencing analysis pipeline that integrates identification of somatic single nucleotide variants (SNVs) and copy number alterations (CNAs) and clonal tracking for both. SuperFreq does not require a matched normal and instead relies on unrelated controls. When analysing multiple samples from a single patient, SuperFreq cross checks variant calls to improve clonal tracking, which helps to separate somatic from germline variants, and to resolve overlapping CNA calls. To demonstrate our software we analysed 304 cancer-normal exome samples across 33 cancer types in The Cancer Genome Atlas (TCGA) and evaluated the quality of the SNV and CNA calls. We simulated clonal evolution through in silico mixing of cancer and normal samples in known proportion. We found that SuperFreq identified 93% of clones with a cellular fraction of at least 50% and mutations were assigned to the correct clone with high recall and precision. In addition, SuperFreq maintained a similar level of performance for most aspects of the analysis when run without a matched normal. SuperFreq is highly versatile and can be applied in many different experimental settings for the analysis of exomes and other capture libraries. We demonstrate an application of SuperFreq to leukaemia patients with diagnosis and relapse samples.

13 citations


Posted ContentDOI
30 Jul 2018-bioRxiv
TL;DR: SuperFreq is a cancer exome sequencing analysis pipeline that integrates identification of somatic single nucleotide variants (SNVs) and copy number alterations (CNAs) and clonal tracking for both and can be applied in many different experimental settings for the analysis of exomes and other capture libraries.
Abstract: Motivation Analysing multiple tumour samples from an individual cancer patient allows insight into the way the disease evolves. Monitoring the expansion and contraction of distinct clones helps to reveal the mutations that initiate the disease and those that drive progression; therefore, the ability to identify and track clones using genomics data is of great interest. Existing approaches for clonal tracking typically require the user to combine multiple tools that are not purpose-made. Furthermore, most methods require a matched normal (non-tumour) sample, which limits the scope of application. Results We have built superFreq, a cancer exome sequencing analysis tool that calls and annotates somatic SNVs and CNAs and attributes them to clones. SuperFreq makes use of unrelated control samples and does not require matched normal samples. We demonstrate the ability of superFreq to track clones by combining real samples in known proportions to simulating a multi-sample analysis. In addition, we compared superFreq to other somatic SNV callers and CNA callers on exome sequencing data from cancer-normal pairs, including 304 participants gathered from 33 cancer types in The Cancer Genome Atlas (TCGA). SuperFreq offers a reliable platform to identify somatic mutations and to track clones. SuperFreq recalled 91% of somatic SNVs identified by a consensus of four other methods, with a median of 1 additional somatic SNV per sample that was not found by any other method. CNA calls from superFreq showed good agreement with those generated by Sequenza, or those from ASCAT generated using matched SNP arrays. Using our simulated data set for testing multi-sample clonal tracking, we found that superFreq identified 93% of clones with a cellular fraction of at least 50%, and mutations were assigned to clones with high recall and close to 100% precision. In addition, SuperFreq maintained a similar level of performance for most aspects of the analysis without a matched normal control. SuperFreq is a highly adaptable method and has already been used in multiple different projects. Availability SuperFreq is implemented in R and available on github at https://github.com/ChristofferFlensburg/superFreq.

7 citations


Cites background from "Germline loss of MBD4 predisposes t..."

  • ...SuperFreq was designed to detect and track somatic mutations in exomes, and it has been applied to study breast cancer metastasis [2, 21], lung cancer xenografts [22], gastric cancer organoids [23], and myeloid leukaemia [24]....

    [...]



Posted ContentDOI
16 Jan 2018-bioRxiv
TL;DR: Similar molecular processes shaping population-scale human genome variation also underlies the rapid evolution of an infant ultra-mutated leukemia, which is one of the earliest manifestations of cancer hypermutation recorded.
Abstract: Background: Mixed lineage leukemia/Histone-lysine N-methyltransferase 2A gene rearrangements occur in 80% of infant acute lymphoblastic leukemia, but the role of cooperating events is unknown. While infant leukemias typically carry few somatic lesions, we identified a case with over 100 somatic point mutations per megabase and here report unique genomic-features of this case. Results: The patient presented at 82 days of age, one of the earliest manifestations of cancer hypermutation recorded. The transcriptional profile showed global similarities to canonical cases. Coding lesions were predominantly clonal and almost entirely targeting alleles reported in human genetic variation databases with a notable exception in the mismatch repair gene, MSH2 . There were no rare germline alleles or somatic mutations affecting proof-reading polymerase genes POLE or POLD1 , however there was a predicted damaging mutation in the error prone replicative polymerase, POLK . The patient9s diagnostic leukemia transcriptome was depleted of rare and low-frequency germline alleles due to loss-of-heterozygosity, while somatic point mutations targeted low-frequency and common human alleles in proportions that offset this discrepancy. Somatic signatures of ultra-mutations were highly correlated with germline single nucleotide polymorphic sites indicating a common role for 5-methylcytosine deamination, DNA mismatch repair and DNA adducts. Conclusions: These data suggest similar molecular processes shaping population-scale human genome variation also underlies the rapid evolution of an infant ultra-mutated leukemia.

References
More filters

Journal ArticleDOI
TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.
Abstract: Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ~10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: [email protected]

35,234 citations


Journal ArticleDOI
06 Sep 2012-Nature
TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

11,598 citations


Journal ArticleDOI
TL;DR: The TopHat pipeline is much faster than previous systems, mapping nearly 2.2 million reads per CPU hour, which is sufficient to process an entire RNA-Seq experiment in less than a day on a standard desktop computer.
Abstract: Motivation: A new protocol for sequencing the messenger RNA in a cell, known as RNA-Seq, generates millions of short sequence fragments in a single run. These fragments, or ‘reads’, can be used to measure levels of gene expression and to identify novel splice variants of genes. However, current software for aligning RNA-Seq data to a genome relies on known splice junctions and cannot identify novel ones. TopHat is an efficient read-mapping algorithm designed to align reads from an RNA-Seq experiment to a reference genome without relying on known splice sites. Results: We mapped the RNA-Seq reads from a recent mammalian RNA-Seq experiment and recovered more than 72% of the splice junctions reported by the annotation-based software from that study, along with nearly 20 000 previously unreported junctions. The TopHat pipeline is much faster than previous systems, mapping nearly 2.2 million reads per CPU hour, which is sufficient to process an entire RNA-Seq experiment in less than a day on a standard desktop computer. We describe several challenges unique to ab initio splice site discovery from RNA-Seq reads that will require further algorithm development. Availability: TopHat is free, open-source software available from http://tophat.cbcb.umd.edu Contact: ude.dmu.sc@eloc Supplementary information: Supplementary data are available at Bioinformatics online.

10,631 citations


Journal ArticleDOI
Abstract: MOTIVATION: Next-generation sequencing technologies generate millions of short sequence reads, which are usually aligned to a reference genome. In many applications, the key information required for downstream analysis is the number of reads mapping to each genomic feature, for example to each exon or each gene. The process of counting reads is called read summarization. Read summarization is required for a great variety of genomic analyses but has so far received relatively little attention in the literature. RESULTS: We present featureCounts, a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments. featureCounts implements highly efficient chromosome hashing and feature blocking techniques. It is considerably faster than existing methods (by an order of magnitude for gene-level summarization) and requires far less computer memory. It works with either single or paired-end reads and provides a wide range of options appropriate for different sequencing applications. AVAILABILITY AND IMPLEMENTATION: featureCounts is available under GNU General Public License as part of the Subread (http://subread.sourceforge.net) or Rsubread (http://www.bioconductor.org) software packages.

8,495 citations


Journal ArticleDOI
Monkol Lek, Konrad J. Karczewski1, Konrad J. Karczewski2, Eric Vallabh Minikel2, Eric Vallabh Minikel1, Kaitlin E. Samocha, Eric Banks2, Timothy Fennell2, Anne H. O’Donnell-Luria2, Anne H. O’Donnell-Luria1, Anne H. O’Donnell-Luria3, James S. Ware, Andrew J. Hill2, Andrew J. Hill1, Andrew J. Hill4, Beryl B. Cummings1, Beryl B. Cummings2, Taru Tukiainen1, Taru Tukiainen2, Daniel P. Birnbaum2, Jack A. Kosmicki, Laramie E. Duncan1, Laramie E. Duncan2, Karol Estrada1, Karol Estrada2, Fengmei Zhao1, Fengmei Zhao2, James Zou2, Emma Pierce-Hoffman1, Emma Pierce-Hoffman2, Joanne Berghout5, David Neil Cooper6, Nicole A. Deflaux7, Mark A. DePristo2, Ron Do, Jason Flannick2, Jason Flannick1, Menachem Fromer, Laura D. Gauthier2, Jackie Goldstein1, Jackie Goldstein2, Namrata Gupta2, Daniel P. Howrigan2, Daniel P. Howrigan1, Adam Kiezun2, Mitja I. Kurki1, Mitja I. Kurki2, Ami Levy Moonshine2, Pradeep Natarajan, Lorena Orozco, Gina M. Peloso2, Gina M. Peloso1, Ryan Poplin2, Manuel A. Rivas2, Valentin Ruano-Rubio2, Samuel A. Rose2, Douglas M. Ruderfer8, Khalid Shakir2, Peter D. Stenson6, Christine Stevens2, Brett Thomas1, Brett Thomas2, Grace Tiao2, María Teresa Tusié-Luna, Ben Weisburd2, Hong-Hee Won9, Dongmei Yu, David Altshuler10, David Altshuler2, Diego Ardissino, Michael Boehnke11, John Danesh12, Stacey Donnelly2, Roberto Elosua, Jose C. Florez1, Jose C. Florez2, Stacey Gabriel2, Gad Getz2, Gad Getz1, Stephen J. Glatt13, Christina M. Hultman14, Sekar Kathiresan, Markku Laakso15, Steven A. McCarroll2, Steven A. McCarroll1, Mark I. McCarthy16, Mark I. McCarthy17, Dermot P.B. McGovern18, Ruth McPherson19, Benjamin M. Neale2, Benjamin M. Neale1, Aarno Palotie, Shaun Purcell8, Danish Saleheen20, Jeremiah M. Scharf, Pamela Sklar, Patrick F. Sullivan14, Patrick F. Sullivan21, Jaakko Tuomilehto22, Ming T. Tsuang23, Hugh Watkins17, Hugh Watkins16, James G. Wilson24, Mark J. Daly2, Mark J. Daly1, Daniel G. MacArthur2, Daniel G. MacArthur1 
18 Aug 2016-Nature
TL;DR: The aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC) provides direct evidence for the presence of widespread mutational recurrence.
Abstract: Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.

7,679 citations


Frequently Asked Questions (1)
Q1. What have the authors contributed in "Germline loss of mbd4 predisposes to leukaemia due to a mutagenic cascade driven by 5mc" ?

These authors contributed equally to this work 9 These authors jointly directed this work