scispace - formally typeset
Search or ask a question
Posted ContentDOI

Germline loss of MBD4 predisposes to leukaemia due to a mutagenic cascade driven by 5mC

TL;DR: A novel cancer predisposition syndrome resulting from germline biallelic inactivation of MBD4 that leads to the development of acute myeloid leukaemia (AML), and a critical interaction with somatic mutations in DNMT3A that accelerates leukaemogenesis and accounts for the conserved path to AML is highlighted.
Abstract: Cytosine methylation is essential for normal mammalian development, yet also provides a major mutagenic stimulus. Methylcytosine (5mC) is prone to spontaneous deamination, which introduces cytosine to thymine transition mutations (C>T) upon replication. Cells endure hundreds of 5mC deamination events each day and an intricate repair network is engaged to restrict this damage. Central to this network are the DNA glycosylases MBD4 and TDG, which recognise T:G mispairing and initiate base excision repair (BER). Here we describe a novel cancer predisposition syndrome resulting from germline biallelic inactivation of MBD4 that leads to the development of acute myeloid leukaemia (AML). These leukaemias have an extremely high burden of C>T mutations, specifically in the context of methylated CG dinucleotides (CG>TG). This dependence on 5mC as a source of mutations may explain the remarkable observation that MBD4-deficient AMLs share a common set of driver mutations, including biallelic mutations in DNMT3A and hotspot mutations in IDH1/IDH2. By assessing serial samples taken over the course of treatment, we highlight a critical interaction with somatic mutations in DNMT3A that accelerates leukaemogenesis and accounts for the conserved path to AML. MBD4-deficiency was also detected, rarely, in sporadic cancers, which display the same mutational signature. Collectively these cancers provide a model of 5mC-dependent hypermutation and reveal factors that shape its mutagenic influence.

Summary (2 min read)

Affiliation

  • These authors contributed equally to this work 9 MBD4-deficiency was also detected, rarely, in sporadic cancers, which display the same mutational signature.
  • Both cases exhibited an elevated mutation rate and strong enrichment for CG>TG mutations (Fig. 1d, Extended Data Fig. 1a).
  • This shift in functional activity – the expansion of DNMT3Amutant clones – increases the likelihood that cells with biallelic DNMT3A mutations will emerge, which appears to be key for initiating AML in MBD4-deficient patients.
  • The authors confirmed that recombinant DNMT3A enhances TDG glycosylase activity in vitro (Fig. 4a), but had no impact on MBD4 glycosylase activity (Extended Data Fig. 7).

Contributions

  • All authors discussed the results and agree with the conclusions presented.
  • C, Relative mutation rate in different genomic features per Mb of CG dinucleotides (CG corrected), or corrected for methylation status in CD34+ cells (5mC corrected).
  • Each coloured area is proportional to the representation of the clone and vertical lines indicate sampling points31.
  • B, Schematic representation of the repair pathways governing T:G mismatch repair and the combined influence of germline mutations in MBD4 and somatic mutations in DNMT3A (at top) in AML.

Extended Data References – pg. 20-21

  • Supplementary Information Somatic mutations detected in MBD4-deficient AML at diagnosis (hg19).
  • A quality score is provided , variants with a score >0.5 were used for mutation signature analysis.

AML cases

  • Sanger sequencing traces were generated from cloned PCR products after amplification from DNA (top).
  • B, A schematic of the MBD4 gene is shown at top together with the position of two candidate loss-of-function variants that impact splice sites.
  • Sites with mutations were typically fully methylated in the control sample.
  • Individual values are plotted (n=2) and the bar shows the mean.
  • The relative mutation rate was calculated per bin based on CG or 5mCG abundance (as in a).

Clinical synopsis

  • The AML was negative for NPM1, FLT3 and CEBPa mutations.
  • She had induction chemotherapy (high dose cytarabine, idarubicin and etoposide) and achieved complete morphologic and cytogenetic remission.
  • Bone marrow examination 5 weeks post allogeneic HSCT showed complete morphologic and cytogenetic remission; and full donor chimerism.
  • Relapsed AML (of WEHI-AML-1 origin) occurred 11 weeks post allogeneic HSCT.

Methods

  • Patient characteristics and sample collection EMC-AML-1, WEHI-AML-1 and WEHI-AML-2 were diagnosed with AML and treated with combination chemotherapy as per the protocols at their respective institutions [see Clinical Synopsis].
  • They gave informed consent according to the Declaration of Helsinki for participation in research and for collection of samples over the course of their treatment.
  • DNA libraries were quantified and used for both whole genome sequencing and whole exome sequencing.
  • Reduced representation bisulfite sequencing (RRBS) For WEHI-AML-1 and WEHI-AML-2, between 75 to 100 ng of DNA was used to construct RRBS libraries using the Ovation RRBS Methyl-Seq System (NuGEN, San Carlos, CA, USA).
  • DNA was restriction enzyme digested using Mspl followed by ligation with indexed adaptors.

RNA sequencing

  • For WEHI-AML-1 and WEHI-AML-2, total RNA was extracted using TRIzol (Thermo Fisher Scientific, Waltham, MA, USA) as per manufacturer’s instructions.
  • As the mutations occurred almost exclusively in a CG context, the rate of CG>TG mutations per CG was calculated for each genomic feature.
  • Transcriptional strand and expression level: Transcriptional strand bias analysis was performed by determining the template and non-template strands per gene as reported in Ensembl v7513.
  • Libraries were generated as per manufacturer’s instructions and the sequencing was performed on a MiSeq.

Site-directed mutagenesis and cloning

  • And anti-sense 5’- TTGTATTTCCAGGGCGGCACGACTGGGCTGGAGAGTCT-3’. QuikChange II XL Site-Directed Mutagenesis Kit (Agilent Technologies, Santa Clara, CA, USA) was used to generate the DNMT3A and MBD4 mutants.
  • Proteins were verified by SDS-PAGE using a NuPage Novex 4-12% Bis-Tris Protein Gel run in a Bis-Tris XCell SureLock™ Mini-Cell system (Thermo Fisher Scientific, Waltham, MA, USA) with 1x MOPS at 200V for 90 minutes.
  • MBD4 and TDG glycosylase activity assays MBD4 and TDG glycosylase activity assays were performed as described (Hashimoto et al., NAR, 2012).

Data availability

  • Sequencing data from WEHI-AML-1 and WEHI-AML-2 have been deposited at the European Genome Phenome Archive (EGA) [EGAS00001002581].
  • The data are available for ethically approved research into haematological malignancy upon completion of a data transfer agreement.
  • Sequencing data from EMC-AML-1 were sourced from the dbGaP under accession phs001027.
  • TCGA data were downloaded from the GDC Data Commons.
  • Code to reproduce the figures and data are made available through GitHub (https://github.com/MathijsSanders/AML-RoaMeR).

Extended Data References

  • Distinct evolution and dynamics of epigenetic and genetic heterogeneity in acute myeloid leukemia.
  • The UCSC Genome Browser database: 2017 update.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

1
Title
Germline loss of MBD4 predisposes to leukaemia due to a mutagenic cascade
driven by 5mC
Authors
Mathijs A. Sanders
1,8
, Edward Chew
2,3,4,5,8
, Christoffer Flensburg
2,4
, Annelieke
Zeilemaker
1
, Sarah E. Miller
2
, Adil S. al Hinai
1,6
, Ashish Bajel
3,5
, Bram Luiken
1
,
Melissa Rijken
1
, Tamara Mclennan
7
, Remco M. Hoogenboezem
1
, François G.
Kavelaars
1
, Marnie E. Blewitt
4,7
, Eric M. Bindels
1
, Warren S. Alexander
2,4
, Bob
Löwenberg
1
, Andrew W. Roberts
2,3,4,5
, Peter J.M. Valk
1,9
*, Ian J. Majewski
2,4,9
*
Affiliation
1
Department of Hematology, Erasmus University Medical Center, Rotterdam, The
Netherlands
2
Division of Cancer and Haematology, The Walter and Eliza Hall Institute of Medical
Research, Parkville, Australia
3
Department of Clinical Haematology and Bone Marrow Transplantation, Royal
Melbourne Hospital, Parkville, Australia
4
Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne,
Parkville, Australia
5
Victorian Comprehensive Cancer Centre, Parkville, Australia
6
National Genetic Center, Royal Hospital, Ministry of Health, Sultanate of Oman
7
Division of Molecular Medicine, The Walter and Eliza Hall Institute of Medical
Research, Parkville, Australia
8
These authors contributed equally to this work
9
These authors jointly directed this work
* Correspondence
Peter J.M. Valk
Department of Hematology
Erasmus University Medical Center
Em: p.valk@erasmusmc.nl
Ian J. Majewski
Cancer and Haematology Division
The Walter and Eliza Hall Institute of Medical Research
Em: majewski@wehi.edu.au
.CC-BY-NC-ND 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted November 1, 2017. ; https://doi.org/10.1101/180588doi: bioRxiv preprint

2
Cytosine methylation is essential for normal mammalian development, yet also
provides a major mutagenic stimulus. Methylcytosine (5mC) is prone to spontaneous
deamination, which introduces cytosine to thymine transition mutations (C>T) upon
replication
1
. Cells endure hundreds of 5mC deamination events each day and an
intricate repair network is engaged to restrict this damage. Central to this network
are the DNA glycosylases MBD4
2
and TDG
3,4
, which recognise T:G mispairing and
initiate base excision repair (BER). Here we describe a novel cancer predisposition
syndrome resulting from germline biallelic inactivation of MBD4 that leads to the
development of acute myeloid leukaemia (AML). These leukaemias have an
extremely high burden of C>T mutations, specifically in the context of methylated CG
dinucleotides (CG>TG). This dependence on 5mC as a source of mutations may
explain the remarkable observation that MBD4-deficient AMLs share a common set
of driver mutations, including biallelic mutations in DNMT3A and hotspot mutations in
IDH1/IDH2. By assessing serial samples taken over the course of treatment, we
highlight a critical interaction with somatic mutations in DNMT3A that accelerates
leukaemogenesis and accounts for the conserved path to AML. MBD4-deficiency
was also detected, rarely, in sporadic cancers, which display the same mutational
signature. Collectively these cancers provide a model of 5mC-dependent
hypermutation and reveal factors that shape its mutagenic influence.
We identified three patients with AML, including two siblings, that were distinctive
because of their early age of onset (all <35 years old) and an extremely high
mutational burden (~33-fold above what is typical for AML) (Fig. 1a, Clinical
Synopsis). Virtually all of the somatic mutations identified were C>T in the context of
a CG dinucleotide (>95% of SNVs) (Fig. 1b, Extended Data Fig. 1). This differs
markedly from the distribution of C>T mutations in AML generally and is more
refined than the mutational signature ascribed to ageing, which includes a strong
contribution from 5mC deamination
5
. All three cases carried rare germline loss-of-
function variants in the gene encoding the DNA glycosylase MBD4
2
(Fig. 1c,
Extended Data Table 1). Case EMC-AML-1 carried a homozygous MBD4 in-frame
deletion of Histidine 567 (His567) in the glycosylase domain. An in vitro glycosylase
assay confirmed that loss of His567 results in a catalytically inactive MBD4 protein
(Fig. 1c). The siblings (WEHI-AML-1, WEHI-AML-2) were compound heterozygotes
with a frameshift in exon 3 and a variant that disrupts the splice acceptor of exon 7
(Fig. 1c, Extended Data Table 1). Analysis of the MBD4 mRNA allowed for phasing
of the variants to distinct alleles and confirmed aberrant splicing that excludes exon 7
and disrupts the glycosylase domain (Extended Data Fig. 2). MBD4 has not
previously been associated with haematological malignancy, but somatic mutations
have been detected in sporadic colon cancers with mismatch repair (MMR)
deficiency
6,7
. Two patients (EMC-AML-1, WEHI-AML-2) also had colorectal polyps, a
common manifestation of DNA repair defects, including those associated with loss of
BER components MUTYH
8-10
and NTHL1
11
.
We accessed large cancer databases to explore the link between MBD4-deficiency
and the distinctive CG>TG signature. Analysis of the Cancer Genome Atlas (TCGA)
identified nine cases, from 10,683 total, that carried germline loss-of-function
.CC-BY-NC-ND 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted November 1, 2017. ; https://doi.org/10.1101/180588doi: bioRxiv preprint

3
variants in MBD4 (Fig. 1c, Extended Data Table 1). In two cases, a uveal
melanoma (TCGA-UVM-1) and a glioblastoma multiforme (TCGA-GBM-1), splice
site mutations were accompanied by loss of the wildtype MBD4 allele due to somatic
copy number alterations (Extended Data Fig. 3a). Analysis of RNA sequencing from
both tumours confirmed aberrant splicing of MBD4, predicted to result in protein
truncation and loss of function (Extended Data Fig. 3b). Both cases exhibited an
elevated mutation rate and strong enrichment for CG>TG mutations (Fig. 1d,
Extended Data Fig. 1a). This signature was also observed in a glioma cell line,
SW1783, that carries a homozygous truncating variant in MBD4 at Leu563
(Extended Data Fig. 1a). The cancers that retained a wildtype allele did not display
a prominent CG>TG signature (Fig. 1d). These results suggest both alleles of MBD4
must be inactivated to block its repair activity, which is consistent with other BER-
associated cancer syndromes
8,11
. Analysis of a larger cohort will be required to
determine whether heterozygous loss of MBD4 predisposes to cancer.
Whole genome sequencing and methylation profiling were performed to refine the
mutational signature associated with MBD4-deficiency in AML. While MBD4 is
known to interact with the MMR pathway
12
, MBD4-deficienct leukaemias were
largely devoid of small insertions and deletions, suggesting MMR remains intact.
Overall, >15,000 substitution mutations were identified in each AML genome, of
which >90% were CG>TG (Fig. 2a, Extended Data Fig. 1b). The proportion of
mutations was higher in the context of the ACG triplet and lower in the context of
TCG, with CCG and GCG being intermediate. This difference remained after
correction for trimer abundance and methylation status (Fig. 2b), and was found to
be significant in the exome data from the five MBD4-deficient cancers (p= 0.007937,
Mann-Whitney U test) (Extended Data Fig. 1). The ACA trimer was the most
commonly mutated site outside of a CG context, and this matches the most common
site of non-CG methylation
13
. The mutation rate for a given region was linked to 5mC
abundance. Sparsely methylated regions, such as promoters and CG islands, were
rarely mutated (Fig. 2c). Correcting for 5mC abundance revealed a consistent
mutation rate across different genomic features (Fig. 2c). Reduced representation
bisulfite sequencing (RRBS) confirmed that >95% of CG sites mutated in the AML
were fully methylated in matched normal bone marrow available for two cases (Fig.
2d). Assessment of the mutated sites in each AML directly revealed ~50%
methylation, indicating the non-mutated CG site on the alternate allele was
methylated (Fig. 2d). Similar results were obtained when we assessed sites mutated
in the MBD4-deficient glioblastoma (Extended Data Fig. 4). We next assessed the
influence of genetic and epigenetic features known to influence mutation rate
14
.
Extending the analysis of sequence context to include one base either side of the
CG identified higher mutation rates in the context of a 3’ cytosine (NCGC), with the
highest rate at ACGC (Fig. 2e). The relative mutation rate was not influenced by the
transcriptional strand (Extended Data Fig. 5a), but was higher in late replicating
regions (Fig. 2f) and at lowly expressed genes (Extended Data Fig. 5b).
Collectively these results suggest that while 5mC is the dominant factor contributing
to the mutation rate, the local sequence context, replication timing and expression
status also contribute. The differences between tetramers and enrichment in late
.CC-BY-NC-ND 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted November 1, 2017. ; https://doi.org/10.1101/180588doi: bioRxiv preprint

4
replicating regions were also evident in rare germline CG>TG SNPs from the
gnomAD database
15
, indicating this phenomenon is not restricted to cancer
(Extended Data Fig. 5c).
The three cases with germline MBD4-deficiency shared a common path to AML.
They acquired biallelic DNMT3A mutations and IDH1/IDH2 hotspot mutations, all of
which were CG>TG (Fig. 3). Biallelic DNMT3A mutations are uncommon in AML,
affecting ~3% of patients in TCGA-AML, and when considering they also have
coincident IDH1/IDH2 mutations, it is highly unlikely that these three individuals
share this pattern of driver mutations by chance. These mutations impact 5mC at
multiple levels deposition (DNMT3A), removal (IDH1/IDH2) and repair (MBD4)
and this convergence suggests that modulating DNA methylation is central to AML
pathogenesis in MBD4-deficient cases. Analysis of sequential bone marrow biopsies
taken during treatment and single cell genotyping allowed us to refine the order of
somatic mutation acquisition in two cases (EMC-AML-1, WEHI-AML-1) (Fig. 3a-b,
Extended Data Fig. 6). DNMT3A mutations present in the AML at diagnosis were
also detected in non-malignant bone marrow populations in both cases, indicating
that these mutations are among the first acquired. Mutations in DNMT3A enhance
the self-renewal capacity of haematopoietic stem cells (HSCs) and are associated
with age-related clonal haematopoiesis
16-19
. In both patients, marked expansion of
clones carrying DNMT3A mutations occurred with treatment (Fig. 3a-b), suggesting
a strong advantage over normal HSCs. EMC-AML-1 experienced multiple clonal
outgrowths, with nine distinct DNMT3A mutations, and repeated selection of clones
with biallelic mutations. This shift in functional activity the expansion of DNMT3A-
mutant clones increases the likelihood that cells with biallelic DNMT3A mutations
will emerge, which appears to be key for initiating AML in MBD4-deficient patients.
There is a marked discrepancy between the substantial mutation burden in MBD4-
deficient AMLs and the modest 2-3 fold increase in mutation rate in MBD4-deficient
mice
20,21
. It is unclear whether this difference is a reflection of longer disease latency
in humans, as compared to mice, or whether somatic mutations in the AML further
compromise DNA repair. Mutations in DNMT3A and IDH1/IDH2 have been
associated with altered DNA repair in model systems
22,23
. It also remains unclear
why TDG, a glycosylase with overlapping substrate specificity, does not compensate
for MBD4 loss. One possible explanation stems from the observation that
DNMT3A/B can directly stimulate TDG glycosylase activity
24,25
. We confirmed that
recombinant DNMT3A enhances TDG glycosylase activity in vitro (Fig. 4a), but had
no impact on MBD4 glycosylase activity (Extended Data Fig. 7). Mutant forms of
DNMT3A showed weaker stimulation, and even inhibit TDG at higher concentrations
(Fig. 4a). We propose a model for AML pathogenesis whereby inhibition of DNMT3A
contributes in two ways: loss of one allele enables expansion of a premalignant
clone, then acquisition of a second DNMT3A mutation increases the CG>TG
mutation rate due to impaired TDG activity (Fig. 4b). Supporting this model, the
premalignant clone identified in WEHI-AML-1, which had a monoallelic DNMT3A
mutation, did not carry additional mutations that would suggest an elevated mutation
rate. The sporadic cancers that became MBD4-deficient (TCGA-UVM-1 and TCGA-
.CC-BY-NC-ND 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted November 1, 2017. ; https://doi.org/10.1101/180588doi: bioRxiv preprint

5
GBM-1) did not acquire mutations in DNMT3A or IDH1/IDH2, which may indicate
that this interaction is specific to the haematopoietic compartment.
The last five years have seen a concerted effort to define mutational processes that
shape the cancer genome
5
. Deamination of 5mC is the most common source of
somatic mutations and this damage continues to accumulate with age
26
. Our results
highlight the important role for MBD4 in safeguarding against damage wrought by
5mC deamination. One manifestation of this damage is clonal haematopoiesis, a
phenomenon typically observed in people >70 years of age. Individuals with biallelic
loss of MBD4 in the germline sustain high levels of damage from 5mC deamination
and experience clonal expansions decades earlier, which eventually progress to
AML. There are more than 40 million 5mC residues in the genome, yet these
individuals develop the same type of cancer AML with a common set of driver
mutations. Our results indicate this convergence results from the combination of a
highly restricted mutational signature, which accesses a select set of driver genes,
and the dual role of DNMT3A, which regulates HSC function and directly contributes
to DNA repair. This interaction between mutational process, driver landscape and
stem cell biology has broad implications, and may explain the tissue restricted
pattern of disease in this and other cancer predisposition syndromes.
Acknowledgements
The authors would like to thank Simon He, Anita Rijneveld, Kirsten van Lom and
Kirsten Gussinklo for providing clinical information and reviewing samples; Meaghan
Wall for assistance with cytogenetics; Naomi Sprigg for assistance with sample
collection; Elwin Rombouts for assistance with single cell sorting; Hideharu
Hashimoto and Xiaodong Cheng for the TDG expression vector; Sari van Rossum
and Joyce Lebbink for assistance with recombinant protein isolation; the
Australasian Leukaemia and Lymphoma Group for access to clinical samples; and
Stephen Wilcox for technical assistance with sequencing. Additional sequencing was
performed at The Australian Genome Research Facility (Melbourne, Australia) and
the Kinghorn Centre for Clinical Genomics (Sydney, Australia).! Sean Grimmond,
Jason Wong, Oliver Sieber, Alicia Oshlack and Stephen Nutt provided valuable
feedback on the manuscript.
!
This work was made possible through support from the Australian National Health
and Medical Research Council (NHMRC) (Program Grant 1113577, to W.S.A and
A.W.R), an Independent Research Institutes Infrastructure Support Scheme Grant
(9000220), a Victorian State Government Operational Infrastructure Support Grant,
the Netherlands Organisation for Scientific Research (NWO) and the Center for
Translational Molecular Medicine (CTMM). M.A.S is supported by a grant from
CTMM (GR03O-102) and a Rubicon fellowship from NWO (019.153LW.038), E.C. is
a recipient of a PhD scholarship from the Leukaemia Foundation of Australia, A.H. is
a recipient of a PhD scholarship from the Ministry of Health - Sultanate of Oman,
M.E.B is supported by the Bellberry-Viertel fellowship, W.S.A and A.W.R are
supported by fellowships from NHMRC (1058344 and 1079560, respectively), and
I.J.M. is supported by the Victorian Cancer Agency.! We wish to acknowledge the
.CC-BY-NC-ND 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted November 1, 2017. ; https://doi.org/10.1101/180588doi: bioRxiv preprint

Citations
More filters
Journal ArticleDOI
TL;DR: SuperFreq is a cancer exome sequencing analysis pipeline that integrates identification of somatic single nucleotide variants (SNVs) and copy number alterations (CNAs) and clonal tracking for both and can be applied in many different experimental settings for the analysis of exomes and other capture libraries.
Abstract: Analysing multiple cancer samples from an individual patient can provide insight into the way the disease evolves. Monitoring the expansion and contraction of distinct clones helps to reveal the mutations that initiate the disease and those that drive progression. Existing approaches for clonal tracking from sequencing data typically require the user to combine multiple tools that are not purpose-built for this task. Furthermore, most methods require a matched normal (non-tumour) sample, which limits the scope of application. We developed SuperFreq, a cancer exome sequencing analysis pipeline that integrates identification of somatic single nucleotide variants (SNVs) and copy number alterations (CNAs) and clonal tracking for both. SuperFreq does not require a matched normal and instead relies on unrelated controls. When analysing multiple samples from a single patient, SuperFreq cross checks variant calls to improve clonal tracking, which helps to separate somatic from germline variants, and to resolve overlapping CNA calls. To demonstrate our software we analysed 304 cancer-normal exome samples across 33 cancer types in The Cancer Genome Atlas (TCGA) and evaluated the quality of the SNV and CNA calls. We simulated clonal evolution through in silico mixing of cancer and normal samples in known proportion. We found that SuperFreq identified 93% of clones with a cellular fraction of at least 50% and mutations were assigned to the correct clone with high recall and precision. In addition, SuperFreq maintained a similar level of performance for most aspects of the analysis when run without a matched normal. SuperFreq is highly versatile and can be applied in many different experimental settings for the analysis of exomes and other capture libraries. We demonstrate an application of SuperFreq to leukaemia patients with diagnosis and relapse samples.

33 citations

Journal ArticleDOI
TL;DR: This review considers both coding and non-coding driver mutations, and discusses how such mutations might be identified from cancer sequencing datasets, and some of the tools and database that are available for the annotation of somatic variants and the identification of cancer driver genes.
Abstract: In the last decade, the costs of genome sequencing have decreased considerably. The commencement of large-scale cancer sequencing projects has enabled cancer genomics to join the big data revolution. One of the challenges still facing cancer genomics research is determining which are the driver mutations in an individual cancer, as these contribute only a small subset of the overall mutation profile of a tumour. Focusing primarily on somatic single nucleotide mutations in this review, we consider both coding and non-coding driver mutations, and discuss how such mutations might be identified from cancer sequencing datasets. We describe some of the tools and database that are available for the annotation of somatic variants and the identification of cancer driver genes. We also address the use of genome-wide variation in mutation load to establish background mutation rates from which to identify driver mutations under positive selection. Finally, we describe the ways in which mutational signatures can act as clues for the identification of cancer drivers, as these mutations may cause, or arise from, certain mutational processes. By defining the molecular changes responsible for driving cancer development, new cancer treatment strategies may be developed or novel preventative measures proposed.

23 citations


Cites background from "Germline loss of MBD4 predisposes t..."

  • ...numbers of C > T mutations (associated with signature 1, following the deamination of methylated cytosines), researchers uncovered a germline mutation in the DNA glycosylase MBD4 that may predispose cells to subsequently developing certain driver mutations that accelerate oncogenesis (Sanders et al. 2017)....

    [...]

  • ...…of C > T mutations (associated with signature 1, following the deamination of methylated cytosines), researchers uncovered a germline mutation in the DNA glycosylase MBD4 that may predispose cells to subsequently developing certain driver mutations that accelerate oncogenesis (Sanders et al. 2017)....

    [...]

Posted ContentDOI
30 Jul 2018-bioRxiv
TL;DR: SuperFreq is a cancer exome sequencing analysis pipeline that integrates identification of somatic single nucleotide variants (SNVs) and copy number alterations (CNAs) and clonal tracking for both and can be applied in many different experimental settings for the analysis of exomes and other capture libraries.
Abstract: Motivation Analysing multiple tumour samples from an individual cancer patient allows insight into the way the disease evolves. Monitoring the expansion and contraction of distinct clones helps to reveal the mutations that initiate the disease and those that drive progression; therefore, the ability to identify and track clones using genomics data is of great interest. Existing approaches for clonal tracking typically require the user to combine multiple tools that are not purpose-made. Furthermore, most methods require a matched normal (non-tumour) sample, which limits the scope of application. Results We have built superFreq, a cancer exome sequencing analysis tool that calls and annotates somatic SNVs and CNAs and attributes them to clones. SuperFreq makes use of unrelated control samples and does not require matched normal samples. We demonstrate the ability of superFreq to track clones by combining real samples in known proportions to simulating a multi-sample analysis. In addition, we compared superFreq to other somatic SNV callers and CNA callers on exome sequencing data from cancer-normal pairs, including 304 participants gathered from 33 cancer types in The Cancer Genome Atlas (TCGA). SuperFreq offers a reliable platform to identify somatic mutations and to track clones. SuperFreq recalled 91% of somatic SNVs identified by a consensus of four other methods, with a median of 1 additional somatic SNV per sample that was not found by any other method. CNA calls from superFreq showed good agreement with those generated by Sequenza, or those from ASCAT generated using matched SNP arrays. Using our simulated data set for testing multi-sample clonal tracking, we found that superFreq identified 93% of clones with a cellular fraction of at least 50%, and mutations were assigned to clones with high recall and close to 100% precision. In addition, SuperFreq maintained a similar level of performance for most aspects of the analysis without a matched normal control. SuperFreq is a highly adaptable method and has already been used in multiple different projects. Availability SuperFreq is implemented in R and available on github at https://github.com/ChristofferFlensburg/superFreq.

22 citations


Cites background from "Germline loss of MBD4 predisposes t..."

  • ...SuperFreq was designed to detect and track somatic mutations in exomes, and it has been applied to study breast cancer metastasis [2, 21], lung cancer xenografts [22], gastric cancer organoids [23], and myeloid leukaemia [24]....

    [...]

Posted ContentDOI
16 Jan 2018-bioRxiv
TL;DR: Similar molecular processes shaping population-scale human genome variation also underlies the rapid evolution of an infant ultra-mutated leukemia, which is one of the earliest manifestations of cancer hypermutation recorded.
Abstract: Background: Mixed lineage leukemia/Histone-lysine N-methyltransferase 2A gene rearrangements occur in 80% of infant acute lymphoblastic leukemia, but the role of cooperating events is unknown. While infant leukemias typically carry few somatic lesions, we identified a case with over 100 somatic point mutations per megabase and here report unique genomic-features of this case. Results: The patient presented at 82 days of age, one of the earliest manifestations of cancer hypermutation recorded. The transcriptional profile showed global similarities to canonical cases. Coding lesions were predominantly clonal and almost entirely targeting alleles reported in human genetic variation databases with a notable exception in the mismatch repair gene, MSH2 . There were no rare germline alleles or somatic mutations affecting proof-reading polymerase genes POLE or POLD1 , however there was a predicted damaging mutation in the error prone replicative polymerase, POLK . The patient9s diagnostic leukemia transcriptome was depleted of rare and low-frequency germline alleles due to loss-of-heterozygosity, while somatic point mutations targeted low-frequency and common human alleles in proportions that offset this discrepancy. Somatic signatures of ultra-mutations were highly correlated with germline single nucleotide polymorphic sites indicating a common role for 5-methylcytosine deamination, DNA mismatch repair and DNA adducts. Conclusions: These data suggest similar molecular processes shaping population-scale human genome variation also underlies the rapid evolution of an infant ultra-mutated leukemia.
References
More filters
Journal Article
01 Jan 2012-Nature
TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

8,106 citations

Journal ArticleDOI
Ludmil B. Alexandrov1, Serena Nik-Zainal2, Serena Nik-Zainal3, David C. Wedge1, Samuel Aparicio4, Sam Behjati1, Sam Behjati5, Andrew V. Biankin, Graham R. Bignell1, Niccolo Bolli1, Niccolo Bolli5, Åke Borg2, Anne Lise Børresen-Dale6, Anne Lise Børresen-Dale7, Sandrine Boyault8, Birgit Burkhardt8, Adam Butler1, Carlos Caldas9, Helen Davies1, Christine Desmedt, Roland Eils5, Jorunn E. Eyfjord10, John A. Foekens11, Mel Greaves12, Fumie Hosoda13, Barbara Hutter5, Tomislav Ilicic1, Sandrine Imbeaud14, Sandrine Imbeaud15, Marcin Imielinsk15, Natalie Jäger5, David T. W. Jones16, David T. Jones1, Stian Knappskog11, Stian Knappskog17, Marcel Kool11, Sunil R. Lakhani18, Carlos López-Otín18, Sancha Martin1, Nikhil C. Munshi19, Nikhil C. Munshi20, Hiromi Nakamura13, Paul A. Northcott16, Marina Pajic21, Elli Papaemmanuil1, Angelo Paradiso22, John V. Pearson23, Xose S. Puente18, Keiran Raine1, Manasa Ramakrishna1, Andrea L. Richardson19, Andrea L. Richardson22, Julia Richter22, Philip Rosenstiel22, Matthias Schlesner5, Ton N. Schumacher24, Paul N. Span25, Jon W. Teague1, Yasushi Totoki13, Andrew Tutt24, Rafael Valdés-Mas18, Marit M. van Buuren25, Laura van ’t Veer26, Anne Vincent-Salomon27, Nicola Waddell23, Lucy R. Yates1, Icgc PedBrain24, Jessica Zucman-Rossi15, Jessica Zucman-Rossi14, P. Andrew Futreal1, Ultan McDermott1, Peter Lichter24, Matthew Meyerson15, Matthew Meyerson19, Sean M. Grimmond23, Reiner Siebert22, Elias Campo28, Tatsuhiro Shibata13, Stefan M. Pfister11, Stefan M. Pfister16, Peter J. Campbell3, Peter J. Campbell29, Peter J. Campbell30, Michael R. Stratton31, Michael R. Stratton3 
22 Aug 2013-Nature
TL;DR: It is shown that hypermutation localized to small genomic regions, ‘kataegis’, is found in many cancer types, and this results reveal the diversity of mutational processes underlying the development of cancer.
Abstract: All cancers are caused by somatic mutations; however, understanding of the biological processes generating these mutations is limited. The catalogue of somatic mutations from a cancer genome bears the signatures of the mutational processes that have been operative. Here we analysed 4,938,362 mutations from 7,042 cancers and extracted more than 20 distinct mutational signatures. Some are present in many cancer types, notably a signature attributed to the APOBEC family of cytidine deaminases, whereas others are confined to a single cancer class. Certain signatures are associated with age of the patient at cancer diagnosis, known mutagenic exposures or defects in DNA maintenance, but many are of cryptic origin. In addition to these genome-wide mutational signatures, hypermutation localized to small genomic regions, 'kataegis', is found in many cancer types. The results reveal the diversity of mutational processes underlying the development of cancer, with potential implications for understanding of cancer aetiology, prevention and therapy.

7,904 citations

Journal ArticleDOI
19 Nov 2009-Nature
TL;DR: The first genome-wide, single-base-resolution maps of methylated cytosines in a mammalian genome, from both human embryonic stem cells and fetal fibroblasts, along with comparative analysis of messenger RNA and small RNA components of the transcriptome, several histone modifications, and sites of DNA-protein interaction for several key regulatory factors were presented in this article.
Abstract: DNA cytosine methylation is a central epigenetic modification that has essential roles in cellular processes including genome regulation, development and disease. Here we present the first genome-wide, single-base-resolution maps of methylated cytosines in a mammalian genome, from both human embryonic stem cells and fetal fibroblasts, along with comparative analysis of messenger RNA and small RNA components of the transcriptome, several histone modifications, and sites of DNA-protein interaction for several key regulatory factors. Widespread differences were identified in the composition and patterning of cytosine methylation between the two genomes. Nearly one-quarter of all methylation identified in embryonic stem cells was in a non-CG context, suggesting that embryonic stem cells may use different methylation mechanisms to affect gene regulation. Methylation in non-CG contexts showed enrichment in gene bodies and depletion in protein binding sites and enhancers. Non-CG methylation disappeared upon induced differentiation of the embryonic stem cells, and was restored in induced pluripotent stem cells. We identified hundreds of differentially methylated regions proximal to genes involved in pluripotency and differentiation, and widespread reduced methylation levels in fibroblasts associated with lower transcriptional activity. These reference epigenomes provide a foundation for future studies exploring this key epigenetic modification in human disease and development.

4,266 citations

Journal ArticleDOI
TL;DR: An analysis tool for the detection of somatic mutations and copy number alterations in exome data from tumor-normal pairs is presented and new light is shed on the landscape of genetic alterations in ovarian cancer.
Abstract: Exome sequencing of tumor samples and matched normal controls has the potential to rapidly identify protein-altering mutations across hundreds of patients, potentially enabling the discovery of recurrent events driving tumor development and growth (International Cancer Genome Consortium 2010; Stratton 2011). Yet the analysis of such data presents significant challenges. Sequencing coverage is nonuniform across targeted regions and from one sample to the next (Ng et al. 2009; Bainbridge et al. 2010; Teer et al. 2010). Many regions achieve high read depth (more than 100×), which can confound variant callers and depth-based filters if not properly addressed (Ku et al. 2011). Repetitive and paralogous sequences can give rise to numerous false positives. The detection of somatic mutations in tumor genomes is even more challenging. The genomes of primary tumors are genetically heterogeneous (Ding et al. 2010), with frequent rearrangements (Campbell et al. 2008) and copy number alterations (CNAs) (Beroukhim et al. 2010). Further, somatic mutations are relatively rare compared with germline variation, often representing <0.1% of variants in a tumor genome (Ley et al. 2008; Mardis et al. 2009). Simply subtracting variants in the matched normal from variants in the tumor (Wei et al. 2011) is poorly suited for the analysis of exome sequence data, because it fails to account for regions that were undersampled in the normal. Accurate mutation detection requires a direct, simultaneous comparison of tumor–normal pairs at every position in the exome, but few algorithms to do so have been described. Numerous algorithms have been developed to assess genome-wide copy number using whole-genome sequencing (WGS) data. Most of these approaches (Campbell et al. 2008; Alkan et al. 2009; Chiang et al. 2009; Yoon et al. 2009; Abyzov et al. 2011) would be confounded by exome data sets, because of the biases introduced by hybridization and the sparse and uneven coverages throughout the genome. However, when both DNA samples in a tumor–normal pair were captured and sequenced under identical hybridization conditions, we reasoned that it might be possible to detect somatic CNAs (SCNAs) as deviations from the log-ratio of sequence coverage depth within a tumor–normal pair, and then quantify the deviations statistically. Such an approach would provide a gene-centric view of copy number in a tumor sample, though it would be limited to the ∼1% of the genome captured by current exome platforms. Previously, we published VarScan (Koboldt et al. 2009), an algorithm for variant detection in next-generation sequencing data. We have since released a new tool, VarScan 2 (http://varscan.sourceforge.net), with several improvements, including the ability to identify somatic mutation, loss of heterozygosity (LOH), and CNA events in tumor–normal pairs. VarScan 2 analyzes sequence data from a tumor sample and its corresponding normal sample simultaneously, applying heuristic methods and a statistical test to detect variants—single nucleotide variants (SNVs) and insertions/deletions (indels)—and classify them by somatic status. By direct comparison of normalized sequence depth, our method also detects SCNAs in the tumor genome. Here, we utilize VarScan 2 for the analysis of exome sequence data from 151 patients with high-grade serous ovarian adenocarcinoma (HGS-OVCa) that were initially characterized within the Cancer Genome Atlas (TCGA) project (Cancer Genome Atlas Research Network 2011). We present a robust pipeline for the detection of both germline (inherited) and somatic (acquired) mutations by exome sequencing and describe filtering approaches for detecting variants with high sensitivity and specificity. To evaluate the performance of our SCNA detection algorithm, we compare our results to copy number data from high-density SNP array and WGS approaches. Our results demonstrate the accuracy of VarScan 2 for somatic mutation and CNA detection and enable a new survey of the genetic landscape in ovarian carcinoma.

4,096 citations

Journal ArticleDOI
Timothy J. Ley1, Christopher A. Miller1, Li Ding1, Benjamin J. Raphael2, Andrew J. Mungall3, Gordon Robertson3, Katherine A. Hoadley4, Timothy J. Triche5, Peter W. Laird5, Jack Baty1, Lucinda Fulton1, Robert S. Fulton1, Sharon Heath1, Joelle Kalicki-Veizer1, Cyriac Kandoth1, Jeffery M. Klco1, Daniel C. Koboldt1, Krishna L. Kanchi1, Shashikant Kulkarni1, Tamara Lamprecht1, David E. Larson1, G. Lin1, Charles Lu1, Michael D. McLellan1, Joshua F. McMichael1, Jacqueline E. Payton1, Heather Schmidt1, David H. Spencer1, Michael H. Tomasson1, John W. Wallis1, Lukas D. Wartman1, Mark A. Watson1, John S. Welch1, Michael C. Wendl1, Adrian Ally3, Miruna Balasundaram3, Inanc Birol3, Yaron S.N. Butterfield3, Readman Chiu3, Andy Chu3, Eric Chuah3, Hye Jung E. Chun3, Richard Corbett3, Noreen Dhalla3, Ranabir Guin3, An He3, Carrie Hirst3, Martin Hirst3, Robert A. Holt3, Steven J.M. Jones3, Aly Karsan3, Darlene Lee3, Haiyan I. Li3, Marco A. Marra3, Michael Mayo3, Richard A. Moore3, Karen Mungall3, Jeremy Parker3, Erin Pleasance3, Patrick Plettner3, Jacquie Schein3, Dominik Stoll3, Lucas Swanson3, Angela Tam3, Nina Thiessen3, Richard Varhol3, Natasja Wye3, Yongjun Zhao3, Stacey Gabriel6, Gad Getz6, Carrie Sougnez6, Lihua Zou6, Mark D.M. Leiserson2, Fabio Vandin2, Hsin-Ta Wu2, Frederick Applebaum7, Stephen B. Baylin8, Rehan Akbani9, Bradley M. Broom9, Ken Chen9, Thomas C. Motter9, Khanh Thi-Thuy Nguyen9, John N. Weinstein9, Nianziang Zhang9, Martin L. Ferguson, Christopher Adams10, Aaron D. Black10, Jay Bowen10, Julie M. Gastier-Foster10, Thomas Grossman10, Tara M. Lichtenberg10, Lisa Wise10, Tanja Davidsen11, John A. Demchok11, Kenna R. Mills Shaw11, Margi Sheth11, Heidi J. Sofia, Liming Yang11, James R. Downing, Greg Eley, Shelley Alonso12, Brenda Ayala12, Julien Baboud12, Mark Backus12, Sean P. Barletta12, Dominique L. Berton12, Anna L. Chu12, Stanley Girshik12, Mark A. Jensen12, Ari B. Kahn12, Prachi Kothiyal12, Matthew C. Nicholls12, Todd Pihl12, David Pot12, Rohini Raman12, Rashmi N. Sanbhadti12, Eric E. Snyder12, Deepak Srinivasan12, Jessica Walton12, Yunhu Wan12, Zhining Wang12, Jean Pierre J. Issa13, Michelle M. Le Beau14, Martin Carroll15, Hagop M. Kantarjian, Steven M. Kornblau, Moiz S. Bootwalla5, Phillip H. Lai5, Hui Shen5, David Van Den Berg5, Daniel J. Weisenberger5, Daniel C. Link1, Matthew J. Walter1, Bradley A. Ozenberger11, Elaine R. Mardis1, Peter Westervelt1, Timothy A. Graubert1, John F. DiPersio1, Richard K. Wilson1 
TL;DR: It is found that a complex interplay of genetic events contributes to AML pathogenesis in individual patients and the databases from this study are widely available to serve as a foundation for further investigations of AMl pathogenesis, classification, and risk stratification.
Abstract: BACKGROUND—Many mutations that contribute to the pathogenesis of acute myeloid leukemia (AML) are undefined The relationships between patterns of mutations and epigenetic phenotypes are not yet clear METHODS—We analyzed the genomes of 200 clinically annotated adult cases of de novo AML, using either whole-genome sequencing (50 cases) or whole-exome sequencing (150 cases), along with RNA and microRNA sequencing and DNA-methylation analysis RESULTS—AML genomes have fewer mutations than most other adult cancers, with an average of only 13 mutations found in genes Of these, an average of 5 are in genes that are recurrently mutated in AML A total of 23 genes were significantly mutated, and another 237 were mutated in two or more samples Nearly all samples had at least 1 nonsynonymous mutation in one of nine categories of genes that are almost certainly relevant for pathogenesis, including transcriptionfactor fusions (18% of cases), the gene encoding nucleophosmin (NPM1) (27%), tumorsuppressor genes (16%), DNA-methylation–related genes (44%), signaling genes (59%), chromatin-modifying genes (30%), myeloid transcription-factor genes (22%), cohesin-complex genes (13%), and spliceosome-complex genes (14%) Patterns of cooperation and mutual exclusivity suggested strong biologic relationships among several of the genes and categories CONCLUSIONS—We identified at least one potential driver mutation in nearly all AML samples and found that a complex interplay of genetic events contributes to AML pathogenesis in individual patients The databases from this study are widely available to serve as a foundation for further investigations of AML pathogenesis, classification, and risk stratification (Funded by the National Institutes of Health) The molecular pathogenesis of acute myeloid leukemia (AML) has been studied with the use of cytogenetic analysis for more than three decades Recurrent chromosomal structural variations are well established as diagnostic and prognostic markers, suggesting that acquired genetic abnormalities (ie, somatic mutations) have an essential role in pathogenesis 1,2 However, nearly 50% of AML samples have a normal karyotype, and many of these genomes lack structural abnormalities, even when assessed with high-density comparative genomic hybridization or single-nucleotide polymorphism (SNP) arrays 3-5 (see Glossary) Targeted sequencing has identified recurrent mutations in FLT3, NPM1, KIT, CEBPA, and TET2 6-8 Massively parallel sequencing enabled the discovery of recurrent mutations in DNMT3A 9,10 and IDH1 11 Recent studies have shown that many patients with

3,980 citations

Frequently Asked Questions (1)
Q1. What have the authors contributed in "Germline loss of mbd4 predisposes to leukaemia due to a mutagenic cascade driven by 5mc" ?

These authors contributed equally to this work 9 These authors jointly directed this work