scispace - formally typeset

Posted ContentDOI

A maternal-effect genetic incompatibility in Caenorhabditis elegans

28 Feb 2017-bioRxiv (bioRxiv)-pp 112524

TL;DR: The discovery of a selfish element causing a genetic incompatibility between strains of the nematode Caenorhabditis elegans is discovered and the results suggest that other essential genes identified by genetic screens may turn out to be components of selfish elements.

AbstractSelfish genetic elements spread in natural populations and have an important role in genome evolution. We discovered a selfish element causing a genetic incompatibility between strains of the nematode Caenorhabditis elegans . The element is made up of sup-35 , a maternal-effect toxin that kills developing embryos, and pha-1 , its zygotically expressed antidote. pha-1 has long been considered essential for pharynx development based on its mutant phenotype, but this phenotype in fact arises from a loss of suppression of sup-35 toxicity. Inactive copies of the sup-35/pha-1 element show high sequence divergence from active copies, and phylogenetic reconstruction suggests that they represent ancestral stages in the evolution of the element. Our results suggest that other essential genes identified by genetic screens may turn out to be components of selfish elements.

Summary (4 min read)

Introduction

  • In what is perhaps the most extreme scenario, selfish elements can kill individuals that do not inherit them, leading to a genetic incompatibility between carriers and non-carriers (Beeman et al.
  • Their underlying genetic mechanisms have been resolved in only a few cases (Werren 2011).

Results

  • V from the standard laboratory strain N2 into the strain DL238 by performing eight rounds of backcrossing and selection.
  • The authors examined the sequences of sup-35 and pha-1 in 152 C. elegans wild isolates that represented unique isotypes (Cook et al. 2016) in the Caenorhabditis elegans Natural Diversity Resource (Cook et al. 2017).
  • This suggested that the DL238 and QX1211 haplotypes were highly divergent from the N2 reference, and that major genomic rearrangements may have occurred.

Discussion

  • The antidote, pha-1, was originally thought to be a developmental gene, in large part due to the specific pharyngeal defects observed in mutants (Schnabel and Schnabel 1990; Granato et al.
  • An attractive possibility is that this regulation evolved as an additional mechanism to cope with sup-35 toxicity, as part of an arms race between the selfish element and its host.
  • These results suggest that the origin of the sup-35/pha-1 element involved the duplication of a pre-existing gene (rmd-2) and the recruitment of a novel gene of unknown origin in the lineage leading to C. elegans.
  • Lastly, their work highlights the importance of studying natural genetic variation for understanding gene function.

Acknowledgments:

  • The authors thank members of the Kruglyak lab for their comments.
  • Funding was provided by the Howard Hughes Medical Institute and NIH grant R01 HG004321 (L.K.).
  • E.B. is supported by a Gruss-Lipper postdoctoral fellowship from the EGL foundation.
  • All authors discussed and agreed on the final version of the manuscript.
  • The authors declare no competing financial interests.

Methods

  • C. elegans strains and mutant alleles Strains were grown using standard culturing techniques, with the exception that a modified nematode growth medium (NGM) containing 1% agar and 0.7% agarose was used to prevent burrowing of wild isolates (Brenner 1974; Andersen et al. 2015).
  • All experiments were carried out at 20oC.
  • All the strains used and generated in this study are listed in Table S2.
  • Some of the strains were provided by the CGC, which is funded by the NIH Office of Research Infrastructure Programs (P40 OD010440).
  • The construction of strains carrying the peel-1/zeel-1 allele from CB4856 (niDf9) was performed by backcross following a PCR product specific to the N2 allele amplified using the following primers: FW GCAGAGGAGGCAAAGGTGACTA; RV.

AGCACGTGTAGGCAGAAGTCAT.

  • Introgression of a Chr. V genetic marker into DL238 We introgressed the fog-2 (q71) allele from the N2 background into DL238 by performing eight consecutive rounds of backcross and selection for the feminization of the germline (fog) phenotype (Schedl and Kimble 1988).the authors.the authors.
  • Since the peel-1/zeel-1 element is active in N2 but not in DL238, the authors performed the backcross using DL238 males and feminized hermaphrodites.
  • The use of DL238 males avoided the fixation of the peel-1/zeel-1 element on Chr. I because DL238 worms do not carry the PEEL-1 toxin in their sperm.
  • The fixation of the N2 haplotype on Chr. III in the introgression strain could not be caused a paternal-effect toxin.
  • The introgressed marker, fog-2(q71), is required for spermatogenesis in hermaphrodites but not in males, and the resulting strain must reproduce by outcrossing.

Short read sequencing

  • The authors extracted genomic DNA (gDNA) using the DNeasy Blood & Tissue kit .
  • The authors prepared sequencing libraries using the Nextera protocol .
  • The authors followed the standard protocol with the following exception: they performed agarose size-selection of the Nextera libraries, extracting a ~500 bp band.
  • Libraries were quantified using the Qubit HS kit and sequenced using 300bp paired-end V3 kits on a Miseq desktop sequencer at 12 pM.

Variant calling in DL238

  • Automated preprocessing, alignment and variant calling were done using bcbio-nextgen (ver. 0.9.9) (https://bcbio-nextgen.readthedocs.io/).
  • Short reads from DL238 were aligned to the WBcel235 build of the reference N2 genome.
  • Variant calling was performed with four different software packages: GATK HaplotypeCaller (McKenna et al. 2010), Platypus (Rimmer et al. 2014), Freebayes (Garrison and Marth 2012) and Varscan (Koboldt et al. 2012).
  • Variants identified by fewer than three of the callers were filtered out, resulting in 239,131 SNVs between DL238 and.

N2.

  • Genotyping of the DL238 isolate and the DL238 Chr. V introgression strain Analyses were performed using the R Project for Statistical Genetics (https://www.r-project.org/).
  • To reduce the number of spurious SNVs, the authors restricted their analysis to SNVs for which all DL238 reads supported the DL238 allele when aligning to DL238, and no reads supported the N2 allele when aligning to N2.
  • Next morning, gravid hermaphrodites were allowed to lay eggs for 4-8 hours.
  • Embryonic lethality in the progeny of mating hermaphrodites was scored similarly, but L4 hermaphrodites were transferred together with males to a new plate, and their eggs were laid and collected in the presence of these males to guarantee continuous mating.
  • Visual inspection of DL238 short read alignments in the homozygous region revealed that pha-1 and sup-35 were very likely missing in DL238.

Microscopy

  • Larvae were transferred to a 3% agarose pad and visualized under bright field using a Nikon Eclipse 90i microscope equipped with a Photometrics CoolSNAP HQ2 CCD camera.
  • Multiple sequence alignments were visualized using Geneious ver. 10 (restricted free version) (https://www.geneious.com/).
  • In contrast, even null variants in sup-36 or sup-37, which are unlinked to sup-35/pha-1, are predicted to only reduce the embryonic lethality from 25% to 18.75% if they are recessive.
  • Furthermore, in a backcross of F1 males to QX2327 hermaphrodites, all of the F2 progeny inherit at least one functioning copy of sup-36 (or sup-37) from the QX2327 parent, and the lethality isn’t expected to be reduced at all.
  • For these reasons, and given the sample sizes in their screening, the authors cannot rule out the presence of hypomorphic variants weakly affecting sup-35, or even strongly affecting sup-36 and sup-37, in some of the wild isolates tested.

Validation of chimeric fusion

  • To validate the large deletion leading to the fusion of sup-35 and the Y48A6C.1 pseudogene, the authors designed primers that flanked the deletion: FW-del: GATCACGTGAGACAGGAAAAG and RV- del: CCCTTCAAAAGCACACCAAC.
  • This primer pair amplified the expected 1000 bp band in the wild isolate ED3012, which carries the deletion, but not in the reference N2 (fig. S8C).
  • As a positive control, the authors amplified the pha-1 locus using primers FW-pha-1: CCGTTTTCATCACGTTGCTCGA and RV-pha-1: TGTCGCGCACTACTGAATCAGA.
  • To confirm whether the chimeric fusion sup-35/Y48A6C.1 is expressed, the authors performed reverse transcription (RT) PCR (fig. S8D).
  • Total RNA was isolated from mixed stage N2 and ED3012 populations using the RNeasy kit , and cDNA was prepared using the SuperScript III Reverse Transcriptase kit (Thermo Fisher Scientific).

TTTTTCGCTTTCCAAACTGG, RV1: GCGAGCAACTCTTTCTCGAT, RV2:

  • The FW1-RV1 primer pair amplifies exclusively the spliced cDNA of wild type sup-35, and the FW1-RV2 primer pair amplifies exclusively the spliced cDNA of the sup-35/Y48A6C.1 chimeric fusion.
  • The authors then used blastn (Altschul et al. 1990) to search for scaffolds that had homology to sup- 35, pha-1, and genes in their vicinity.
  • Moreover, the region of interest contained several repetitive elements.
  • Finally, DL238 and QX1211 Illumina short reads were re-aligned to the assembled haplotypes and visually inspected to discard errors introduced by the assemblers.

Multiple sequence alignment

  • To study the genomic architecture and evolution of the sup-35/pha-1 element, the authors recovered the sequence at the sup-35/pha-1 locus in additional Caenorhabditis species.
  • Hmt-1 is a predicted transmembrane ABC transporter, and Y47D3A.29 encodes the catalytic subunit of DNA polymerase alpha.
  • Y48A6C.4, a predicted ortholog of S. cerevisiae IPI1, is located between hmt-1 and Y47D3A.29 in all sequenced Caenorhabditis.

Phylogenetic tree

  • The coding sequence of Y48A6C.4 was recovered in the different C. elegans isolates and other Caenorhabditis species.
  • The authors then assembled the cDNA sequences of Y48A6C.4 in each isolate.
  • To that end, the authors first aligned the sequence around the start and end positions of each exon to the alternative reference using blast (Altschul et al. 1990).
  • To create a gene tree, the cDNA sequences were in-silico translated, and the protein sequences were aligned using MAFFT (Katoh and Standley 2013).
  • The general time reversible (GTR) substitution model with gamma distributed rate variation was used as it was found to be the best fit (by minimizing the BIC criterion) using jModelTest2 (Darriba et al. 2012).

Did you find this useful? Give us your feedback

...read more

Content maybe subject to copyright    Report

1
A maternal-effect genetic incompatibility in Caenorhabditis elegans
Eyal Ben-David
*,†
, Alejandro Burga
*,†
, Leonid Kruglyak
Department of Human Genetics, Department of Biological Chemistry, and Howard Hughes Medical
Institute, University of California, Los Angeles, Los Angeles, CA 90095, USA.
To whom correspondence should be addressed. Email: ebd@ucla.edu (E.B); aburga@mednet.ucla.edu
(A.B.); lkruglyak@mednet.ucla.edu (L.K.).
* These authors contributed equally to this work.
Selfish genetic elements spread in natural populations and have an important role in genome evolution. We
discovered a selfish element causing a genetic incompatibility between strains of the nematode
Caenorhabditis elegans. The element is made up of sup-35, a maternal-effect toxin that kills developing
embryos, and pha-1, its zygotically expressed antidote. pha-1 has long been considered essential for
pharynx development based on its mutant phenotype, but this phenotype in fact arises from a loss of
suppression of sup-35 toxicity. Inactive copies of the sup-35/pha-1 element show high sequence divergence
from active copies, and phylogenetic reconstruction suggests that they represent ancestral stages in the
evolution of the element. Our results suggest that other essential genes identified by genetic screens may
turn out to be components of selfish elements.
.CC-BY-NC-ND 4.0 International licensea
certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under
The copyright holder for this preprint (which was notthis version posted March 1, 2017. ; https://doi.org/10.1101/112524doi: bioRxiv preprint

2
Introduction
Selfish genetic elements subvert the laws of Mendelian segregation to promote their own transmission
(Dawkins 1976; Doolittle and Sapienza 1980; Orgel and Crick 1980; Werren 2011; Sinkins 2011). In what
is perhaps the most extreme scenario, selfish elements can kill individuals that do not inherit them, leading
to a genetic incompatibility between carriers and non-carriers (Beeman et al. 1992; Werren 1997, 2011;
Hurst and Werren 2001; Lorenzen et al. 2008). Selfish elements are predicted to spread in natural
populations (Hurst and Werren 2001; Werren 2011), and consequently, there is significant interest in using
synthetic forms of such elements to drive population replacement of pathogen vectors in the wild (Chen et
al. 2007; Hammond et al. 2015). However, despite the prominent role of genetic incompatibilities in
genome evolution and their promise in pathogen control, their underlying genetic mechanisms have been
resolved in only a few cases (Werren 2011). Our laboratory previously identified the only known genetic
incompatibility in the nematode Caenorhabditis elegans (Seidel et al. 2008, 2011). The incompatibility is
caused by a selfish element composed of two tightly linked genes: peel-1, a sperm-delivered toxin, and
zeel-1, a zygotically expressed antidote. In crosses between isolates that carry the element and ones that do
not, the peel-1 toxin is delivered by the sperm to all progeny, so that only embryos that inherit the element
and the zeel-1 antidote survive. An analogous element, Maternal-effect dominant embryonic arrest (Medea)
has been previously described in the beetle Tribolium; however, the underlying genes remain unknown
(Beeman et al. 1992; Lorenzen et al. 2008).
Results
A maternal-effect genetic incompatibility in C. elegans
As part of ongoing efforts to study natural genetic variation in C. elegans, we introgressed a genetic marker
located on the right arm of Chr. V from the standard laboratory strain N2 into the strain DL238 by
performing eight rounds of backcrossing and selection. DL238 is a wild strain isolated in the Manuka
Natural Reserve, Hawaii, USA, and is one of the most highly divergent C. elegans isolates identified to
.CC-BY-NC-ND 4.0 International licensea
certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under
The copyright holder for this preprint (which was notthis version posted March 1, 2017. ; https://doi.org/10.1101/112524doi: bioRxiv preprint

3
date (Andersen et al. 2012). To confirm the success of the introgression, we genotyped the resulting strain
at single-nucleotide variants (SNVs) between DL238 and N2 by whole-genome sequencing. As expected,
with the exception of a small region on the right arm of Chr. V where the marker is located, most of the
genome was homozygous for the DL238 alleles (Fig. 1A). However, to our surprise, we observed sequence
reads supporting the N2 allele at many SNVs on Chr. III, including two large regions that were homozygous
for the N2 allele despite the eight rounds of backcrossing (Fig. 1A, Fig. S1). This observation suggested
that N2 variants located on this chromosome were strongly selected during the backcrossing.
To investigate the nature of the selection, we performed a series of crosses between the N2 and DL238
strains and examined their progeny. To avoid effects of the peel-1/zeel-1 element, which is present in N2
and absent in DL238, we performed a cross between DL238 males and a near isogenic line (NIL) that lacks
the peel-1/zeel-1 element in an otherwise N2 background (hereafter, N2 peel-1
-/-
) (Seidel et al. 2011). We
observed low baseline embryonic lethality in the F
1
generation and in the parental strains (0.26% (N = 381)
for F
1
; 0.99% (N = 304) for DL238; 0.4% (N = 242) for N2 peel-1
-/-
), and we did not observe any obvious
abnormal phenotypes in the F
1
that could explain the strong selection. However, when we allowed
heterozygous F
1
hermaphrodites from this cross to self-fertilize, we observed 25.15% (N = 855) embryonic
lethality among the F
2
progeny (Fig. 1B). Similar results were obtained for F1 hermaphrodites from the
reciprocal parental cross (26.1%, N = 398). These results suggested the presence of a novel genetic
incompatibility between N2 and DL238 that causes embryonic lethality in their F
2
progeny.
The observed pattern of embryonic lethality (no lethality in the parents nor in the F
1
; 25% lethality in the
F
2
) is consistent with an interaction between the genotype of the zygote and a maternal or paternal effect
(Fig. 1C) (Seidel et al. 2008). We hypothesized that the incompatibility could stem from a cytoplasmically-
inherited toxin that kills embryos if they lack a zygotically expressed antidote, analogous to the mechanism
of the peel-1/zeel-1 element (Seidel et al. 2008, 2011). To test this model and to discriminate between
maternal and paternal effects, we crossed heterozygous F
1
DL238 x N2 peel-1
–/–
males and hermaphrodites
.CC-BY-NC-ND 4.0 International licensea
certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under
The copyright holder for this preprint (which was notthis version posted March 1, 2017. ; https://doi.org/10.1101/112524doi: bioRxiv preprint

4
with DL238 hermaphrodites or males, respectively (Fig. 1B, Fig. S2). We observed 48.59% (N = 389)
lethality when F
1
hermaphrodites were crossed to DL238 males, but only baseline lethality (1.17%; N =
171) in the reciprocal cross of F
1
males to DL238 hermaphrodites. 50% lethality when the F
1
parent is the
mother and no lethality when the F1 parent is the father indicates that the incompatibility is caused by
maternal-effect toxicity that is rescued by a linked zygotic antidote (Fig. S2). We tested whether the new
incompatibility was independent from the paternal-effect peel-1/zeel-1 element by crossing DL238 and N2
worms and selfing the F
1
progeny. We observed 41.37% (N = 389) embryonic lethality among the F
2
progeny, consistent with expectation for Mendelian segregation of two independent incompatibilities
(43.75%) (Fig. 1D).
pha-1 and sup-35 constitute a selfish element that underlies the incompatibility between DL238 and
N2
To identify the genes underlying the maternal-effect incompatibility between N2 and DL238, we sequenced
the genome of DL238 using Illumina short reads and aligned those reads to the N2 reference genome. We
focused our attention on the two regions on Chr. III that were completely homozygous for the N2 allele in
the introgressed strain (Fig. 1A, Fig. S1). Inspection of short read coverage revealed a large ~50 kb region
on the right arm of the chromosome with very poor and sparse alignment to the N2 reference (Chr III:
11,086,500 11,145,000) (Fig. 2A). This region contains ten genes and two pseudogenes in N2. We noticed
that pha-1, annotated as an essential gene in the reference genome, appeared to be completely missing in
DL238 (Fig. 2A) (Schnabel and Schnabel 1990). pha-1 was originally identified as an essential gene
required for differentiation and morphogenesis of the pharynx, the C. elegans feeding organ (Schnabel and
Schnabel 1990). But if pha-1 is essential for embryonic development and missing in DL238, then how are
DL238 worms alive? pha-1 lethality can be fully suppressed by mutations in three other genes: sup-35, sup-
36, and sup-37 (Schnabel et al. 1991). We found no coding variants in sup-36 and sup-37 which reside on
.CC-BY-NC-ND 4.0 International licensea
certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under
The copyright holder for this preprint (which was notthis version posted March 1, 2017. ; https://doi.org/10.1101/112524doi: bioRxiv preprint

5
chromosomes IV and V, respectively (Schnabel et al. 1991) (Fig. S3). However, sup-35, which is located
12.5kb upstream of pha-1, also appeared to be missing or highly divergent in DL238 (Fig. 2A, Fig. S3).
We hypothesized that sup-35 and pha-1 could constitute a selfish element responsible for the observed
incompatibility between the N2 and DL238 isolates. In our model, sup-35 encodes a maternally-deposited
toxin that kills embryos unless they express the zygotic antidote, pha-1 (Fig. 2B). N2 worms carry the sup-
35/pha-1 element, which is missing or inactive in DL238, and F
1
hermaphrodites deposit the sup-35 toxin
in all their oocytes. 25% of their F
2
self-progeny do not inherit the element and are killed because they lack
the antidote pha-1. Consistent with our model, an RNA-sequencing time-course of C. elegans
embryogenesis showed that sup-35 transcripts are maternally provided, whereas pha-1 transcripts are first
detected in the embryo at the 100-cell stage (Hashimshony et al. 2014). To test our model, we first asked
whether sup-35 was necessary for the F
2
embryonic lethality in the N2 x DL238 cross. We crossed DL238
males to N2 peel-1
-/-
hermaphrodites carrying a null sup-35(e2223) allele (Fig. 2B). This sup-35 allele was
reported to fully rescue pha-1 associated embryonic lethality (Schnabel et al. 1991). Embryonic lethality in
the F
2
dropped from 25% to baseline in this cross (1.40%, N = 576), demonstrating that sup-35 activity
underlies the incompatibility between N2 and DL238 (Fig. 2B). We next tested whether expression of pha-
1, the zygotic antidote, was sufficient to rescue the embryonic lethality. We introgressed a pha-1 multicopy
transgene into the DL238 and N2 peel-1
-/-
strains and repeated the cross. As predicted, expression of pha-1
was sufficient to reduce embryonic lethality in the F
2
to baseline (1.49%, N = 268) (Fig. 2B). Moreover,
we reasoned that if the sup-35/pha-1 element underlies the maternal incompatibility, arrested embryos from
an N2 x DL238 cross should phenocopy pha-1 mutant embryos. We collected rare L1-arrested F
2
larvae
from an N2 peel-1
-/-
x DL238 cross and observed major morphological defects in the pharynx of these
individuals, as previously reported for pha-1 mutants (Schnabel and Schnabel 1990; Polley et al. 2014)
(Fig. 2C).
.CC-BY-NC-ND 4.0 International licensea
certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under
The copyright holder for this preprint (which was notthis version posted March 1, 2017. ; https://doi.org/10.1101/112524doi: bioRxiv preprint

Figures (3)
References
More filters

Journal ArticleDOI
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.
Abstract: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score. Recent mathematical results on the stochastic properties of MSP scores allow an analysis of the performance of this method as well as the statistical significance of alignments it generates. The basic algorithm is simple and robust; it can be implemented in a number of ways and applied in a variety of contexts including straight-forward DNA and protein sequence database searches, motif searches, gene identification searches, and in the analysis of multiple regions of similarity in long DNA sequences. In addition to its flexibility and tractability to mathematical analysis, BLAST is an order of magnitude faster than existing sequence comparison tools of comparable sensitivity.

81,150 citations


Journal ArticleDOI
TL;DR: MrBayes 3 performs Bayesian phylogenetic analysis combining information from different data partitions or subsets evolving under different stochastic evolutionary models to analyze heterogeneous data sets and explore a wide variety of structured models mixing partition-unique and shared parameters.
Abstract: Summary: MrBayes 3 performs Bayesian phylogenetic analysis combining information from different data partitions or subsets evolving under different stochastic evolutionary models. This allows the user to analyze heterogeneous data sets consisting of different data types—e.g. morphological, nucleotide, and protein— and to explore a wide variety of structured models mixing partition-unique and shared parameters. The program employs MPI to parallelize Metropolis coupling on Macintosh or UNIX clusters.

24,102 citations


Journal ArticleDOI
TL;DR: This version of MAFFT has several new features, including options for adding unaligned sequences into an existing alignment, adjustment of direction in nucleotide alignment, constrained alignment and parallel processing, which were implemented after the previous major update.
Abstract: We report a major update of the MAFFT multiple sequence alignment program. This version has several new features, including options for adding unaligned sequences into an existing alignment, adjustment of direction in nucleotide alignment, constrained alignment and parallel processing, which were implemented after the previous major update. This report shows actual examples to explain how these features work, alone and in combination. Some examples incorrectly aligned by MAFFT are also shown to clarify its limitations. We discuss how to avoid misalignments, and our ongoing efforts to overcome such limitations.

19,901 citations


Journal ArticleDOI
TL;DR: Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches and can be used simultaneously to achieve even greater alignment speeds.
Abstract: Bowtie is an ultrafast, memory-efficient alignment program for aligning short DNA sequence reads to large genomes. For the human genome, Burrows-Wheeler indexing allows Bowtie to align more than 25 million reads per CPU hour with a memory footprint of approximately 1.3 gigabytes. Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches. Multiple processor cores can be used simultaneously to achieve even greater alignment speeds. Bowtie is open source http://bowtie.cbcb.umd.edu.

18,079 citations


Journal ArticleDOI
TL;DR: The GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
Abstract: Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS—the 1000 Genome pilot alone includes nearly five terabases—make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

16,404 citations


"A maternal-effect genetic incompati..." refers methods in this paper

  • ...Variant calling was performed with four different software packages: GATK HaplotypeCaller (McKenna et al. 2010), Platypus (Rimmer et al....

    [...]