Showing papers by "Adam Auton published in 2016"

PDF

Open Access

Journal Article•DOI•

Punctuated bursts in human male demography inferred from 1,244 worldwide Y-chromosome sequences

[...]

G. David Poznik¹, Yali Xue², Fernando L. Mendez¹, Thomas Willems³, Andrea Massaia², Melissa A. Wilson Sayres⁴, Qasim Ayub², Shane A. McCarthy², Apurva Narechania⁵, Seva Kashin⁶, Yuan Chen², Ruby Banerjee², Juan L. Rodriguez-Flores⁷, Maria Cerezo², Haojing Shao⁸, Melissa Gymrek³, Ankit Malhotra, Sandra Louzada², Rob DeSalle⁵, Graham R. S. Ritchie⁹, Graham R. S. Ritchie², Eliza Cerveira, Tomas W Fitzgerald², Erik Garrison², Anthony Marcketta¹⁰, David Mittelman¹¹, Mallory Romanovitch, Chengsheng Zhang, Xiangqun Zheng-Bradley¹², Gonçalo R. Abecasis¹³, Steven A. McCarroll¹⁴, Paul Flicek¹², Peter A. Underhill¹, Lachlan J. M. Coin⁸, Daniel R. Zerbino¹², Fengtang Yang², Charles Lee¹⁵, Laura Clarke¹², Adam Auton¹⁰, Yaniv Erlich¹⁶, Robert E. Handsaker⁶, Robert E. Handsaker¹⁴, Carlos Bustamante¹, Chris Tyler-Smith² - Show less +40 more•Institutions (16)

Stanford University¹, Wellcome Trust Sanger Institute², Massachusetts Institute of Technology³, Arizona State University⁴, American Museum of Natural History⁵, Broad Institute⁶, Cornell University⁷, University of Queensland⁸, European Bioinformatics Institute⁹, Yeshiva University¹⁰, Virginia Tech¹¹, Wellcome Trust¹², University of Michigan¹³, Harvard University¹⁴, Ewha Womans University¹⁵, Columbia University¹⁶

01 Jun 2016-Nature Genetics

TL;DR: A calibrated phylogenetic tree is constructed on the basis of binary single-nucleotide variants and the more complex variants onto it, estimating the number of mutations for each class and shows bursts of extreme expansion in male numbers that have occurred independently among the five continental superpopulations examined.

...read moreread less

Abstract: We report the sequences of 1,244 human Y chromosomes randomly ascertained from 26 worldwide populations by the 1000 Genomes Project. We discovered more than 65,000 variants, including single-nucleotide variants, multiple-nucleotide variants, insertions and deletions, short tandem repeats, and copy number variants. Of these, copy number variants contribute the greatest predicted functional impact. We constructed a calibrated phylogenetic tree on the basis of binary single-nucleotide variants and projected the more complex variants onto it, estimating the number of mutations for each class. Our phylogeny shows bursts of extreme expansion in male numbers that have occurred independently among each of the five continental superpopulations examined, at times of known migrations and technological innovations.

...read moreread less

280 citations

Journal Article•DOI•

A Pedigree-Based Map of Recombination in the Domestic Dog Genome

[...]

Christopher L. Campbell¹, Claude Bhérer¹, Bernice E. Morrow¹, Adam R. Boyko², Adam Auton¹ - Show less +1 more•Institutions (2)

Albert Einstein College of Medicine¹, Cornell University²

01 Nov 2016-G3: Genes, Genomes, Genetics

TL;DR: The data suggests that dogs have similar broad scale properties of recombination to humans, while fine scale recombination is similar to other species lacking PRDM9.

...read moreread less

Abstract: Meiotic recombination in mammals has been shown to largely cluster into hotspots, which are targeted by the chromatin modifier PRDM9. The canid family, including wolves and dogs, has undergone a series of disrupting mutations in this gene, rendering PRDM9 inactive. Given the importance of PRDM9, it is of great interest to learn how its absence in the dog genome affects patterns of recombination placement. We have used genotypes from domestic dog pedigrees to generate sex-specific genetic maps of recombination in this species. On a broad scale, we find that placement of recombination events in dogs is consistent with that in mice and apes, in that the majority of recombination occurs toward the telomeres in males, while female crossing over is more frequent and evenly spread along chromosomes. It has been previously suggested that dog recombination is more uniform in distribution than that of humans; however, we found that recombination in dogs is less uniform than in humans. We examined the distribution of recombination within the genome, and found that recombination is elevated immediately upstream of the transcription start site and around CpG islands, in agreement with previous studies, but that this effect is stronger in male dogs. We also found evidence for positive crossover interference influencing the spacing between recombination events in dogs, as has been observed in other species including humans and mice. Overall our data suggests that dogs have similar broad scale properties of recombination to humans, while fine scale recombination is similar to other species lacking PRDM9.

...read moreread less

46 citations

Posted Content•DOI•

A direct multi-generational estimate of the human mutation rate from autozygous segments seen in thousands of parentally related individuals

[...]

Vagheesh M. Narasimhan¹, Raheleh Rahbari¹, Aylwyn Scally², Arthur Wuster¹, Dan Mason³, Yali Xue¹, John Wright³, Richard C. Trembath⁴, Eamonn R. Maher², van Heel Da⁵, Adam Auton⁶, Matthew E. Hurles¹, Chris Tyler-Smith¹, Richard Durbin¹ - Show less +10 more•Institutions (6)

Wellcome Trust Sanger Institute¹, University of Cambridge², National Health Service³, King's College London⁴, Queen Mary University of London⁵, Albert Einstein College of Medicine⁶

17 Jun 2016-bioRxiv

TL;DR: Exome sequences from 3,222 British-Pakistani individuals with high parental relatedness are used to estimate exome mutation rates, finding frequent recurrence of mutations at polymorphic CpG sites, and an increase in C to T mutations in the Pakistani population compared to Europeans, suggesting that mutational processes have evolved rapidly between human populations.

...read moreread less

Abstract: Heterozygous mutations within homozygous sequences descended from a recent common ancestor offer a way to ascertain de novo mutations (DNMs) across multiple generations. Using exome sequences from 3,222 British-Pakistani individuals with high parental relatedness, we estimate a mutation rate of 1.45 ± 0.05 × 10 -8 per base pair per generation in autosomal coding sequence, with a corresponding non-crossover gene conversion rate of 8.75 ± 0.05 × 10 -6 per base pair per generation. This is at the lower end of exome mutation rates previously estimated in parent-offspring trios, suggesting that post-zygotic mutations contribute little to the human germline mutation rate. We found frequent recurrence of mutations at polymorphic CpG sites, and an increase in C to T mutations in a 59 CCG 39 → 59 CTG 39 context in the Pakistani population compared to Europeans, suggesting that mutational processes have evolved rapidly between human populations.

...read moreread less

15 citations

Posted Content•DOI•

False Negatives Are a Significant Feature of Next Generation Sequencing Callsets

[...]

Dean Bobo¹, Mikhail Lipatov¹, Juan L. Rodriguez-Flores², Adam Auton³, Brenna M. Henn¹ - Show less +1 more•Institutions (3)

Stony Brook University¹, Cornell University², Albert Einstein College of Medicine³

26 Jul 2016-bioRxiv

TL;DR: It is shown that missing mutations are a significant feature of genomic datasets and imply additional fine-tuning of bioinformatics pipelines is needed, and a phylogeny-aware tool is designed which can be used to quantify the FN rate for haploid genomic experiments, without additional generation of validation data.

...read moreread less

Abstract: Author(s): Bobo, Dean; Lipatov, Mikhail; Rodriguez-Flores, Juan; Auton, Adam; Henn, Brenna | Abstract: Short-read, next-generation sequencing (NGS) is now broadly used to identify rare or de novo mutations in population samples and disease cohorts. However, NGS data is known to be error-prone and post-processing pipelines have primarily focused on the removal of spurious mutations or “false positives” for downstream genome datasets. Less attention has been paid to characterizing the fraction of missing mutations or “false negatives” (FN). Here we interrogate several publically available human NGS autosomal variant datasets using corresponding Sanger sequencing as a truth-set. We examine both low-coverage Illumina and high-coverage Complete Genomics genomes. We show that the FN rate varies between 3%-18% and that false-positive rates are considerably lower (l3%) for publically available human genome callsets like 1000 Genomes. The FN rate is strongly dependent on calling pipeline parameters, as well as read coverage. Our results demonstrate that missing mutations are a significant feature of genomic datasets and imply additional fine-tuning of bioinformatics pipelines is needed. To address this, we design a phylogeny-aware tool [PhyloFaN] which can be used to quantify the FN rate for haploid genomic experiments, without additional generation of validation data. Using PhyloFaN on ultra-high coverage NGS data from both Illumina HiSeq and Complete Genomics platforms derived from the 1000 Genomes Project, we characterize the false negative rate in human mtDNA genomes. The false negative rate for the publically available mtDNA callsets is 17-20%, even for extremely high coverage haploid data.

...read moreread less

14 citations

An integrated map of structural variation in 2,504 human genomes_supplement

[...]

Sudmant Ph, Tobias Rausch, Gardner Ej, Handsaker Re +491 more

01 Jan 2016

TL;DR: An integrated set of eight structural variant classes comprising both balanced and unbalanced variants, which are constructed using short-read DNA sequencing data and statistically phased onto haplotype blocks in 26 human populations are described.

...read moreread less

1 citations