scispace - formally typeset
Search or ask a question

Showing papers by "Adam Auton published in 2016"


Journal ArticleDOI
TL;DR: A calibrated phylogenetic tree is constructed on the basis of binary single-nucleotide variants and the more complex variants onto it, estimating the number of mutations for each class and shows bursts of extreme expansion in male numbers that have occurred independently among the five continental superpopulations examined.
Abstract: We report the sequences of 1,244 human Y chromosomes randomly ascertained from 26 worldwide populations by the 1000 Genomes Project. We discovered more than 65,000 variants, including single-nucleotide variants, multiple-nucleotide variants, insertions and deletions, short tandem repeats, and copy number variants. Of these, copy number variants contribute the greatest predicted functional impact. We constructed a calibrated phylogenetic tree on the basis of binary single-nucleotide variants and projected the more complex variants onto it, estimating the number of mutations for each class. Our phylogeny shows bursts of extreme expansion in male numbers that have occurred independently among each of the five continental superpopulations examined, at times of known migrations and technological innovations.

280 citations


Journal ArticleDOI
TL;DR: The data suggests that dogs have similar broad scale properties of recombination to humans, while fine scale recombination is similar to other species lacking PRDM9.
Abstract: Meiotic recombination in mammals has been shown to largely cluster into hotspots, which are targeted by the chromatin modifier PRDM9. The canid family, including wolves and dogs, has undergone a series of disrupting mutations in this gene, rendering PRDM9 inactive. Given the importance of PRDM9, it is of great interest to learn how its absence in the dog genome affects patterns of recombination placement. We have used genotypes from domestic dog pedigrees to generate sex-specific genetic maps of recombination in this species. On a broad scale, we find that placement of recombination events in dogs is consistent with that in mice and apes, in that the majority of recombination occurs toward the telomeres in males, while female crossing over is more frequent and evenly spread along chromosomes. It has been previously suggested that dog recombination is more uniform in distribution than that of humans; however, we found that recombination in dogs is less uniform than in humans. We examined the distribution of recombination within the genome, and found that recombination is elevated immediately upstream of the transcription start site and around CpG islands, in agreement with previous studies, but that this effect is stronger in male dogs. We also found evidence for positive crossover interference influencing the spacing between recombination events in dogs, as has been observed in other species including humans and mice. Overall our data suggests that dogs have similar broad scale properties of recombination to humans, while fine scale recombination is similar to other species lacking PRDM9.

46 citations


Posted ContentDOI
17 Jun 2016-bioRxiv
TL;DR: Exome sequences from 3,222 British-Pakistani individuals with high parental relatedness are used to estimate exome mutation rates, finding frequent recurrence of mutations at polymorphic CpG sites, and an increase in C to T mutations in the Pakistani population compared to Europeans, suggesting that mutational processes have evolved rapidly between human populations.
Abstract: Heterozygous mutations within homozygous sequences descended from a recent common ancestor offer a way to ascertain de novo mutations (DNMs) across multiple generations. Using exome sequences from 3,222 British-Pakistani individuals with high parental relatedness, we estimate a mutation rate of 1.45 ± 0.05 × 10 -8 per base pair per generation in autosomal coding sequence, with a corresponding non-crossover gene conversion rate of 8.75 ± 0.05 × 10 -6 per base pair per generation. This is at the lower end of exome mutation rates previously estimated in parent-offspring trios, suggesting that post-zygotic mutations contribute little to the human germline mutation rate. We found frequent recurrence of mutations at polymorphic CpG sites, and an increase in C to T mutations in a 59 CCG 39 → 59 CTG 39 context in the Pakistani population compared to Europeans, suggesting that mutational processes have evolved rapidly between human populations.

15 citations


Posted ContentDOI
26 Jul 2016-bioRxiv
TL;DR: It is shown that missing mutations are a significant feature of genomic datasets and imply additional fine-tuning of bioinformatics pipelines is needed, and a phylogeny-aware tool is designed which can be used to quantify the FN rate for haploid genomic experiments, without additional generation of validation data.
Abstract: Author(s): Bobo, Dean; Lipatov, Mikhail; Rodriguez-Flores, Juan; Auton, Adam; Henn, Brenna | Abstract: Short-read, next-generation sequencing (NGS) is now broadly used to identify rare or de novo mutations in population samples and disease cohorts. However, NGS data is known to be error-prone and post-processing pipelines have primarily focused on the removal of spurious mutations or “false positives” for downstream genome datasets. Less attention has been paid to characterizing the fraction of missing mutations or “false negatives” (FN). Here we interrogate several publically available human NGS autosomal variant datasets using corresponding Sanger sequencing as a truth-set. We examine both low-coverage Illumina and high-coverage Complete Genomics genomes. We show that the FN rate varies between 3%-18% and that false-positive rates are considerably lower (l3%) for publically available human genome callsets like 1000 Genomes. The FN rate is strongly dependent on calling pipeline parameters, as well as read coverage. Our results demonstrate that missing mutations are a significant feature of genomic datasets and imply additional fine-tuning of bioinformatics pipelines is needed. To address this, we design a phylogeny-aware tool [PhyloFaN] which can be used to quantify the FN rate for haploid genomic experiments, without additional generation of validation data. Using PhyloFaN on ultra-high coverage NGS data from both Illumina HiSeq and Complete Genomics platforms derived from the 1000 Genomes Project, we characterize the false negative rate in human mtDNA genomes. The false negative rate for the publically available mtDNA callsets is 17-20%, even for extremely high coverage haploid data.

14 citations


01 Jan 2016
TL;DR: An integrated set of eight structural variant classes comprising both balanced and unbalanced variants, which are constructed using short-read DNA sequencing data and statistically phased onto haplotype blocks in 26 human populations are described.

1 citations