scispace - formally typeset
Search or ask a question

Showing papers by "Carlos Bustamante published in 2016"


Journal ArticleDOI
TL;DR: This work developed REVEL (rare exome variant ensemble learner), an ensemble method for predicting the pathogenicity of missense variants on the basis of individual tools: MutPred, FATHMM, VEST, PolyPhen, SIFT, PROVEAN, MutationAssessor, LRT, GERP, SiPhy, phyloP, and phastCons.
Abstract: The vast majority of coding variants are rare, and assessment of the contribution of rare variants to complex traits is hampered by low statistical power and limited functional data. Improved methods for predicting the pathogenicity of rare coding variants are needed to facilitate the discovery of disease variants from exome sequencing studies. We developed REVEL (rare exome variant ensemble learner), an ensemble method for predicting the pathogenicity of missense variants on the basis of individual tools: MutPred, FATHMM, VEST, PolyPhen, SIFT, PROVEAN, MutationAssessor, MutationTaster, LRT, GERP, SiPhy, phyloP, and phastCons. REVEL was trained with recently discovered pathogenic and rare neutral missense variants, excluding those previously used to train its constituent tools. When applied to two independent test sets, REVEL had the best overall performance (p −12 ) as compared to any individual tool and seven ensemble methods: MetaSVM, MetaLR, KGGSeq, Condel, CADD, DANN, and Eigen. Importantly, REVEL also had the best performance for distinguishing pathogenic from rare neutral variants with allele frequencies

1,295 citations


Journal ArticleDOI
TL;DR: A calibrated phylogenetic tree is constructed on the basis of binary single-nucleotide variants and the more complex variants onto it, estimating the number of mutations for each class and shows bursts of extreme expansion in male numbers that have occurred independently among the five continental superpopulations examined.
Abstract: We report the sequences of 1,244 human Y chromosomes randomly ascertained from 26 worldwide populations by the 1000 Genomes Project. We discovered more than 65,000 variants, including single-nucleotide variants, multiple-nucleotide variants, insertions and deletions, short tandem repeats, and copy number variants. Of these, copy number variants contribute the greatest predicted functional impact. We constructed a calibrated phylogenetic tree on the basis of binary single-nucleotide variants and projected the more complex variants onto it, estimating the number of mutations for each class. Our phylogeny shows bursts of extreme expansion in male numbers that have occurred independently among each of the five continental superpopulations examined, at times of known migrations and technological innovations.

280 citations


Journal ArticleDOI
TL;DR: It is concluded that many predicted deleterious mutations have evolved as if they were neutral during the expansion out of Africa, but that OOA populations are likely to have a higher mutation load due to increased allele frequencies of nearly neutral variants that are recessive or partially recessive.
Abstract: The Out-of-Africa (OOA) dispersal ∼ 50,000 y ago is characterized by a series of founder events as modern humans expanded into multiple continents. Population genetics theory predicts an increase of mutational load in populations undergoing serial founder effects during range expansions. To test this hypothesis, we have sequenced full genomes and high-coverage exomes from seven geographically divergent human populations from Namibia, Congo, Algeria, Pakistan, Cambodia, Siberia, and Mexico. We find that individual genomes vary modestly in the overall number of predicted deleterious alleles. We show via spatially explicit simulations that the observed distribution of deleterious allele frequencies is consistent with the OOA dispersal, particularly under a model where deleterious mutations are recessive. We conclude that there is a strong signal of purifying selection at conserved genomic positions within Africa, but that many predicted deleterious mutations have evolved as if they were neutral during the expansion out of Africa. Under a model where selection is inversely related to dominance, we show that OOA populations are likely to have a higher mutation load due to increased allele frequencies of nearly neutral variants that are recessive or partially recessive.

238 citations


Journal ArticleDOI
TL;DR: It is found that recent migrations also caused a strong increase in genetic relatedness among geographically distant African-Americans, which could track north- and west-bound migration routes followed during the Great Migration of the twentieth century.
Abstract: We present a comprehensive assessment of genomic diversity in the African-American population by studying three genotyped cohorts comprising 3,726 African-Americans from across the United States that provide a representative description of the population across all US states and socioeconomic status An estimated 821% of ancestors to African-Americans lived in Africa prior to the advent of transatlantic travel, 167% in Europe, and 12% in the Americas, with increased African ancestry in the southern United States compared to the North and West Combining demographic models of ancestry and those of relatedness suggests that admixture occurred predominantly in the South prior to the Civil War and that ancestry-biased migration is responsible for regional differences in ancestry We find that recent migrations also caused a strong increase in genetic relatedness among geographically distant African-Americans Long-range relatedness among African-Americans and between African-Americans and European-Americans thus track north- and west-bound migration routes followed during the Great Migration of the twentieth century By contrast, short-range relatedness patterns suggest comparable mobility of ∼15–16km per generation for African-Americans and European-Americans, as estimated using a novel analytical model of isolation-by-distance

161 citations


Journal ArticleDOI
TL;DR: Genetic variation from deeply sequenced genomes of 642 individuals from North and South American, Caribbean and West African populations is presented, substantially increasing the lexicon of human genomic variation and suggesting much variation remains to be discovered in African-admixed populations in the Americas.
Abstract: The African Diaspora in the Western Hemisphere represents one of the largest forced migrations in history and had a profound impact on genetic diversity in modern populations. To date, the fine-scale population structure of descendants of the African Diaspora remains largely uncharacterized. Here we present genetic variation from deeply sequenced genomes of 642 individuals from North and South American, Caribbean and West African populations, substantially increasing the lexicon of human genomic variation and suggesting much variation remains to be discovered in African-admixed populations in the Americas. We summarize genetic variation in these populations, quantifying the postcolonial sex-biased European gene flow across multiple regions. Moreover, we refine estimates on the burden of deleterious variants carried across populations and how this varies with African ancestry. Our data are an important resource for empowering disease mapping studies in African-admixed individuals and will facilitate gene discovery for diseases disproportionately affecting individuals of African ancestry.

136 citations


Journal ArticleDOI
TL;DR: This study combines structural models of myosin from multiple stages of its chemomechanical cycle, exome sequencing data from two population cohorts of 60,706 and 42,930 individuals, and genetic and phenotypic data from 2,913 patients with HCM to identify regions of disease enrichment within β-cardiac myOSin.
Abstract: Myosin motors are the fundamental force-generating elements of muscle contraction. Variation in the human β-cardiac myosin heavy chain gene (MYH7) can lead to hypertrophic cardiomyopathy (HCM), a heritable disease characterized by cardiac hypertrophy, heart failure, and sudden cardiac death. How specific myosin variants alter motor function or clinical expression of disease remains incompletely understood. Here, we combine structural models of myosin from multiple stages of its chemomechanical cycle, exome sequencing data from two population cohorts of 60,706 and 42,930 individuals, and genetic and phenotypic data from 2,913 patients with HCM to identify regions of disease enrichment within β-cardiac myosin. We first developed computational models of the human β-cardiac myosin protein before and after the myosin power stroke. Then, using a spatial scan statistic modified to analyze genetic variation in protein 3D space, we found significant enrichment of disease-associated variants in the converter, a kinetic domain that transduces force from the catalytic domain to the lever arm to accomplish the power stroke. Focusing our analysis on surface-exposed residues, we identified a larger region significantly enriched for disease-associated variants that contains both the converter domain and residues on a single flat surface on the myosin head described as the myosin mesa. Notably, patients with HCM with variants in the enriched regions have earlier disease onset than patients who have HCM with variants elsewhere. Our study provides a model for integrating protein structure, large-scale genetic sequencing, and detailed phenotypic data to reveal insight into time-shifted protein structures and genetic disease.

99 citations


Journal ArticleDOI
TL;DR: To investigate positive selection on the dog lineage early in the domestication, patterns of polymorphism in six canid genomes were examined and 349 outlier regions consistent with positive selection at a low FDR were identified.
Abstract: Controlling for background demographic effects is important for accurately identifying loci that have recently undergone positive selection. To date, the effects of demography have not yet been explicitly considered when identifying loci under selection during dog domestication. To investigate positive selection on the dog lineage early in the domestication, we examined patterns of polymorphism in six canid genomes that were previously used to infer a demographic model of dog domestication. Using an inferred demographic model, we computed false discovery rates (FDR) and identified 349 outlier regions consistent with positive selection at a low FDR. The signals in the top 100 regions were frequently centered on candidate genes related to brain function and behavior, including LHFPL3, CADM2, GRIK3, SH3GL2, MBP, PDE7B, NTAN1, and GLRA1. These regions contained significant enrichments in behavioral ontology categories. The 3rd top hit, CCRN4L, plays a major role in lipid metabolism, that is supported by additional metabolism related candidates revealed in our scan, including SCP2D1 and PDXC1. Comparing our method to an empirical outlier approach that does not directly account for demography, we found only modest overlaps between the two methods, with 60% of empirical outliers having no overlap with our demography-based outlier detection approach. Demography-aware approaches have lower-rates of false discovery. Our top candidates for selection, in addition to expanding the set of neurobehavioral candidate genes, include genes related to lipid metabolism, suggesting a dietary target of selection that was important during the period when proto-dogs hunted and fed alongside hunter-gatherers.

97 citations


Journal ArticleDOI
TL;DR: This study is the first to analyze within- and between-species genome-wide recombination rate variation in several close relatives, and finds that various correlates of recombinations rate persist throughout the African great apes including repeats, diversity, and divergence.
Abstract: We present three linkage-disequilibrium (LD)-based recombination maps generated using whole-genome sequence data from 10 Nigerian chimpanzees, 13 bonobos, and 15 western gorillas, collected as part of the Great Ape Genome Project (Prado-Martinez J, et al. 2013. Great ape genetic diversity and population history. Nature 499:471-475). We also identified species-specific recombination hotspots in each group using a modified LDhot framework, which greatly improves statistical power to detect hotspots at varying strengths. We show that fewer hotspots are shared among chimpanzee subspecies than within human populations, further narrowing the time scale of complete hotspot turnover. Further, using species-specific PRDM9 sequences to predict potential binding sites (PBS), we show higher predicted PRDM9 binding in recombination hotspots as compared to matched cold spot regions in multiple great ape species, including at least one chimpanzee subspecies. We found that correlations between broad-scale recombination rates decline more rapidly than nucleotide divergence between species. We also compared the skew of recombination rates at centromeres and telomeres between species and show a skew from chromosome means extending as far as 10-15 Mb from chromosome ends. Further, we examined broad-scale recombination rate changes near a translocation in gorillas and found minimal differences as compared to other great ape species perhaps because the coordinates relative to the chromosome ends were unaffected. Finally, on the basis of multiple linear regression analysis, we found that various correlates of recombination rate persist throughout the African great apes including repeats, diversity, and divergence. Our study is the first to analyze within- and between-species genome-wide recombination rate variation in several close relatives.

89 citations


Journal ArticleDOI
TL;DR: This work presents an alternative correction method called eigenMT, which runs over 500 times faster than permutation-based methods and has adjusted p values that closely approximate empirical ones.
Abstract: Methods for multiple-testing correction in local expression quantitative trait locus (cis-eQTL) studies are a trade-off between statistical power and computational efficiency. Bonferroni correction, though computationally trivial, is overly conservative and fails to account for linkage disequilibrium between variants. Permutation-based methods are more powerful, though computationally far more intensive. We present an alternative correction method called eigenMT, which runs over 500 times faster than permutations and has adjusted p values that closely approximate empirical ones. To achieve this speed while also maintaining the accuracy of permutation-based methods, we estimate the effective number of independent variants tested for association with a particular gene, termed Meff, by using the eigenvalue decomposition of the genotype correlation matrix. We employ a regularized estimator of the correlation matrix to ensure Meff is robust and yields adjusted p values that closely approximate p values from permutations. Finally, using a common genotype matrix, we show that eigenMT can be applied with even greater efficiency to studies across tissues or conditions. Our method provides a simpler, more efficient approach to multiple-testing correction than existing methods and fits within existing pipelines for eQTL discovery.

88 citations


Journal ArticleDOI
TL;DR: The estimate that the time to the most recent common ancestor (TMRCA) of Neandertal and modern human Y chromosomes is ∼588 thousand years ago (kya) suggests that the Y-chromosome divergence mirrors the population divergence of Ne andertals andmodern human ancestors, and it refutes alternative scenarios of a relatively recent or super-archaic origin of Neanderthal Y chromosomes.
Abstract: Sequencing the genomes of extinct hominids has reshaped our understanding of modern human origins. Here, we analyze ∼120 kb of exome-captured Y-chromosome DNA from a Neandertal individual from El Sidron, Spain. We investigate its divergence from orthologous chimpanzee and modern human sequences and find strong support for a model that places the Neandertal lineage as an outgroup to modern human Y chromosomes—including A00, the highly divergent basal haplogroup. We estimate that the time to the most recent common ancestor (TMRCA) of Neandertal and modern human Y chromosomes is ∼588 thousand years ago (kya) (95% confidence interval [CI]: 447–806 kya). This is ∼2.1 (95% CI: 1.7–2.9) times longer than the TMRCA of A00 and other extant modern human Y-chromosome lineages. This estimate suggests that the Y-chromosome divergence mirrors the population divergence of Neandertals and modern human ancestors, and it refutes alternative scenarios of a relatively recent or super-archaic origin of Neandertal Y chromosomes. The fact that the Neandertal Y we describe has never been observed in modern humans suggests that the lineage is most likely extinct. We identify protein-coding differences between Neandertal and modern human Y chromosomes, including potentially damaging changes to PCDH11Y, TMSB4Y, USP9Y, and KDM5D. Three of these changes are missense mutations in genes that produce male-specific minor histocompatibility (H-Y) antigens. Antigens derived from KDM5D, for example, are thought to elicit a maternal immune response during gestation. It is possible that incompatibilities at one or more of these genes played a role in the reproductive isolation of the two groups.

73 citations


Journal ArticleDOI
14 Dec 2016-PLOS ONE
TL;DR: The methodology for selecting trait-specific content for use in multi-ethnic populations and how enriching MEGA for this content may contribute to deeper biological understanding of the genetic etiology of complex disease are described.
Abstract: Investigating genetic architecture of complex traits in ancestrally diverse populations is imperative to understand the etiology of disease. However, the current paucity of genetic research in people of African and Latin American ancestry, Hispanic and indigenous peoples in the United States is likely to exacerbate existing health disparities for many common diseases. The Population Architecture using Genomics and Epidemiology, Phase II (PAGE II), Study was initiated in 2013 by the National Human Genome Research Institute to expand our understanding of complex trait loci in ethnically diverse and well characterized study populations. To meet this goal, the Multi-Ethnic Genotyping Array (MEGA) was designed to substantially improve fine-mapping and functional discovery by increasing variant coverage across multiple ethnicities at known loci for metabolic, cardiovascular, renal, inflammatory, anthropometric, and a variety of lifestyle traits. Studying the frequency distribution of clinically relevant mutations, putative risk alleles, and known functional variants across multiple populations will provide important insight into the genetic architecture of complex diseases and facilitate the discovery of novel, sometimes population-specific, disease associations. DNA samples from 51,650 self-identified African ancestry (17,328), Hispanic/Latino (22,379), Asian/Pacific Islander (8,640), and American Indian (653) and an additional 2,650 participants of either South Asian or European ancestry, and other reference panels have been genotyped on MEGA by PAGE II. MEGA was designed as a new resource for studying ancestrally diverse populations. Here, we describe the methodology for selecting trait-specific content for use in multi-ethnic populations and how enriching MEGA for this content may contribute to deeper biological understanding of the genetic etiology of complex disease.

Journal ArticleDOI
TL;DR: Investigation of evolutionarily independent lineages of livebearing fishes that have colonized and adapted to springs rich in H2S and compared their genome-wide gene expression patterns with closely related lineages from adjacent, nonsulfidic streams suggests modification of processes associated with H2s detoxification and toxicity likely complement each other to mediate elevated H2 S tolerance in sulfide spring fishes.
Abstract: Hydrogen sulfide (H2S) is a potent toxicant interfering with oxidative phosphorylation in mitochondria and creating extreme environmental conditions in aquatic ecosystems. The mechanistic basis of adaptation to perpetual exposure to H2S remains poorly understood. We investigated evolutionarily independent lineages of livebearing fishes that have colonized and adapted to springs rich in H2S and compared their genome-wide gene expression patterns with closely related lineages from adjacent, nonsulfidic streams. Significant differences in gene expression were uncovered between all sulfidic and nonsulfidic population pairs. Variation in the number of differentially expressed genes among population pairs corresponded to differences in divergence times and rates of gene flow, which is consistent with neutral drift driving a substantial portion of gene expression variation among populations. Accordingly, there was little evidence for convergent evolution shaping large-scale gene expression patterns among independent sulfide spring populations. Nonetheless, we identified a small number of genes that was consistently differentially expressed in the same direction in all sulfidic and nonsulfidic population pairs. Functional annotation of shared differentially expressed genes indicated upregulation of genes associated with enzymatic H2S detoxification and transport of oxidized sulfur species, oxidative phosphorylation, energy metabolism, and pathways involved in responses to oxidative stress. Overall, our results suggest that modification of processes associated with H2S detoxification and toxicity likely complement each other to mediate elevated H2S tolerance in sulfide spring fishes. Our analyses allow for the development of novel hypotheses about biochemical and physiological mechanisms of adaptation to extreme environments.

Journal ArticleDOI
TL;DR: The complete mechanochemical cycle of ClpXP is characterized, showing that ADP release and ATP binding occur nonsequentially during the dwell, whereas ATP hydrolysis and phosphate release occur during the burst.
Abstract: Single-molecule spectroscopy reveals the complete mechanochemical cycle of the AAA+ protease ClpXP: ADP release and ATP binding occur during the dwell phase, whereas ATP hydrolysis and Pi release occur during the burst phase. ATP-dependent proteases of the AAA+ family, including Escherichia coli ClpXP and the eukaryotic proteasome, contribute to maintenance of cellular proteostasis. ClpXP unfolds and translocates substrates into an internal degradation chamber, using cycles of alternating dwell and burst phases. The ClpX motor performs chemical transformations during the dwell and translocates the substrate in increments of 1–4 nm during the burst, but the processes occurring during these phases remain unknown. Here we characterized the complete mechanochemical cycle of ClpXP, showing that ADP release and ATP binding occur nonsequentially during the dwell, whereas ATP hydrolysis and phosphate release occur during the burst. The highly conserved translocating loops within the ClpX pore are optimized to maximize motor power generation, the coupling between chemical and mechanical tasks, and the efficiency of protein processing. Conformational resetting of these loops between consecutive bursts appears to determine ADP release from individual ATPase subunits and the overall duration of the motor's cycle.

Journal ArticleDOI
TL;DR: Progress on a key set of nine research challenge areas will help generate the knowledge required to advance effectively next-generation sequencing diagnostics to the clinic.
Abstract: Next-generation sequencing technologies are fueling a wave of new diagnostic tests. Progress on a key set of nine research challenge areas will help generate the knowledge required to advance effectively these diagnostics to the clinic.

Journal ArticleDOI
TL;DR: The genome provides an important addition to the linkage map and transcriptomic tools recently developed for this species that together provide critical resources for epigenetic, transcriptomic, and proteomic analyses and will serve as the foundation for addressing key questions in behavior, physiology, toxicology, and evolutionary biology.
Abstract: The mangrove rivulus (Kryptolebias marmoratus) is one of two preferentially self-fertilizing hermaphroditic vertebrates. This mode of reproduction makes mangrove rivulus an important model for evolutionary and biomedical studies because long periods of self-fertilization result in naturally homozygous genotypes that can produce isogenic lineages without significant limitations associated with inbreeding depression. Over 400 isogenic lineages currently held in laboratories across the globe show considerable among-lineage variation in physiology, behavior, and life history traits that is maintained under common garden conditions. Temperature mediates the development of primary males and also sex change between hermaphrodites and secondary males, which makes the system ideal for the study of sex determination and sexual plasticity. Mangrove rivulus also exhibit remarkable adaptations to living in extreme environments, and the system has great promise to shed light on the evolution of terrestrial locomotion, aerial respiration, and broad tolerances to hypoxia, salinity, temperature, and environmental pollutants. Genome assembly of the mangrove rivulus allows the study of genes and gene families associated with the traits described above. Here we present a de novo assembled reference genome for the mangrove rivulus, with an approximately 900 Mb genome, including 27,328 annotated, predicted, protein-coding genes. Moreover, we are able to place more than 50% of the assembled genome onto a recently published linkage map. The genome provides an important addition to the linkage map and transcriptomic tools recently developed for this species that together provide critical resources for epigenetic, transcriptomic, and proteomic analyses. Moreover, the genome will serve as the foundation for addressing key questions in behavior, physiology, toxicology, and evolutionary biology.

Journal ArticleDOI
TL;DR: Improvements to the ADMIXTURE software are described, allowing users to extract more information from large genomic datasets, and increased power to detect sex-biased admixture in African-American individuals from the 1000 Genomes project is demonstrated.
Abstract: A number of large genomic datasets are being generated for studies of human ancestry and diseases. The ADMIXTURE program is commonly used to infer individual ancestry from genomic data. We describe two improvements to the ADMIXTURE software. The first enables ADMIXTURE to infer ancestry for a new set of individuals using cluster allele frequencies from a reference set of individuals. Using data from the 1000 Genomes Project, we show that this allows ADMIXTURE to infer ancestry for 10,920 individuals in a few hours (a 5 × speedup). This mode also allows ADMIXTURE to correctly estimate individual ancestry and allele frequencies from a set of related individuals. The second modification allows ADMIXTURE to correctly handle X-chromosome (and other haploid) data from both males and females. We demonstrate increased power to detect sex-biased admixture in African-American individuals from the 1000 Genomes project using this extension. These modifications make ADMIXTURE more efficient and versatile, allowing users to extract more information from large genomic datasets.

Journal ArticleDOI
TL;DR: This work reviews and evaluates a fisheries-independent method of indexing population size; inferring adult abundance from estimates of the genetic effective size of a population (Ne), and shows that declines in Ne track declines in the abundance of model fisheries species.
Abstract: Sustainable exploitation of fisheries populations is challenging to achieve when the size of the population prior to exploitation and the actual numbers removed over time and across fishing zones are not clearly known. Quantitative fisheries' modeling is able to address this problem, but accurate and reliable model outcomes depend on high quality input data. Much of this information is obtained through the operation of the fishery under consideration, but while this seems appropriate, biases may occur. For example, poorly quantified changes in fishing methods that increase catch rates can erroneously suggest that the overall population size is increasing. Hence, the incorporation of estimates of abundance derived from independent data sources is preferable. We review and evaluate a fisheries-independent method of indexing population size; inferring adult abundance from estimates of the genetic effective size of a population (Ne ). Recent studies of elasmobranch species have shown correspondence between Ne and ecologically determined estimates of the population size (N). Simulation studies have flagged the possibility that the range of Ne /N ratios across species may be more restricted than previously thought, and also show that declines in Ne track declines in the abundance of model fisheries species. These key developments bring this new technology closer to implementation in fisheries science, particularly for data-poor fisheries or species of conservation interest.

Journal ArticleDOI
TL;DR: A statistical method to remove errors caused by restriction site polymorphisms, implemented in the software package GBStools, which is most effective at removing genotype errors in data sets over 100 samples when coverage is 40X or higher, and is most pronounced in species with high genomic diversity.
Abstract: Reduced representation sequencing methods such as genotyping-by-sequencing (GBS) enable low-cost measurement of genetic variation without the need for a reference genome assembly. These methods are widely used in genetic mapping and population genetics studies, especially with non-model organisms. Variant calling error rates, however, are higher in GBS than in standard sequencing, in particular due to restriction site polymorphisms, and few computational tools exist that specifically model and correct these errors. We developed a statistical method to remove errors caused by restriction site polymorphisms, implemented in the software package GBStools. We evaluated it in several simulated data sets, varying in number of samples, mean coverage and population mutation rate, and in two empirical human data sets (N = 8 and N = 63 samples). In our simulations, GBStools improved genotype accuracy more than commonly used filters such as Hardy-Weinberg equilibrium p-values. GBStools is most effective at removing genotype errors in data sets over 100 samples when coverage is 40X or higher, and the improvement is most pronounced in species with high genomic diversity. We also demonstrate the utility of GBS and GBStools for human population genetic inference in Argentine populations and reveal widely varying individual ancestry proportions and an excess of singletons, consistent with recent population growth.

Journal ArticleDOI
TL;DR: Analysis of diet from stomach contents revealed that O. bruniensis preys exclusively on the egg capsules of holocephalans, potentially making it the only known elasmobranch with a diet reliant solely upon other chondrichthyans.
Abstract: The reproductive biology and diet of prickly dogfish Oxynotus bruniensis, a deep-sea elasmobranch, endemic to the outer continental and insular shelves of southern Australia and New Zealand, and caught as by-catch in demersal fisheries, are described from specimens caught in New Zealand waters. A total of 53 specimens were obtained from research surveys and commercial fisheries, including juveniles and adults ranging in size from 33·5 to 75·6 cm total length (LT ). Estimated size-at-maturity was 54·7 cm LT in males and 64·0 cm LT in females. Three gravid females (65·0, 67·5 and 71·2 cm LT ) were observed, all with eight embryos. Size-at-birth was estimated to be 25-27 cm LT . Vitellogenesis was not concurrent with embryo development. Analysis of diet from stomach contents, including DNA identification of prey using the mitochondrial genes cox1 and nadh2, revealed that O. bruniensis preys exclusively on the egg capsules of holocephalans, potentially making it the only known elasmobranch with a diet reliant solely upon other chondrichthyans. Based on spatial overlap with deep-sea fisheries, a highly specialized diet, and reproductive characteristics representative of a low productivity fish, the commercial fisheries by-catch of O. bruniensis may put this species at relatively high risk of overfishing.

Journal ArticleDOI
TL;DR: The giant devil ray is the only mobulid assessed as Endangered due to its restricted distribution, high bycatch mortality and suspected population decline, and comparison with the partial mitogenome of M. japanica suggests a sister-cryptic species complex and two different taxonomic units.
Abstract: The giant devil ray, Mobula mobular, is a member of one of the most distinct groups of cartilaginous fishes, the Mobulidae (manta and devil rays), and is the only mobulid assessed as Endangered due its restricted distribution, high bycatch mortality and suspected population decline. The complete mitochondrial genome is 18 913 base pairs in length and comprises 2 rRNAs, 13 protein-coding genes, 22 tRNAs and 2 non-coding regions. Comparison with the partial mitogenome of M. japanica suggests a sister-cryptic species complex and two different taxonomic units. However, the limited divergence within the species (>99.9% genetic identity) may be the result of a geographically and numerically restricted population of M. mobular within the Mediterranean Sea.

Posted ContentDOI
24 Nov 2016-bioRxiv
TL;DR: This work disentangle recent population history in the widely-used 1000 Genomes Project reference panel with an emphasis on populations underrepresented in medical studies, and highlights the need for generalized risk prediction methods and the inclusion of more diverse individuals in medical genomics.
Abstract: The vast majority of genome-wide association studies are performed in Europeans, and their transferability to other populations is dependent on many factors (e.g. linkage disequilibrium, allele frequencies, genetic architecture). As medical genomics studies become increasingly large and diverse, gaining insights into population history and consequently the transferability of disease risk measurement is critical. Here, we disentangle recent population history in the widely-used 1000 Genomes Project reference panel, with an emphasis on populations underrepresented in medical studies. To examine the transferability of single-ancestry GWAS, we used published summary statistics to calculate polygenic risk scores for six well-studied traits and diseases. We identified directional inconsistencies in all scores; for example, height is predicted to decrease with genetic distance from Europeans, despite robust anthropological evidence that West Africans are as tall as Europeans on average. To gain deeper quantitative insights into GWAS transferability, we developed a complex trait coalescent-based simulation framework considering effects of polygenicity, causal allele frequency divergence, and heritability. As expected, correlations between true and inferred risk were typically highest in the population from which summary statistics were derived. We demonstrated that scores inferred from European GWAS were biased by genetic drift in other populations even when choosing the same causal variants, and that biases in any direction were possible and unpredictable. This work cautions that summarizing findings from large-scale GWAS may have limited portability to other populations using standard approaches, and highlights the need for generalized risk prediction methods and the inclusion of more diverse individuals in medical genomics.

Journal ArticleDOI
TL;DR: The proposed mitogenome may indicate the presence of two separate stocks or two different species of Z. chilensis in South America and highlights the need for caution when using genetic resources without a taxonomic reference or a voucher specimen.
Abstract: The yellownose skate Zearaja chilensis is endemic to South America. The species is the target of a valuable commercial fishery in Chile, but is highly susceptible to over-exploitation. The complete mitochondrial genome was described from 694,593 sequences obtained using Ion Torrent Next Generation Sequencing. The total length of the mitogenome was 16,909 bp, comprising 2 rRNAs, 13 protein-coding genes, 22 tRNAs and 2 non-coding regions. Comparison between the proposed mitogenome and one previously described from “raw fish fillets from a skate speciality restaurant in Seoul, Korea” resulted in 97.4% similarity, rather than approaching 100% similarity as might be expected. The 2.6% dissimilarity may indicate the presence of two separate stocks or two different species of, ostensibly, Z. chilensis in South America and highlights the need for caution when using genetic resources without a taxonomic reference or a voucher specimen.

Journal ArticleDOI
TL;DR: The phylogenomic reconstruction inferred from the mitogenome of 15 species of Lamniform and Carcharhiniform sharks supports the inclusion of C. taurus in a clade with the Lamnidae and Cetorhinidae and contributes to ongoing investigation into the monophyly of the Family Odontaspididae.
Abstract: The complete mitochondrial genome of the grey nurse shark Carcharias taurus is described from 25 963 828 sequences obtained using Illumina NGS technology. Total length of the mitogenome is 16 715 bp, consisting of 2 rRNAs, 13 protein-coding regions, 22 tRNA and 2 non-coding regions thus updating the previously published mitogenome for this species. The phylogenomic reconstruction inferred from the mitogenome of 15 species of Lamniform and Carcharhiniform sharks supports the inclusion of C. taurus in a clade with the Lamnidae and Cetorhinidae. This complete mitogenome contributes to ongoing investigation into the monophyly of the Family Odontaspididae.

Posted ContentDOI
25 Apr 2016-bioRxiv
TL;DR: The evolutionary history of a gene associated with resistance to the most common malaria-causing parasite, Plasmodium vivax, is revisited and it is shown that it is one of regions of the human genome that has been under strongest selective pressure in the authors' evolutionary history.
Abstract: The human DARC (Duffy antigen receptor for chemokines) gene encodes a membrane-bound chemokine receptor crucial for the infection of red blood cells by Plasmodium vivax, a major causative agent of malaria. Of the three major allelic classes segregating in human populations, the FY*O allele has been shown to protect against P. vivax infection and is near fixation in sub-Saharan Africa, while FY*B and FY*A are common in Europe and Asia, respectively. Due to the combination of its strong geographic differentiation and association with malaria resistance, DARC is considered a canonical example of a locus under positive selection in humans. Here, we use sequencing data from over 1,000 individuals in twenty-one human populations, as well as ancient human and great ape genomes, to analyze the fine scale population structure of DARC. We estimate the time to most recent common ancestor (TMRCA) of the FY*O mutation to be 42 kya (95% CI: 34-49 kya). We infer the FY*O null mutation swept to fixation in Africa from standing variation with very low initial frequency (0.1%) and a selection coefficient of 0.043 (95% CI:0.011-0.18), which is among the strongest estimated in the genome. We estimate the TMRCA of the FY*A mutation to be 57 kya (95% CI: 48-65 kya) and infer that, prior to the sweep of FY*O, all three alleles were segregating in Africa, as highly diverged populations from Asia and Khomani San hunter-gatherers share the same FY*A haplotypes. We test multiple models of admixture that may account for this observation and reject recent Asian or European admixture as the cause.

Journal ArticleDOI
TL;DR: Phylogenetic analysis based on mtDNA revealed low genetic divergence among longnose skates, in particular, those dwelling the continental shelf and slope off the coasts of Chile and Argentina.
Abstract: The complete mitochondrial genome of the roughskin skate Dipturus trachyderma is described from 1 455 724 sequences obtained using Illumina NGS technology. Total length of the mitogenome was 16 909 base pairs, comprising 2 rRNAs, 13 protein-coding genes, 22 tRNAs and 2 non-coding regions. Phylogenetic analysis based on mtDNA revealed low genetic divergence among longnose skates, in particular, those dwelling the continental shelf and slope off the coasts of Chile and Argentina.

Posted Content
TL;DR: Golestanian theoretically examined a number of possible explanations for this phenomenon and concluded that "collective heating" is the best candidate to account for the observed diffusion coefficient increase as discussed by the authors, but this explanation cannot possibly apply to our experiments.
Abstract: We (Riedel et al. Nature 2015), as well as others, have showed that some enzymes exhibit enhanced diffusion when active. In a recent PRL, (Golestanian, PRL 2015, arXiv:1508.03219) R.Golestanian theoretically examines a number of possible explanations for this phenomenon and concludes that "collective heating" is the best candidate to account for the observed diffusion coefficient increase. Here we present evidence showing that collective heating cannot possibly apply to our experiments.

Posted ContentDOI
23 Aug 2016-bioRxiv
TL;DR: It is shown that the transferability of results from GWAS are dependent on the ancestral diversity of the study cohort as well as the phenotype polygenicity, causal allele frequency divergence, and heritability, and the need for inclusion of more diverse samples in medical genomics studies to enable broadly applicable disease risk information.
Abstract: Background: Genome-wide association studies (GWAS) have largely focused on European descent populations, and the transferability of these findings to diverse populations is dependent on many factors, including selection, genetic divergence, heritability, and phenotype complexity. As medical genomics studies become increasingly large and ethnically diverse, gaining clear insight into population history and genetic diversity from available reference panels is critically important. Results: We disentangle the population history of the widely-used 1000 Genomes Project reference panel, with an emphasis on underrepresented Hispanic/Latino and African descent populations. By leveraging haplotype sharing, linkage disequilibrium decay, and ancestry deconvolution along chromosomes in admixed populations, we gain insights into ancestral allele frequencies, the origins, rates, and timings of admixture, and sex-biased demography. We make empirical observations to evaluate the impact of population structure in association studies, with conclusions that inform rare variant association in diverse populations, how we use standard GWAS tools, and transferability of findings across populations. Finally, we show through coalescent simulations that inferred polygenic risk scores derived from European GWAS are biased when applied to diverse populations. Conclusions: Our study provides fine-scale insight into the sampling, genetic origins, divergence, and sex-biased history of admixture in the 1000 Genomes Project populations. We show that the transferability of results from GWAS are dependent on the ancestral diversity of the study cohort as well as the phenotype polygenicity, causal allele frequency divergence, and heritability. This work highlights the need for inclusion of more diverse samples in medical genomics studies to enable broadly applicable disease risk information.

Journal ArticleDOI
TL;DR: African Americans have a higher incidence of venous thromboembolism than European descent individuals, however, the typical genetic risk factors in populations of European descent are nearly absent in African Americans, and population‐specific genetic factors influencing the higher VTE rate are not well characterized.
Abstract: Introduction African Americans have a higher incidence of venous thromboembolism (VTE) than European descent individuals. However, the typical genetic risk factors in populations of European descent are nearly absent in African Americans, and population-specific genetic factors influencing the higher VTE rate are not well characterized. Methods We performed a candidate gene analysis on an exome-sequenced African American family with recurrent VTE and identified a variant in Protein S (PROS1) V510M (rs138925964). We assessed the population impact of PROS1 V510M using a multicenter African American cohort of 306 cases with VTE compared to 370 controls. Additionally, we compared our case cohort to a background population cohort of 2203 African Americans in the NHLBI GO Exome Sequencing Project (ESP). Results In the African American family with recurrent VTE, we found prior laboratories for our cases indicating low free Protein S levels, providing functional support for PROS1 V510M as the causative mutation. Additionally, this variant was significantly enriched in the VTE cases of our multicenter case–control study (Fisher's Exact Test, P = 0.0041, OR = 4.62, 95% CI: 1.51–15.20; allele frequencies – cases: 2.45%, controls: 0.54%). Similarly, PROS1 V510M was also enriched in our VTE case cohort compared to African Americans in the ESP cohort (Fisher's Exact Test, P = 0.010, OR = 2.28, 95% CI: 1.26–4.10). Conclusions We found a variant, PROS1 V510M, in an African American family with VTE and clinical laboratory abnormalities in Protein S. Additionally, we found that this variant conferred increased risk of VTE in a case–control study of African Americans. In the ESP cohort, the variant is nearly absent in ESP European descent subjects (n = 3, allele frequency: 0.03%). Additionally, in 1000 Genomes Phase 3 data, the variant only appears in African descent populations. Thus, PROS1 V510M is a population-specific genetic risk factor for VTE in African Americans.

Journal ArticleDOI
TL;DR: A method that uses genotype array data from the sequenced samples, rather than public data such as HapMap or dbSNP, to train an accurate classifier using Random Forests to better distinguish true from false positive calls is presented.
Abstract: Motivation Variant calling from next-generation sequencing (NGS) data is susceptible to false positive calls due to sequencing, mapping and other errors. To better distinguish true from false positive calls, we present a method that uses genotype array data from the sequenced samples, rather than public data such as HapMap or dbSNP, to train an accurate classifier using Random Forests. We demonstrate our method on a set of variant calls obtained from 642 African-ancestry genomes from the Consortium on Asthma among African-ancestry Populations in the Americas (CAAPA), sequenced to high depth (30X). Results We have applied our classifier to compare call sets generated with different calling methods, including both single-sample and multi-sample callers. At a False Positive Rate of 5%, our method determines true positive rates of 97.5%, 95% and 99% on variant calls obtained using Illuminas single-sample caller CASAVA, Real Time Genomics multisample variant caller, and the GATK UnifiedGenotyper, respectively. Since NGS sequencing data may be accompanied by genotype data for the same samples, either collected concurrent to sequencing or from a previous study, our method can be trained on each dataset to provide a more accurate computational validation of site calls compared to generic methods. Moreover, our method allows for adjustment based on allele frequency (e.g. a different set of criteria to determine quality for rare versus common variants) and thereby provides insight into sequencing characteristics that indicate call quality for variants of different frequencies. Availability and implementation Code is available on Github at: https://github.com/suyashss/variant_validation. Contacts suyashs@stanford.edu or mtaub@jhsph.edu. Supplementary information Supplementary data are available at Bioinformatics online.

Posted ContentDOI
10 Feb 2016-bioRxiv
TL;DR: Two improvements to the ADMIXTURE software are described, allowing users to extract more information from large genomic datasets, and increased power to detect sex-biased admixture in African-American individuals from the 1000 Genomes project is demonstrated.
Abstract: Background: A number of large genomic datasets are being generated for studies of human ancestry and diseases. The ADMIXTURE program is commonly used to infer individual ancestry from genomic data. Results: We describe two improvements to the ADMIXTURE software. The first enables ADMIXTURE to infer ancestry for a new set of individuals using cluster allele frequencies from a reference set of individuals. Using data from the 1000 Genomes Project, we show that this allows ADMIXTURE to infer ancestry for 10,920 individuals in a few hours (a 5x speedup). This mode also allows ADMIXTURE to correctly estimate individual ancestry and allele frequencies from a set of related individuals. The second modification allows ADMIXTURE to correctly handle X-chromosome (and other haploid) data from both males and females. We demonstrate increased power to detect sex-biased admixture in African-American individuals from the 1000 Genomes project using this extension. Conclusions: These modifications make ADMIXTURE more efficient and versatile, allowing users to extract more information from large genomic datasets.