scispace - formally typeset
Search or ask a question

Showing papers by "Carlos Bustamante published in 2017"


Journal ArticleDOI
TL;DR: It is demonstrated that scores inferred from European GWASs are biased by genetic drift in other populations even when choosing the same causal variants and that biases in any direction are possible and unpredictable.
Abstract: The vast majority of genome-wide association studies (GWASs) are performed in Europeans, and their transferability to other populations is dependent on many factors (e.g., linkage disequilibrium, allele frequencies, genetic architecture). As medical genomics studies become increasingly large and diverse, gaining insights into population history and consequently the transferability of disease risk measurement is critical. Here, we disentangle recent population history in the widely used 1000 Genomes Project reference panel, with an emphasis on populations underrepresented in medical studies. To examine the transferability of single-ancestry GWASs, we used published summary statistics to calculate polygenic risk scores for eight well-studied phenotypes. We identify directional inconsistencies in all scores; for example, height is predicted to decrease with genetic distance from Europeans, despite robust anthropological evidence that West Africans are as tall as Europeans on average. To gain deeper quantitative insights into GWAS transferability, we developed a complex trait coalescent-based simulation framework considering effects of polygenicity, causal allele frequency divergence, and heritability. As expected, correlations between true and inferred risk are typically highest in the population from which summary statistics were derived. We demonstrate that scores inferred from European GWASs are biased by genetic drift in other populations even when choosing the same causal variants and that biases in any direction are possible and unpredictable. This work cautions that summarizing findings from large-scale GWASs may have limited portability to other populations using standard approaches and highlights the need for generalized risk prediction methods and the inclusion of more diverse individuals in medical genomics.

1,073 citations


Journal ArticleDOI
05 May 2017-Science
TL;DR: The analysis of the genetic diversity of Bantu speakers revealed adaptive introgression of genes that likely originated in other African populations, including specific immune-related genes, and applied this information to African Americans suggests that gene flow from Africa into the Americas was more complex than previously thought.
Abstract: Bantu languages are spoken by about 310 million Africans, yet the genetic history of Bantu-speaking populations remains largely unexplored. We generated genomic data for 1318 individuals from 35 populations in western central Africa, where Bantu languages originated. We found that early Bantu speakers first moved southward, through the equatorial rainforest, before spreading toward eastern and southern Africa. We also found that genetic adaptation of Bantu speakers was facilitated by admixture with local populations, particularly for the HLA and LCT loci. Finally, we identified a major contribution of western central African Bantu speakers to the ancestry of African Americans, whose genomes present no strong signals of natural selection. Together, these results highlight the contribution of Bantu-speaking peoples to the complex genetic history of Africans and African Americans.

183 citations


Journal ArticleDOI
TL;DR: The original version of this article contained an error in the spelling of the author Christian A.M.A. Wilson, which was incorrectly given as Christian M. Wilson.
Abstract: The original version of this article contained an error in the spelling of the author Christian A.M. Wilson, which was incorrectly given as Christian M.A. Wilson. This has now been corrected in both the PDF and HTML versions of the article.

145 citations


Journal ArticleDOI
30 Nov 2017-Cell
TL;DR: It is demonstrated that skin pigmentation is highly heritable, but known pigmentation loci explain only a small fraction of the variance, and how the architecture of skin Pigmentation can vary across humans subject to different local evolutionary pressures is shown.

127 citations


Journal ArticleDOI
05 Oct 2017-Cell
TL;DR: This work uses genome-wide association mapping and gene-expression analysis to map the Mendelian blue locus, which abolishes yellow pigmentation in the budgerigar, and finds that the blue trait maps to a single amino acid substitution (R644W) in an uncharacterized polyketide synthase (MuPKS).

90 citations


Journal ArticleDOI
TL;DR: A survey of data analysis methods starting from an overview of basic statistical techniques underlying the analysis of super-resolution and, more broadly, imaging data is provided.
Abstract: Super-resolution microscopy provides direct insight into fundamental biological processes occurring at length scales smaller than light’s diffraction limit. The analysis of data at such scales has brought statistical and machine learning methods into the mainstream. Here we provide a survey of data analysis methods starting from an overview of basic statistical techniques underlying the analysis of super-resolution and, more broadly, imaging data. We subsequently break down the analysis of super-resolution data into four problems: the localization problem, the counting problem, the linking problem, and what we’ve termed the interpretation problem.

87 citations


Journal ArticleDOI
TL;DR: Se sequencing data from over 1,000 individuals in twenty-one human populations, as well as ancient human genomes, are used to perform a fine-scale investigation of the evolutionary history of DARC and infer that, prior to the sweep of FY*O, all three alleles were segregating in Africa, as highly diverged populations from Asia and ≠Khomani San hunter-gatherers share the same FY*A haplotypes.
Abstract: The human DARC (Duffy antigen receptor for chemokines) gene encodes a membrane-bound chemokine receptor crucial for the infection of red blood cells by Plasmodium vivax, a major causative agent of malaria. Of the three major allelic classes segregating in human populations, the FY*O allele has been shown to protect against P. vivax infection and is at near fixation in sub-Saharan Africa, while FY*B and FY*A are common in Europe and Asia, respectively. Due to the combination of strong geographic differentiation and association with malaria resistance, DARC is considered a canonical example of positive selection in humans. Despite this, details of the timing and mode of selection at DARC remain poorly understood. Here, we use sequencing data from over 1,000 individuals in twenty-one human populations, as well as ancient human genomes, to perform a fine-scale investigation of the evolutionary history of DARC. We estimate the time to most recent common ancestor (TMRCA) of the most common FY*O haplotype to be 42 kya (95% CI: 34-49 kya). We infer the FY*O null mutation swept to fixation in Africa from standing variation with very low initial frequency (0.1%) and a selection coefficient of 0.043 (95% CI:0.011-0.18), which is among the strongest estimated in the human genome. We estimate the TMRCA of the FY*A mutation in non-Africans to be 57 kya (95% CI: 48-65 kya) and infer that, prior to the sweep of FY*O, all three alleles were segregating in Africa, as highly diverged populations from Asia and ≠Khomani San hunter-gatherers share the same FY*A haplotypes. We test multiple models of admixture that may account for this observation and reject recent Asian or European admixture as the cause.

81 citations


Journal ArticleDOI
TL;DR: This paper proposes three practical strategies for reducing re-identification risks in beacons that manipulate the beacon such that the presence of rare alleles is obscured and budgets the number of accesses per user for each individual genome.

70 citations


Journal ArticleDOI
TL;DR: In this paper, the authors characterized the effects of the trefoil-knotted protein MJ0366 from Methanocaldococcus jannaschii on the operation of the ClpXP protease from Escherichia coli.
Abstract: ATP-dependent proteases translocate proteins through a narrow pore for their controlled destruction. However, how a protein substrate containing a knotted topology affects this process remains unknown. Here, we characterized the effects of the trefoil-knotted protein MJ0366 from Methanocaldococcus jannaschii on the operation of the ClpXP protease from Escherichia coli . ClpXP completely degrades MJ0366 when pulling from the C-terminal ssrA-tag. However, when a GFP moiety is appended to the N terminus of MJ0366, ClpXP releases intact GFP with a 47-residue tail. The extended length of this tail suggests that ClpXP tightens the trefoil knot against GFP, which prevents GFP unfolding. Interestingly, if the linker between the knot core of MJ0366 and GFP is longer than 36 residues, ClpXP tightens and translocates the knot before it reaches GFP, enabling the complete unfolding and degradation of the substrate. These observations suggest that a knot-induced stall during degradation of multidomain proteins by AAA proteases may constitute a novel mechanism to produce partially degraded products with potentially new functions.

60 citations


Journal ArticleDOI
TL;DR: It is found that phenological events throughout the growing season are correlated, and the marked difference in size between table and wine grapes is quantified, suggesting that religious rules concerning alcohol consumption have had a marked impact on patterns of phenomic and genomic diversity in grapes.
Abstract: Grapes are one of the most economically and culturally important crops worldwide, and they have been bred for both winemaking and fresh consumption. Here we evaluate patterns of diversity across 33 phenotypes collected over a 17-year period from 580 table and wine grape accessions that belong to one of the world's largest grape gene banks, the grape germplasm collection of the United States Department of Agriculture. We find that phenological events throughout the growing season are correlated, and quantify the marked difference in size between table and wine grapes. By pairing publicly available historical phenotype data with genome-wide polymorphism data, we identify large effect loci controlling traits that have been targeted during domestication and breeding, including hermaphroditism, lighter skin pigmentation and muscat aroma. Breeding for larger berries in table grapes was traditionally concentrated in geographic regions where Islam predominates and alcohol was prohibited, whereas wine grapes retained the ancestral smaller size that is more desirable for winemaking in predominantly Christian regions. We uncover a novel locus with a suggestive association with berry size that harbors a signature of positive selection for larger berries. Our results suggest that religious rules concerning alcohol consumption have had a marked impact on patterns of phenomic and genomic diversity in grapes.

58 citations


Journal ArticleDOI
TL;DR: The National Heart, Lung, and Blood Institute- and National Human Genome Research Institute-funded DCM Precision Medicine Study aims to enroll 1300 individuals who meet rigorous clinical criteria for idiopathic DCM along with 2600 of their relatives and conduct a randomized controlled trial to test the effectiveness of Family Heart Talk.
Abstract: Background— The cause of idiopathic dilated cardiomyopathy (DCM) is unknown by definition, but its familial subtype is considered to have a genetic component. We hypothesize that most idiopathic DCM, whether familial or nonfamilial, has a genetic basis, in which case a genetics-driven approach to identifying at-risk family members for clinical screening and early intervention could reduce morbidity and mortality. Methods— On the basis of this hypothesis, we have launched the National Heart, Lung, and Blood Institute- and National Human Genome Research Institute-funded DCM Precision Medicine Study, which aims to enroll 1300 individuals (600 non-Hispanic African ancestry, 600 non-Hispanic European ancestry, and 100 Hispanic) who meet rigorous clinical criteria for idiopathic DCM along with 2600 of their relatives. Enrolled relatives will undergo clinical cardiovascular screening to identify asymptomatic disease, and all individuals with idiopathic DCM will undergo exome sequencing to identify relevant variants in genes previously implicated in DCM. Results will be returned by genetic counselors 12 to 14 months after enrollment. The data obtained will be used to describe the prevalence of familial DCM among idiopathic DCM cases and the genetic architecture of idiopathic DCM in multiple ethnicity–ancestry groups. We will also conduct a randomized controlled trial to test the effectiveness of Family Heart Talk , an intervention to aid family communication, for improving uptake of preventive screening and surveillance in at-risk first-degree relatives. Conclusions— We anticipate that this study will demonstrate that idiopathic DCM has a genetic basis and guide best practices for a genetics-driven approach to early intervention in at-risk relatives. Clinical Trial Registration— URL: http://www.clinicaltrials.gov. Unique identifier: NCT03037632.

Posted ContentDOI
Genevieve L. Wojcik1, Misa Graff2, Katherine K. Nishimura3, Ran Tao4, Jeffrey Haessler3, Christopher R. Gignoux1, Heather M. Highland2, Yesha Patel5, Elena P. Sorokin1, Christy L. Avery2, Gillian M. Belbin6, Stephanie A. Bien3, Iona Cheng7, Sinead Cullina6, Chani J. Hodonsky2, Yao Hu3, Huckins Lm6, Janina M. Jeff6, Anne E. Justice2, Jonathan M. Kocarnik3, Unhee Lim8, Bridget M Lin2, Yingchang Lu6, Sarah C. Nelson9, Sung-Hyuk Park5, Hannah Poisner6, Michael Preuss6, Melissa A. Richard10, Claudia Schurmann6, Veronica Wendy Setiawan5, Alexandra Sockell1, Karan Vahi5, Abhishek Vishnu6, Marie Verbanck6, Ruth H. Walker6, Kris Young2, Niha Zubair3, Acuna-Alonso, José Luis Ambite5, Kathleen C. Barnes11, Eric Boerwinkle10, Erwin P. Bottinger6, Carlos Bustamante1, Christian Caberto8, Canizales-Quinteroes S, Matthew P. Conomos9, Ewa Deelman5, Ron Do6, Kimberly F. Doheny12, Lindsay Fernández-Rhodes2, Myriam Fornage10, Gerardo Heiss2, Brenna M. Henn13, Lucia A. Hindorff14, Rebecca D. Jackson15, Benyam Hailu14, Cecilia A. Laurie9, Cathy C. Laurie9, Yuqing Li7, Danyu Lin2, Andrés Moreno-Estrada16, Girish N. Nadkarni6, Paul Norman11, Loreall Pooler5, Alexander P. Reiner9, Jane Romm12, Sabati C1, Karla Sandoval16, Xin Sheng5, E Stahl6, Daniel O. Stram5, Timothy A. Thornton9, Christina L. Wassel17, L R Wilkens8, Cheryl A. Winkler, Sachiko Yoneyama2, Steve Buyske18, Christopher A. Haiman5, Charles Kooperberg3, Loic Le Marchand8, Loos R6, Tara C. Matise18, Kari E. North2, Ulrike Peters3, Eimear E. Kenny6, Christopher S. Carlson3 
15 Sep 2017-bioRxiv
TL;DR: The data shows strong evidence of effect-size heterogeneity across ancestries for published GWAS associations, substantial benefits for fine-mapping using diverse cohorts, and insights into clinical implications.
Abstract: Genome-wide association studies (GWAS) have laid the foundation for investigations into the biology of complex traits, drug development, and clinical guidelines. However, the dominance of European-ancestry populations in GWAS creates a biased view of the role of human variation in disease, and hinders the equitable translation of genetic associations into clinical and public health applications. The Population Architecture using Genomics and Epidemiology (PAGE) study conducted a GWAS of 26 clinical and behavioral phenotypes in 49,839 non-European individuals. Using strategies designed for analysis of multi-ethnic and admixed populations, we confirm 574 GWAS catalog variants across these traits, and find 38 secondary signals in known loci and 27 novel loci. Our data shows strong evidence of effect-size heterogeneity across ancestries for published GWAS associations, substantial benefits for fine-mapping using diverse cohorts, and insights into clinical implications. We strongly advocate for continued, large genome-wide efforts in diverse populations to reduce health disparities.

Posted ContentDOI
Sebastian M. Waszak1, Grace Tiao2, Bin Zhu3, Tobias Rausch1, Francesc Muyas4, Bernardo Rodriguez-Martin5, Raquel Rabionet6, Sergei Yakneen1, Geòrgia Escaramís, Yang Li7, Natalie Saini3, Steven A. Roberts8, German Demidov4, Esa Pitkänen1, Olivier Delaneau9, Jose Maria Heredia-Genestar10, Joachim Weischenfeldt11, Suyash Shringarpure12, Jieming Chen13, Hidewaki Nakagawa, Ludmil B. Alexandrov14, Oliver Drechsel4, L. J. Dursi15, Ayellet V. Segrè2, Erik Garrison7, Serap Erkek1, Nina Habermann1, Lara Urban1, Ekta Khurana16, Andy Cafferkey1, Shuto Hayashi17, Seiya Imoto17, Lauri A. Aaltonen18, Eva G. Alvarez5, Adrian Baez-Ortega19, Matthew A. Bailey20, Mattia Bosio4, Alicia L. Bruzos5, Ivo Buchhalter21, Carlos Bustamante12, Claudia Calabrese1, Anthony DiBiase22, Mark Gerstein20, Aliaksei Holik4, Xing Hua3, Kuan-lin Huang23, Ivica Letunic, Leszek J. Klimczak3, Roelof Koster3, Sushant Kumar20, Michael D. McLellan23, R. Jay Mashl23, Lisa Mirabello3, Steven Newhouse1, Aparna Prasad4, Gunnar Rätsch24, Matthias Schlesner21, Roland F. Schwarz21, Pramod Sharma22, Tal Shmaya, Nikos Sidiropoulos11, Lei Song3, Hana Susak4, Tomas Tanskanen18, Marta Tojo5, David C. Wedge25, Mark H. Wright12, Ying Wu, Kai Ye23, Venkata Yellapantula23, Jorge Zamora5, Atul J. Butte13, Gad Getz26, Jared T. Simpson15, Li Ding23, Tomas Marques-Bonet4, Arcadi Navarro4, Alvis Brazma1, Peter J. Campbell27, Stephen J. Chanock3, Nilanjan Chatterjee28, Oliver Stegle21, Reiner Siebert29, Stephan Ossowski4, Olivier Harismendy30, Dmitry A. Gordenin3, Jose M. C. Tubio5, Francisco M. De La Vega12, Douglas F. Easton19, Xavier Estivill, Jan O. Korbel1, Icgc 
01 Nov 2017-bioRxiv
TL;DR: This study highlights the major impact of rare and common germline variants on mutational landscapes in cancer and inferred over a hundred polymorphic L1/LINE elements with somatic retrotransposition activity in cancer.
Abstract: Cancers develop through somatic mutagenesis, however germline genetic variation can markedly contribute to tumorigenesis via diverse mechanisms. We discovered and phased 88 million germline single nucleotide variants, short insertions/deletions, and large structural variants in whole genomes from 2,642 cancer patients, and employed this genomic resource to study genetic determinants of somatic mutagenesis across 39 cancer types. Our analyses implicate damaging germline variants in a variety of cancer predisposition and DNA damage response genes with specific somatic mutation patterns. Mutations in the MBD4 DNA glycosylase gene showed association with elevated C>T mutagenesis at CpG dinucleotides, a ubiquitous mutational process acting across tissues. Analysis of somatic structural variation exposed complex rearrangement patterns, involving cycles of templated insertions and tandem duplications, in BRCA1-deficient tumours. Genome-wide association analysis implicated common genetic variation at the APOBEC3 gene cluster with reduced basal levels of somatic mutagenesis attributable to APOBEC cytidine deaminases across cancer types. We further inferred over a hundred polymorphic L1/LINE elements with somatic retrotransposition activity in cancer. Our study highlights the major impact of rare and common germline variants on mutational landscapes in cancer.

Posted ContentDOI
15 Sep 2017-bioRxiv
TL;DR: The data show strong evidence of effect-size heterogeneity across ancestries for published GWAS associations, which substantially restricts genetically-guided precision medicine and advocate for new, large genome-wide efforts in diverse populations to reduce health disparities.
Abstract: Genome-wide association studies (GWAS) have laid the foundation for many downstream investigations, including the biology of complex traits, drug development, and clinical guidelines. However, the dominance of European-ancestry populations in GWAS creates a biased view of human variation and hinders the translation of genetic associations into clinical and public health applications. To demonstrate the benefit of studying underrepresented populations, the Population Architecture using Genomics and Epidemiology (PAGE) study conducted a GWAS of 26 clinical and behavioral phenotypes in 49,839 non-European individuals. Using novel strategies for multi-ethnic analysis of admixed populations, we confirm 574 GWAS catalog variants across these traits, and find 28 novel loci and 42 residual signals in known loci. Our data show strong evidence of effect-size heterogeneity across ancestries for published GWAS associations, which substantially restricts genetically-guided precision medicine. We advocate for new, large genome-wide efforts in diverse populations to reduce health disparities.

Journal ArticleDOI
TL;DR: This array is designed based on the novel variation identified in 642 CAAPA samples of African ancestry with high coverage whole genome sequence data, and will enable better GWAS analyses for researchers with individuals of African descent in their study populations.
Abstract: A primary goal of The Consortium on Asthma among African-ancestry Populations in the Americas (CAAPA) is to develop an 'African Diaspora Power Chip' (ADPC), a genotyping array consisting of tagging SNPs, useful in comprehensively identifying African specific genetic variation. This array is designed based on the novel variation identified in 642 CAAPA samples of African ancestry with high coverage whole genome sequence data (~30× depth). This novel variation extends the pattern of variation catalogued in the 1000 Genomes and Exome Sequencing Projects to a spectrum of populations representing the wide range of West African genomic diversity. These individuals from CAAPA also comprise a large swath of the African Diaspora population and incorporate historical genetic diversity covering nearly the entire Atlantic coast of the Americas. Here we show the results of designing and producing such a microchip array. This novel array covers African specific variation far better than other commercially available arrays, and will enable better GWAS analyses for researchers with individuals of African descent in their study populations. A recent study cataloging variation in continental African populations suggests this type of African-specific genotyping array is both necessary and valuable for facilitating large-scale GWAS in populations of African ancestry.

Journal ArticleDOI
TL;DR: The authors combine optical tweezer experiments and calculations to experimentally determine the energy cost for knot formation, which indicates that knotted proteins evolved specific folding pathways because knot formation in unfolded chains is unfavorable.
Abstract: Knots are natural topologies of chains. Yet, little is known about spontaneous knot formation in a polypeptide chain—an event that can potentially impair its folding—and about the effect of a knot on the stability and folding kinetics of a protein. Here we used optical tweezers to show that the free energy cost to form a trefoil knot in the denatured state of a polypeptide chain of 120 residues is 5.8 ± 1 kcal mol−1. Monte Carlo dynamics of random chains predict this value, indicating that the free energy cost of knot formation is of entropic origin. This cost is predicted to remain above 3 kcal mol−1 for denatured proteins as large as 900 residues. Therefore, we conclude that naturally knotted proteins cannot attain their knot randomly in the unfolded state but must pay the cost of knotting through contacts along their folding landscape. The effect of knots on protein stability and folding kinetics is not well understood. Here the authors combine optical tweezer experiments and calculations to experimentally determine the energy cost for knot formation, which indicates that knotted proteins evolved specific folding pathways because knot formation in unfolded chains is unfavorable.

Posted ContentDOI
13 Oct 2017-bioRxiv
TL;DR: It is demonstrated that skin pigmentation is highly heritable, but that known pigmentation loci explain only a small fraction of the variance, by considering diverse, under-studied African populations and performing the first genome-wide association approach for pigmentation.
Abstract: Fewer than 15 genes have been directly associated with skin pigmentation variation in humans, leading to its characterization as a relatively simple trait. However, by assembling a global survey of quantitative skin pigmentation phenotypes, we demonstrate that pigmentation is more complex than previously assumed with genetic architecture varying by latitude. We investigate polygenicity in the Khoe and the San, populations indigenous to southern Africa, who have considerably lighter skin than equatorial Africans. We demonstrate that skin pigmentation is highly heritable, but that known pigmentation loci explain only a small fraction of the variance. Rather, baseline skin pigmentation is a complex, polygenic trait in the KhoeSan. Despite this, we identify canonical and non-canonical skin pigmentation loci, including near SLC24A5, TYRP1, SMARCA2/VLDLR, and SNX13 using a genome-wide association approach complemented by targeted resequencing. By considering diverse, under-studied African populations, we show how the architecture of skin pigmentation can vary across humans subject to different local evolutionary pressures.

Journal ArticleDOI
TL;DR: FI (Functional Inference of Regulators of Expression), a tool to score both noncoding and coding SNVs based on their potential to regulate the expression levels of nearby genes, is developed.
Abstract: Motivation Interpreting genetic variation in noncoding regions of the genome is an important challenge for personal genome analysis. One mechanism by which noncoding single nucleotide variants (SNVs) influence downstream phenotypes is through the regulation of gene expression. Methods to predict whether or not individual SNVs are likely to regulate gene expression would aid interpretation of variants of unknown significance identified in whole-genome sequencing studies. Results We developed FIRE (Functional Inference of Regulators of Expression), a tool to score both noncoding and coding SNVs based on their potential to regulate the expression levels of nearby genes. FIRE consists of 23 random forests trained to recognize SNVs in cis-expression quantitative trait loci (cis-eQTLs) using a set of 92 genomic annotations as predictive features. FIRE scores discriminate cis-eQTL SNVs from non-eQTL SNVs in the training set with a cross-validated area under the receiver operating characteristic curve (AUC) of 0.807, and discriminate cis-eQTL SNVs shared across six populations of different ancestry from non-eQTL SNVs with an AUC of 0.939. FIRE scores are also predictive of cis-eQTL SNVs across a variety of tissue types. Availability and implementation FIRE scores for genome-wide SNVs in hg19/GRCh37 are available for download at https://sites.google.com/site/fireregulatoryvariation/. Contact nilah@stanford.edu. Supplementary information Supplementary data are available at Bioinformatics online.

Journal ArticleDOI
TL;DR: A developed marker set is both present on current generation SNP chips and can be highly multiplexed in standalone panels and thus is a promising resource for SNP-based DNA typing.
Abstract: Genetic markers are important resources for individual identification and parentage assessment. Although short tandem repeats (STRs) have been the traditional DNA marker, technological advances have led to single nucleotide polymorphisms (SNPs) becoming an attractive alternative. SNPs can be highly multiplexed and automatically scored, which allows for easier standardization and sharing among laboratories. Equine parentage is currently assessed using STRs. We obtained a publicly available SNP dataset of 729 horses representing 32 diverse breeds. A proposed set of 101 SNPs was analyzed for DNA typing suitability. The overall minor allele frequency of the panel was 0.376 (range 0.304-0.419), with per breed probability of identities ranging from 5.6 × 10-35 to 1.86 × 10-42 . When one parent was available, exclusion probabilities ranged from 0.9998 to 0.999996, although when both parents were available, all breeds had exclusion probabilities greater than 0.9999999. A set of 388 horses from 35 breeds was genotyped to evaluate marker performance on known families. The set included 107 parent-offspring pairs and 101 full trios. No horses shared identical genotypes across all markers, indicating that the selected set was sufficient for individual identification. All pairwise comparisons were classified using ISAG rules, with one or two excluding markers considered an accepted parent-offspring pair, two or three excluding markers considered doubtful and four or more excluding markers rejecting parentage. The panel had an overall accuracy of 99.9% for identifying true parent-offspring pairs. Our developed marker set is both present on current generation SNP chips and can be highly multiplexed in standalone panels and thus is a promising resource for SNP-based DNA typing.

Journal ArticleDOI
TL;DR: The majority of variation in gene expression was correlated with organ type, and the presence of specific environmental stressors elicited unique expression differences among organs, potentially indicating that physiochemical stressors with clear biochemical consequences can constrain the diversity of adaptive solutions that mitigate their adverse effects.
Abstract: Variation in gene expression can provide insights into organismal responses to environmental stress and physiological mechanisms mediating adaptation to habitats with contrasting environmental conditions. We performed an RNA-sequencing experiment to quantify gene expression patterns in fish adapted to habitats with different combinations of environmental stressors, including the presence of toxic hydrogen sulphide (H2 S) and the absence of light in caves. We specifically asked how gene expression varies among populations living in different habitats, whether population differences were consistent among organs, and whether there is evidence for shared expression responses in populations exposed to the same stressors. We analysed organ-specific transcriptome-wide data from four ecotypes of Poecilia mexicana (nonsulphidic surface, sulphidic surface, nonsulphidic cave and sulphidic cave). The majority of variation in gene expression was correlated with organ type, and the presence of specific environmental stressors elicited unique expression differences among organs. Shared patterns of gene expression between populations exposed to the same environmental stressors increased with levels of organismal organization (from transcript to gene to physiological pathway). In addition, shared patterns of gene expression were more common between populations from sulphidic than populations from cave habitats, potentially indicating that physiochemical stressors with clear biochemical consequences can constrain the diversity of adaptive solutions that mitigate their adverse effects. Overall, our analyses provided insights into transcriptional variation in a unique system, in which adaptation to H2 S and darkness coincide. Functional annotations of differentially expressed genes provide a springboard for investigating physiological mechanisms putatively underlying adaptation to extreme environments.

Journal ArticleDOI
16 Feb 2017-PLOS ONE
TL;DR: The results provide evidence for three management units for Z. chilensis, and it is recommended that separate management arrangements are required for each of these units, but there is no evidence to discriminate the extant population of Dipturus trachyderma as separate management units.
Abstract: The longnose skates (Zearaja chilensis and Dipturus trachyderma) are the main component of the elasmobranch fisheries in the south-east Pacific Ocean. Both species are considered to be a single stock by the fishery management in Chile however, little is known about the level of demographic connectivity within the fishery. In this study, we used a genetic variation (560 bp of the control region of the mitochondrial genome and ten microsatellite loci) to explore population connectivity at five locations along the Chilean coast. Analysis of Z chilensis populations revealed significant genetic structure among off -shore locations (San Antonio, Valdivia), two locations in the Chiloe Interior Sea (Puerto Montt and Aysen) and Punta Arenas in southern Chile. For example, mtDNA haplotype diversity was similar across off -shore locations and Punta Arenas (h = 0.46-0.50), it was significantly different to those in the Chiloe Interior Sea (h= 0.08). These results raise concerns about the long-term survival of the species within the interior sea, as population resilience will rely almost exclusively on self -recruitment. In contrast, little evidence of genetic structure was found for D. trachyderma. Our results provide evidence for three management units for Z chilensis, and we recommend that separate management arrangements are required for each of these units. However, there is no evidence to discriminate the extant population of Dipturus trachyderma as separate management units. The lack of genetic population subdivision for D. trachyderma appears to correspond with their higher dispersal ability and more offshore habitat preference.


Journal ArticleDOI
TL;DR: The results indicate that subunit assemblies other than α2ββ'ω·σA can be separated by ion-exchange chromatography on Mono Q column and that assemblies with the wrong RNAP subunit stoichiometry lack transcriptional activity.

Posted ContentDOI
28 Aug 2017-bioRxiv
TL;DR: The effect of homozygous carriers, commonly referred to as “human knockouts,” is measured across medical phenotypes for genes implicated to be protective against disease or associated with at least one phenotype in this study and several genes with strong pleiotropic or non-additive effects are found.
Abstract: Protein-truncating variants can have profound effects on gene function and are critical for clinical genome interpretation and generating therapeutic hypotheses, but their relevance to medical phenotypes has not been systematically assessed. We characterized the effect of 18,228 protein-truncating variants across 135 phenotypes from the UK Biobank and found 27 associations between medical phenotypes and protein-truncating variants in genes outside the major histocompatibility complex. We performed phenome-wide analyses and directly measured the effect of homozygous carriers, commonly referred to as "human knockouts," across medical phenotypes for genes implicated to be protective against disease or associated with at least one phenotype in our study and found several genes with strong pleiotropic or non-additive effects. Our results illustrate the importance of protein-truncating variants in a variety of diseases.

Posted ContentDOI
02 Dec 2017-bioRxiv
TL;DR: The results highlight the power of applying a genetic mapping strategy to hibernation and present new insight into the genetics driving its seasonal onset.
Abstract: Hibernation is a highly dynamic phenotype whose timing, for many mammals, is controlled by a circannual clock and accompanied by rhythms in body mass and food intake. When housed in an animal facility, 13-lined ground squirrels exhibit individual variation in the seasonal onset of hibernation, which is not explained by environmental or biological factors, such as body mass and sex. We hypothesized that underlying genetic architecture instead drives variation in this timing. After first increasing the contiguity of the genome assembly, we therefore employed a genotype-by-sequencing approach to characterize genetic variation in 153 13-lined ground squirrels. Combining this with datalogger records, we estimated high heritability (61-100%) for the seasonal onset of hibernation. After applying a genome-wide scan with 46,996 variants, we also identified 21 loci significantly associated with hibernation immergence, which alone accounted for 54% of the variance in the phenotype. The most significant marker (SNP 15, p=3.81x10−6) was located near prolactin-releasing hormone receptor (PRLHR), a gene that regulates food intake and energy homeostasis. Other significant loci were located near genes functionally related to hibernation physiology, including muscarinic acetylcholine receptor M2 (CHRM2), involved in the control of heart rate, exocyst complex component 4 (EXOC4) and prohormone convertase 2 (PCSK2), both of which are involved in insulin signaling and processing. Finally, we applied an expression quantitative loci (eQTL) analysis using existing transcriptome datasets, and we identified significant (q

Posted ContentDOI
03 Feb 2017-bioRxiv
TL;DR: A novel pipeline to select tag SNPs using the 26 population reference panel from Phase of the 1000 Genomes Project is developed and the unified framework presented will enable investigators to make informed decisions for the design of new arrays, and help empower the next phase of rare variant association for global health.
Abstract: The emergence of very large cohorts in genomic research has facilitated a focus on genotype-imputation strategies to power rare variant association. Consequently, a new generation of genotyping arrays are being developed designed with tag single nucleotide polymorphisms (SNPs) to improve rare variant imputation. Selection of these tag SNPs poses several challenges as rare variants tend to be continentally- or even population-specific and reflect fine-scale linkage disequilibrium (LD) structure impacted by recent demographic events. To explore the landscape of tag-able variation and guide design considerations for large-cohort and biobank arrays, we developed a novel pipeline to select tag SNPs using the 26 population reference panel from Phase 3 of the 1000 Genomes Project. We evaluate our approach using leave-one-out internal validation via standard imputation methods that allows the direct comparison of tag SNP performance by estimating the correlation of the imputed and real genotypes for each iteration of potential array sites. We show how this approach allows for an assessment of array design and performance that can take advantage of the development of deeper and more diverse sequenced reference panels. We quantify the impact of demography on tag SNP performance across populations and provide population-specific guidelines for tag SNP selection. We also examine array design strategies that target single populations versus multi-ethnic cohorts, and demonstrate a boost in performance for the latter can be obtained by prioritizing tag SNPs that contribute information across multiple populations simultaneously. Finally, we demonstrate the utility of improved array design to provide meaningful improvements in power, particularly in trans-ethnic studies. The unified framework presented will enable investigators to make informed decisions for the design of new arrays, and help empower the next phase of rare variant association for global health.

Book ChapterDOI
TL;DR: How to use high-resolution optical tweezers to investigate the mechanism of the bacteriophage φ29 DNA packaging motor, a ring-shaped ATPase responsible for genome packing during viral assembly is described.
Abstract: The past decade has seen an explosion in the use of single-molecule approaches to study complex biological processes. One such approach-optical trapping-is particularly well suited for investigating molecular motors, a diverse group of macromolecular complexes that convert chemical energy into mechanical work, thus playing key roles in virtually every aspect of cellular life. Here we describe how to use high-resolution optical tweezers to investigate the mechanism of the bacteriophage φ29 DNA packaging motor, a ring-shaped ATPase responsible for genome packing during viral assembly. This system illustrates how to use single-molecule techniques to uncover novel, often unexpected, principles of motor operation.

Posted ContentDOI
22 Nov 2017-bioRxiv
TL;DR: It is shown for the first time that a single population underwent strong domestication approximately 3,600 years ago, the Criollo population, and that during the process of domestication, there was strong selection for genes involved in the metabolism of the colored protectants anthocyanins and the stimulant theobromine, as well as disease resistance genes.
Abstract: Domestication has had a strong impact on the development of modern societies. We sequenced 200 genomes of the chocolate plant Theobroma cacao L. to show for the first time that a single population underwent strong domestication approximately 3,600 years (95% CI: 2481 – 10,903 years ago) ago, the Criollo population. We also show that during the process of domestication, there was strong selection for genes involved in the metabolism of the colored protectants anthocyanins and the stimulant theobromine, as well as disease resistance genes. Our analyses show that domesticated populations of T. cacao (Criollo) maintain a higher proportion of high frequency deleterious mutations. We also show for the first time the negative consequences the increase accumulation of deleterious mutations during domestication on the fitness of individuals (significant negative correlation between Criollo ancestry and Kg of beans per hectare per year, P = 0.000425).

Posted ContentDOI
21 Sep 2017-bioRxiv
TL;DR: In this paper, the authors presented the first analysis of individuals' genome sequences from early and late Neolithic sites in Morocco, as well as Andalusian Early Neolithic individuals.
Abstract: One of the greatest transitions in the human story was the change from hunter-gatherer to farmer. How farming traditions expanded from their birthplace in the Fertile Crescent has always been a matter of contention. Two models were proposed, one involving the movement of people and the other based on the transmission of ideas. Over the last decade, paleogenomics has been instrumental in settling long-disputed archaeological questions1, including those surrounding the Neolithic revolution2. Compared to the extensive genetic work done on Europe and the Near East, the Neolithic transition in North Africa, including the Maghreb, remains largely uncharacterized. Archaeological evidence suggests this process may have happened through an in situ development from Epipaleolithic communities3,4, or by demic diffusion from the Eastern Mediterranean shores5 or Iberia6. In fact,Neolithic pottery in North Africa strongly resembles that of European cultures like Cardial and Andalusian Early Neolithic, the southern-most early farmer culture from Iberia. Here, we present the first analysis of individuals’ genome sequences from early and late Neolithic sites in Morocco, as well as Andalusian Early Neolithic individuals. We show that Early Neolithic Moroccans are distinct from any other reported ancient individuals and possess an endemic element retained in present-day Maghrebi populations, indicating long-term genetic continuity in the region. Among ancient populations, early Neolithic Moroccans share affinities with Levantine Natufian hunter-gatherers (∼9,000 BCE) and Pre-Pottery Neolithic farmers (∼6,500 BCE). Late Neolithic (∼3,000 BCE) Moroccan remains, in comparison, share an Iberian component of a prominent European-wide demic expansion, supporting theories of trans-Gibraltar gene flow. Finally, the Andalusian Early Neolithic samples share the same genetic composition as the Cardial Mediterranean Neolithic culture that reached Iberia ∼5,500 BCE. The cultural and genetic similarities of the Iberian Neolithic cultures with that of North African Neolithic sites further reinforce the model of an Iberian intrusion into the Maghreb.

Journal ArticleDOI
TL;DR: A study of a clinical tumor specimen containing a novel somatic single nucleotide variant that caused allele drop-out in EGFR L858R genotyping, resulting in a false-negative interpretation and impacting patient clinical management is described.
Abstract: While PCR-based genotyping methods abound in molecular testing for lung cancer therapy, these approaches may not provide the robust sensitivity to detect accurate genotypes in a variable cancer genomic background. Here, we describe a study of a clinical tumor specimen containing a novel somatic single nucleotide variant that caused allele drop-out in EGFR L858R genotyping, resulting in a false-negative interpretation and impacting patient clinical management. We demonstrate that a subsequent unbiased next-generation sequencing approach correctly identified the driver mutation, and therefore may be more reliable for somatic variant detection. These findings magnify the potential pitfalls of PCR amplification-based approaches and stress the importance of unbiased and sensitive molecular testing strategies for therapeutic marker detection as molecular testing becomes the standard for determining clinical management of cancer patients.