scispace - formally typeset
Search or ask a question

Showing papers by "Carlos Bustamante published in 2011"


Journal ArticleDOI
TL;DR: This work establishes an open-source translational research platform for genome-wide association studies in rice that directly links molecular variation in genes and metabolic pathways with the germplasm resources needed to accelerate varietal development and crop improvement.
Abstract: Asian rice, Oryza sativa is a cultivated, inbreeding species that feeds over half of the world ’ s population. Understanding the genetic basis of diverse physiological, developmental, and morphological traits provides the basis for improving yield, quality and sustainability of rice. Here we show the results of a genome-wide association study based on genotyping 44,100 SNP variants across 413 diverse accessions of O. sativa collected from 82 countries that were systematically phenotyped for 34 traits. Using cross-population-based mapping strategies, we identifi ed dozens of common variants infl uencing numerous complex traits. Signifi cant heterogeneity was observed in the genetic architecture associated with subpopulation structure and response to environment. This work establishes an open-source translational research platform for genome-wide association studies in rice that directly links molecular variation in genes and metabolic pathways with the germplasm resources needed to accelerate varietal development and crop improvement.

1,170 citations


Journal ArticleDOI
Ryan E. Mills1, Klaudia Walter2, Chip Stewart3, Robert E. Handsaker4  +371 moreInstitutions (21)
03 Feb 2011-Nature
TL;DR: A map of unbalanced SVs is constructed based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations, and serves as a resource for sequencing-based association studies.
Abstract: Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analysing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.

1,085 citations


Journal ArticleDOI
TL;DR: It is found that the majority of human genomic variable sites are rare and exhibit little sharing among diverged populations, emphasizing that replication of disease association for specific rare genetic variants across diverging populations must overcome both reduced statistical power because of rarity and higher population divergence.
Abstract: High-throughput sequencing technology enables population-level surveys of human genomic variation. Here, we examine the joint allele frequency distributions across continental human populations and present an approach for combining complementary aspects of whole-genome, low-coverage data and targeted high-coverage data. We apply this approach to data generated by the pilot phase of the Thousand Genomes Project, including whole-genome 2–4× coverage data for 179 samples from HapMap European, Asian, and African panels as well as high-coverage target sequencing of the exons of 800 genes from 697 individuals in seven populations. We use the site frequency spectra obtained from these data to infer demographic parameters for an Out-of-Africa model for populations of African, European, and Asian descent and to predict, by a jackknife-based approach, the amount of genetic diversity that will be discovered as sample sizes are increased. We predict that the number of discovered nonsynonymous coding variants will reach 100,000 in each population after ∼1,000 sequenced chromosomes per population, whereas ∼2,500 chromosomes will be needed for the same number of synonymous variants. Beyond this point, the number of segregating sites in the European and Asian panel populations is expected to overcome that of the African panel because of faster recent population growth. Overall, we find that the majority of human genomic variable sites are rare and exhibit little sharing among diverged populations. Our results emphasize that replication of disease association for specific rare genetic variants across diverged populations must overcome both reduced statistical power because of rarity and higher population divergence.

670 citations


Journal ArticleDOI
07 Oct 2011-Science
TL;DR: It is shown that Aboriginal Australians are descendants of an early human dispersal into eastern Asia, possibly 62,000 to 75,000 years ago, which is separate from the one that gave rise to modern Asians 25, thousands of years ago.
Abstract: We present an Aboriginal Australian genomic sequence obtained from a 100-year-old lock of hair donated by an Aboriginal man from southern Western Australia in the early 20th century. We detect no evidence of European admixture and estimate contamination levels to be below 0.5%. We show that Aboriginal Australians are descendants of an early human dispersal into eastern Asia, possibly 62,000 to 75,000 years ago. This dispersal is separate from the one that gave rise to modern Asians 25,000 to 38,000 years ago. We also find evidence of gene flow between populations of the two dispersal waves prior to the divergence of Native Americans from modern Asian ancestors. Our findings support the hypothesis that present-day Aboriginal Australians descend from the earliest humans to occupy Australia, likely representing one of the oldest continuous populations outside Africa.

656 citations


Journal ArticleDOI
TL;DR: It is proposed that the adoption of vegetative propagation was a double-edged sword: Although it provided a benefit by ensuring true breeding cultivars, it also discouraged the generation of unique cultivars through crosses.
Abstract: The grape is one of the earliest domesticated fruit crops and, since antiquity, it has been widely cultivated and prized for its fruit and wine. Here, we characterize genome-wide patterns of genetic variation in over 1,000 samples of the domesticated grape, Vitis vinifera subsp. vinifera, and its wild relative, V. vinifera subsp. sylvestris from the US Department of Agriculture grape germplasm collection. We find support for a Near East origin of vinifera and present evidence of introgression from local sylvestris as the grape moved into Europe. High levels of genetic diversity and rapid linkage disequilibrium (LD) decay have been maintained in vinifera, which is consistent with a weak domestication bottleneck followed by thousands of years of widespread vegetative propagation. The considerable genetic diversity within vinifera, however, is contained within a complex network of close pedigree relationships that has been generated by crosses among elite cultivars. We show that first-degree relationships are rare between wine and table grapes and among grapes from geographically distant regions. Our results suggest that although substantial genetic diversity has been maintained in the grape subsequent to domestication, there has been a limited exploration of this diversity. We propose that the adoption of vegetative propagation was a double-edged sword: Although it provided a benefit by ensuring true breeding cultivars, it also discouraged the generation of unique cultivars through crosses. The grape currently faces severe pathogen pressures, and the long-term sustainability of the grape and wine industries will rely on the exploitation of the grape's tremendous natural genetic diversity.

611 citations


Journal ArticleDOI
Devin P. Locke1, LaDeana W. Hillier1, Wesley C. Warren1, Kim C. Worley2, Lynne V. Nazareth2, Donna M. Muzny2, Shiaw-Pyng Yang1, Zhengyuan Wang1, Asif T. Chinwalla1, Patrick Minx1, Makedonka Mitreva1, Lisa Cook1, Kim D. Delehaunty1, Catrina Fronick1, Heather Schmidt1, Lucinda Fulton1, Robert S. Fulton1, Joanne O. Nelson1, Vincent Magrini1, Craig Pohl1, Tina Graves1, Chris Markovic1, Andy Cree2, Huyen Dinh2, Jennifer Hume2, Christie Kovar2, Gerald R. Fowler2, Gerton Lunter3, Gerton Lunter4, Stephen Meader3, Andreas Heger3, Chris P. Ponting3, Tomas Marques-Bonet5, Tomas Marques-Bonet6, Can Alkan5, Lin Chen5, Ze Cheng5, Jeffrey M. Kidd5, Evan E. Eichler7, Evan E. Eichler5, Simon D. M. White8, Stephen M. J. Searle8, Albert J. Vilella9, Yuan Chen9, Paul Flicek9, Jian Ma10, Jian Ma11, Brian J. Raney10, Bernard B. Suh10, Richard Burhans12, Javier Herrero9, David Haussler10, Rui Faria6, Rui Faria13, Olga Fernando14, Olga Fernando6, Fleur Darré6, Domènec Farré6, Elodie Gazave6, Meritxell Oliva6, Arcadi Navarro6, Roberta Roberto15, Oronzo Capozzi15, Nicoletta Archidiacono15, Giuliano Della Valle16, Stefania Purgato16, Mariano Rocchi15, Miriam K. Konkel17, Jerilyn A. Walker17, Brygg Ullmer17, Mark A. Batzer17, Arian F.A. Smit18, Robert Hubley18, Claudio Casola19, Daniel R. Schrider19, Matthew W. Hahn19, Víctor Quesada20, Xose S. Puente20, Gonzalo R. Ordóñez20, Carlos López-Otín20, Tomas Vinar21, Brona Brejova21, Aakrosh Ratan12, Robert S. Harris12, Webb Miller12, Carolin Kosiol, Heather A. Lawson1, Vikas Taliwal22, André L. Martins22, Adam Siepel22, Arindam RoyChoudhury23, Xin Ma22, Jeremiah D. Degenhardt22, Carlos Bustamante24, Ryan N. Gutenkunst25, Thomas Mailund26, Julien Y. Dutheil26, Asger Hobolth26, Mikkel H. Schierup26, Oliver A. Ryder, Yuko Yoshinaga27, Pieter J. de Jong27, George M. Weinstock1, Jeffrey Rogers2, Elaine R. Mardis1, Richard A. Gibbs2, Richard K. Wilson1 
27 Jan 2011-Nature
TL;DR: The orang-utan species, Pongo abelii and Pongo pygmaeus, are the most phylogenetically distant great apes from humans, thereby providing an informative perspective on hominid evolution and a primate polymorphic neocentromere, found in both Pongo species are described.
Abstract: 'Orang-utan' is derived from a Malay term meaning 'man of the forest' and aptly describes the southeast Asian great apes native to Sumatra and Borneo. The orang-utan species, Pongo abelii (Sumatran) and Pongo pygmaeus (Bornean), are the most phylogenetically distant great apes from humans, thereby providing an informative perspective on hominid evolution. Here we present a Sumatran orang-utan draft genome assembly and short read sequence data from five Sumatran and five Bornean orang-utan genomes. Our analyses reveal that, compared to other primates, the orang-utan genome has many unique features. Structural evolution of the orang-utan genome has proceeded much more slowly than other great apes, evidenced by fewer rearrangements, less segmental duplication, a lower rate of gene family turnover and surprisingly quiescent Alu repeats, which have played a major role in restructuring other primate genomes. We also describe a primate polymorphic neocentromere, found in both Pongo species, emphasizing the gradual evolution of orang-utan genome structure. Orang-utans have extremely low energy usage for a eutherian mammal, far lower than their hominid relatives. Adding their genome to the repertoire of sequenced primates illuminates new signals of positive selection in several pathways including glycolipid metabolism. From the population perspective, both Pongo species are deeply diverse; however, Sumatran individuals possess greater diversity than their Bornean counterparts, and more species-specific variation. Our estimate of Bornean/Sumatran speciation time, 400,000 years ago, is more recent than most previous studies and underscores the complexity of the orang-utan speciation process. Despite a smaller modern census population size, the Sumatran effective population size (N(e)) expanded exponentially relative to the ancestral N(e) after the split, while Bornean N(e) declined over the same period. Overall, the resources and analyses presented here offer new opportunities in evolutionary genomics, insights into hominid biology, and an extensive database of variation for conservation efforts.

555 citations


Journal ArticleDOI
13 Jul 2011-Nature
TL;DR: Medical genomics has focused almost entirely on those of European descent, but other ethnic groups must be studied to ensure that more people benefit, say researchers.
Abstract: Medical genomics has focused almost entirely on those of European descent. Other ethnic groups must be studied to ensure that more people benefit, say Carlos D. Bustamante, Esteban Gonzalez Burchard and Francisco M. De La Vega.

490 citations


Journal ArticleDOI
TL;DR: Demographic modeling based on SNP data and a diffusion-based approach provide the strongest support for a single domestication origin of rice, and Bayesian phylogenetic analyses implementing the multispecies coalescent and using previously published phylogenetic sequence datasets also point to a single origin of Asian domesticated rice.
Abstract: Asian rice, Oryza sativa, is one of world's oldest and most important crop species. Rice is believed to have been domesticated ∼9,000 y ago, although debate on its origin remains contentious. A single-origin model suggests that two main subspecies of Asian rice, indica and japonica, were domesticated from the wild rice O. rufipogon. In contrast, the multiple independent domestication model proposes that these two major rice types were domesticated separately and in different parts of the species range of wild rice. This latter view has gained much support from the observation of strong genetic differentiation between indica and japonica as well as several phylogenetic studies of rice domestication. We reexamine the evolutionary history of domesticated rice by resequencing 630 gene fragments on chromosomes 8, 10, and 12 from a diverse set of wild and domesticated rice accessions. Using patterns of SNPs, we identify 20 putative selective sweeps on these chromosomes in cultivated rice. Demographic modeling based on these SNP data and a diffusion-based approach provide the strongest support for a single domestication origin of rice. Bayesian phylogenetic analyses implementing the multispecies coalescent and using previously published phylogenetic sequence datasets also point to a single origin of Asian domesticated rice. Finally, we date the origin of domestication at ∼8,200–13,500 y ago, depending on the molecular clock estimate that is used, which is consistent with known archaeological data that suggests rice was first cultivated at around this time in the Yangtze Valley of China.

400 citations


Journal ArticleDOI
TL;DR: It is found that African hunter-gatherer populations today remain highly differentiated, encompassing major components of variation that are not found in other African populations, and tend to have the lowest levels of genome-wide linkage disequilibrium among 27 African populations.
Abstract: Africa is inferred to be the continent of origin for all modern human populations, but the details of human prehistory and evolution in Africa remain largely obscure owing to the complex histories of hundreds of distinct populations. We present data for more than 580,000 SNPs for several hunter-gatherer populations: the Hadza and Sandawe of Tanzania, and the ≠Khomani Bushmen of South Africa, including speakers of the nearly extinct N|u language. We find that African hunter-gatherer populations today remain highly differentiated, encompassing major components of variation that are not found in other African populations. Hunter-gatherer populations also tend to have the lowest levels of genome-wide linkage disequilibrium among 27 African populations. We analyzed geographic patterns of linkage disequilibrium and population differentiation, as measured by FST, in Africa. The observed patterns are consistent with an origin of modern humans in southern Africa rather than eastern Africa, as is generally assumed. Additionally, genetic variation in African hunter-gatherer populations has been significantly affected by interaction with farmers and herders over the past 5,000 y, through both severe population bottlenecks and sex-biased migration. However, African hunter-gatherer populations continue to maintain the highest levels of genetic diversity in the world.

390 citations


Journal ArticleDOI
TL;DR: The hypothesis that selectively introgressing alleles across subpopulations is an efficient approach for trait enhancement in plant breeding programs is supported and demonstrates the fundamental importance of subpopulation in interpreting and manipulating the genetics of complex traits in rice.
Abstract: Aluminum (Al) toxicity is a primary limitation to crop productivity on acid soils, and rice has been demonstrated to be significantly more Al tolerant than other cereal crops. However, the mechanisms of rice Al tolerance are largely unknown, and no genes underlying natural variation have been reported. We screened 383 diverse rice accessions, conducted a genome-wide association (GWA) study, and conducted QTL mapping in two bi-parental populations using three estimates of Al tolerance based on root growth. Subpopulation structure explained 57% of the phenotypic variation, and the mean Al tolerance in Japonica was twice that of Indica. Forty-eight regions associated with Al tolerance were identified by GWA analysis, most of which were subpopulation-specific. Four of these regions co-localized with a priori candidate genes, and two highly significant regions co-localized with previously identified QTLs. Three regions corresponding to induced Al-sensitive rice mutants (ART1, STAR2, Nrat1) were identified through bi-parental QTL mapping or GWA to be involved in natural variation for Al tolerance. Haplotype analysis around the Nrat1 gene identified susceptible and tolerant haplotypes explaining 40% of the Al tolerance variation within the aus subpopulation, and sequence analysis of Nrat1 identified a trio of non-synonymous mutations predictive of Al sensitivity in our diversity panel. GWA analysis discovered more phenotype–genotype associations and provided higher resolution, but QTL mapping identified critical rare and/or subpopulation-specific alleles not detected by GWA analysis. Mapping using Indica/Japonica populations identified QTLs associated with transgressive variation where alleles from a susceptible aus or indica parent enhanced Al tolerance in a tolerant Japonica background. This work supports the hypothesis that selectively introgressing alleles across subpopulations is an efficient approach for trait enhancement in plant breeding programs and demonstrates the fundamental importance of subpopulation in interpreting and manipulating the genetics of complex traits in rice.

357 citations


Journal ArticleDOI
29 Apr 2011-Cell
TL;DR: Direct observations of mechanical, force-induced protein unfolding by the ClpX unfoldase from E. coli, alone, and in complex with the ClPP peptidase are reported.

Journal ArticleDOI
TL;DR: It is found that these enigmatic canids are highly admixed varieties derived from gray wolves and coyotes, respectively, and divergent genomic history suggests that they do not have a shared recent ancestry as proposed by previous researchers.
Abstract: High-throughput genotyping technologies developed for model species can potentially increase the resolution of demographic history and ancestry in wild relatives. We use a SNP genotyping microarray developed for the domestic dog to assay variation in over 48K loci in wolf-like species worldwide. Despite the high mobility of these large carnivores, we find distinct hierarchical population units within gray wolves and coyotes that correspond with geographic and ecologic differences among populations. Further, we test controversial theories about the ancestry of the Great Lakes wolf and red wolf using an analysis of haplotype blocks across all 38 canid autosomes. We find that these enigmatic canids are highly admixed varieties derived from gray wolves and coyotes, respectively. This divergent genomic history suggests that they do not have a shared recent ancestry as proposed by previous researchers. Interspecific hybridization, as well as the process of evolutionary divergence, may be responsible for the observed phenotypic distinction of both forms. Such admixture complicates decisions regarding endangered species restoration and protection.

Journal ArticleDOI
07 Jul 2011-Nature
TL;DR: It is found that the translation rate of identical codons at the decoding centre is greatly influenced by the GC content of folded structures at the mRNA entry site, and force applied to the ends of the hairpin to favour its unfolding significantly speeds translation.
Abstract: The ribosome translates the genetic information encoded in messenger RNA into protein. Folded structures in the coding region of an mRNA represent a kinetic barrier that lowers the peptide elongation rate, as the ribosome must disrupt structures it encounters in the mRNA at its entry site to allow translocation to the next codon. Such structures are exploited by the cell to create diverse strategies for translation regulation, such as programmed frameshifting, the modulation of protein expression levels, ribosome localization and co-translational protein folding. Although strand separation activity is inherent to the ribosome, requiring no exogenous helicases, its mechanism is still unknown. Here, using a single-molecule optical tweezers assay on mRNA hairpins, we find that the translation rate of identical codons at the decoding centre is greatly influenced by the GC content of folded structures at the mRNA entry site. Furthermore, force applied to the ends of the hairpin to favour its unfolding significantly speeds translation. Quantitative analysis of the force dependence of its helicase activity reveals that the ribosome, unlike previously studied helicases, uses two distinct active mechanisms to unwind mRNA structure: it destabilizes the helical junction at the mRNA entry site by biasing its thermal fluctuations towards the open state, increasing the probability of the ribosome translocating unhindered; and it mechanically pulls apart the mRNA single strands of the closed junction during the conformational changes that accompany ribosome translocation. The second of these mechanisms ensures a minimal basal rate of translation in the cell; specialized, mechanically stable structures are required to stall the ribosome temporarily. Our results establish a quantitative mechanical basis for understanding the mechanism of regulation of the elongation rate of translation by structured mRNAs.

Journal ArticleDOI
23 Dec 2011-Science
TL;DR: The results suggest that the ribosome not only decodes the genetic information and synthesizes polypeptides, but also promotes efficient de novo attainment of the native state.
Abstract: Proteins are synthesized by the ribosome and generally must fold to become functionally active. Although it is commonly assumed that the ribosome affects the folding process, this idea has been extremely difficult to demonstrate. We have developed an experimental system to investigate the folding of single ribosome-bound stalled nascent polypeptides with optical tweezers. In T4 lysozyme, synthesized in a reconstituted in vitro translation system, the ribosome slows the formation of stable tertiary interactions and the attainment of the native state relative to the free protein. Incomplete T4 lysozyme polypeptides misfold and aggregate when free in solution, but they remain folding-competent near the ribosomal surface. Altogether, our results suggest that the ribosome not only decodes the genetic information and synthesizes polypeptides, but also promotes efficient de novo attainment of the native state.

Journal ArticleDOI
TL;DR: This study represents a large step toward detecting and interpreting low frequency coding variation, clearly lays out technical steps for effective analysis of DNA capture data, and articulates functional and population properties of this important class of genetic variation.
Abstract: Background Rare coding variants constitute an important class of human genetic variation, but are underrepresented in current databases that are based on small population samples. Recent studies show that variants altering amino acid sequence and protein function are enriched at low variant allele frequency, 2 to 5%, but because of insufficient sample size it is not clear if the same trend holds for rare variants below 1% allele frequency.

Journal ArticleDOI
18 Feb 2011-Cell
TL;DR: An overview of the main results arrived at by the application of single-molecule methods to the study of themain machines of the central dogma is presented.

Journal ArticleDOI
TL;DR: A novel synthetic human reference sequence is developed that is ethnically concordant and used for the analysis of genomes from a nuclear family with history of familial thrombophilia, demonstrating that the use of the major allele reference sequence results in improved genotype accuracy for disease-associated variant loci.
Abstract: Whole-genome sequencing harbors unprecedented potential for characterization of individual and family genetic variation. Here, we develop a novel synthetic human reference sequence that is ethnically concordant and use it for the analysis of genomes from a nuclear family with history of familial thrombophilia. We demonstrate that the use of the major allele reference sequence results in improved genotype accuracy for disease-associated variant loci. We infer recombination sites to the lowest median resolution demonstrated to date (,1,000 base pairs). We use family inheritance state analysis to control sequencing error and inform family-wide haplotype phasing, allowing quantification of genome-wide compound heterozygosity. We develop a sequence-based methodology for Human Leukocyte Antigen typing that contributes to disease risk prediction. Finally, we advance methods for analysis of disease and pharmacogenomic risk across the coding and non-coding genome that incorporate phased variant data. We show these methods are capable of identifying multigenic risk for inherited thrombophilia and informing the appropriate pharmacological therapy. These ethnicity-specific, family-based approaches to interpretation of genetic variation are emblematic of the next generation of genetic risk assessment using whole-genome sequencing.

Journal ArticleDOI
TL;DR: Atomic force microscopy images of yeast RNA polymerase II–nucleosome complexes confirm the presence of looped transcriptional intermediates and provide mechanistic insight into the histone-transfer process through the distribution of transcribed nucleosome positions.
Abstract: What happens to histones during transcription is not well understood. Atomic force microscopy snapshots of RNA polymerase II (Pol II)-nucleosome complexes before, during and after transcription show the presence of looped transcriptional intermediates. In addition, a fraction of transcribed histones are remodeled to hexasomes, and the size of this fraction depends on the elongation rate of Pol II.

Journal ArticleDOI
23 Sep 2011-Science
TL;DR: Asynchronous release of nascent nucleotides rationalizes various observations of its dsNA unwinding and may be used to coordinate the translocation speed of NS3 along the RNA during viral replication.
Abstract: Nonhexameric helicases use adenosine triphosphate (ATP) to unzip base pairs in double-stranded nucleic acids (dsNAs). Studies have suggested that these helicases unzip dsNAs in single–base pair increments, consuming one ATP molecule per base pair, but direct evidence for this mechanism is lacking. We used optical tweezers to follow the unwinding of double-stranded RNA by the hepatitis C virus NS3 helicase. Single–base pair steps by NS3 were observed, along with nascent nucleotide release that was asynchronous with base pair opening. Asynchronous release of nascent nucleotides rationalizes various observations of its dsNA unwinding and may be used to coordinate the translocation speed of NS3 along the RNA during viral replication.

Journal ArticleDOI
TL;DR: This study demonstrates that genes associated with complex disorders can be mapped using resequencing and analytical methods with sample sizes far smaller than those required by genome-wide association studies and supports the hypothesis that rare mutations account for a proportion of the phenotypic variance of these complex disorders.
Abstract: Deep resequencing of functional regions in human genomes is key to identifying potentially causal rare variants for complex disorders Here, we present the results from a large-sample resequencing (n = 285 patients) study of candidate genes coupled with population genetics and statistical methods to identify rare variants associated with Autism Spectrum Disorder and Schizophrenia Three genes, MAP1A, GRIN2B, and CACNA1F, were consistently identified by different methods as having significant excess of rare missense mutations in either one or both disease cohorts In a broader context, we also found that the overall site frequency spectrum of variation in these cases is best explained by population models of both selection and complex demography rather than neutral models or models accounting for complex demography alone Mutations in the three disease-associated genes explained much of the difference in the overall site frequency spectrum among the cases versus controls This study demonstrates that genes associated with complex disorders can be mapped using resequencing and analytical methods with sample sizes far smaller than those required by genome-wide association studies Additionally, our findings support the hypothesis that rare mutations account for a proportion of the phenotypic variance of these complex disorders

Journal ArticleDOI
28 Jun 2011-PLOS ONE
TL;DR: It is found that DIC outperforms competing methods in many genetic contexts, validating its application in assessing population structure.
Abstract: Inferring population structure using Bayesian clustering programs often requires a priori specification of the number of subpopulations, , from which the sample has been drawn. Here, we explore the utility of a common Bayesian model selection criterion, the Deviance Information Criterion (DIC), for estimating . We evaluate the accuracy of DIC, as well as other popular approaches, on datasets generated by coalescent simulations under various demographic scenarios. We find that DIC outperforms competing methods in many genetic contexts, validating its application in assessing population structure.

Book ChapterDOI
TL;DR: This method makes it possible to study protein folding in the physiologically relevant low-force regime of optical tweezers and enables us to monitor processes - such as refolding events and fluctuations between different molecular conformations - that could not be detected in previous force spectroscopy experiments.
Abstract: In this chapter, we describe a method that extends the use of optical tweezers to the study of the folding mechanism of single protein molecules. This method entails the use of DNA molecules as molecular handles to manipulate individual proteins between two polystyrene beads. The DNA molecules function as spacers between the protein and the beads, and keep the interactions between the tethering surfaces to a minimum. The handles can have different lengths, be attached to any pair of exposed cysteine residues, and be used to manipulate both monomeric and polymeric proteins. By changing the position of the cysteine residues on the protein surface, it is possible to apply the force to different portions of the protein and along different molecular axes. Circular dichroism and enzymatic activity studies have revealed that for many proteins, the handles do not significantly affect the folding behavior and the structure of the tethered protein. This method makes it possible to study protein folding in the physiologically relevant low-force regime of optical tweezers and enables us to monitor processes - such as refolding events and fluctuations between different molecular conformations - that could not be detected in previous force spectroscopy experiments.

Journal ArticleDOI
TL;DR: This simple approach demonstrates that the experimentally observed structural states at nonzero tension are a consequence of the tension and that these tension-induced states cease to exist at zero tension.
Abstract: We analyze the response of a single nucleosome to tension, which serves as a prototypical biophysical measurement where tension-dependent deformation alters transition kinetics. We develop a statistical-mechanics model of a nucleosome as a wormlike chain bound to a spool, incorporating fluctuations in the number of bases bound, the spool orientation, and the conformations of the unbound polymer segments. With the resulting free-energy surface, we perform dynamic simulations that permit a direct comparison with experiments. This simple approach demonstrates that the experimentally observed structural states at nonzero tension are a consequence of the tension and that these tension-induced states cease to exist at zero tension. The transitions between states exhibit substantial deformation of the unbound polymer segments. The associated deformation energy increases with tension; thus, the application of tension alters the kinetics due to tension-induced deformation of the transition states. This mechanism would arise in any system where the tether molecule is deformed in the transition state under the influence of tension.

Journal ArticleDOI
01 Mar 2011-Genetics
TL;DR: The analysis of both simulated and human resequencing data suggests that recent admixture does not result in an excess of false-positive results for neutrality tests based on the frequency spectrum after accounting for the population growth in the parental African population.
Abstract: We investigate the performance of tests of neutrality in admixed populations using plausible demographic models for African-American history as well as resequencing data from African and African-American populations. The analysis of both simulated and human resequencing data suggests that recent admixture does not result in an excess of false-positive results for neutrality tests based on the frequency spectrum after accounting for the population growth in the parental African population. Furthermore, when simulating positive selection, Tajima's D, Fu and Li's D, and haplotype homozygosity have lower power to detect population-specific selection using individuals sampled from the admixed population than from the nonadmixed population. Fay and Wu's H test, however, has more power to detect selection using individuals from the admixed population than from the nonadmixed population, especially when the selective sweep ended long ago. Our results have implications for interpreting recent genome-wide scans for positive selection in human populations.

Posted Content
TL;DR: The utility of a Bayesian extension that allows the experimental uncertainties to be directly quantified and build in detailed balance to reduce uncertainty through physical constraints is illustrated incharacterizing the three-state kinetic behavior of an RNA hairpin in a stationary optical trap.
Abstract: Departments of Statistics and Computer Science, University of Chicago, IL 60637, USA(Dated: August 9, 2011)Single-molecule force spectroscopy has proven to be a powerful tool for studying the kinetic be-havior of biomolecules. Through application of an external force, conformational states with smallor transient populations can be stabilized, allowing them to be characterized and the statistics of in-dividual trajectories studied to provide insight into biomolecular folding and function. Because theobserved quantity (force or extension) is not necessarily an ideal reaction coordinate, individual ob-servations cannot be uniquely associated with kinetically distinct conformations. While maximum-likelihood schemes such as hidden Markov models have solved this problem for other classes ofsingle-molecule experiments by using temporal information to aid in the inference of a sequence ofdistinct conformational states, these methods do not give a clear picture of how precisely the modelparameters are determined by the data due to instrument noise and finite-sample statistics, both sig-nificant problems in force spectroscopy. We solve this problem through a Bayesian extension thatallows the experimental uncertainties to be directly quantified, and build in detailed balance to fur-ther reduce uncertainty through physical constraints. We illustrate the utility of this approach incharacterizing the three-state kinetic behavior of an RNA hairpin in a stationary optical trap.

Journal ArticleDOI
TL;DR: Ge and Sang suggest that the analyses of domesticated Asian rice had a single origin were flawed and that the origin of rice remains an open question.
Abstract: In a recent study using two different methods of analysis and two different datasets, we concluded that domesticated Asian rice had a single origin (1). Ge and Sang (2) suggest that our analyses were flawed and that the origin of rice remains an open question.

Book ChapterDOI
01 Jan 2011
TL;DR: A number of statistical methods for testing associations between rare variants in two genes to obesity are described, and algorithmic strategies for haplotype phasing by multi-assembly of shared haplotypes from next-generation sequencing data are formulated.
Abstract: Genome-wide associations studies (GWAS) have been very successful in identifying common genetic variation associated to numerous complex diseases [1]. However, most of the identified common genetic variants appear to confer modest risk and few causal alleles have been identified [2]. Furthermore, these associations account for a small portion of the total heritability of inherited disease variation [1]. This has led to the reexamination of the contribution of environment, gene-gene and gene-environment interactions, and rare genetic variants in complex diseases [1, 3, 4]. There is strong evidence that rare variants play an important role in complex disease etiology and may have larger genetic effects than common variants [2]. Currently, much of what we know regarding the contribution of rare genetic variants to disease risk is based on a limited number of phenotypes and candidate genes. However, rapid advancement of second generation sequencing technologies will invariably lead to widespread association studies comparing whole exome and eventually whole genome sequencing of cases and controls. A tremendous challenge for enabling these "next generation" medical genomic studies is developing statistical approaches for correlating rare genetic variants with disease outcome. The analysis of rare variants is challenging since methods used for common variants are woefully underpowered. Therefore, methods that can deal with genetic heterogeneity at the trait-associated locus have been developed to analyze rare variants. These methods instead analyzing individual variants analyze variants within a region/gene as a group and usually rely on collapsing. They can be applied to both in cases vs. controls and quantitative trait studies are needed. The paper of Bansal et al. in this volume describes the application of a number of statistical methods for testing associations between rare variants in two genes to obesity. The authors considered the relative merits of the different methods as well as important implementation details, such as the leveraging of genomic annotations and determining p-values. Knowledge of haplotypes can increase the power of GWAS studies and also highlight associations that are impossible to detect without haplotype phase (e.g. loss of heterozygosity). Even more complicated phase-dependent interactions of variants in linkage equilibrium have also been suggested as possible causes of missing heritability. In their work, Hallsorsson et al. formulate algorithmic strategies for haplotype phasing by multi-assembly of shared haplotypes from next-generation sequencing data. These methods would allow testing haplotypes harboring rare variants for association and potentially increase their explanatory power. Since single SNP tests are often underpowered in rare variant association analysis, Zeggini and Asimit propose a locus-based method that has high power in the presence of rare variants and that incorporate base quality scores available for sequencing data. Their results suggest that this multi-marker approach may be best suited for smaller regions, or after some filtering to reduce the number of SNPs that are jointly tested to reduce loss of power due to multiple-testing adjustments. Finally, the paper of Zhou et al., presents a penalized regression framework for association testing on sequence data, in the presence of both common and rare variants. This method also introduces the use of weights to incorporate available biological information on the variants. Although these tactics improve both false positive and false negative rates, they represent an incremental development and there is still significant room for improvement. With the development of sequencing technologies and methods to detect complex trait rare variant associations many new and exciting discovery are imminent. The analysis of rare variants is still in its infancy and the next few years promises to produce many new methods to meet the special demands of analyzing this type of data. Note from Publisher: This article contains the abstract and references.

Proceedings Article
01 Jan 2011
TL;DR: A number of statistical methods for testing associations between rare variants in two genes to obesity are described, including a locus-based method that has high power in the presence of rare variants and that incorporate base quality scores available for sequencing data.
Abstract: Genome-wide associations studies (GWAS) have been very successful in identifying common genetic variation associated to numerous complex diseases [1]. However, most of the identified common genetic variants appear to confer modest risk and few causal alleles have been identified [2]. Furthermore, these associations account for a small portion of the total heritability of inherited disease variation [1]. This has led to the reexamination of the contribution of environment, gene-gene and gene-environment interactions, and rare genetic variants in complex diseases [1, 3, 4]. There is strong evidence that rare variants play an important role in complex disease etiology and may have larger genetic effects than common variants [2]. Currently, much of what we know regarding the contribution of rare genetic variants to disease risk is based on a limited number of phenotypes and candidate genes. However, rapid advancement of second generation sequencing technologies will invariably lead to widespread association studies comparing whole exome and eventually whole genome sequencing of cases and controls. A tremendous challenge for enabling these “next generation” medical genomic studies is developing statistical approaches for correlating rare genetic variants with disease outcome. The analysis of rare variants is challenging since methods used for common variants are woefully underpowered. Therefore, methods that can deal with genetic heterogeneity at the trait-associated locus have been developed to analyze rare variants. These methods instead analyzing individual variants analyze variants within a region/gene as a group and usually rely on collapsing. They can be applied to both in cases vs. controls and quantitative trait studies are needed. The paper of Bansal et al. in this volume describes the application of a number of statistical methods for testing associations between rare variants in two genes to obesity. The authors considered the relative merits of the different methods as well as important implementation details, such as the leveraging of genomic annotations and determining p-values. Knowledge of haplotypes can increase the power of GWAS studies and also highlight associations that are impossible to detect without haplotype phase (e.g. loss of heterozygosity). Even more complicated phase-dependent interactions of variants in linkage equilibrium have also been suggested as possible causes of missing heritability. In their work, Hallsorsson et al. formulate algorithmic strategies for haplotype phasing by multi-assembly of shared haplotypes from next-generation sequencing data. These methods would allow testing haplotypes harboring rare variants for association and potentially increase their explanatory power. Since single SNP tests are often underpowered in rare variant association analysis, Zeggini and Asimit propose a locus-based method that has high power in the presence of rare variants and that incorporate base quality scores available for sequencing data. Their results suggest that this multi-marker approach may be best suited for smaller regions, or after some filtering to reduce the number of SNPs that are jointly tested to reduce loss of power due to multiple-testing adjustments. Finally, the paper of Zhou et al., presents a penalized regression framework for association testing on sequence data, in the presence of both common and rare variants. This method also introduces the use of weights to incorporate available biological information on the variants. Although these tactics improve both false positive and false negative rates, they represent an incremental development and there is still significant room for improvement. With the development of sequencing technologies and methods to detect complex trait rare variant associations many new and exciting discovery are imminent. The analysis of rare variants is still in its infancy and the next few years promises to produce many new methods to meet the special demands of analyzing this type of data.

Posted Content
TL;DR: A modified version of this theory which does not require numerical derivatives, allowing rate constants to be robustly estimated from the time-correlation function directly directly is presented.
Abstract: While seemingly straightforward in principle, the reliable estimation of rate constants is seldom easy in practice. Numerous issues, such as the complication of poor reaction coordinates, cause obvious approaches to yield unreliable estimates. When a reliable order parameter is available, the reactive flux theory of Chandler allows the rate constant to be extracted from the plateau region of an appropriate reactive flux correlation function. However, when applied to real data from single- molecule experiments or molecular dynamics simulations, the reactive flux correlation function requires the numerical differentiation of a noisy empirical correlation function, which can result in an unacceptably poor estimate of the rate and pathological dependence on the sampling interval. We present a modified version of this theory which does not require numerical derivatives, allowing rate constants to be robustly estimated from the time-correlation function directly. We illustrate the approach using single-molecule passive force spectroscopy measurements of an RNA hairpin.

Journal ArticleDOI
06 Jun 2011-PLOS ONE
TL;DR: There are differences in the genetic and selective basis for domestication between these two Asian rice varietal groups, particularly in tropical japonica and indica rice.
Abstract: Oryza sativa or Asian cultivated rice is one of the major cereal grass species domesticated for human food use during the Neolithic. Domestication of this species from the wild grass Oryza rufipogon was accompanied by changes in several traits, including seed shattering, percent seed set, tillering, grain weight, and flowering time. Quantitative trait locus (QTL) mapping has identified three genomic regions in chromosome 3 that appear to be associated with these traits. We would like to study whether these regions show signatures of selection and whether the same genetic basis underlies the domestication of different rice varieties. Fragments of 88 genes spanning these three genomic regions were sequenced from multiple accessions of two major varietal groups in O. sativa--indica and tropical japonica--as well as the ancestral wild rice species O. rufipogon. In tropical japonica, the levels of nucleotide variation in these three QTL regions are significantly lower compared to genome-wide levels, and coalescent simulations based on a complex demographic model of rice domestication indicate that these patterns are consistent with selection. In contrast, there is no significant reduction in nucleotide diversity in the homologous regions in indica rice. These results suggest that there are differences in the genetic and selective basis for domestication between these two Asian rice varietal groups.