Showing papers by "Carlos Bustamante published in 2011"

PDF

Open Access

Journal Article•DOI•

Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa

[...]

Keyan Zhao¹, Chih-Wei Tung¹, Georgia C. Eizenga², Mark Wright¹, M. Liakat Ali³, Adam H. Price⁴, Gareth J. Norton⁴, S. M. Rafiqul Islam⁵, Andrew R. Reynolds¹, Jason G. Mezey¹, Anna M. McClung², Carlos Bustamante¹, Carlos Bustamante⁶, Susan R. McCouch¹ - Show less +10 more•Institutions (6)

Cornell University¹, Agricultural Research Service², University of Arkansas³, University of Aberdeen⁴, Bangladesh Agricultural University⁵, Stanford University⁶

13 Sep 2011-Nature Communications

TL;DR: This work establishes an open-source translational research platform for genome-wide association studies in rice that directly links molecular variation in genes and metabolic pathways with the germplasm resources needed to accelerate varietal development and crop improvement.

...read moreread less

Abstract: Asian rice, Oryza sativa is a cultivated, inbreeding species that feeds over half of the world ’ s population. Understanding the genetic basis of diverse physiological, developmental, and morphological traits provides the basis for improving yield, quality and sustainability of rice. Here we show the results of a genome-wide association study based on genotyping 44,100 SNP variants across 413 diverse accessions of O. sativa collected from 82 countries that were systematically phenotyped for 34 traits. Using cross-population-based mapping strategies, we identifi ed dozens of common variants infl uencing numerous complex traits. Signifi cant heterogeneity was observed in the genetic architecture associated with subpopulation structure and response to environment. This work establishes an open-source translational research platform for genome-wide association studies in rice that directly links molecular variation in genes and metabolic pathways with the germplasm resources needed to accelerate varietal development and crop improvement.

...read moreread less

1,170 citations

Journal Article•DOI•

Mapping copy number variation by population-scale genome sequencing

[...]

Ryan E. Mills¹, Klaudia Walter², Chip Stewart³, Robert E. Handsaker⁴ +371 more•Institutions (21)

03 Feb 2011-Nature

TL;DR: A map of unbalanced SVs is constructed based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations, and serves as a resource for sequencing-based association studies.

...read moreread less

Abstract: Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analysing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.

...read moreread less

1,085 citations

Journal Article•DOI•

Demographic history and rare allele sharing among human populations

[...]

Simon Gravel¹, Brenna M. Henn¹, Ryan N. Gutenkunst, Amit Indap², Gabor T. Marth², Andrew G. Clark³, Fuli Yu⁴, Richard A. Gibbs⁴, Carlos Bustamante¹ - Show less +5 more•Institutions (4)

Stanford University¹, Boston College², Cornell University³, Baylor College of Medicine⁴

19 Jul 2011-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: It is found that the majority of human genomic variable sites are rare and exhibit little sharing among diverged populations, emphasizing that replication of disease association for specific rare genetic variants across diverging populations must overcome both reduced statistical power because of rarity and higher population divergence.

...read moreread less

Abstract: High-throughput sequencing technology enables population-level surveys of human genomic variation. Here, we examine the joint allele frequency distributions across continental human populations and present an approach for combining complementary aspects of whole-genome, low-coverage data and targeted high-coverage data. We apply this approach to data generated by the pilot phase of the Thousand Genomes Project, including whole-genome 2–4× coverage data for 179 samples from HapMap European, Asian, and African panels as well as high-coverage target sequencing of the exons of 800 genes from 697 individuals in seven populations. We use the site frequency spectra obtained from these data to infer demographic parameters for an Out-of-Africa model for populations of African, European, and Asian descent and to predict, by a jackknife-based approach, the amount of genetic diversity that will be discovered as sample sizes are increased. We predict that the number of discovered nonsynonymous coding variants will reach 100,000 in each population after ∼1,000 sequenced chromosomes per population, whereas ∼2,500 chromosomes will be needed for the same number of synonymous variants. Beyond this point, the number of segregating sites in the European and Asian panel populations is expected to overcome that of the African panel because of faster recent population growth. Overall, we find that the majority of human genomic variable sites are rare and exhibit little sharing among diverged populations. Our results emphasize that replication of disease association for specific rare genetic variants across diverged populations must overcome both reduced statistical power because of rarity and higher population divergence.

...read moreread less

670 citations

Journal Article•DOI•

An Aboriginal Australian Genome Reveals Separate Human Dispersals into Asia

[...]

Morten Rasmussen¹, Xiaosen Guo¹, Yong Wang², Kirk E. Lohmueller², Simon Rasmussen³, Anders Albrechtsen¹, Line Skotte¹, Stinus Lindgreen¹, Mait Metspalu⁴, Thibaut Jombart⁵, Toomas Kivisild⁶, Weiwei Zhai⁷, Anders Eriksson⁶, Andrea Manica⁶, Ludovic Orlando¹, Francisco M. De La Vega⁸, Silvana R. Tridico⁹, Ene Metspalu⁴, Kasper Nielsen³, María C. Ávila-Arcos¹, J. Víctor Moreno-Mayar¹, J. Víctor Moreno-Mayar¹⁰, Craig Muller, Joe Dortch¹¹, M. Thomas P. Gilbert¹, Ole Lund³, Agata Wesolowska³, Monika Karmin⁴, Lucy A. Weinert⁵, Bo Wang, Jun Li, Shuaishuai Tai, Fei Xiao, Tsunehiko Hanihara¹², George van Driem¹³, Aashish R. Jha¹⁴, François-Xavier Ricaut¹⁵, Peter de Knijff¹⁶, Andrea Bamberg Migliano¹⁷, Andrea Bamberg Migliano⁶, Irene Gallego Romero¹⁴, Karsten Kristiansen¹, David M. Lambert¹⁸, Søren Brunak¹, Søren Brunak³, Peter Forster⁶, Bernd Brinkmann, Olaf Nehlich¹⁹, Michael Bunce⁹, Michael P. Richards²⁰, Michael P. Richards¹⁹, Ramneek Gupta³, Carlos Bustamante⁸, Anders Krogh¹, Robert Foley⁶, Marta Mirazón Lahr⁶, Francois Balloux⁵, Thomas Sicheritz-Pontén³, Richard Villems⁴, Richard Villems²¹, Rasmus Nielsen², Rasmus Nielsen¹, Jun Wang, Eske Willerslev¹ - Show less +60 more•Institutions (21)

07 Oct 2011-Science

TL;DR: It is shown that Aboriginal Australians are descendants of an early human dispersal into eastern Asia, possibly 62,000 to 75,000 years ago, which is separate from the one that gave rise to modern Asians 25, thousands of years ago.

...read moreread less

Abstract: We present an Aboriginal Australian genomic sequence obtained from a 100-year-old lock of hair donated by an Aboriginal man from southern Western Australia in the early 20th century. We detect no evidence of European admixture and estimate contamination levels to be below 0.5%. We show that Aboriginal Australians are descendants of an early human dispersal into eastern Asia, possibly 62,000 to 75,000 years ago. This dispersal is separate from the one that gave rise to modern Asians 25,000 to 38,000 years ago. We also find evidence of gene flow between populations of the two dispersal waves prior to the divergence of Native Americans from modern Asian ancestors. Our findings support the hypothesis that present-day Aboriginal Australians descend from the earliest humans to occupy Australia, likely representing one of the oldest continuous populations outside Africa.

...read moreread less

656 citations

Journal Article•DOI•

Genetic structure and domestication history of the grape.

[...]

Sean Myles¹, Adam R. Boyko², Christopher L. Owens¹, Patrick J. Brown, Fabrizio Grassi³, Mallikarjuna K. Aradhya⁴, Bernard Prins⁴, Andy Reynolds², Jer Ming Chia⁵, Doreen Ware¹, Doreen Ware⁵, Carlos Bustamante², Edward S. Buckler¹ - Show less +9 more•Institutions (5)

Cornell University¹, Stanford University², University of Milan³, University of California, Davis⁴, Cold Spring Harbor Laboratory⁵

01 Mar 2011-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: It is proposed that the adoption of vegetative propagation was a double-edged sword: Although it provided a benefit by ensuring true breeding cultivars, it also discouraged the generation of unique cultivars through crosses.

...read moreread less

Abstract: The grape is one of the earliest domesticated fruit crops and, since antiquity, it has been widely cultivated and prized for its fruit and wine. Here, we characterize genome-wide patterns of genetic variation in over 1,000 samples of the domesticated grape, Vitis vinifera subsp. vinifera, and its wild relative, V. vinifera subsp. sylvestris from the US Department of Agriculture grape germplasm collection. We find support for a Near East origin of vinifera and present evidence of introgression from local sylvestris as the grape moved into Europe. High levels of genetic diversity and rapid linkage disequilibrium (LD) decay have been maintained in vinifera, which is consistent with a weak domestication bottleneck followed by thousands of years of widespread vegetative propagation. The considerable genetic diversity within vinifera, however, is contained within a complex network of close pedigree relationships that has been generated by crosses among elite cultivars. We show that first-degree relationships are rare between wine and table grapes and among grapes from geographically distant regions. Our results suggest that although substantial genetic diversity has been maintained in the grape subsequent to domestication, there has been a limited exploration of this diversity. We propose that the adoption of vegetative propagation was a double-edged sword: Although it provided a benefit by ensuring true breeding cultivars, it also discouraged the generation of unique cultivars through crosses. The grape currently faces severe pathogen pressures, and the long-term sustainability of the grape and wine industries will rely on the exploitation of the grape's tremendous natural genetic diversity.

...read moreread less

611 citations

Journal Article•DOI•

Comparative and demographic analysis of orang-utan genomes.

[...]

Devin P. Locke¹, LaDeana W. Hillier¹, Wesley C. Warren¹, Kim C. Worley², Lynne V. Nazareth², Donna M. Muzny², Shiaw-Pyng Yang¹, Zhengyuan Wang¹, Asif T. Chinwalla¹, Patrick Minx¹, Makedonka Mitreva¹, Lisa Cook¹, Kim D. Delehaunty¹, Catrina Fronick¹, Heather Schmidt¹, Lucinda Fulton¹, Robert S. Fulton¹, Joanne O. Nelson¹, Vincent Magrini¹, Craig Pohl¹, Tina Graves¹, Chris Markovic¹, Andy Cree², Huyen Dinh², Jennifer Hume², Christie Kovar², Gerald R. Fowler², Gerton Lunter³, Gerton Lunter⁴, Stephen Meader³, Andreas Heger³, Chris P. Ponting³, Tomas Marques-Bonet⁵, Tomas Marques-Bonet⁶, Can Alkan⁵, Lin Chen⁵, Ze Cheng⁵, Jeffrey M. Kidd⁵, Evan E. Eichler⁷, Evan E. Eichler⁵, Simon D. M. White⁸, Stephen M. J. Searle⁸, Albert J. Vilella⁹, Yuan Chen⁹, Paul Flicek⁹, Jian Ma¹⁰, Jian Ma¹¹, Brian J. Raney¹⁰, Bernard B. Suh¹⁰, Richard Burhans¹², Javier Herrero⁹, David Haussler¹⁰, Rui Faria⁶, Rui Faria¹³, Olga Fernando¹⁴, Olga Fernando⁶, Fleur Darré⁶, Domènec Farré⁶, Elodie Gazave⁶, Meritxell Oliva⁶, Arcadi Navarro⁶, Roberta Roberto¹⁵, Oronzo Capozzi¹⁵, Nicoletta Archidiacono¹⁵, Giuliano Della Valle¹⁶, Stefania Purgato¹⁶, Mariano Rocchi¹⁵, Miriam K. Konkel¹⁷, Jerilyn A. Walker¹⁷, Brygg Ullmer¹⁷, Mark A. Batzer¹⁷, Arian F.A. Smit¹⁸, Robert Hubley¹⁸, Claudio Casola¹⁹, Daniel R. Schrider¹⁹, Matthew W. Hahn¹⁹, Víctor Quesada²⁰, Xose S. Puente²⁰, Gonzalo R. Ordóñez²⁰, Carlos López-Otín²⁰, Tomas Vinar²¹, Brona Brejova²¹, Aakrosh Ratan¹², Robert S. Harris¹², Webb Miller¹², Carolin Kosiol, Heather A. Lawson¹, Vikas Taliwal²², André L. Martins²², Adam Siepel²², Arindam RoyChoudhury²³, Xin Ma²², Jeremiah D. Degenhardt²², Carlos Bustamante²⁴, Ryan N. Gutenkunst²⁵, Thomas Mailund²⁶, Julien Y. Dutheil²⁶, Asger Hobolth²⁶, Mikkel H. Schierup²⁶, Oliver A. Ryder, Yuko Yoshinaga²⁷, Pieter J. de Jong²⁷, George M. Weinstock¹, Jeffrey Rogers², Elaine R. Mardis¹, Richard A. Gibbs², Richard K. Wilson¹ - Show less +103 more•Institutions (27)

27 Jan 2011-Nature

TL;DR: The orang-utan species, Pongo abelii and Pongo pygmaeus, are the most phylogenetically distant great apes from humans, thereby providing an informative perspective on hominid evolution and a primate polymorphic neocentromere, found in both Pongo species are described.

...read moreread less

Abstract: 'Orang-utan' is derived from a Malay term meaning 'man of the forest' and aptly describes the southeast Asian great apes native to Sumatra and Borneo. The orang-utan species, Pongo abelii (Sumatran) and Pongo pygmaeus (Bornean), are the most phylogenetically distant great apes from humans, thereby providing an informative perspective on hominid evolution. Here we present a Sumatran orang-utan draft genome assembly and short read sequence data from five Sumatran and five Bornean orang-utan genomes. Our analyses reveal that, compared to other primates, the orang-utan genome has many unique features. Structural evolution of the orang-utan genome has proceeded much more slowly than other great apes, evidenced by fewer rearrangements, less segmental duplication, a lower rate of gene family turnover and surprisingly quiescent Alu repeats, which have played a major role in restructuring other primate genomes. We also describe a primate polymorphic neocentromere, found in both Pongo species, emphasizing the gradual evolution of orang-utan genome structure. Orang-utans have extremely low energy usage for a eutherian mammal, far lower than their hominid relatives. Adding their genome to the repertoire of sequenced primates illuminates new signals of positive selection in several pathways including glycolipid metabolism. From the population perspective, both Pongo species are deeply diverse; however, Sumatran individuals possess greater diversity than their Bornean counterparts, and more species-specific variation. Our estimate of Bornean/Sumatran speciation time, 400,000 years ago, is more recent than most previous studies and underscores the complexity of the orang-utan speciation process. Despite a smaller modern census population size, the Sumatran effective population size (N(e)) expanded exponentially relative to the ancestral N(e) after the split, while Bornean N(e) declined over the same period. Overall, the resources and analyses presented here offer new opportunities in evolutionary genomics, insights into hominid biology, and an extensive database of variation for conservation efforts.

...read moreread less

555 citations

Journal Article•DOI•

Genomics for the world

[...]

Carlos Bustamante¹, Francisco M. De La Vega¹, Esteban G. Burchard²•Institutions (2)

Stanford University¹, University of California, San Francisco²

13 Jul 2011-Nature

TL;DR: Medical genomics has focused almost entirely on those of European descent, but other ethnic groups must be studied to ensure that more people benefit, say researchers.

...read moreread less

Abstract: Medical genomics has focused almost entirely on those of European descent. Other ethnic groups must be studied to ensure that more people benefit, say Carlos D. Bustamante, Esteban Gonzalez Burchard and Francisco M. De La Vega.

...read moreread less

490 citations

Journal Article•DOI•

Molecular evidence for a single evolutionary origin of domesticated rice

[...]

Jeanmaire Molina¹, Martin Sikora², Nandita R. Garud², Jonathan M. Flowers¹, Samara Rubinstein¹, Andy Reynolds², Pu Huang³, Scott A. Jackson⁴, Barbara A. Schaal³, Carlos Bustamante², Adam R. Boyko², Michael D. Purugganan - Show less +8 more•Institutions (4)

New York University¹, Stanford University², Washington University in St. Louis³, Purdue University⁴

17 May 2011-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: Demographic modeling based on SNP data and a diffusion-based approach provide the strongest support for a single domestication origin of rice, and Bayesian phylogenetic analyses implementing the multispecies coalescent and using previously published phylogenetic sequence datasets also point to a single origin of Asian domesticated rice.

...read moreread less

Abstract: Asian rice, Oryza sativa, is one of world's oldest and most important crop species. Rice is believed to have been domesticated ∼9,000 y ago, although debate on its origin remains contentious. A single-origin model suggests that two main subspecies of Asian rice, indica and japonica, were domesticated from the wild rice O. rufipogon. In contrast, the multiple independent domestication model proposes that these two major rice types were domesticated separately and in different parts of the species range of wild rice. This latter view has gained much support from the observation of strong genetic differentiation between indica and japonica as well as several phylogenetic studies of rice domestication. We reexamine the evolutionary history of domesticated rice by resequencing 630 gene fragments on chromosomes 8, 10, and 12 from a diverse set of wild and domesticated rice accessions. Using patterns of SNPs, we identify 20 putative selective sweeps on these chromosomes in cultivated rice. Demographic modeling based on these SNP data and a diffusion-based approach provide the strongest support for a single domestication origin of rice. Bayesian phylogenetic analyses implementing the multispecies coalescent and using previously published phylogenetic sequence datasets also point to a single origin of Asian domesticated rice. Finally, we date the origin of domestication at ∼8,200–13,500 y ago, depending on the molecular clock estimate that is used, which is consistent with known archaeological data that suggests rice was first cultivated at around this time in the Yangtze Valley of China.

...read moreread less

400 citations

Journal Article•DOI•

Hunter-gatherer genomic diversity suggests a southern African origin for modern humans

[...]

Brenna M. Henn¹, Christopher R. Gignoux², Matthew J. Jobin¹, Julie M. Granka¹, John Michael Macpherson, Jeffrey M. Kidd¹, Laura Rodríguez-Botigué³, Sohini Ramachandran⁴, Lawrence Hon, Abra Brisbin⁵, Alice A. Lin¹, Peter A. Underhill¹, David Comas⁴, Kenneth K. Kidd⁶, Paul Norman¹, Peter Parham¹, Carlos Bustamante¹, Joanna L. Mountain, Marcus W. Feldman¹ - Show less +15 more•Institutions (6)

Stanford University¹, University of California, San Francisco², Pompeu Fabra University³, Butler Hospital⁴, Cornell University⁵, Yale University⁶

29 Mar 2011-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: It is found that African hunter-gatherer populations today remain highly differentiated, encompassing major components of variation that are not found in other African populations, and tend to have the lowest levels of genome-wide linkage disequilibrium among 27 African populations.

...read moreread less

Abstract: Africa is inferred to be the continent of origin for all modern human populations, but the details of human prehistory and evolution in Africa remain largely obscure owing to the complex histories of hundreds of distinct populations. We present data for more than 580,000 SNPs for several hunter-gatherer populations: the Hadza and Sandawe of Tanzania, and the ≠Khomani Bushmen of South Africa, including speakers of the nearly extinct N|u language. We find that African hunter-gatherer populations today remain highly differentiated, encompassing major components of variation that are not found in other African populations. Hunter-gatherer populations also tend to have the lowest levels of genome-wide linkage disequilibrium among 27 African populations. We analyzed geographic patterns of linkage disequilibrium and population differentiation, as measured by FST, in Africa. The observed patterns are consistent with an origin of modern humans in southern Africa rather than eastern Africa, as is generally assumed. Additionally, genetic variation in African hunter-gatherer populations has been significantly affected by interaction with farmers and herders over the past 5,000 y, through both severe population bottlenecks and sex-biased migration. However, African hunter-gatherer populations continue to maintain the highest levels of genetic diversity in the world.

...read moreread less

390 citations

Journal Article•DOI•

Genetic Architecture of Aluminum Tolerance in Rice (Oryza sativa) Determined through Genome-Wide Association Analysis and QTL Mapping

[...]

Adam N. Famoso¹, Keyan Zhao¹, Randy T. Clark², Chih-Wei Tung¹, Mark Wright¹, Carlos Bustamante¹, Leon V. Kochian², Susan R. McCouch¹ - Show less +4 more•Institutions (2)

Cornell University¹, United States Department of Agriculture²

04 Aug 2011-PLOS Genetics

TL;DR: The hypothesis that selectively introgressing alleles across subpopulations is an efficient approach for trait enhancement in plant breeding programs is supported and demonstrates the fundamental importance of subpopulation in interpreting and manipulating the genetics of complex traits in rice.

...read moreread less

Abstract: Aluminum (Al) toxicity is a primary limitation to crop productivity on acid soils, and rice has been demonstrated to be significantly more Al tolerant than other cereal crops. However, the mechanisms of rice Al tolerance are largely unknown, and no genes underlying natural variation have been reported. We screened 383 diverse rice accessions, conducted a genome-wide association (GWA) study, and conducted QTL mapping in two bi-parental populations using three estimates of Al tolerance based on root growth. Subpopulation structure explained 57% of the phenotypic variation, and the mean Al tolerance in Japonica was twice that of Indica. Forty-eight regions associated with Al tolerance were identified by GWA analysis, most of which were subpopulation-specific. Four of these regions co-localized with a priori candidate genes, and two highly significant regions co-localized with previously identified QTLs. Three regions corresponding to induced Al-sensitive rice mutants (ART1, STAR2, Nrat1) were identified through bi-parental QTL mapping or GWA to be involved in natural variation for Al tolerance. Haplotype analysis around the Nrat1 gene identified susceptible and tolerant haplotypes explaining 40% of the Al tolerance variation within the aus subpopulation, and sequence analysis of Nrat1 identified a trio of non-synonymous mutations predictive of Al sensitivity in our diversity panel. GWA analysis discovered more phenotype–genotype associations and provided higher resolution, but QTL mapping identified critical rare and/or subpopulation-specific alleles not detected by GWA analysis. Mapping using Indica/Japonica populations identified QTLs associated with transgressive variation where alleles from a susceptible aus or indica parent enhanced Al tolerance in a tolerant Japonica background. This work supports the hypothesis that selectively introgressing alleles across subpopulations is an efficient approach for trait enhancement in plant breeding programs and demonstrates the fundamental importance of subpopulation in interpreting and manipulating the genetics of complex traits in rice.

...read moreread less

357 citations

Journal Article•DOI•

ClpX(P) Generates Mechanical Force to Unfold and Translocate Its Protein Substrates

[...]

Rodrigo A. Maillard¹, Gheorghe Chistol¹, Maya Sen¹, Maurizio Righini¹, Jiongyi Tan¹, Christian M. Kaiser¹, Courtney Hodges¹, Andreas Martin¹, Carlos Bustamante - Show less +5 more•Institutions (1)

University of California, Berkeley¹

29 Apr 2011-Cell

TL;DR: Direct observations of mechanical, force-induced protein unfolding by the ClpX unfoldase from E. coli, alone, and in complex with the ClPP peptidase are reported.

...read moreread less

Journal Article•DOI•

A genome-wide perspective on the evolutionary history of enigmatic wolf-like canids

[...]

01 Aug 2011-Genome Research

TL;DR: It is found that these enigmatic canids are highly admixed varieties derived from gray wolves and coyotes, respectively, and divergent genomic history suggests that they do not have a shared recent ancestry as proposed by previous researchers.

...read moreread less

Abstract: High-throughput genotyping technologies developed for model species can potentially increase the resolution of demographic history and ancestry in wild relatives. We use a SNP genotyping microarray developed for the domestic dog to assay variation in over 48K loci in wolf-like species worldwide. Despite the high mobility of these large carnivores, we find distinct hierarchical population units within gray wolves and coyotes that correspond with geographic and ecologic differences among populations. Further, we test controversial theories about the ancestry of the Great Lakes wolf and red wolf using an analysis of haplotype blocks across all 38 canid autosomes. We find that these enigmatic canids are highly admixed varieties derived from gray wolves and coyotes, respectively. This divergent genomic history suggests that they do not have a shared recent ancestry as proposed by previous researchers. Interspecific hybridization, as well as the process of evolutionary divergence, may be responsible for the observed phenotypic distinction of both forms. Such admixture complicates decisions regarding endangered species restoration and protection.

...read moreread less

Journal Article•DOI•

The ribosome uses two active mechanisms to unwind messenger RNA during translation

[...]

Xiaohui Qu¹, Jin-Der Wen², Jin-Der Wen¹, Laura Lancaster³, Harry F. Noller³, Carlos Bustamante¹, Ignacio Tinoco¹ - Show less +3 more•Institutions (3)

University of California, Berkeley¹, National Taiwan University², University of California, Santa Cruz³

07 Jul 2011-Nature

TL;DR: It is found that the translation rate of identical codons at the decoding centre is greatly influenced by the GC content of folded structures at the mRNA entry site, and force applied to the ends of the hairpin to favour its unfolding significantly speeds translation.

...read moreread less

Abstract: The ribosome translates the genetic information encoded in messenger RNA into protein. Folded structures in the coding region of an mRNA represent a kinetic barrier that lowers the peptide elongation rate, as the ribosome must disrupt structures it encounters in the mRNA at its entry site to allow translocation to the next codon. Such structures are exploited by the cell to create diverse strategies for translation regulation, such as programmed frameshifting, the modulation of protein expression levels, ribosome localization and co-translational protein folding. Although strand separation activity is inherent to the ribosome, requiring no exogenous helicases, its mechanism is still unknown. Here, using a single-molecule optical tweezers assay on mRNA hairpins, we find that the translation rate of identical codons at the decoding centre is greatly influenced by the GC content of folded structures at the mRNA entry site. Furthermore, force applied to the ends of the hairpin to favour its unfolding significantly speeds translation. Quantitative analysis of the force dependence of its helicase activity reveals that the ribosome, unlike previously studied helicases, uses two distinct active mechanisms to unwind mRNA structure: it destabilizes the helical junction at the mRNA entry site by biasing its thermal fluctuations towards the open state, increasing the probability of the ribosome translocating unhindered; and it mechanically pulls apart the mRNA single strands of the closed junction during the conformational changes that accompany ribosome translocation. The second of these mechanisms ensures a minimal basal rate of translation in the cell; specialized, mechanically stable structures are required to stall the ribosome temporarily. Our results establish a quantitative mechanical basis for understanding the mechanism of regulation of the elongation rate of translation by structured mRNAs.

...read moreread less

Journal Article•DOI•

The Ribosome Modulates Nascent Protein Folding

[...]

Christian M. Kaiser¹, Daniel Goldman¹, John D. Chodera¹, Ignacio Tinoco¹, Carlos Bustamante - Show less +1 more•Institutions (1)

University of California, Berkeley¹

23 Dec 2011-Science

TL;DR: The results suggest that the ribosome not only decodes the genetic information and synthesizes polypeptides, but also promotes efficient de novo attainment of the native state.

...read moreread less

Abstract: Proteins are synthesized by the ribosome and generally must fold to become functionally active. Although it is commonly assumed that the ribosome affects the folding process, this idea has been extremely difficult to demonstrate. We have developed an experimental system to investigate the folding of single ribosome-bound stalled nascent polypeptides with optical tweezers. In T4 lysozyme, synthesized in a reconstituted in vitro translation system, the ribosome slows the formation of stable tertiary interactions and the attainment of the native state relative to the free protein. Incomplete T4 lysozyme polypeptides misfold and aggregate when free in solution, but they remain folding-competent near the ribosomal surface. Altogether, our results suggest that the ribosome not only decodes the genetic information and synthesizes polypeptides, but also promotes efficient de novo attainment of the native state.

...read moreread less

Journal Article•DOI•

The functional spectrum of low-frequency coding variation

[...]

Gabor T. Marth¹, Fuli Yu², Amit Indap¹, Kiran V. Garimella³, Simon Gravel⁴, Wen Fung Leong¹, Chris Tyler-Smith⁵, Matthew N. Bainbridge², Thomas W. Blackwell⁶, Xiangqun Zheng-Bradley⁷, Yuan Chen⁵, Danny Challis², Laura Clarke⁷, Edward V. Ball, Kristian Cibulskis³, David Neil Cooper, Bob Fulton⁸, Christopher Hartl³, Daniel C. Koboldt⁸, Donna Muzny⁴, Richard J.H. Smith⁷, Carrie Sougnez³, Chip Stewart¹, Alistair Ward¹, Jin Yu², Yali Xue⁵, David Altshuler³, Carlos Bustamante⁴, Andrew G. Clark⁹, Mark J. Daly³, Mark A. DePristo³, Paul Flicek⁷, Stacey Gabriel³, Elaine R. Mardis⁸, Aarno Palotie⁵, Richard A. Gibbs² - Show less +32 more•Institutions (9)

Boston College¹, Baylor College of Medicine², Broad Institute³, Stanford University⁴, Wellcome Trust Sanger Institute⁵, University of Michigan⁶, European Bioinformatics Institute⁷, Washington University in St. Louis⁸, Cornell University⁹

14 Sep 2011-Genome Biology

TL;DR: This study represents a large step toward detecting and interpreting low frequency coding variation, clearly lays out technical steps for effective analysis of DNA capture data, and articulates functional and population properties of this important class of genetic variation.

...read moreread less

Abstract: Background Rare coding variants constitute an important class of human genetic variation, but are underrepresented in current databases that are based on small population samples. Recent studies show that variants altering amino acid sequence and protein function are enriched at low variant allele frequency, 2 to 5%, but because of insufficient sample size it is not clear if the same trend holds for rare variants below 1% allele frequency.

...read moreread less

Journal Article•DOI•

Revisiting the Central Dogma One Molecule at a Time

[...]

Carlos Bustamante, Wei Cheng¹, Yara X. Mejia²•Institutions (2)

University of Michigan¹, Max Planck Society²

18 Feb 2011-Cell

TL;DR: An overview of the main results arrived at by the application of single-molecule methods to the study of themain machines of the central dogma is presented.

...read moreread less

Journal Article•DOI•

Phased whole-genome genetic risk in a family quartet using a major allele reference sequence.

[...]

Frederick E. Dewey¹, Rong Chen¹, Sergio Cordero¹, Kelly E. Ormond¹, Colleen Caleshu¹, Konrad J. Karczewski¹, Michelle Whirl-Carrillo¹, Matthew T. Wheeler¹, Joel T. Dudley¹, Jake K. Byrnes¹, Omar E. Cornejo¹, Joshua W. Knowles¹, Mark Woon¹, Katrin Sangkuhl¹, Li Gong¹, Caroline F. Thorn¹, Joan M. Hebert¹, Emidio Capriotti¹, Sean P. David¹, Aleksandra Pavlovic¹, Anne West², Joseph V. Thakuria³, Madeleine Ball³, Alexander Wait Zaranek³, Heidi L. Rehm³, George M. Church³, John West, Carlos Bustamante¹, Michael Snyder¹, Russ B. Altman¹, Teri E. Klein¹, Atul J. Butte¹, Euan A. Ashley¹ - Show less +29 more•Institutions (3)

Stanford University¹, Wellesley College², Harvard University³

15 Sep 2011-PLOS Genetics

TL;DR: A novel synthetic human reference sequence is developed that is ethnically concordant and used for the analysis of genomes from a nuclear family with history of familial thrombophilia, demonstrating that the use of the major allele reference sequence results in improved genotype accuracy for disease-associated variant loci.

...read moreread less

Abstract: Whole-genome sequencing harbors unprecedented potential for characterization of individual and family genetic variation. Here, we develop a novel synthetic human reference sequence that is ethnically concordant and use it for the analysis of genomes from a nuclear family with history of familial thrombophilia. We demonstrate that the use of the major allele reference sequence results in improved genotype accuracy for disease-associated variant loci. We infer recombination sites to the lowest median resolution demonstrated to date (,1,000 base pairs). We use family inheritance state analysis to control sequencing error and inform family-wide haplotype phasing, allowing quantification of genome-wide compound heterozygosity. We develop a sequence-based methodology for Human Leukocyte Antigen typing that contributes to disease risk prediction. Finally, we advance methods for analysis of disease and pharmacogenomic risk across the coding and non-coding genome that incorporate phased variant data. We show these methods are capable of identifying multigenic risk for inherited thrombophilia and informing the appropriate pharmacological therapy. These ethnicity-specific, family-based approaches to interpretation of genetic variation are emblematic of the next generation of genetic risk assessment using whole-genome sequencing.

...read moreread less

Journal Article•DOI•

The elongation rate of RNA polymerase determines the fate of transcribed nucleosomes

[...]

Lacramioara Bintu¹, Marta Kopaczynska², Marta Kopaczynska³, Courtney Hodges¹, Courtney Hodges², Lucyna Lubkowska⁴, Mikhail Kashlev⁴, Carlos Bustamante - Show less +4 more•Institutions (4)

University of California, Berkeley¹, California Institute of Technology², California Institute for Quantitative Biosciences³, National Institutes of Health⁴

01 Dec 2011-Nature Structural & Molecular Biology

TL;DR: Atomic force microscopy images of yeast RNA polymerase II–nucleosome complexes confirm the presence of looped transcriptional intermediates and provide mechanistic insight into the histone-transfer process through the distribution of transcribed nucleosome positions.

...read moreread less

Abstract: What happens to histones during transcription is not well understood. Atomic force microscopy snapshots of RNA polymerase II (Pol II)-nucleosome complexes before, during and after transcription show the presence of looped transcriptional intermediates. In addition, a fraction of transcribed histones are remodeled to hexasomes, and the size of this fraction depends on the elongation rate of Pol II.

...read moreread less

Journal Article•DOI•

Single-base pair unwinding and asynchronous RNA release by the hepatitis C virus NS3 helicase.

[...]

Wei Cheng¹, Srikesh Arunajadai², Jeffrey R. Moffitt³, Ignacio Tinoco⁴, Carlos Bustamante⁴ - Show less +1 more•Institutions (4)

University of Michigan¹, Columbia University², Harvard University³, University of California, Berkeley⁴

23 Sep 2011-Science

TL;DR: Asynchronous release of nascent nucleotides rationalizes various observations of its dsNA unwinding and may be used to coordinate the translocation speed of NS3 along the RNA during viral replication.

...read moreread less

Abstract: Nonhexameric helicases use adenosine triphosphate (ATP) to unzip base pairs in double-stranded nucleic acids (dsNAs). Studies have suggested that these helicases unzip dsNAs in single–base pair increments, consuming one ATP molecule per base pair, but direct evidence for this mechanism is lacking. We used optical tweezers to follow the unwinding of double-stranded RNA by the hepatitis C virus NS3 helicase. Single–base pair steps by NS3 were observed, along with nascent nucleotide release that was asynchronous with base pair opening. Asynchronous release of nascent nucleotides rationalizes various observations of its dsNA unwinding and may be used to coordinate the translocation speed of NS3 along the RNA during viral replication.

...read moreread less

Journal Article•DOI•

A population genetic approach to mapping neurological disorder genes using deep resequencing

[...]

Rachel A. Myers¹, Ferran Casals¹, Julie Gauthier¹, Fadi F. Hamdan¹, Jon Keebler², Jon Keebler¹, Adam R. Boyko³, Carlos Bustamante³, Amélie Piton¹, Dan Spiegelman¹, Edouard Henrion¹, Martine Zilversmit¹, Julie Hussin¹, Jacklyn Quinlan¹, Yan Yang¹, Ronald G. Lafrenière¹, Alexander R. Griffing², Eric A. Stone², Guy A. Rouleau¹, Philip Awadalla - Show less +16 more•Institutions (3)

Université de Montréal¹, North Carolina State University², Stanford University³

24 Feb 2011-PLOS Genetics

TL;DR: This study demonstrates that genes associated with complex disorders can be mapped using resequencing and analytical methods with sample sizes far smaller than those required by genome-wide association studies and supports the hypothesis that rare mutations account for a proportion of the phenotypic variance of these complex disorders.

...read moreread less

Abstract: Deep resequencing of functional regions in human genomes is key to identifying potentially causal rare variants for complex disorders Here, we present the results from a large-sample resequencing (n = 285 patients) study of candidate genes coupled with population genetics and statistical methods to identify rare variants associated with Autism Spectrum Disorder and Schizophrenia Three genes, MAP1A, GRIN2B, and CACNA1F, were consistently identified by different methods as having significant excess of rare missense mutations in either one or both disease cohorts In a broader context, we also found that the overall site frequency spectrum of variation in these cases is best explained by population models of both selection and complex demography rather than neutral models or models accounting for complex demography alone Mutations in the three disease-associated genes explained much of the difference in the overall site frequency spectrum among the cases versus controls This study demonstrates that genes associated with complex disorders can be mapped using resequencing and analytical methods with sample sizes far smaller than those required by genome-wide association studies Additionally, our findings support the hypothesis that rare mutations account for a proportion of the phenotypic variance of these complex disorders

...read moreread less

Journal Article•DOI•

On identifying the optimal number of population clusters via the deviance information criterion.

[...]

Hong Gao¹, Katarzyna Bryc², Carlos Bustamante¹•Institutions (2)

Stanford University¹, Harvard University²

28 Jun 2011-PLOS ONE

TL;DR: It is found that DIC outperforms competing methods in many genetic contexts, validating its application in assessing population structure.

...read moreread less

Abstract: Inferring population structure using Bayesian clustering programs often requires a priori specification of the number of subpopulations, , from which the sample has been drawn. Here, we explore the utility of a common Bayesian model selection criterion, the Deviance Information Criterion (DIC), for estimating . We evaluate the accuracy of DIC, as well as other popular approaches, on datasets generated by coalescent simulations under various demographic scenarios. We find that DIC outperforms competing methods in many genetic contexts, validating its application in assessing population structure.

...read moreread less

Book Chapter•DOI•

DNA molecular handles for single-molecule protein-folding studies by optical tweezers.

[...]

Ciro Cecconi, Elizabeth A. Shank, Susan Marqusee, Carlos Bustamante

01 Jan 2011-Methods of Molecular Biology

TL;DR: This method makes it possible to study protein folding in the physiologically relevant low-force regime of optical tweezers and enables us to monitor processes - such as refolding events and fluctuations between different molecular conformations - that could not be detected in previous force spectroscopy experiments.

...read moreread less

Abstract: In this chapter, we describe a method that extends the use of optical tweezers to the study of the folding mechanism of single protein molecules. This method entails the use of DNA molecules as molecular handles to manipulate individual proteins between two polystyrene beads. The DNA molecules function as spacers between the protein and the beads, and keep the interactions between the tethering surfaces to a minimum. The handles can have different lengths, be attached to any pair of exposed cysteine residues, and be used to manipulate both monomeric and polymeric proteins. By changing the position of the cysteine residues on the protein surface, it is possible to apply the force to different portions of the protein and along different molecular axes. Circular dichroism and enzymatic activity studies have revealed that for many proteins, the handles do not significantly affect the folding behavior and the structure of the tethered protein. This method makes it possible to study protein folding in the physiologically relevant low-force regime of optical tweezers and enables us to monitor processes - such as refolding events and fluctuations between different molecular conformations - that could not be detected in previous force spectroscopy experiments.

...read moreread less

Journal Article•DOI•

Tension-dependent structural deformation alters single-molecule transition kinetics.

[...]

Bariz Sudhanshu¹, Shirley S. Mihardja, Elena F. Koslover, Shafigh Mehraeen, Carlos Bustamante, Andrew J. Spakowitz - Show less +2 more•Institutions (1)

Stanford University¹

01 Feb 2011-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: This simple approach demonstrates that the experimentally observed structural states at nonzero tension are a consequence of the tension and that these tension-induced states cease to exist at zero tension.

...read moreread less

Abstract: We analyze the response of a single nucleosome to tension, which serves as a prototypical biophysical measurement where tension-dependent deformation alters transition kinetics. We develop a statistical-mechanics model of a nucleosome as a wormlike chain bound to a spool, incorporating fluctuations in the number of bases bound, the spool orientation, and the conformations of the unbound polymer segments. With the resulting free-energy surface, we perform dynamic simulations that permit a direct comparison with experiments. This simple approach demonstrates that the experimentally observed structural states at nonzero tension are a consequence of the tension and that these tension-induced states cease to exist at zero tension. The transitions between states exhibit substantial deformation of the unbound polymer segments. The associated deformation energy increases with tension; thus, the application of tension alters the kinetics due to tension-induced deformation of the transition states. This mechanism would arise in any system where the tether molecule is deformed in the transition state under the influence of tension.

...read moreread less

Journal Article•DOI•

Detecting Directional Selection in the Presence of Recent Admixture in African Americans

[...]

Kirk E. Lohmueller¹, Carlos Bustamante¹, Andrew G. Clark•Institutions (1)

Cornell University¹

01 Mar 2011-Genetics

TL;DR: The analysis of both simulated and human resequencing data suggests that recent admixture does not result in an excess of false-positive results for neutrality tests based on the frequency spectrum after accounting for the population growth in the parental African population.

...read moreread less

Abstract: We investigate the performance of tests of neutrality in admixed populations using plausible demographic models for African-American history as well as resequencing data from African and African-American populations. The analysis of both simulated and human resequencing data suggests that recent admixture does not result in an excess of false-positive results for neutrality tests based on the frequency spectrum after accounting for the population growth in the parental African population. Furthermore, when simulating positive selection, Tajima's D, Fu and Li's D, and haplotype homozygosity have lower power to detect population-specific selection using individuals sampled from the admixed population than from the nonadmixed population. Fay and Wu's H test, however, has more power to detect selection using individuals from the admixed population than from the nonadmixed population, especially when the selective sweep ended long ago. Our results have implications for interpreting recent genome-wide scans for positive selection in human populations.

...read moreread less

Posted Content•

Bayesian hidden Markov model analysis of single-molecule force spectroscopy: Characterizing kinetics under measurement uncertainty

[...]

John D. Chodera, Phillip Elms, Frank Noé, Bettina G. Keller, Christian M. Kaiser, Aaron Ewall-Wice, Susan Marqusee, Carlos Bustamante, Nina Singhal Hinrichs - Show less +5 more

06 Aug 2011-arXiv: Statistical Mechanics

TL;DR: The utility of a Bayesian extension that allows the experimental uncertainties to be directly quantified and build in detailed balance to reduce uncertainty through physical constraints is illustrated incharacterizing the three-state kinetic behavior of an RNA hairpin in a stationary optical trap.

...read moreread less

Abstract: Departments of Statistics and Computer Science, University of Chicago, IL 60637, USA(Dated: August 9, 2011)Single-molecule force spectroscopy has proven to be a powerful tool for studying the kinetic be-havior of biomolecules. Through application of an external force, conformational states with smallor transient populations can be stabilized, allowing them to be characterized and the statistics of in-dividual trajectories studied to provide insight into biomolecular folding and function. Because theobserved quantity (force or extension) is not necessarily an ideal reaction coordinate, individual ob-servations cannot be uniquely associated with kinetically distinct conformations. While maximum-likelihood schemes such as hidden Markov models have solved this problem for other classes ofsingle-molecule experiments by using temporal information to aid in the inference of a sequence ofdistinct conformational states, these methods do not give a clear picture of how precisely the modelparameters are determined by the data due to instrument noise and ﬁnite-sample statistics, both sig-niﬁcant problems in force spectroscopy. We solve this problem through a Bayesian extension thatallows the experimental uncertainties to be directly quantiﬁed, and build in detailed balance to fur-ther reduce uncertainty through physical constraints. We illustrate the utility of this approach incharacterizing the three-state kinetic behavior of an RNA hairpin in a stationary optical trap.

...read moreread less

Journal Article•DOI•

Reply to Ge and Sang: A single origin of domesticated rice

[...]

New York University¹, Stanford University², Washington University in St. Louis³, Purdue University⁴

27 Sep 2011-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: Ge and Sang suggest that the analyses of domesticated Asian rice had a single origin were flawed and that the origin of rice remains an open question.

...read moreread less

Abstract: In a recent study using two different methods of analysis and two different datasets, we concluded that domesticated Asian rice had a single origin (1). Ge and Sang (2) suggest that our analyses were flawed and that the origin of rice remains an open question.

...read moreread less

Book Chapter•DOI•

GENOME-WIDE ASSOCIATION MAPPING AND RARE ALLELES: FROM POPULATION GENOMICS TO PERSONALIZED MEDICINE - Session Introduction.

[...]

DE La Vega Fm¹, Carlos Bustamante², Suzanne M. Leal³•Institutions (3)

Life Technologies¹, Stanford University², Baylor College of Medicine³

01 Jan 2011

TL;DR: A number of statistical methods for testing associations between rare variants in two genes to obesity are described, and algorithmic strategies for haplotype phasing by multi-assembly of shared haplotypes from next-generation sequencing data are formulated.

...read moreread less

Abstract: Genome-wide associations studies (GWAS) have been very successful in identifying common genetic variation associated to numerous complex diseases [1]. However, most of the identified common genetic variants appear to confer modest risk and few causal alleles have been identified [2]. Furthermore, these associations account for a small portion of the total heritability of inherited disease variation [1]. This has led to the reexamination of the contribution of environment, gene-gene and gene-environment interactions, and rare genetic variants in complex diseases [1, 3, 4]. There is strong evidence that rare variants play an important role in complex disease etiology and may have larger genetic effects than common variants [2]. Currently, much of what we know regarding the contribution of rare genetic variants to disease risk is based on a limited number of phenotypes and candidate genes. However, rapid advancement of second generation sequencing technologies will invariably lead to widespread association studies comparing whole exome and eventually whole genome sequencing of cases and controls. A tremendous challenge for enabling these "next generation" medical genomic studies is developing statistical approaches for correlating rare genetic variants with disease outcome. The analysis of rare variants is challenging since methods used for common variants are woefully underpowered. Therefore, methods that can deal with genetic heterogeneity at the trait-associated locus have been developed to analyze rare variants. These methods instead analyzing individual variants analyze variants within a region/gene as a group and usually rely on collapsing. They can be applied to both in cases vs. controls and quantitative trait studies are needed. The paper of Bansal et al. in this volume describes the application of a number of statistical methods for testing associations between rare variants in two genes to obesity. The authors considered the relative merits of the different methods as well as important implementation details, such as the leveraging of genomic annotations and determining p-values. Knowledge of haplotypes can increase the power of GWAS studies and also highlight associations that are impossible to detect without haplotype phase (e.g. loss of heterozygosity). Even more complicated phase-dependent interactions of variants in linkage equilibrium have also been suggested as possible causes of missing heritability. In their work, Hallsorsson et al. formulate algorithmic strategies for haplotype phasing by multi-assembly of shared haplotypes from next-generation sequencing data. These methods would allow testing haplotypes harboring rare variants for association and potentially increase their explanatory power. Since single SNP tests are often underpowered in rare variant association analysis, Zeggini and Asimit propose a locus-based method that has high power in the presence of rare variants and that incorporate base quality scores available for sequencing data. Their results suggest that this multi-marker approach may be best suited for smaller regions, or after some filtering to reduce the number of SNPs that are jointly tested to reduce loss of power due to multiple-testing adjustments. Finally, the paper of Zhou et al., presents a penalized regression framework for association testing on sequence data, in the presence of both common and rare variants. This method also introduces the use of weights to incorporate available biological information on the variants. Although these tactics improve both false positive and false negative rates, they represent an incremental development and there is still significant room for improvement. With the development of sequencing technologies and methods to detect complex trait rare variant associations many new and exciting discovery are imminent. The analysis of rare variants is still in its infancy and the next few years promises to produce many new methods to meet the special demands of analyzing this type of data. Note from Publisher: This article contains the abstract and references.

...read moreread less

Proceedings Article•

Genome-wide association mapping and rare alleles: from population genomics to personalized medicine

[...]

Francisco M. De La Vega¹, Carlos Bustamante², Suzanne M. Leal³•Institutions (3)

Life Technologies¹, Stanford University², Baylor College of Medicine³

01 Jan 2011

TL;DR: A number of statistical methods for testing associations between rare variants in two genes to obesity are described, including a locus-based method that has high power in the presence of rare variants and that incorporate base quality scores available for sequencing data.

...read moreread less

Abstract: Genome-wide associations studies (GWAS) have been very successful in identifying common genetic variation associated to numerous complex diseases [1]. However, most of the identified common genetic variants appear to confer modest risk and few causal alleles have been identified [2]. Furthermore, these associations account for a small portion of the total heritability of inherited disease variation [1]. This has led to the reexamination of the contribution of environment, gene-gene and gene-environment interactions, and rare genetic variants in complex diseases [1, 3, 4]. There is strong evidence that rare variants play an important role in complex disease etiology and may have larger genetic effects than common variants [2]. Currently, much of what we know regarding the contribution of rare genetic variants to disease risk is based on a limited number of phenotypes and candidate genes. However, rapid advancement of second generation sequencing technologies will invariably lead to widespread association studies comparing whole exome and eventually whole genome sequencing of cases and controls. A tremendous challenge for enabling these “next generation” medical genomic studies is developing statistical approaches for correlating rare genetic variants with disease outcome. The analysis of rare variants is challenging since methods used for common variants are woefully underpowered. Therefore, methods that can deal with genetic heterogeneity at the trait-associated locus have been developed to analyze rare variants. These methods instead analyzing individual variants analyze variants within a region/gene as a group and usually rely on collapsing. They can be applied to both in cases vs. controls and quantitative trait studies are needed. The paper of Bansal et al. in this volume describes the application of a number of statistical methods for testing associations between rare variants in two genes to obesity. The authors considered the relative merits of the different methods as well as important implementation details, such as the leveraging of genomic annotations and determining p-values. Knowledge of haplotypes can increase the power of GWAS studies and also highlight associations that are impossible to detect without haplotype phase (e.g. loss of heterozygosity). Even more complicated phase-dependent interactions of variants in linkage equilibrium have also been suggested as possible causes of missing heritability. In their work, Hallsorsson et al. formulate algorithmic strategies for haplotype phasing by multi-assembly of shared haplotypes from next-generation sequencing data. These methods would allow testing haplotypes harboring rare variants for association and potentially increase their explanatory power. Since single SNP tests are often underpowered in rare variant association analysis, Zeggini and Asimit propose a locus-based method that has high power in the presence of rare variants and that incorporate base quality scores available for sequencing data. Their results suggest that this multi-marker approach may be best suited for smaller regions, or after some filtering to reduce the number of SNPs that are jointly tested to reduce loss of power due to multiple-testing adjustments. Finally, the paper of Zhou et al., presents a penalized regression framework for association testing on sequence data, in the presence of both common and rare variants. This method also introduces the use of weights to incorporate available biological information on the variants. Although these tactics improve both false positive and false negative rates, they represent an incremental development and there is still significant room for improvement. With the development of sequencing technologies and methods to detect complex trait rare variant associations many new and exciting discovery are imminent. The analysis of rare variants is still in its infancy and the next few years promises to produce many new methods to meet the special demands of analyzing this type of data.

...read moreread less

Posted Content•

A robust approach to estimating rates from time-correlation functions

[...]

John D. Chodera, Phillip Elms, William C. Swope, Jan-Hendrik Prinz, Susan Marqusee, Carlos Bustamante, Frank Noé, Vijay S. Pande - Show less +4 more

10 Aug 2011-arXiv: Statistical Mechanics

TL;DR: A modified version of this theory which does not require numerical derivatives, allowing rate constants to be robustly estimated from the time-correlation function directly directly is presented.

...read moreread less

Abstract: While seemingly straightforward in principle, the reliable estimation of rate constants is seldom easy in practice. Numerous issues, such as the complication of poor reaction coordinates, cause obvious approaches to yield unreliable estimates. When a reliable order parameter is available, the reactive flux theory of Chandler allows the rate constant to be extracted from the plateau region of an appropriate reactive flux correlation function. However, when applied to real data from single- molecule experiments or molecular dynamics simulations, the reactive flux correlation function requires the numerical differentiation of a noisy empirical correlation function, which can result in an unacceptably poor estimate of the rate and pathological dependence on the sampling interval. We present a modified version of this theory which does not require numerical derivatives, allowing rate constants to be robustly estimated from the time-correlation function directly. We illustrate the approach using single-molecule passive force spectroscopy measurements of an RNA hairpin.

...read moreread less

Journal Article•DOI•

Levels and Patterns of Nucleotide Variation in Domestication QTL Regions on Rice Chromosome 3 Suggest Lineage-Specific Selection

[...]

Xianfa Xie¹, Jeanmaire Molina¹, Ryan D. Hernandez², Andrew R. Reynolds³, Adam R. Boyko⁴, Carlos Bustamante⁴, Michael D. Purugganan¹ - Show less +3 more•Institutions (4)

New York University¹, University of Chicago², Cornell University³, Stanford University⁴

06 Jun 2011-PLOS ONE

TL;DR: There are differences in the genetic and selective basis for domestication between these two Asian rice varietal groups, particularly in tropical japonica and indica rice.

...read moreread less

Abstract: Oryza sativa or Asian cultivated rice is one of the major cereal grass species domesticated for human food use during the Neolithic. Domestication of this species from the wild grass Oryza rufipogon was accompanied by changes in several traits, including seed shattering, percent seed set, tillering, grain weight, and flowering time. Quantitative trait locus (QTL) mapping has identified three genomic regions in chromosome 3 that appear to be associated with these traits. We would like to study whether these regions show signatures of selection and whether the same genetic basis underlies the domestication of different rice varieties. Fragments of 88 genes spanning these three genomic regions were sequenced from multiple accessions of two major varietal groups in O. sativa--indica and tropical japonica--as well as the ancestral wild rice species O. rufipogon. In tropical japonica, the levels of nucleotide variation in these three QTL regions are significantly lower compared to genome-wide levels, and coalescent simulations based on a complex demographic model of rice domestication indicate that these patterns are consistent with selection. In contrast, there is no significant reduction in nucleotide diversity in the homologous regions in indica rice. These results suggest that there are differences in the genetic and selective basis for domestication between these two Asian rice varietal groups.

...read moreread less