Showing papers on "Genomics published in 2012"

PDF

Open Access

Journal Article•DOI•

An integrated encyclopedia of DNA elements in the human genome

[...]

Principal investigators¹, Nhgri groups², Data production leads³, Lead analysts³•Institutions (3)

Wellcome Trust¹, University of Washington², Pennsylvania State University³

06 Sep 2012-Nature

TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

13,548 citations

SPAdes, a new genome assembly algorithm and its applications to single-cell sequencing ( 7th Annual SFAF Meeting, 2012)

[...]

Glenn Tesler

01 Jun 2012

TL;DR: SPAdes as mentioned in this paper is a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler and on popular assemblers Velvet and SoapDeNovo (for multicell data).

...read moreread less

Abstract: The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online ( http://bioinf.spbau.ru/spades ). It is distributed as open source software.

...read moreread less

10,124 citations

Journal Article•DOI•

An integrated map of genetic variation from 1,092 human genomes

[...]

Gonçalo R. Abecasis¹, Adam Auton², Lisa D. Brooks³, Mark A. DePristo⁴, Richard Durbin⁵, Robert E. Handsaker⁶, Robert E. Handsaker⁴, Hyun Min Kang¹, Gabor T. Marth⁷, Gil McVean⁸ - Show less +6 more•Institutions (8)

University of Michigan¹, Yeshiva University², National Institutes of Health³, Broad Institute⁴, Wellcome Trust Sanger Institute⁵, Harvard University⁶, Boston College⁷, University of Oxford⁸

01 Nov 2012-Nature

TL;DR: It is shown that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways, and that each individual contains hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites.

...read moreread less

Abstract: By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methods to integrate information across several algorithms and diverse data sources, we provide a validated haplotype map of 38 million single nucleotide polymorphisms, 1.4 million short insertions and deletions, and more than 14,000 larger deletions. We show that individuals from different populations carry different profiles of rare and common variants, and that low-frequency variants show substantial geographic differentiation, which is further increased by the action of purifying selection. We show that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways, and that each individual contains hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites. This resource, which captures up to 98% of accessible single nucleotide polymorphisms at a frequency of 1% in related populations, enables analysis of common and low-frequency variants in individuals from diverse, including admixed, populations.

...read moreread less

7,710 citations

Journal Article•DOI•

Landscape of transcription in human cells

[...]

Sarah Djebali, Carrie A. Davis¹, Angelika Merkel, Alexander Dobin¹, Timo Lassmann, Ali Mortazavi², Ali Mortazavi³, Andrea Tanzer, Julien Lagarde, Wei Lin¹, Felix Schlesinger¹, Chenghai Xue¹, Georgi K. Marinov², Jainab Khatun⁴, Brian A. Williams², Chris Zaleski¹, Joel Rozowsky⁵, Marion S. Röder, Felix Kokocinski⁶, Rehab F. Abdelhamid, Tyler Alioto, Igor Antoshechkin², Michael T. Baer¹, Nadav Bar⁷, Philippe Batut¹, Kimberly Bell¹, Ian Bell⁸, Sudipto K. Chakrabortty¹, Xian Chen⁹, Jacqueline Chrast¹⁰, Joao Curado, Thomas Derrien, Jorg Drenkow¹, Erica Dumais⁸, Jacqueline Dumais⁸, Radha Duttagupta⁸, Emilie Falconnet¹¹, Meagan Fastuca¹, Kata Fejes-Toth¹, Pedro G. Ferreira, Sylvain Foissac⁸, Melissa J. Fullwood¹², Hui Gao⁸, David Gonzalez, Assaf Gordon¹, Harsha P. Gunawardena⁹, Cédric Howald¹⁰, Sonali Jha¹, Rory Johnson, Philipp Kapranov⁸, Brandon King², Colin Kingswood, Oscar Junhong Luo¹², Eddie Park³, Kimberly Persaud¹, Jonathan B. Preall¹, Paolo Ribeca, Brian A. Risk⁴, Daniel Robyr¹¹, Michael Sammeth, Lorian Schaffer², Lei-Hoon See¹, Atif Shahab¹², Jørgen Skancke⁷, Ana Maria Suzuki, Hazuki Takahashi, Hagen Tilgner¹³, Diane Trout², Nathalie Walters¹⁰, Huaien Wang¹, John A. Wrobel⁴, Yanbao Yu⁹, Xiaoan Ruan¹², Yoshihide Hayashizaki, Jennifer Harrow⁶, Mark Gerstein⁵, Tim Hubbard⁶, Alexandre Reymond¹⁰, Stylianos E. Antonarakis¹¹, Gregory J. Hannon¹, Morgan C. Giddings⁴, Morgan C. Giddings⁹, Yijun Ruan¹², Barbara J. Wold², Piero Carninci, Roderic Guigó¹⁴, Thomas R. Gingeras⁸, Thomas R. Gingeras¹ - Show less +84 more•Institutions (14)

Cold Spring Harbor Laboratory¹, California Institute of Technology², University of California, Irvine³, Florida State University College of Arts and Sciences⁴, Yale University⁵, Wellcome Trust Sanger Institute⁶, Norwegian University of Science and Technology⁷, Affymetrix⁸, University of North Carolina at Chapel Hill⁹, University of Lausanne¹⁰, University of Geneva¹¹, Genome Institute of Singapore¹², Stanford University¹³, Pompeu Fabra University¹⁴

06 Sep 2012-Nature

TL;DR: Evidence that three-quarters of the human genome is capable of being transcribed is reported, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs that prompt a redefinition of the concept of a gene.

...read moreread less

Abstract: Eukaryotic cells make many types of primary and processed RNAs that are found either in specific subcellular compartments or throughout the cells. A complete catalogue of these RNAs is not yet available and their characteristic subcellular localizations are also poorly understood. Because RNA represents the direct output of the genetic information encoded by genomes and a significant proportion of a cell's regulatory capabilities are focused on its synthesis, processing, transport, modification and translation, the generation of such a catalogue is crucial for understanding genome function. Here we report evidence that three-quarters of the human genome is capable of being transcribed, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs. These observations, taken together, prompt a redefinition of the concept of a gene.

...read moreread less

4,450 citations

Journal Article•DOI•

Phytozome: a comparative platform for green plant genomics

[...]

David Goodstein¹, Shengqiang Shu¹, Russell Howson¹, Rochak Neupane¹, Richard D. Hayes¹, Joni Fazo¹, Therese Mitros¹, William Dirks¹, Uffe Hellsten¹, Nicholas H. Putnam¹, Daniel S. Rokhsar¹ - Show less +7 more•Institutions (1)

United States Department of Energy¹

01 Jan 2012-Nucleic Acids Research

TL;DR: Phytozome provides a view of the evolutionary history of every plant gene at the level of sequence, gene structure, gene family and genome organization, while at the same time providing access to the sequences and functional annotations of a growing number of complete plant genomes.

...read moreread less

Abstract: The number of sequenced plant genomes and associated genomic resources is growing rapidly with the advent of both an increased focus on plant genomics from funding agencies, and the application of inexpensive next generation sequencing. To interact with this increasing body of data, we have developed Phytozome (http://www.phytozome.net), a comparative hub for plant genome and gene family data and analysis. Phytozome provides a view of the evolutionary history of every plant gene at the level of sequence, gene structure, gene family and genome organization, while at the same time providing access to the sequences and functional annotations of a growing number (currently 25) of complete plant genomes, including all the land plants and selected algae sequenced at the Joint Genome Institute, as well as selected species sequenced elsewhere. Through a comprehensive plant genome database and web portal, these data and analyses are available to the broader plant science research community, providing powerful comparative genomics tools that help to link model systems with other plants of economic and ecological importance.

...read moreread less

3,728 citations

Journal Article•DOI•

HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants

[...]

Lucas D. Ward¹, Manolis Kellis¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 2012-Nucleic Acids Research

TL;DR: HaploReg is presented, a tool for exploring annotations of the non-coding genome among the results of published GWAS or novel sets of variants, and will be useful to researchers developing mechanistic hypotheses of the impact of non-Coding variants on clinical phenotypes and normal variation.

...read moreread less

Abstract: The resolution of genome-wide association studies (GWAS) is limited by the linkage disequilibrium (LD) structure of the population being studied. Selecting the most likely causal variants within an LD block is relatively straightforward within coding sequence, but is more difficult when all variants are intergenic. Predicting functional non-coding sequence has been recently facilitated by the availability of conservation and epigenomic information. We present HaploReg, a tool for exploring annotations of the non-coding genome among the results of published GWAS or novel sets of variants. Using LD information from the 1000 Genomes Project, linked SNPs and small indels can be visualized along with their predicted chromatin state in nine cell types, conservation across mammals and their effect on regulatory motifs. Sets of SNPs, such as those resulting from GWAS, are analyzed for an enrichment of cell type-specific enhancers. HaploReg will be useful to researchers developing mechanistic hypotheses of the impact of non-coding variants on clinical phenotypes and normal variation. The HaploReg database is available at http://compbio.mit.edu/HaploReg.

...read moreread less

2,075 citations

Journal Article•DOI•

The oyster genome reveals stress adaptation and complexity of shell formation

[...]

Guofan Zhang¹, Xiaodong Fang, Ximing Guo², Li Li, Ruibang Luo, Fei Xu, Pengcheng Yang, Linlin Zhang, Xiaotong Wang, Haigang Qi, Zhiqiang Xiong, Huayong Que, Yinlong Xie, Peter W. H. Holland³, Jordi Paps³, Yabing Zhu, Fucun Wu, Yuanxin Chen, Jiafeng Wang, Chunfang Peng, Jie Meng, Lan Yang, Jun Liu, Bo Wen, Na Zhang, Zhiyong Huang, Qihui Zhu, Yue Feng, Andrew S. Mount⁴, Dennis Hedgecock⁵, Zhe Xu⁶, Yunjie Liu, Tomislav Domazet-Lošo, Yishuai Du, Xiaoqing Sun, Shoudu Zhang, Binghang Liu, Peizhou Cheng, Xuanting Jiang, Juan Li, Dingding Fan, Wei Wang, Wenjing Fu, Tong Wang, Bo Wang, Jibiao Zhang, Zhiyu Peng, Yingxiang Li, Na Li, Jinpeng Wang, Maoshan Chen, Yan He², Fengji Tan, Xiaorui Song, Qiumei Zheng, Ronglian Huang, Hailong Yang, Du Xuedi, Li Chen, Mei Yang, Patrick M. Gaffney⁷, Shan Wang², Longhai Luo, Zhicai She, Yao Ming, Huang Wen, Shu Zhang, Baoyu Huang, Yong Zhang, Tao Qu, Peixiang Ni, Guoying Miao, Junyi Wang, Qiang Wang, Christian E. W. Steinberg⁸, Haiyan Wang, Ning Li, Lumin Qian², Guojie Zhang, Yingrui Li, Huanming Yang, Xiao Liu, Jian Wang, Ye Yin, Jun Wang⁹ - Show less +81 more•Institutions (9)

Chinese Academy of Sciences¹, Rutgers University², University of Oxford³, Clemson University⁴, University of Southern California⁵, Atlantic Cape Community College⁶, University of Delaware⁷, Humboldt University of Berlin⁸, University of Copenhagen⁹

04 Oct 2012-Nature

TL;DR: The sequencing and assembly of the oyster genome using short reads and a fosmid-pooling strategy and transcriptomes of development and stress response and the proteome of the shell are reported, showing that shell formation in molluscs is more complex than currently understood and involves extensive participation of cells and their exosomes.

...read moreread less

Abstract: The Pacific oyster Crassostrea gigas belongs to one of the most species-rich but genomically poorly explored phyla, the Mollusca. Here we report the sequencing and assembly of the oyster genome using short reads and a fosmid-pooling strategy, along with transcriptomes of development and stress response and the proteome of the shell. The oyster genome is highly polymorphic and rich in repetitive sequences, with some transposable elements still actively shaping variation. Transcriptome studies reveal an extensive set of genes responding to environmental stress. The expansion of genes coding for heat shock protein 70 and inhibitors of apoptosis is probably central to the oyster's adaptation to sessile life in the highly stressful intertidal zone. Our analyses also show that shell formation in molluscs is more complex than currently understood and involves extensive participation of cells and their exosomes. The oyster genome sequence fills a void in our understanding of the Lophotrochozoa.

...read moreread less

1,806 citations

Journal Article•DOI•

The Drosophila melanogaster Genetic Reference Panel

[...]

Trudy F. C. Mackay¹, Stephen Richards², Eric A. Stone¹, Antonio Barbadilla, Julien F. Ayroles³, Julien F. Ayroles¹, Dianhui Zhu², Sònia Casillas, Yi Han², Michael M. Magwire¹, Julie M. Cridland⁴, Mark F. Richardson⁵, Robert R. H. Anholt¹, Maite G. Barrón, Crystal Bess², Kerstin P. Blankenburg², Mary Anna Carbone¹, David Castellano, Lesley S. Chaboub², Laura H Duncan¹, Zeke Harris¹, Mehwish Javaid², Joy Jayaseelan², Shalini N. Jhangiani², Katherine W. Jordan¹, Fremiet Lara², Faye Lawrence¹, Sandra L. Lee², Pablo Librado⁶, Raquel S. Linheiro⁵, Richard F. Lyman¹, Aaron J. Mackey⁷, Mala Munidasa², Donna M. Muzny², Lynne V. Nazareth², Irene Newsham, Lora Perales², Ling-Ling Pu², Carson Qu², Miquel Ràmia, Jeffrey G. Reid², Stephanie M. Rollmann⁸, Stephanie M. Rollmann¹, Julio Rozas⁶, Nehad Saada², Lavanya Turlapati¹, Kim C. Worley², Yuanqing Wu², Akihiko Yamamoto¹, Yiming Zhu², Casey M. Bergman⁵, Kevin R. Thornton⁴, David Mittelman⁹, Richard A. Gibbs² - Show less +50 more•Institutions (9)

North Carolina State University¹, Baylor College of Medicine², Harvard University³, University of California, Irvine⁴, University of Manchester⁵, University of Barcelona⁶, University of Virginia⁷, University of Cincinnati⁸, Virginia Bioinformatics Institute⁹

09 Feb 2012-Nature

TL;DR: The Drosophila melanogaster Genetic Reference Panel is described, a community resource for analysis of population genomics and quantitative traits, which reveals reduced polymorphism in centromeric autosomal regions and the X chromosomes, evidence for positive and negative selection, and rapid evolution of the X chromosome.

...read moreread less

Abstract: A major challenge of biology is understanding the relationship between molecular genetic variation and variation in quantitative traits, including fitness. This relationship determines our ability to predict phenotypes from genotypes and to understand how evolutionary forces shape variation within and between species. Previous efforts to dissect the genotype-phenotype map were based on incomplete genotypic information. Here, we describe the Drosophila melanogaster Genetic Reference Panel (DGRP), a community resource for analysis of population genomics and quantitative traits. The DGRP consists of fully sequenced inbred lines derived from a natural population. Population genomic analyses reveal reduced polymorphism in centromeric autosomal regions and the X chromosome, evidence for positive and negative selection, and rapid evolution of the X chromosome. Many variants in novel genes, most at low frequency, are associated with quantitative traits and explain a large fraction of the phenotypic variance. The DGRP facilitates genotype-phenotype mapping using the power of Drosophila genetics.

...read moreread less

1,568 citations

Journal Article•DOI•

Development of High-Density Genetic Maps for Barley and Wheat Using a Novel Two-Enzyme Genotyping-by-Sequencing Approach

[...]

Jesse Poland¹, Jesse Poland², Patrick J. Brown³, Mark E. Sorrells⁴, Jean-Luc Jannink¹, Jean-Luc Jannink⁴ - Show less +2 more•Institutions (4)

United States Department of Agriculture¹, Kansas State University², University of Illinois at Urbana–Champaign³, Cornell University⁴

28 Feb 2012-PLOS ONE

TL;DR: The GBS approach presented here provides a powerful method of developing high-density markers in species without a sequenced genome while providing valuable tools for anchoring and ordering physical maps and whole-genome shotgun sequence.

...read moreread less

Abstract: Advancements in next-generation sequencing technology have enabled whole genome re-sequencing in many species providing unprecedented discovery and characterization of molecular polymorphisms. There are limitations, however, to next-generation sequencing approaches for species with large complex genomes such as barley and wheat. Genotyping-by-sequencing (GBS) has been developed as a tool for association studies and genomics-assisted breeding in a range of species including those with complex genomes. GBS uses restriction enzymes for targeted complexity reduction followed by multiplex sequencing to produce high-quality polymorphism data at a relatively low per sample cost. Here we present a GBS approach for species that currently lack a reference genome sequence. We developed a novel two-enzyme GBS protocol and genotyped bi-parental barley and wheat populations to develop a genetically anchored reference map of identified SNPs and tags. We were able to map over 34,000 SNPs and 240,000 tags onto the Oregon Wolfe Barley reference map, and 20,000 SNPs and 367,000 tags on the Synthetic W9784 × Opata85 (SynOpDH) wheat reference map. To further evaluate GBS in wheat, we also constructed a de novo genetic map using only SNP markers from the GBS data. The GBS approach presented here provides a powerful method of developing high-density markers in species without a sequenced genome while providing valuable tools for anchoring and ordering physical maps and whole-genome shotgun sequence. Development of the sequenced reference genome(s) will in turn increase the utility of GBS data enabling physical mapping of genes and haplotype imputation of missing data. Finally, as a result of low per-sample costs, GBS will have broad application in genomics-assisted plant breeding programs.

...read moreread less

1,492 citations

Journal Article•DOI•

methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles

[...]

Altuna Akalin¹, Matthias Kormaksson¹, Sheng Li¹, Francine E. Garrett-Bakelman¹, Maria E. Figueroa², Ari Melnick¹, Christopher E. Mason¹ - Show less +3 more•Institutions (2)

Cornell University¹, University of Michigan²

03 Oct 2012-Genome Biology

TL;DR: An R package that rapidly analyzes genome-wide cytosine epigenetic profiles from high-throughput methylation and Hydroxymethylation sequencing experiments is described, which includes functions for clustering, sample quality visualization, differential methylation analysis and annotation features, thus automating and simplifying many of the steps for discerning statistically significant bases or regions of DNAmethylation.

...read moreread less

Abstract: DNA methylation is a chemical modification of cytosine bases that is pivotal for gene regulation, cellular specification and cancer development. Here, we describe an R package, methylKit, that rapidly analyzes genome-wide cytosine epigenetic profiles from high-throughput methylation and hydroxymethylation sequencing experiments. methylKit includes functions for clustering, sample quality visualization, differential methylation analysis and annotation features, thus automating and simplifying many of the steps for discerning statistically significant bases or regions of DNA methylation. Finally, we demonstrate methylKit on breast cancer data, in which we find statistically significant regions of differential methylation and stratify tumor subtypes. methylKit is available at http://code.google.com/p/methylkit.

...read moreread less

1,395 citations

Journal Article•DOI•

A map of the cis-regulatory sequences in the mouse genome

[...]

Yin Shen¹, Feng Yue¹, David F. McCleary¹, Zhen Ye¹, Lee Edsall¹, Samantha Kuan¹, Ulrich Wagner¹, Jesse R. Dixon¹, Jesse R. Dixon², Leonard Lee¹, Victor V. Lobanenkov³, Bing Ren², Bing Ren¹ - Show less +9 more•Institutions (3)

Ludwig Institute for Cancer Research¹, University of California, San Diego², National Institutes of Health³

02 Aug 2012-Nature

TL;DR: It is shown that much of the mouse genome is organized into domains of coordinately regulated enhancers and promoters, which provides a resource for the annotation of functional elements in the mammalian genome and for the study of mechanisms regulating tissue-specific gene expression.

...read moreread less

Abstract: A genomic map of nearly 300,000 potential cis-regulatory sequences determined from diverse mouse tissues and cell types reveals active promoters, enhancers and CCCTC-binding factor sites encompassing 11% of the mouse genome and significantly expands annotation of mammalian regulatory sequences. The identification of cis-regulatory sequences in the mouse genome has lagged behind that of other model organisms. Here, a genomic map of nearly 300,000 potential cis-regulatory sequences has been experimentally determined from diverse mouse tissues and cell types. The map reveals active promoters, enhancers and CTCF (CCCTC-binding factor) sites in nearly 11% of the mouse genome and significantly expands the annotation of mammalian regulatory sequences. The laboratory mouse is the most widely used mammalian model organism in biomedical research. The 2.6 × 109 bases of the mouse genome possess a high degree of conservation with the human genome1, so a thorough annotation of the mouse genome will be of significant value to understanding the function of the human genome. So far, most of the functional sequences in the mouse genome have yet to be found, and the cis-regulatory sequences in particular are still poorly annotated. Comparative genomics has been a powerful tool for the discovery of these sequences2, but on its own it cannot resolve their temporal and spatial functions. Recently, ChIP-Seq has been developed to identify cis-regulatory elements in the genomes of several organisms including humans, Drosophila melanogaster and Caenorhabditis elegans3,4,5. Here we apply the same experimental approach to a diverse set of 19 tissues and cell types in the mouse to produce a map of nearly 300,000 murine cis-regulatory sequences. The annotated sequences add up to 11% of the mouse genome, and include more than 70% of conserved non-coding sequences. We define tissue-specific enhancers and identify potential transcription factors regulating gene expression in each tissue or cell type. Finally, we show that much of the mouse genome is organized into domains of coordinately regulated enhancers and promoters. Our results provide a resource for the annotation of functional elements in the mammalian genome and for the study of mechanisms regulating tissue-specific gene expression.

...read moreread less

Journal Article•DOI•

A physical, genetic and functional sequence assembly of the barley genome

[...]

Klaus F. X. Mayer, Robbie Waugh¹, Peter Langridge², Timothy J. Close³, Roger P. Wise⁴, Andreas Graner⁵, Takashi Matsumoto⁶, Kazuhiro Sato⁷, Alan H. Schulman⁸, Ruvini Ariyadasa⁵, Daniela Schulte⁵, Naser Poursarebani⁵, Ruonan Zhou⁵, Burkhard Steuernagel⁵, Martin Mascher⁵, Uwe Scholz⁵, Bu-Jun Shi², Kavitha Madishetty³, Jan T. Svensson³, Prasanna R. Bhat³, Matthew J. Moscou³, Josh Resnik³, Gary J. Muehlbauer, Pete E. Hedley¹, Hui Liu¹, Jenny Morris¹, Zeev Frenkel⁹, Avraham Korol⁹, Hélène Bergès¹⁰, Stefan Taudien¹¹, Marius Felder¹¹, Marco Groth¹¹, Matthias Platzer¹¹, Axel Himmelbach⁵, Stefano Lonardi³, Denisa Duma³, Matthew Alpert³, Francesa Cordero¹², Francesa Cordero³, Marco Beccuti³, Gianfranco Ciardo³, Yaqin Ma³, Steve Wanamaker³, Federica Cattonaro, Vera Vendramin¹³, Simone Scalabrin, Slobodanka Radovic¹³, Rod A. Wing¹⁴, Michele Morgante¹³, Thomas Nussbaumer, Heidrun Gundlach, Mihaela Martis, Jesse Poland¹⁵, Matthias Pfeifer, Cédric Moisy⁸, Jaakko Tanskanen⁸, Andrea Zuccolo, Manuel Spannagl, Joanne Russell¹, Arnis Druka¹, David Marshall¹, Micha Bayer¹, David Swarbreck, Dharanya Sampath, Sarah Ayling, Melanie Febrer, Mario Caccamo, Tsuyoshi Tanaka⁶, Steve Wannamaker³, Thomas Schmutzer⁵, John W. S. Brown¹, John W. S. Brown¹⁶, Geoffrey B. Fincher², Nils Stein⁵ - Show less +70 more•Institutions (16)

James Hutton Institute¹, University of Adelaide², University of California, Riverside³, Iowa State University⁴, Leibniz Association⁵, University of Tsukuba⁶, Okayama University⁷, University of Helsinki⁸, University of Haifa⁹, Institut national de la recherche agronomique¹⁰, National Institutes of Health¹¹, University of Turin¹², University of Udine¹³, University of Arizona¹⁴, Kansas State University¹⁵, University of Dundee¹⁶

29 Nov 2012-Nature

TL;DR: An integrated and ordered physical, genetic and functional sequence resource that describes the barley gene-space in a structured whole-genome context and suggests that post-transcriptional processing forms an important regulatory layer.

...read moreread less

Abstract: Barley (Hordeum vulgare L.) is among the world's earliest domesticated and most important crop plants. It is diploid with a large haploid genome of 5.1 gigabases (Gb). Here we present an integrated and ordered physical, genetic and functional sequence resource that describes the barley gene-space in a structured whole-genome context. We developed a physical map of 4.98 Gb, with more than 3.90 Gb anchored to a high-resolution genetic map. Projecting a deep whole-genome shotgun assembly, complementary DNA and deep RNA sequence data onto this framework supports 79,379 transcript clusters, including 26,159 'high-confidence' genes with homology support from other plant genomes. Abundant alternative splicing, premature termination codons and novel transcriptionally active regions suggest that post-transcriptional processing forms an important regulatory layer. Survey sequences from diverse accessions reveal a landscape of extensive single-nucleotide variation. Our data provide a platform for both genome-assisted research and enabling contemporary crop improvement.

...read moreread less

Journal Article•DOI•

LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets

[...]

Andreas Wilm¹, Pauline Poh Kim Aw¹, Denis Bertrand¹, Grace Hui Ting Yeo¹, Swee Hoe Ong¹, Chang Hua Wong¹, Chiea Chuen Khor¹, Rosemary Petric¹, Martin L. Hibberd¹, Niranjan Nagarajan¹ - Show less +6 more•Institutions (1)

Genome Institute of Singapore¹

01 Dec 2012-Nucleic Acids Research

TL;DR: It is shown that LoFreq has near-perfect specificity, with significantly improved sensitivity compared with existing methods and can efficiently analyze deep Illumina sequencing datasets without resorting to approximations or heuristics.

...read moreread less

Abstract: The study of cell-population heterogeneity in a range of biological systems, from viruses to bacterial isolates to tumor samples, has been transformed by recent advances in sequencing throughput. While the high-coverage afforded can be used, in principle, to identify very rare variants in a population, existing ad hoc approaches frequently fail to distinguish true variants from sequencing errors. We report a method (LoFreq) that models sequencing run-specific error rates to accurately call variants occurring in <0.05% of a population. Using simulated and real datasets (viral, bacterial and human), we show that LoFreq has near-perfect specificity, with significantly improved sensitivity compared with existing methods and can efficiently analyze deep Illumina sequencing datasets without resorting to approximations or heuristics. We also present experimental validation for LoFreq on two different platforms (Fluidigm and Sequenom) and its application to call rare somatic variants from exome sequencing datasets for gastric cancer. Source code and executables for LoFreq are freely available at http://sourceforge.net/projects/lofreq/.

...read moreread less

Journal Article•DOI•

Analysis of the bread wheat genome using whole-genome shotgun sequencing

[...]

Rachel Brenchley¹, Manuel Spannagl, Matthias Pfeifer, Gary L A Barker², Rosalinda D’Amore¹, Alexandra M. Allen², Neil McKenzie³, Melissa Kramer⁴, Arnaud Kerhornou, Dan Bolser, Suzanne Kay¹, Darren Waite³, Martin Trick³, Ian Bancroft³, Y. Q. Gu⁵, Naxin Huo⁵, Ming-Cheng Luo⁶, Sunish K. Sehgal⁷, Bikram S. Gill⁷, S. F. Kianian, Olin D. Anderson⁵, Paul J. Kersey, Jan Dvorak⁶, W. Richard McCombie⁴, Anthony Hall¹, Klaus F. X. Mayer, Keith J. Edwards², Michael W. Bevan³, Neil Hall¹ - Show less +25 more•Institutions (7)

University of Liverpool¹, University of Bristol², John Innes Centre³, Cold Spring Harbor Laboratory⁴, United States Department of Agriculture⁵, University of California, Davis⁶, Kansas State University⁷

29 Nov 2012-Nature

TL;DR: It is shown that the hexaploid genome is highly dynamic, with significant loss of gene family members on polyploidization and domestication, and an abundance of gene fragments.

...read moreread less

Abstract: Bread wheat (Triticum aestivum) is a globally important crop, accounting for 20 per cent of the calories consumed by humans. Major efforts are underway worldwide to increase wheat production by extending genetic diversity and analysing key traits, and genomic resources can accelerate progress. But so far the very large size and polyploid complexity of the bread wheat genome have been substantial barriers to genome analysis. Here we report the sequencing of its large, 17-gigabase-pair, hexaploid genome using 454 pyrosequencing, and comparison of this with the sequences of diploid ancestral and progenitor genomes. We identified between 94,000 and 96,000 genes, and assigned two-thirds to the three component genomes (A, B and D) of hexaploid wheat. High-resolution synteny maps identified many small disruptions to conserved gene order. We show that the hexaploid genome is highly dynamic, with significant loss of gene family members on polyploidization and domestication, and an abundance of gene fragments. Several classes of genes involved in energy harvesting, metabolism and growth are among expanded gene families that could be associated with crop productivity. Our analyses, coupled with the identification of extensive genetic variation, provide a resource for accelerating gene discovery and improving this major crop.

...read moreread less

Journal Article•DOI•

In vivo genome editing using a high-efficiency TALEN system

[...]

Victoria M. Bedell¹, Ying Wang², Jarryd M. Campbell¹, Tanya L. Poshusta¹, Colby G. Starker³, Randall G. Krug¹, Wenfang Tan³, Sumedha G. Penheiter¹, Alvin C.H. Ma⁴, Alvin C.H. Ma¹, Anskar Y.H. Leung⁴, Scott C. Fahrenkrug³, Daniel F. Carlson³, Daniel F. Voytas³, Karl J. Clark¹, Jeffrey J. Essner², Stephen C. Ekker¹ - Show less +13 more•Institutions (4)

Mayo Clinic¹, Iowa State University², University of Minnesota³, University of Hong Kong⁴

01 Nov 2012-Nature

TL;DR: Improvements in artificial transcription activator-like effector nucleases (TALENs) provide a powerful new approach for targeted zebrafish genome editing and functional genomic applications and offer the potential to model genetic variation as well as to generate targeted conditional alleles.

...read moreread less

Abstract: The zebrafish (Danio rerio) is increasingly being used to study basic vertebrate biology and human disease with a rich array of in vivo genetic and molecular tools. However, the inability to readily modify the genome in a targeted fashion has been a bottleneck in the field. Here we show that improvements in artificial transcription activator-like effector nucleases (TALENs) provide a powerful new approach for targeted zebrafish genome editing and functional genomic applications. Using the GoldyTALEN modified scaffold and zebrafish delivery system, we show that this enhanced TALEN toolkit has a high efficiency in inducing locus-specific DNA breaks in somatic and germline tissues. At some loci, this efficacy approaches 100%, including biallelic conversion in somatic tissues that mimics phenotypes seen using morpholino-based targeted gene knockdowns. With this updated TALEN system, we successfully used single-stranded DNA oligonucleotides to precisely modify sequences at predefined locations in the zebrafish genome through homology-directed repair, including the introduction of a custom-designed EcoRV site and a modified loxP (mloxP) sequence into somatic tissue in vivo. We further show successful germline transmission of both EcoRV and mloxP engineered chromosomes. This combined approach offers the potential to model genetic variation as well as to generate targeted conditional alleles.

...read moreread less

Journal Article•DOI•

GAGE: A critical evaluation of genome assemblies and assembly algorithms

[...]

Steven L. Salzberg¹, Adam M. Phillippy², Aleksey V. Zimin³, Daniela Puiu⁴, Tanja Magoc⁴, Sergey Koren³, Sergey Koren², Todd J. Treangen⁴, Michael C. Schatz⁵, Arthur L. Delcher, Michael Roberts³, Guillaume Marçais³, Mihai Pop³, James A. Yorke³ - Show less +10 more•Institutions (5)

Johns Hopkins University School of Medicine¹, Battelle Memorial Institute², University of Maryland, College Park³, Johns Hopkins University⁴, Cold Spring Harbor Laboratory⁵

01 Mar 2012-Genome Research

TL;DR: Evaluating several of the leading de novo assembly algorithms on four different short-read data sets generated by Illumina sequencers concludes that data quality, rather than the assembler itself, has a dramatic effect on the quality of an assembled genome.

...read moreread less

Abstract: New sequencing technology has dramatically altered the landscape of whole-genome sequencing, allowing scientists to initiate numerous projects to decode the genomes of previously unsequenced organisms. The lowest-cost technology can generate deep coverage of most species, including mammals, in just a few days. The sequence data generated by one of these projects consist of millions or billions of short DNA sequences (reads) that range from 50 to 150 nt in length. These sequences must then be assembled de novo before most genome analyses can begin. Unfortunately, genome assembly remains a very difficult problem, made more difficult by shorter reads and unreliable long-range linking information. In this study, we evaluated several of the leading de novo assembly algorithms on four different short-read data sets, all generated by Illumina sequencers. Our results describe the relative performance of the different assemblers as well as other significant differences in assembly difficulty that appear to be inherent in the genomes themselves. Three overarching conclusions are apparent: first, that data quality, rather than the assembler itself, has a dramatic effect on the quality of an assembled genome; second, that the degree of contiguity of an assembly varies enormously among different assemblers and different genomes; and third, that the correctness of an assembly also varies widely and is not well correlated with statistics on contiguity. To enable others to replicate our results, all of our data and methods are freely available, as are all assemblers used in this study.

...read moreread less

Journal Article•DOI•

The genomics of speciation-with-gene-flow

[...]

Jeffrey L. Feder¹, Scott P. Egan¹, Patrik Nosil², Patrik Nosil³•Institutions (3)

University of Notre Dame¹, University of Colorado Boulder², University of Sheffield³

01 Jul 2012-Trends in Genetics

TL;DR: A theory predicting four phases of speciation, defined by changes in the relative effectiveness of divergence and genome hitchhiking, is described and future directions are outlined, emphasizing the need to couple next-generation sequencing with selection, transplant, functional genomics, and mapping studies.

...read moreread less

Journal Article•DOI•

Insights into hominid evolution from the gorilla genome sequence

[...]

Aylwyn Scally¹, Julien Y. Dutheil², LaDeana W. Hillier³, Gregory E. Jordan⁴, Ian Goodhead¹, Javier Herrero⁴, Asger Hobolth², Tuuli Lappalainen⁵, Thomas Mailund², Tomas Marques-Bonet⁶, Tomas Marques-Bonet³, Tomas Marques-Bonet⁷, Shane A. McCarthy¹, Stephen H. Montgomery⁸, Petra C. Schwalie⁴, Y. Amy Tang¹, Michelle C Ward⁸, Yali Xue¹, Bryndis Yngvadottir¹, Can Alkan³, Lars Nørvang Andersen², Qasim Ayub¹, Edward V. Ball⁹, Kathryn Beal⁴, Brenda J. Bradley⁸, Brenda J. Bradley¹⁰, Yuan Chen¹, Chris Clee¹, Stephen Fitzgerald⁴, Tina Graves¹¹, Yong Gu¹, Paul Heath¹, Andreas Heger¹², Emre Karakoc³, Anja Kolb-Kokocinski¹, Gavin K. Laird¹, Gerton Lunter¹³, Stephen Meader¹², Matthew Mort⁹, James C. Mullikin¹⁴, Kasper Munch², Timothy D. O’Connor⁸, Andrew David Phillips⁹, Javier Prado-Martinez⁷, Anthony Rogers¹, Saba Sajjadian³, Dominic Schmidt⁸, Katy Shaw⁹, Jared T. Simpson¹, Peter D. Stenson⁹, Daniel J. Turner¹, Linda Vigilant¹⁵, Albert J. Vilella⁴, Weldon Whitener¹, Baoli Zhu¹⁶, David Neil Cooper⁹, Pieter J. de Jong¹⁶, Emmanouil T. Dermitzakis⁵, Evan E. Eichler³, Paul Flicek⁴, Nick Goldman⁴, Nicholas I. Mundy⁸, Zemin Ning¹, Duncan T. Odom⁸, Duncan T. Odom¹, Chris P. Ponting¹², Michael A. Quail¹, Oliver A. Ryder, Stephen M. J. Searle¹, Wesley C. Warren¹¹, Richard K. Wilson¹¹, Mikkel H. Schierup², Jane Rogers¹, Chris Tyler-Smith¹, Richard Durbin¹ - Show less +71 more•Institutions (16)

Wellcome Trust Sanger Institute¹, Aarhus University², University of Washington³, European Bioinformatics Institute⁴, University of Geneva⁵, Catalan Institution for Research and Advanced Studies⁶, Spanish National Research Council⁷, University of Cambridge⁸, Cardiff University⁹, Yale University¹⁰, Washington University in St. Louis¹¹, University of Oxford¹², Wellcome Trust Centre for Human Genetics¹³, National Institutes of Health¹⁴, Max Planck Society¹⁵, Children's Hospital Oakland Research Institute¹⁶

08 Mar 2012-Nature

TL;DR: A comparison of protein coding genes reveals approximately 500 genes showing accelerated evolution on each of the gorilla, human and chimpanzee lineages, and evidence for parallel acceleration, particularly of genes involved in hearing.

...read moreread less

Abstract: Gorillas are humans' closest living relatives after chimpanzees, and are of comparable importance for the study of human origins and evolution. Here we present the assembly and analysis of a genome sequence for the western lowland gorilla, and compare the whole genomes of all extant great ape genera. We propose a synthesis of genetic and fossil evidence consistent with placing the human-chimpanzee and human-chimpanzee-gorilla speciation events at approximately 6 and 10 million years ago. In 30% of the genome, gorilla is closer to human or chimpanzee than the latter are to each other; this is rarer around coding genes, indicating pervasive selection throughout great ape evolution, and has functional consequences in gene expression. A comparison of protein coding genes reveals approximately 500 genes showing accelerated evolution on each of the gorilla, human and chimpanzee lineages, and evidence for parallel acceleration, particularly of genes involved in hearing. We also compare the western and eastern gorilla species, estimating an average sequence divergence time 1.75 million years ago, but with evidence for more recent genetic exchange and a population bottleneck in the eastern species. The use of the genome sequence in these and future analyses will promote a deeper understanding of great ape biology and evolution.

...read moreread less

Journal Article•DOI•

Transforming clinical microbiology with bacterial genome sequencing

[...]

Xavier Didelot¹, Rory Bowden², Rory Bowden³, Rory Bowden¹, Daniel J. Wilson³, Daniel J. Wilson², Tim E. A. Peto², Derrick W. Crook² - Show less +4 more•Institutions (3)

University of Oxford¹, John Radcliffe Hospital², Wellcome Trust Centre for Human Genetics³

01 Sep 2012-Nature Reviews Genetics

TL;DR: It is predicted that the application of next-generation sequencing will soon be sufficiently fast, accurate and cheap to be used in routine clinical microbiology practice, where it could replace many complex current techniques with a single, more efficient workflow.

...read moreread less

Abstract: Whole-genome sequencing of bacteria has recently emerged as a cost-effective and convenient approach for addressing many microbiological questions. Here, we review the current status of clinical microbiology and how it has already begun to be transformed by using next-generation sequencing. We focus on three essential tasks: identifying the species of an isolate, testing its properties, such as resistance to antibiotics and virulence, and monitoring the emergence and spread of bacterial pathogens. We predict that the application of next-generation sequencing will soon be sufficiently fast, accurate and cheap to be used in routine clinical microbiology practice, where it could replace many complex current techniques with a single, more efficient workflow.

...read moreread less

Journal Article•DOI•

BSmooth: from whole genome bisulfite sequencing reads to differentially methylated regions

[...]

Kasper D. Hansen¹, Benjamin Langmead¹, Benjamin Langmead², Rafael A. Irizarry¹, Rafael A. Irizarry² - Show less +1 more•Institutions (2)

Johns Hopkins University¹, Johns Hopkins University School of Medicine²

03 Oct 2012-Genome Biology

TL;DR: BSmooth is presented, an alignment, quality control and analysis pipeline that provides accurate and precise results even with low coverage data, appropriately handling biological replicates.

...read moreread less

Abstract: DNA methylation is an important epigenetic modification involved in gene regulation, which can now be measured using whole-genome bisulfite sequencing. However, cost, complexity of the data, and lack of comprehensive analytical tools are major challenges that keep this technology from becoming widely applied. Here we present BSmooth, an alignment, quality control and analysis pipeline that provides accurate and precise results even with low coverage data, appropriately handling biological replicates. BSmooth is open source software, and can be downloaded from http://rafalab.jhsph.edu/bsmooth.

...read moreread less

Journal Article•DOI•

Chromatin organization is a major influence on regional mutation rates in human cancer cells

[...]

Benjamin Schuster-Böckler¹, Ben Lehner², Ben Lehner¹•Institutions (2)

European Bioinformatics Institute¹, Catalan Institution for Research and Advanced Studies²

23 Aug 2012-Nature

TL;DR: Testing diverse genetic and epigenetic features shows that mutation rates in cancer genomes are strikingly related to chromatin organization, suggesting that the arrangement of the genome into heterochromatin- and euchROMatin-like domains is a dominant influence on regional mutation-rate variation in human somatic cells.

...read moreread less

Abstract: Cancer genome sequencing provides the first direct information on how mutation rates vary across the human genome in somatic cells. Testing diverse genetic and epigenetic features, here we show that mutation rates in cancer genomes are strikingly related to chromatin organization. Indeed, at the megabase scale, a single feature—levels of the heterochromatin-associated histone modification H3K9me3—can account for more than 40% of mutation-rate variation, and a combination of features can account for more than 55%. The strong association between mutation rates and chromatin organization is upheld in samples from different tissues and for different mutation types. This suggests that the arrangement of the genome into heterochromatin- and euchromatin-like domains is a dominant influence on regional mutation-rate variation in human somatic cells.

...read moreread less

Journal Article•DOI•

A beginner's guide to eukaryotic genome annotation

[...]

Mark Yandell¹, Daniel D. Ence¹•Institutions (1)

University of Utah¹

18 Apr 2012-Nature Reviews Genetics

TL;DR: An overview of the genome annotation process and the available tools is provided and some best-practice approaches are described.

...read moreread less

Abstract: The falling cost of genome sequencing is having a marked impact on the research community with respect to which genomes are sequenced and how and where they are annotated. Genome annotation projects have generally become small-scale affairs that are often carried out by an individual laboratory. Although annotating a eukaryotic genome assembly is now within the reach of non-experts, it remains a challenging task. Here we provide an overview of the genome annotation process and the available tools and describe some best-practice approaches.

...read moreread less

Journal Article•DOI•

The genome of melon (Cucumis melo L.)

[...]

Jordi Garcia-Mas¹, Andrej Benjak, Walter Sanseverino, Michael Bourgeois, Gisela Mir, Víctor M. González, Elizabeth Henaff, Francisco Câmara², Luca Cozzuto², Ernesto Lowy², Tyler Alioto, Salvador Capella-Gutierrez², José Blanca³, Joaquín Cañizares³, Pello Ziarsolo³, Daniel Gonzalez-Ibeas⁴, Luis Rodriguez-Moreno⁴, Marcus Droege⁵, Lei Du⁵, Miguel Álvarez-Tejado⁶, Belen Lorente-Galdos⁴, Marta Melé⁴, Marta Melé², Luming Yang⁷, Yiqun Weng⁷, Arcadi Navarro⁴, Tomas Marques-Bonet⁴, Miguel A. Aranda⁴, Fernando Nuez³, Belén Picó³, Toni Gabaldón², Guglielmo Roma², Roderic Guigó², Josep M. Casacuberta, Pere Arús, Pere Puigdomènech - Show less +32 more•Institutions (7)

Autonomous University of Barcelona¹, Pompeu Fabra University², Polytechnic University of Valencia³, Spanish National Research Council⁴, Hoffmann-La Roche⁵, Roche Applied Science⁶, University of Wisconsin-Madison⁷

17 Jul 2012-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: The data suggest that transposon amplification may in part explain the increased size of the melon genome compared with the close relative cucumber, and a low number of nucleotide-binding site–leucine-rich repeat disease resistance genes were annotated, suggesting the existence of specific defense mechanisms in this species.

...read moreread less

Abstract: We report the genome sequence of melon, an important horticultural crop worldwide. We assembled 375 Mb of the double-haploid line DHL92, representing 83.3% of the estimated melon genome. We predicted 27,427 protein-coding genes, which we analyzed by reconstructing 22,218 phylogenetic trees, allowing mapping of the orthology and paralogy relationships of sequenced plant genomes. We observed the absence of recent whole-genome duplications in the melon lineage since the ancient eudicot triplication, and our data suggest that transposon amplification may in part explain the increased size of the melon genome compared with the close relative cucumber. A low number of nucleotide-binding site–leucine-rich repeat disease resistance genes were annotated, suggesting the existence of specific defense mechanisms in this species. The DHL92 genome was compared with that of its parental lines allowing the quantification of sequence variability in the species. The use of the genome sequence in future investigations will facilitate the understanding of evolution of cucurbits and the improvement of breeding strategies.

...read moreread less

Journal Article•DOI•

Comparative Genomics of Plant-Associated Pseudomonas spp.: Insights into Diversity and Inheritance of Traits Involved in Multitrophic Interactions

[...]

Joyce E. Loper¹, Joyce E. Loper², Karl A. Hassan³, Dmitri V. Mavrodi⁴, Edward W. Davis¹, Chee Kent Lim³, Brenda T. Shaffer¹, Liam D. H. Elbourne³, Virginia O. Stockwell², Sierra L. Hartney², Katy Breakwell³, Marcella D. Henkels¹, Sasha G. Tetu³, Lorena I. Rangel², Teresa A. Kidarsa¹, Neil L. Wilson³, Judith E. van de Mortel⁵, Chunxu Song⁵, Rachel Z Blumhagen¹, Diana Radune⁶, Jessica B. Hostetler⁶, Lauren M. Brinkac⁶, A. Scott Durkin⁶, Daniel A. Kluepfel¹, W. Patrick Wechter¹, Anne J. Anderson⁷, Young Cheol Kim⁸, Leland S. Pierson⁹, Elizabeth A. Pierson⁹, Steven E. Lindow¹⁰, Donald Y. Kobayashi¹¹, Jos M. Raaijmakers⁵, David M. Weller¹, Linda S. Thomashow¹, Andrew E. Allen⁶, Ian T. Paulsen³ - Show less +32 more•Institutions (11)

United States Department of Agriculture¹, Oregon State University², Macquarie University³, Washington State University⁴, Wageningen University and Research Centre⁵, J. Craig Venter Institute⁶, Utah State University⁷, Chonnam National University⁸, Texas A&M University⁹, University of California, Berkeley¹⁰, Rutgers University¹¹

05 Jul 2012-PLOS Genetics

TL;DR: A comparative genome analysis of ten strains within the Pseudomonas fluorescens group including seven new genomic sequences found genes for traits that were not known previously in the strains, highlighting the enormous heterogeneity of the P. fluorescenceens group and the importance of the variable genome in tailoring individual strains to their specific lifestyles and functional repertoire.

...read moreread less

Abstract: We provide here a comparative genome analysis of ten strains within the Pseudomonas fluorescens group including seven new genomic sequences. These strains exhibit a diverse spectrum of traits involved in biological control and other multitrophic interactions with plants, microbes, and insects. Multilocus sequence analysis placed the strains in three sub-clades, which was reinforced by high levels of synteny, size of core genomes, and relatedness of orthologous genes between strains within a sub-clade. The heterogeneity of the P. fluorescens group was reflected in the large size of its pan-genome, which makes up approximately 54% of the pan-genome of the genus as a whole, and a core genome representing only 45–52% of the genome of any individual strain. We discovered genes for traits that were not known previously in the strains, including genes for the biosynthesis of the siderophores achromobactin and pseudomonine and the antibiotic 2-hexyl-5-propyl-alkylresorcinol; novel bacteriocins; type II, III, and VI secretion systems; and insect toxins. Certain gene clusters, such as those for two type III secretion systems, are present only in specific sub-clades, suggesting vertical inheritance. Almost all of the genes associated with multitrophic interactions map to genomic regions present in only a subset of the strains or unique to a specific strain. To explore the evolutionary origin of these genes, we mapped their distributions relative to the locations of mobile genetic elements and repetitive extragenic palindromic (REP) elements in each genome. The mobile genetic elements and many strain-specific genes fall into regions devoid of REP elements (i.e., REP deserts) and regions displaying atypical tri-nucleotide composition, possibly indicating relatively recent acquisition of these loci. Collectively, the results of this study highlight the enormous heterogeneity of the P. fluorescens group and the importance of the variable genome in tailoring individual strains to their specific lifestyles and functional repertoire.

...read moreread less

Journal Article•DOI•

Spatial Organization of the Mouse Genome and Its Role in Recurrent Chromosomal Translocations

[...]

Yu Zhang¹, Rachel Patton McCord², Yu-Jui Ho¹, Bryan R. Lajoie², Dominic G. Hildebrand¹, Alince C. Simon¹, Michael B. Becker¹, Frederick W. Alt¹, Job Dekker² - Show less +5 more•Institutions (2)

Howard Hughes Medical Institute¹, University of Massachusetts Medical School²

02 Mar 2012-Cell

TL;DR: A high-resolution Hi-C spatial organization map of the G1-arrested mouse pro-B cell genome is generated and high-throughput genome-wide translocation sequencing is used to map translocations from target DNA double-strand breaks (DSBs) within it.

...read moreread less

Journal Article•DOI•

2b-RAD: a simple and flexible method for genome-wide genotyping

[...]

Shi Wang¹, Eli Meyer², Eli Meyer¹, John K. McKay³, Mikhail V. Matz¹ - Show less +1 more•Institutions (3)

University of Texas at Austin¹, Oregon State University², Colorado State University³

01 Aug 2012-Nature Methods

TL;DR: 2b-RAD, a streamlined restriction site–associated DNA (RAD) genotyping method based on sequencing the uniform fragments produced by type IIB restriction endonucleases, is described.

...read moreread less

Abstract: Genotyping based on restriction site7ndash;associated (RAD) sequencing around type IIB enzyme recognition sites is reported. The streamlined reduced-representation approach features even and tunable genome coverage and enables large-scale genotyping studies by maximizing the amount of genotypic information that can be obtained from individuals for a given amount of sequencing. We describe 2b-RAD, a streamlined restriction site–associated DNA (RAD) genotyping method based on sequencing the uniform fragments produced by type IIB restriction endonucleases. Well-studied accessions of Arabidopsis thaliana were genotyped to validate the method's accuracy and to demonstrate fine-tuning of marker density as needed. The simplicity of the 2b-RAD protocol makes it particularly suitable for high-throughput genotyping as required for linkage mapping and profiling genetic variation in natural populations.

...read moreread less

Journal Article•DOI•

Proto-genes and de novo gene birth

[...]

Anne-Ruxandra Carvunis¹, Thomas Rolland¹, Ilan Wapinski¹, Michael A. Calderwood¹, Muhammed A. Yildirim¹, Nicolas Simonis¹, Nicolas Simonis², Benoit Charloteaux¹, Benoit Charloteaux³, César A. Hidalgo⁴, Justin Barbette¹, Balaji Santhanam¹, Gloria A. Brar⁵, Jonathan S. Weissman⁵, Aviv Regev⁴, Aviv Regev⁶, Nicolas Thierry-Mieg⁷, Michael E. Cusick¹, Marc Vidal¹ - Show less +15 more•Institutions (7)

Harvard University¹, Free University of Brussels², University of Liège³, Massachusetts Institute of Technology⁴, California Institute for Quantitative Biosciences⁵, Broad Institute⁶, Centre national de la recherche scientifique⁷

19 Jul 2012-Nature

TL;DR: In this article, the authors formalize an evolutionary model according to which functional genes evolve de novo through transitory proto-genes generated by widespread translational activity in non-genic sequences.

...read moreread less

Abstract: Novel protein-coding genes can arise either through re-organization of pre-existing genes or de novo. Processes involving re-organization of pre-existing genes, notably after gene duplication, have been extensively described. In contrast, de novo gene birth remains poorly understood, mainly because translation of sequences devoid of genes, or 'non-genic' sequences, is expected to produce insignificant polypeptides rather than proteins with specific biological functions. Here we formalize an evolutionary model according to which functional genes evolve de novo through transitory proto-genes generated by widespread translational activity in non-genic sequences. Testing this model at the genome scale in Saccharomyces cerevisiae, we detect translation of hundreds of short species-specific open reading frames (ORFs) located in non-genic sequences. These translation events seem to provide adaptive potential, as suggested by their differential regulation upon stress and by signatures of retention by natural selection. In line with our model, we establish that S. cerevisiae ORFs can be placed within an evolutionary continuum ranging from non-genic sequences to genes. We identify ~1,900 candidate proto-genes among S. cerevisiae ORFs and find that de novo gene birth from such a reservoir may be more prevalent than sporadic gene duplication. Our work illustrates that evolution exploits seemingly dispensable sequences to generate adaptive functional innovation.

...read moreread less

Journal Article•DOI•

The Genome Portal of the Department of Energy Joint Genome Institute

[...]

Igor V. Grigoriev¹, Henrik P. Nordberg¹, Igor Shabalov¹, Andrea Aerts¹, Michael N. Cantor¹, David Goodstein¹, Alan Kuo¹, Simon Minovitsky¹, Roman Nikitin¹, Robin A. Ohm¹, Robert Otillar¹, Alexander Poliakov¹, Igor Ratnere¹, Robert Riley¹, Tatyana Smirnova¹, Daniel S. Rokhsar¹, Inna Dubchak¹ - Show less +13 more•Institutions (1)

Lawrence Berkeley National Laboratory¹

01 Jan 2012-Nucleic Acids Research

TL;DR: The general organization of the JGI Genome Portal is described and the most recent addition, MycoCosm, a new integrated fungal genomics resource is described.

...read moreread less

Abstract: The Department of Energy (DOE) Joint Genome Institute (JGI) is a national user facility with massive-scale DNA sequencing and analysis capabilities dedicated to advancing genomics for bioenergy and environmental applications. Beyond generating tens of trillions of DNA bases annually, the Institute develops and maintains data management systems and specialized analytical capabilities to manage and interpret complex genomic data sets, and to enable an expanding community of users around the world to analyze these data in different contexts over the web. The JGI Genome Portal (http://genome.jgi.doe.gov) provides a unified access point to all JGI genomic databases and analytical tools. A user can find all DOE JGI sequencing projects and their status, search for and download assemblies and annotations of sequenced genomes, and interactively explore those genomes and compare them with other sequenced microbes, fungi, plants or metagenomes using specialized systems tailored to each particular class of organisms. We describe here the general organization of the Genome Portal and the most recent addition, MycoCosm (http://jgi.doe.gov/fungi), a new integrated fungal genomics resource.

...read moreread less

Journal Article•DOI•

Crop genomics: advances and applications

[...]

Peter L. Morrell¹, Edward S. Buckler², Jeffrey Ross-Ibarra³•Institutions (3)

University of Minnesota¹, Cornell University², University of California, Davis³

01 Feb 2012-Nature Reviews Genetics

TL;DR: The future of crop improvement will be centred on comparisons of individual plant genomes, and some of the best opportunities may lie in using combinations of new genetic mapping strategies and evolutionary analyses to direct and optimize the discovery and use of genetic variation.

...read moreread less

Abstract: The completion of reference genome sequences for many important crops and the ability to perform high-throughput resequencing are providing opportunities for improving our understanding of the history of plant domestication and to accelerate crop improvement. Crop plant comparative genomics is being transformed by these data and a new generation of experimental and computational approaches. The future of crop improvement will be centred on comparisons of individual plant genomes, and some of the best opportunities may lie in using combinations of new genetic mapping strategies and evolutionary analyses to direct and optimize the discovery and use of genetic variation. Here we review such strategies and insights that are emerging.

...read moreread less

Journal Article•DOI•

Simple Methods for Generating and Detecting Locus- Specific Mutations Induced with TALENs in the Zebrafish Genome

[...]

Timothy J. Dahlem¹, Kazuyuki Hoshijima¹, Michael J. Jurynec¹, Derrick Gunther¹, Colby G. Starker², Alexandra S. Locke¹, Allison M. Weis¹, Daniel F. Voytas², David Grunwald¹ - Show less +5 more•Institutions (2)

University of Utah¹, University of Minnesota²

16 Aug 2012-PLOS Genetics

TL;DR: Results presented here indicate the TALENs are highly sequence-specific and produce minimal off-target effects.

...read moreread less

Abstract: The zebrafish is a powerful experimental system for uncovering gene function in vertebrate organisms. Nevertheless, studies in the zebrafish have been limited by the approaches available for eliminating gene function. Here we present simple and efficient methods for inducing, detecting, and recovering mutations at virtually any locus in the zebrafish. Briefly, double-strand DNA breaks are induced at a locus of interest by synthetic nucleases, called TALENs. Subsequent host repair of the DNA lesions leads to the generation of insertion and deletion mutations at the targeted locus. To detect the induced DNA sequence alterations at targeted loci, genomes are examined using High Resolution Melt Analysis, an efficient and sensitive method for detecting the presence of newly arising sequence polymorphisms. As the DNA binding specificity of a TALEN is determined by a custom designed array of DNA recognition modules, each of which interacts with a single target nucleotide, TALENs with very high target sequence specificities can be easily generated. Using freely accessible reagents and Web-based software, and a very simple cloning strategy, a TALEN that uniquely recognizes a specific pre-determined locus in the zebrafish genome can be generated within days. Here we develop and test the activity of four TALENs directed at different target genes. Using the experimental approach described here, every embryo injected with RNA encoding a TALEN will acquire targeted mutations. Multiple independently arising mutations are produced in each growing embryo, and up to 50% of the host genomes may acquire a targeted mutation. Upon reaching adulthood, approximately 90% of these animals transmit targeted mutations to their progeny. Results presented here indicate the TALENs are highly sequence-specific and produce minimal off-target effects. In all, it takes about two weeks to create a target-specific TALEN and generate growing embryos that harbor an array of germ line mutations at a pre-specified locus.

...read moreread less

Collapse