Showing papers on "Genome published in 2011"

PDF

Open Access

Journal Article•DOI•

Full-length transcriptome assembly from RNA-Seq data without a reference genome.

[...]

Manfred Grabherr¹, Brian J. Haas¹, Moran Yassour¹, Moran Yassour², Joshua Z. Levin¹, Dawn Thompson¹, Ido Amit¹, Xian Adiconis¹, Lin Fan¹, Raktima Raychowdhury¹, Qiandong Zeng¹, Zehua Chen¹, Evan Mauceli¹, Nir Hacohen¹, Andreas Gnirke¹, Nicholas Rhind³, Federica Di Palma¹, Bruce W. Birren¹, Chad Nusbaum¹, Kerstin Lindblad-Toh⁴, Kerstin Lindblad-Toh¹, Nir Friedman², Aviv Regev¹ - Show less +19 more•Institutions (4)

Massachusetts Institute of Technology¹, Hebrew University of Jerusalem², University of Massachusetts Medical School³, Science for Life Laboratory⁴

01 Jul 2011-Nature Biotechnology

TL;DR: The Trinity method for de novo assembly of full-length transcripts and evaluate it on samples from fission yeast, mouse and whitefly, whose reference genome is not yet available, providing a unified solution for transcriptome reconstruction in any sample.

...read moreread less

Abstract: Massively parallel sequencing of cDNA has enabled deep and efficient probing of transcriptomes. Current approaches for transcript reconstruction from such data often rely on aligning reads to a reference genome, and are thus unsuitable for samples with a partial or missing reference genome. Here we present the Trinity method for de novo assembly of full-length transcripts and evaluate it on samples from fission yeast, mouse and whitefly, whose reference genome is not yet available. By efficiently constructing and analyzing sets of de Bruijn graphs, Trinity fully reconstructs a large fraction of transcripts, including alternatively spliced isoforms and transcripts from recently duplicated genes. Compared with other de novo transcriptome assemblers, Trinity recovers more full-length transcripts across a broad range of expression levels, with a sensitivity similar to methods that rely on genome alignments. Our approach provides a unified solution for transcriptome reconstruction in any sample, especially in the absence of a reference genome.

...read moreread less

15,665 citations

Journal Article•DOI•

FLASH: Fast Length Adjustment of Short Reads to Improve Genome Assemblies

[...]

Tanja Magoc¹, Steven L. Salzberg¹•Institutions (1)

Johns Hopkins University School of Medicine¹

01 Nov 2011-Bioinformatics

TL;DR: FLASH is a fast computational tool to extend the length of short reads by overlapping paired-end reads from fragment libraries that are sufficiently short and when FLASH was used to extend reads prior to assembly, the resulting assemblies had substantially greater N50 lengths for both contigs and scaffolds.

...read moreread less

Abstract: Motivation: Next-generation sequencing technologies generate very large numbers of short reads. Even with very deep genome coverage, short read lengths cause problems in de novo assemblies. The use of paired-end libraries with a fragment size shorter than twice the read length provides an opportunity to generate much longer reads by overlapping and merging read pairs before assembling a genome. Results: We present FLASH, a fast computational tool to extend the length of short reads by overlapping paired-end reads from fragment libraries that are sufficiently short. We tested the correctness of the tool on one million simulated read pairs, and we then applied it as a pre-processor for genome assemblies of Illumina reads from the bacterium Staphylococcus aureus and human chromosome 14. FLASH correctly extended and merged reads >99% of the time on simulated reads with an error rate of <1%. With adequately set parameters, FLASH correctly merged reads over 90% of the time even when the reads contained up to 5% errors. When FLASH was used to extend reads prior to assembly, the resulting assemblies had substantially greater N50 lengths for both contigs and scaffolds. Availability and Implementation: The FLASH system is implemented in C and is freely available as open-source code at http://www.cbcb.umd.edu/software/flash. Contact: moc.liamg@cogam.t

...read moreread less

9,827 citations

Journal Article•DOI•

A Robust, Simple Genotyping-by-Sequencing (GBS) Approach for High Diversity Species

[...]

Robert J. Elshire¹, Jeffrey C. Glaubitz¹, Qi-ying Sun¹, Jesse Poland², Ken Kawamoto¹, Edward S. Buckler², Edward S. Buckler¹, Sharon E. Mitchell¹ - Show less +4 more•Institutions (2)

Cornell University¹, United States Department of Agriculture²

04 May 2011-PLOS ONE

TL;DR: A procedure for constructing GBS libraries based on reducing genome complexity with restriction enzymes (REs) is reported, which is simple, quick, extremely specific, highly reproducible, and may reach important regions of the genome that are inaccessible to sequence capture approaches.

...read moreread less

Abstract: Advances in next generation technologies have driven the costs of DNA sequencing down to the point that genotyping-by-sequencing (GBS) is now feasible for high diversity, large genome species. Here, we report a procedure for constructing GBS libraries based on reducing genome complexity with restriction enzymes (REs). This approach is simple, quick, extremely specific, highly reproducible, and may reach important regions of the genome that are inaccessible to sequence capture approaches. By using methylation-sensitive REs, repetitive regions of genomes can be avoided and lower copy regions targeted with two to three fold higher efficiency. This tremendously simplifies computationally challenging alignment problems in species with high levels of genetic diversity. The GBS procedure is demonstrated with maize (IBM) and barley (Oregon Wolfe Barley) recombinant inbred populations where roughly 200,000 and 25,000 sequence tags were mapped, respectively. An advantage in species like barley that lack a complete genome sequence is that a reference map need only be developed around the restriction sites, and this can be done in the process of sample genotyping. In such cases, the consensus of the read clusters across the sequence tagged sites becomes the reference. Alternatively, for kinship analyses in the absence of a reference genome, the sequence tags can simply be treated as dominant markers. Future application of GBS to breeding, conservation, and global species and population surveys may allow plant breeders to conduct genomic selection on a novel germplasm or species without first having to develop any prior molecular tools, or conservation biologists to determine population structure without prior knowledge of the genome or diversity in the species.

...read moreread less

5,163 citations

Journal Article•DOI•

BLAST Ring Image Generator (BRIG) : simple prokaryote genome comparisons

[...]

Nabil-Fareed Alikhan¹, Nicola K. Petty¹, Nouri L. Ben Zakour¹, Scott A. Beatson¹•Institutions (1)

University of Queensland¹

08 Aug 2011-BMC Genomics

TL;DR: BRIG is a cross-platform application that enables the interactive generation of comparative genomic images via a simple graphical-user interface and will perform all required file parsing and BLAST comparisons automatically.

...read moreread less

Abstract: Visualisation of genome comparisons is invaluable for helping to determine genotypic differences between closely related prokaryotes. New visualisation and abstraction methods are required in order to improve the validation, interpretation and communication of genome sequence information; especially with the increasing amount of data arising from next-generation sequencing projects. Visualising a prokaryote genome as a circular image has become a powerful means of displaying informative comparisons of one genome to a number of others. Several programs, imaging libraries and internet resources already exist for this purpose, however, most are either limited in the number of comparisons they can show, are unable to adequately utilise draft genome sequence data, or require a knowledge of command-line scripting for implementation. Currently, there is no freely available desktop application that enables users to rapidly visualise comparisons between hundreds of draft or complete genomes in a single image.

...read moreread less

2,254 citations

Journal Article•DOI•

Genome-wide genetic marker discovery and genotyping using next-generation sequencing.

[...]

John W. Davey¹, Paul A. Hohenlohe², Paul D. Etter², Jason Q. Boone, Julian M. Catchen², Mark Blaxter¹ - Show less +2 more•Institutions (2)

University of Edinburgh¹, University of Oregon²

17 Jun 2011-Nature Reviews Genetics

TL;DR: Best practices for several NGS methods for genome-wide genetic marker development and genotyping that use restriction enzyme digestion of target genomes to reduce the complexity of the target.

...read moreread less

Abstract: The authors describe the best practices for a growing number of methods that use next-generation sequencing to rapidly discover and assess genetic markers across any genome, with applications from population genomics and quantitative trait locus mapping to marker-assisted selection.

...read moreread less

2,231 citations

Journal Article•DOI•

Genome sequence and analysis of the tuber crop potato.

[...]

Xun Xu¹, Shengkai Pan¹, Shifeng Cheng¹, Bo Zhang¹, Mu D¹, Peixiang Ni¹, Gengyun Zhang¹, Shuang Yang¹, Ruiqiang Li¹, Jun Wang¹, Gisella Orjeda², Frank Guzman², Torres M², Roberto Lozano², Olga Ponce², Diana Martinez², De la Cruz G³, Chakrabarti Sk³, Patil Vu³, Konstantin G. Skryabin⁴, Boris B. Kuznetsov⁴, Nikolai V. Ravin⁴, Tatjana V. Kolganova⁴, Alexey V. Beletsky⁴, Andrey V. Mardanov⁴, Di Genova A⁵, Dan Bolser⁵, David M. A. Martin⁵, Li G, Yang Y, Hanhui Kuang⁶, Hu Q⁶, Xiong X⁷, Gerard J. Bishop⁸, Boris Sagredo, Nilo Mejía, Zagorski W⁹, Robert Gromadka⁹, Jan Gawor⁹, Pawel Szczesny⁹, Sanwen Huang, Zhang Z, Liang C, He J, Li Y, He Y, Xu J, Youjun Zhang, Xie B, Du Y, Qu D, Merideth Bonierbale¹⁰, Marc Ghislain¹⁰, Herrera Mdel R, Giovanni Giuliano, Marco Pietrella, Gaetano Perrotta, Paolo Facella, O'Brien K¹¹, Sergio Enrique Feingold, Barreiro Le, Massa Ga, Luis Aníbal Diambra¹², Brett R Whitty¹³, Brieanne Vaillancourt¹³, Lin H¹³, Alicia N. Massa¹³, Geoffroy M¹³, Lundback S¹³, Dean DellaPenna¹³, Buell Cr¹⁴, Sanjeev Kumar Sharma¹⁴, David Marshall¹⁴, Robbie Waugh¹⁴, Glenn J. Bryan¹⁴, Destefanis M¹⁵, Istvan Nagy¹⁵, Dan Milbourne¹⁵, Susan Thomson¹⁶, Mark Fiers¹⁶, Jeanne M. E. Jacobs¹⁶, Kåre Lehmann Nielsen¹⁷, Mads Sønderkær¹⁷, Marina Iovene¹⁸, Giovana Augusta Torres¹⁸, Jiming Jiang¹⁸, Richard E. Veilleux¹⁹, Christian W. B. Bachem²⁰, de Boer J²⁰, Theo Borm²⁰, Bjorn Kloosterman²⁰, van Eck H²⁰, Erwin Datema²⁰, Hekkert Bt²⁰, Aska Goverse²⁰, van Ham Rc²⁰, Richard G. F. Visser²⁰ - Show less +93 more•Institutions (20)

10 Jul 2011-Nature

TL;DR: The potato genome sequence provides a platform for genetic improvement of this vital crop and predicts 39,031 protein-coding genes and presents evidence for at least two genome duplication events indicative of a palaeopolyploid origin.

...read moreread less

Abstract: Potato (Solanum tuberosum L.) is the world's most important non-grain food crop and is central to global food security. It is clonally propagated, highly heterozygous, autotetraploid, and suffers acute inbreeding depression. Here we use a homozygous doubled-monoploid potato clone to sequence and assemble 86% of the 844-megabase genome. We predict 39,031 protein-coding genes and present evidence for at least two genome duplication events indicative of a palaeopolyploid origin. As the first genome sequence of an asterid, the potato genome reveals 2,642 genes specific to this large angiosperm clade. We also sequenced a heterozygous diploid clone and show that gene presence/absence variants and other potentially deleterious mutations occur frequently and are a likely cause of inbreeding depression. Gene family expansion, tissue-specific expression and recruitment of genes to new pathways contributed to the evolution of tuber development. The potato genome sequence provides a platform for genetic improvement of this vital crop.

...read moreread less

1,813 citations

Journal Article•DOI•

The genome of the mesopolyploid crop species Brassica rapa

[...]

Xiaowu Wang¹, Hanzhong Wang, Jun Wang², Jun Wang³, Jun Wang⁴, Rifei Sun, Jian Wu, Shengyi Liu, Yinqi Bai², Jeong-Hwan Mun⁵, Ian Bancroft⁶, Feng Cheng, Sanwen Huang, Xixiang Li, Wei Hua, Junyi Wang², Xiyin Wang⁷, Xiyin Wang⁸, Michael Freeling⁹, J. Chris Pires¹⁰, Andrew H. Paterson⁷, Boulos Chalhoub, Bo Wang², Alice Hayward¹¹, Alice Hayward¹², Andrew G. Sharpe¹³, Beom-Seok Park⁵, Bernd Weisshaar¹⁴, Binghang Liu², Bo Li², Bo Liu, Chaobo Tong, Chi Song², Chris Duran¹⁵, Chris Duran¹², Chunfang Peng², Geng Chunyu², Chushin Koh¹³, Chuyu Lin², David Edwards¹², David Edwards¹⁵, Desheng Mu², Di Shen, Eleni Soumpourou⁶, Fei Li, Fiona Fraser⁶, Gavin C. Conant¹⁰, Gilles Lassalle¹⁶, Graham J.W. King³, Guusje Bonnema¹⁷, Haibao Tang⁹, Haiping Wang, Harry Belcram, Heling Zhou², Hideki Hirakawa, Hiroshi Abe, Hui Guo⁷, Hui Wang, Huizhe Jin⁷, Isobel A. P. Parkin¹⁸, Jacqueline Batley¹¹, Jacqueline Batley¹², Jeong-Sun Kim⁵, Jérémy Just, Jianwen Li², Jiaohui Xu², Jie Deng, Jin A Kim⁵, Jingping Li⁷, Jingyin Yu, Jinling Meng¹⁹, Jinpeng Wang⁸, Jiumeng Min², Julie Poulain²⁰, Katsunori Hatakeyama, Kui Wu², Li Wang⁸, Lu Fang, Martin Trick⁶, Matthew G. Links¹⁸, Meixia Zhao, Mina Jin⁵, Nirala Ramchiary²¹, Nizar Drou²², Paul J. Berkman¹⁵, Paul J. Berkman¹², Qingle Cai², Quanfei Huang², Ruiqiang Li², Satoshi Tabata, Shifeng Cheng², Shu Zhang², Shujiang Zhang, Shunmou Huang, Shusei Sato, Silong Sun, Soo-Jin Kwon⁵, Su-Ryun Choi²¹, Tae-Ho Lee⁷, Wei Fan², Xiang Zhao², Xu Tan⁷, Xun Xu², Yan Wang, Yang Qiu, Ye Yin², Yingrui Li², Yongchen Du, Yongcui Liao, Yong Pyo Lim²¹, Yoshihiro Narusaka, Yupeng Wang⁸, Zhenyi Wang⁸, Zhenyu Li², Zhiwen Wang², Zhiyong Xiong¹⁰, Zhonghua Zhang - Show less +113 more•Institutions (22)

01 Oct 2011-Nature Genetics

TL;DR: The annotation and analysis of the draft genome sequence of Brassica rapa accession Chiifu-401-42, a Chinese cabbage, and used Arabidopsis thaliana as an outgroup for investigating the consequences of genome triplication, such as structural and functional evolution.

...read moreread less

Abstract: We report the annotation and analysis of the draft genome sequence of Brassica rapa accession Chiifu-401-42, a Chinese cabbage. We modeled 41,174 protein coding genes in the B. rapa genome, which has undergone genome triplication. We used Arabidopsis thaliana as an outgroup for investigating the consequences of genome triplication, such as structural and functional evolution. The extent of gene loss (fractionation) among triplicated genome segments varies, with one of the three copies consistently retaining a disproportionately large fraction of the genes expected to have been present in its ancestor. Variation in the number of members of gene families present in the genome may contribute to the remarkable morphological plasticity of Brassica species. The B. rapa genome sequence provides an important resource for studying the evolution of polyploid genomes and underpins the genetic improvement of Brassica oil and vegetable crops.

...read moreread less

1,811 citations

Journal Article•DOI•

High-quality draft assemblies of mammalian genomes from massively parallel sequence data

[...]

Sante Gnerre¹, Iain MacCallum, Dariusz Przybylski, Filipe J. Ribeiro, Joshua N. Burton, Bruce J. Walker, Ted Sharpe, Giles Hall, Terrance Shea, Sean M. Sykes, Aaron M. Berlin, Daniel Aird, Maura Costello, Riza M. Daza, Louise Williams, Robert Nicol, Andreas Gnirke, Chad Nusbaum, Eric S. Lander, David B. Jaffe - Show less +16 more•Institutions (1)

Broad Institute¹

25 Jan 2011-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: The development of an algorithm for genome assembly, ALLPATHS-LG, and its application to massively parallel DNA sequence data from the human and mouse genomes, generated on the Illumina platform, have good accuracy, short-range contiguity, long-range connectivity, and coverage of the genome.

...read moreread less

Abstract: Massively parallel DNA sequencing technologies are revolutionizing genomics by making it possible to generate billions of relatively short (~100-base) sequence reads at very low cost. Whereas such data can be readily used for a wide range of biomedical applications, it has proven difficult to use them to generate high-quality de novo genome assemblies of large, repeat-rich vertebrate genomes. To date, the genome assemblies generated from such data have fallen far short of those obtained with the older (but much more expensive) capillary-based sequencing approach. Here, we report the development of an algorithm for genome assembly, ALLPATHS-LG, and its application to massively parallel DNA sequence data from the human and mouse genomes, generated on the Illumina platform. The resulting draft genome assemblies have good accuracy, short-range contiguity, long-range connectivity, and coverage of the genome. In particular, the base accuracy is high (≥99.95%) and the scaffold sizes (N50 size = 11.5 Mb for human and 7.2 Mb for mouse) approach those obtained with capillary-based sequencing. The combination of improved sequencing technology and improved computational methods should now make it possible to increase dramatically the de novo sequencing of large genomes. The ALLPATHS-LG program is available at http://www.broadinstitute.org/science/programs/genome-biology/crd.

...read moreread less

1,616 citations

Journal Article•DOI•

miRWalk - Database: Prediction of possible miRNA binding sites by walking the genes of three genomes

[...]

Harsh Dweep¹, Carsten Sticht¹, Priyanka Pandey¹, Norbert Gretz¹•Institutions (1)

Heidelberg University¹

01 Oct 2011-Journal of Biomedical Informatics

TL;DR: The miRWalk database as mentioned in this paper is a comprehensive database on miRNAs, which hosts predicted as well as validated miRNA binding sites, information on all known genes of human, mouse and rat.

...read moreread less

1,603 citations

Journal Article•DOI•

A conditional knockout resource for the genome-wide study of mouse gene function.

[...]

William C. Skarnes¹, Barry Rosen¹, Anthony P. West¹, Manousos Koutsourakis¹, Wendy Bushell¹, Vivek Iyer¹, Alejandro O. Mujica¹, Alejandro O. Mujica², Mark G. Thomas¹, Jennifer Harrow¹, Tony Cox¹, David A. Jackson¹, Jessica Severin¹, Jessica Severin², Patrick J. Biggs¹, Patrick J. Biggs², Jun Fu³, Michael Nefedov⁴, Pieter J. de Jong⁴, A. Francis Stewart³, Allan Bradley¹ - Show less +17 more•Institutions (4)

Wellcome Trust Sanger Institute¹, Massey University², Dresden University of Technology³, Children's Hospital Oakland Research Institute⁴

16 Jun 2011-Nature

TL;DR: High-throughput genome engineering highlighted by this study is broadly applicable to rat and human stem cells and provides a foundation for future genome-wide efforts aimed at deciphering the function of all genes encoded by the mammalian genome.

...read moreread less

Abstract: Gene targeting in embryonic stem cells has become the principal technology for manipulation of the mouse genome, offering unrivalled accuracy in allele design and access to conditional mutagenesis. To bring these advantages to the wider research community, large-scale mouse knockout programmes are producing a permanent resource of targeted mutations in all protein-coding genes. Here we report the establishment of a high-throughput gene-targeting pipeline for the generation of reporter-tagged, conditional alleles. Computational allele design, 96-well modular vector construction and high-efficiency gene-targeting strategies have been combined to mutate genes on an unprecedented scale. So far, more than 12,000 vectors and 9,000 conditional targeted alleles have been produced in highly germline-competent C57BL/6N embryonic stem cells. High-throughput genome engineering highlighted by this study is broadly applicable to rat and human stem cells and provides a foundation for future genome-wide efforts aimed at deciphering the function of all genes encoded by the mammalian genome.

...read moreread less

1,538 citations

Journal Article•DOI•

MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects.

[...]

Carson Holt¹, Carson Holt², Mark Yandell¹•Institutions (2)

University of Utah¹, Ontario Institute for Cancer Research²

22 Dec 2011-BMC Bioinformatics

TL;DR: MAKER2 is the first annotation engine specifically designed for second-generation genome projects, which scales to datasets of any size, requires little in the way of training data, and can use mRNA-seq data to improve annotation quality.

...read moreread less

Abstract: Second-generation sequencing technologies are precipitating major shifts with regards to what kinds of genomes are being sequenced and how they are annotated. While the first generation of genome projects focused on well-studied model organisms, many of today's projects involve exotic organisms whose genomes are largely terra incognita. This complicates their annotation, because unlike first-generation projects, there are no pre-existing 'gold-standard' gene-models with which to train gene-finders. Improvements in genome assembly and the wide availability of mRNA-seq data are also creating opportunities to update and re-annotate previously published genome annotations. Today's genome projects are thus in need of new genome annotation tools that can meet the challenges and opportunities presented by second-generation sequencing technologies. We present MAKER2, a genome annotation and data management tool designed for second-generation genome projects. MAKER2 is a multi-threaded, parallelized application that can process second-generation datasets of virtually any size. We show that MAKER2 can produce accurate annotations for novel genomes where training-data are limited, of low quality or even non-existent. MAKER2 also provides an easy means to use mRNA-seq data to improve annotation quality; and it can use these data to update legacy annotations, significantly improving their quality. We also show that MAKER2 can evaluate the quality of genome annotations, and identify and prioritize problematic annotations for manual review. MAKER2 is the first annotation engine specifically designed for second-generation genome projects. MAKER2 scales to datasets of any size, requires little in the way of training data, and can use mRNA-seq data to improve annotation quality. It can also update and manage legacy genome annotation datasets.

...read moreread less

Journal Article•DOI•

Mouse genomic variation and its effect on phenotypes and gene regulation

[...]

Thomas M. Keane¹, Leo Goodstadt², Petr Danecek¹, Michael A. White³, Kim Wong¹, Binnaz Yalcin², Andreas Heger⁴, Avigail Agam², Avigail Agam⁴, Guy Slater¹, Martin Goodson², Nick Furlotte⁵, Eleazar Eskin⁵, Christoffer Nellåker⁴, Helen Whitley², James Cleak², Deborah Janowitz², Deborah Janowitz⁶, Polinka Hernandez-Pliego², Andrew Edwards², T G Belgard⁴, Peter L. Oliver⁴, Rebecca E. McIntyre¹, Amarjit Bhomra², Jérôme Nicod², Xiangchao Gan², Wei Yuan², L van der Weyden¹, Charles A. Steward¹, Sendu Bala¹, Jim Stalker¹, Richard Mott², Richard Durbin¹, Ian J. Jackson⁷, Anne Czechanski, José Afonso Guerra-Assunção⁸, Leah Rae Donahue, Laura G. Reinholdt, Bret A. Payseur³, Chris P. Ponting⁴, Ewan Birney⁸, Jonathan Flint², David J. Adams¹ - Show less +39 more•Institutions (8)

Wellcome Trust Sanger Institute¹, Wellcome Trust Centre for Human Genetics², University of Wisconsin-Madison³, University of Oxford⁴, University of California, Los Angeles⁵, University of Greifswald⁶, Medical Research Council⁷, European Bioinformatics Institute⁸

15 Sep 2011-Nature

TL;DR: These sequences provide a starting point for a new era in the functional analysis of a key model organism and show that the molecular nature of functional variants and their position relative to genes vary according to the effect size of the locus.

...read moreread less

Abstract: We report genome sequences of 17 inbred strains of laboratory mice and identify almost ten times more variants than previously known. We use these genomes to explore the phylogenetic history of the laboratory mouse and to examine the functional consequences of allele-specific variation on transcript abundance, revealing that at least 12% of transcripts show a significant tissue-specific expression bias. By identifying candidate functional variants at 718 quantitative trait loci we show that the molecular nature of functional variants and their position relative to genes vary according to the effect size of the locus. These sequences provide a starting point for a new era in the functional analysis of a key model organism.

...read moreread less

Journal Article•DOI•

A User's Guide to the Encyclopedia of DNA Elements (ENCODE)

[...]

Richard M. Myers, John A. Stamatoyannopoulos¹, Michael Snyder², Ian Dunham +325 more•Institutions (31)

01 Apr 2011-PLOS Biology

TL;DR: An overview of the project and the resources it is generating and the application of ENCODE data to interpret the human genome are provided.

...read moreread less

Abstract: The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health. The ENCODE Consortium is integrating multiple technologies and approaches in a collective effort to discover and define the functional elements encoded in the human genome, including genes, transcripts, and transcriptional regulatory regions, together with their attendant chromatin states and DNA methylation patterns. In the process, standards to ensure high-quality data have been implemented, and novel algorithms have been developed to facilitate analysis. Data and derived results are made available through a freely accessible database. Here we provide an overview of the project and the resources it is generating and illustrate the application of ENCODE data to interpret the human genome.

...read moreread less

Journal Article•DOI•

The developmental transcriptome of Drosophila melanogaster

[...]

Brenton R. Graveley¹, Angela N. Brooks², Joseph W. Carlson³, Michael O. Duff¹, Jane M. Landolin³, Li Yang¹, Carlo G. Artieri⁴, Marijke J. van Baren⁵, Nathan Boley², Benjamin W. Booth³, James B. Brown², Lucy Cherbas⁶, Carrie A. Davis⁷, Alexander Dobin⁷, Renhua Li⁴, Wei Lin⁷, John H. Malone⁴, Nicolas R. Mattiuzzo⁴, David Scott Miller⁶, David Sturgill⁴, Brian B. Tuch⁸, Brian B. Tuch⁹, Chris Zaleski⁷, Dayu Zhang⁶, Marco Blanchette¹⁰, Marco Blanchette¹¹, Sandrine Dudoit², Brian D. Eads⁶, Richard E. Green¹², Ann S. Hammonds³, Lichun Jiang⁴, Phil Kapranov⁷, Laura Langton⁵, Norbert Perrimon¹³, Jeremy E. Sandler³, Kenneth H. Wan³, Aarron T. Willingham², Yu Zhang⁴, Yi Zou⁶, Justen Andrews⁶, Peter J. Bickel², Steven E. Brenner², Michael R. Brent⁵, Peter Cherbas⁶, Thomas R. Gingeras¹⁴, Thomas R. Gingeras⁷, Roger A. Hoskins³, Thomas C. Kaufman⁶, Brian Oliver⁴, Susan E. Celniker³ - Show less +46 more•Institutions (14)

University of Connecticut Health Center¹, University of California, Berkeley², Lawrence Berkeley National Laboratory³, National Institutes of Health⁴, Washington University in St. Louis⁵, Indiana University⁶, Cold Spring Harbor Laboratory⁷, Life Technologies⁸, Amgen⁹, Stowers Institute for Medical Research¹⁰, University of Kansas¹¹, University of California, Santa Cruz¹², Howard Hughes Medical Institute¹³, Affymetrix¹⁴

24 Mar 2011-Nature

TL;DR: 111,195 new elements are identified, including thousands of genes, coding and non-coding transcripts, exons, splicing and editing events and inferred protein isoforms that previously eluded discovery using established experimental, prediction and conservation-based approaches.

...read moreread less

Abstract: Drosophila melanogaster is one of the most well studied genetic model organisms; nonetheless, its genome still contains unannotated coding and non-coding genes, transcripts, exons and RNA editing sites. Full discovery and annotation are pre-requisites for understanding how the regulation of transcription, splicing and RNA editing directs the development of this complex organism. Here we used RNA-Seq, tiling microarrays and cDNA sequencing to explore the transcriptome in 30 distinct developmental stages. We identified 111,195 new elements, including thousands of genes, coding and non-coding transcripts, exons, splicing and editing events, and inferred protein isoforms that previously eluded discovery using established experimental, prediction and conservation-based approaches. These data substantially expand the number of known transcribed elements in the Drosophila genome and provide a high-resolution view of transcriptome dynamics throughout development.

...read moreread less

Journal Article•DOI•

CNVnator: An approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing

[...]

Alexej Abyzov¹, Alexander E. Urban², Michael Snyder², Mark Gerstein¹•Institutions (2)

Yale University¹, Stanford University²

01 Jun 2011-Genome Research

TL;DR: By genotyping CNVs in the CEPH, Yoruba, and Chinese-Japanese populations, it is estimated that at least 11% of all CNV loci involve complex, multi-allelic events, a considerably higher estimate than reported earlier.

...read moreread less

Abstract: Copy number variation (CNV) in the genome is a complex phenomenon, and not completely understood. We have developed a method, CNVnator, for CNV discovery and genotyping from read-depth (RD) analysis of personal genome sequencing. Our method is based on combining the established mean-shift approach with additional refinements (multiple-bandwidth partitioning and GC correction) to broaden the range of discovered CNVs. We calibrated CNVnator using the extensive validation performed by the 1000 Genomes Project. Because of this, we could use CNVnator for CNV discovery and genotyping in a population and characterization of atypical CNVs, such as de novo and multi-allelic events. Overall, for CNVs accessible by RD, CNVnator has high sensitivity (86%-96%), low false-discovery rate (3%-20%), high genotyping accuracy (93%-95%), and high resolution in breakpoint discovery (<200 bp in 90% of cases with high sequencing coverage). Furthermore, CNVnator is complementary in a straightforward way to split-read and read-pair approaches: It misses CNVs created by retrotransposable elements, but more than half of the validated CNVs that it identifies are not detected by split-read or read-pair. By genotyping CNVs in the CEPH, Yoruba, and Chinese-Japanese populations, we estimated that at least 11% of all CNV loci involve complex, multi-allelic events, a considerably higher estimate than reported earlier. Moreover, among these events, we observed cases with allele distribution strongly deviating from Hardy-Weinberg equilibrium, possibly implying selection on certain complex loci. Finally, by combining discovery and genotyping, we identified six potential de novo CNVs in two family trios.

...read moreread less

Journal Article•DOI•

Initial genome sequencing and analysis of multiple myeloma

[...]

Michael A Chapman¹, Michael S. Lawrence¹, Jonathan J Keats², Kristian Cibulskis¹, Carrie Sougnez¹, Anna C. Schinzel³, Christina L. Harview¹, Jean Philippe Brunet¹, Gregory J. Ahmann², Mazhar Adli³, Mazhar Adli¹, Kenneth C. Anderson³, Kristin G. Ardlie¹, Daniel Auclair⁴, Angela Baker⁵, P. Leif Bergsagel², Bradley E. Bernstein¹, Bradley E. Bernstein³, Bradley E. Bernstein⁶, Yotam Drier⁷, Yotam Drier¹, Rafael Fonseca², Stacey Gabriel¹, Craig C. Hofmeister⁸, Sundar Jagannath⁹, Andrzej Jakubowiak¹⁰, Amrita Krishnan¹¹, Joan Levy⁴, Ted Liefeld¹, Sagar Lonial¹², Scott Mahan¹, Bunmi Mfuko⁴, Stefano Monti¹, Louise M. Perkins⁴, Robb Onofrio¹, Trevor J. Pugh¹, S. Vincent Rajkumar², Alex H. Ramos¹, David S. Siegel¹³, Andrey Sivachenko¹, A. Keith Stewart², Suzanne Trudel, Ravi Vij¹⁴, Douglas Voet¹, Wendy Winckler¹, Todd Zimmerman¹⁵, John D. Carpten⁵, Jeff Trent⁵, William C. Hahn³, William C. Hahn¹, Levi A. Garraway¹, Levi A. Garraway³, Matthew Meyerson¹, Matthew Meyerson³, Eric S. Lander¹⁶, Eric S. Lander¹, Eric S. Lander³, Gad Getz¹, Todd R. Golub - Show less +55 more•Institutions (16)

Broad Institute¹, Mayo Clinic², Harvard University³, Multiple Myeloma Research Foundation⁴, Translational Genomics Research Institute⁵, Howard Hughes Medical Institute⁶, Weizmann Institute of Science⁷, Ohio State University⁸, Catholic Medical Center⁹, University of Michigan¹⁰, City of Hope National Medical Center¹¹, Emory University¹², Rutgers University¹³, Washington University in St. Louis¹⁴, University of Chicago¹⁵, Massachusetts Institute of Technology¹⁶

24 Mar 2011-Nature

TL;DR: The massively parallel sequencing of 38 tumour genomes and their comparison to matched normal DNAs indicates that cancer genome sequencing of large collections of samples will yield new insights into cancer not anticipated by existing knowledge.

...read moreread less

Abstract: Multiple myeloma is an incurable malignancy of plasma cells, and its pathogenesis is poorly understood. Here we report the massively parallel sequencing of 38 tumour genomes and their comparison to matched normal DNAs. Several new and unexpected oncogenic mechanisms were suggested by the pattern of somatic mutation across the data set. These include the mutation of genes involved in protein translation (seen in nearly half of the patients), genes involved in histone methylation, and genes involved in blood coagulation. In addition, a broader than anticipated role of NF-κB signalling was indicated by mutations in 11 members of the NF-κB pathway. Of potential immediate clinical relevance, activating mutations of the kinase BRAF were observed in 4% of patients, suggesting the evaluation of BRAF inhibitors in multiple myeloma clinical trials. These results indicate that cancer genome sequencing of large collections of samples will yield new insights into cancer not anticipated by existing knowledge.

...read moreread less

Journal Article•DOI•

Synonymous but not the same: the causes and consequences of codon bias.

[...]

Joshua B. Plotkin¹, Grzegorz Kudla²•Institutions (2)

University of Pennsylvania¹, University of Edinburgh²

01 Jan 2011-Nature Reviews Genetics

TL;DR: Ongoing work to quantify the dynamics of initiation and elongation is as important for understanding natural synonymous variation as it is for designing transgenes in applied contexts.

...read moreread less

Abstract: Despite their name, synonymous mutations have significant consequences for cellular processes in all taxa. As a result, an understanding of codon bias is central to fields as diverse as molecular evolution and biotechnology. Although recent advances in sequencing and synthetic biology have helped to resolve longstanding questions about codon bias, they have also uncovered striking patterns that suggest new hypotheses about protein synthesis. Ongoing work to quantify the dynamics of initiation and elongation is as important for understanding natural synonymous variation as it is for designing transgenes in applied contexts.

...read moreread less

Journal Article•DOI•

The ecoresponsive genome of Daphnia pulex

[...]

John K. Colbourne¹, Michael E. Pfrender², Michael E. Pfrender³, Donald L. Gilbert¹, W. Kelley Thomas⁴, Abraham E. Tucker¹, Abraham E. Tucker⁴, Todd H. Oakley⁵, Shin-ichi Tokishita⁶, Andrea Aerts⁷, Georg J. Arnold⁸, Malay Kumar Basu⁹, Malay Kumar Basu¹⁰, Darren J Bauer⁴, Carla E. Cáceres¹¹, Liran Carmel¹⁰, Liran Carmel¹², Claudio Casola¹, Jeong Hyeon Choi¹, John C. Detter⁷, Qunfeng Dong¹, Qunfeng Dong¹³, Serge Dusheyko⁷, Brian D. Eads¹, Thomas Fröhlich⁸, Kerry Geiler-Samerotte⁵, Kerry Geiler-Samerotte¹⁴, Daniel Gerlach¹⁵, Daniel Gerlach¹⁶, Phil Hatcher⁴, Sanjuro Jogdeo¹⁷, Sanjuro Jogdeo⁴, Jeroen Krijgsveld¹⁸, Evgenia V. Kriventseva¹⁵, Dietmar Kültz¹⁹, Christian Laforsch⁸, Erika Lindquist⁷, Jacqueline Lopez¹, J. Robert Manak²⁰, J. Robert Manak²¹, Jean Muller²², Jasmyn Pangilinan⁷, Rupali P Patwardhan²³, Rupali P Patwardhan¹, Samuel Pitluck⁷, Ellen J. Pritham²⁴, Andreas Rechtsteiner¹, Andreas Rechtsteiner²⁵, Mina Rho¹, Igor B. Rogozin¹⁰, Onur Sakarya⁵, Onur Sakarya²⁶, Asaf Salamov⁷, Sarah Schaack¹, Sarah Schaack²⁴, Harris Shapiro⁷, Yasuhiro Shiga⁶, Courtney Skalitzky²⁰, Zachary Smith¹, Alexander Souvorov¹⁰, Way Sung⁴, Zuojian Tang²⁷, Zuojian Tang¹, Dai Tsuchiya¹, Hank Tu⁷, Hank Tu²⁶, Harmjan R. Vos¹⁸, Mei Wang⁷, Yuri I. Wolf¹⁰, Hideo Yamagata⁶, Takuji Yamada, Yuzhen Ye¹, Joseph R. Shaw¹, Justen Andrews¹, Teresa J. Crease²⁸, Haixu Tang¹, Susan Lucas⁷, Hugh M. Robertson¹¹, Peer Bork, Eugene V. Koonin¹⁰, Evgeny M. Zdobnov¹⁵, Evgeny M. Zdobnov²⁹, Igor V. Grigoriev⁷, Michael Lynch¹, Jeffrey L. Boore⁷, Jeffrey L. Boore³⁰ - Show less +82 more•Institutions (30)

Indiana University¹, University of Notre Dame², Utah State University³, University of New Hampshire⁴, University of California, Santa Barbara⁵, University of Tokyo⁶, United States Department of Energy⁷, Ludwig Maximilian University of Munich⁸, J. Craig Venter Institute⁹, National Institutes of Health¹⁰, University of Illinois at Urbana–Champaign¹¹, Hebrew University of Jerusalem¹², University of North Texas¹³, Harvard University¹⁴, University of Geneva¹⁵, Research Institute of Molecular Pathology¹⁶, Oregon State University¹⁷, Utrecht University¹⁸, University of California, Davis¹⁹, Hoffmann-La Roche²⁰, University of Iowa²¹, University of Strasbourg²², University of Washington²³, University of Texas at Arlington²⁴, University of California, Santa Cruz²⁵, Life Technologies²⁶, New York University²⁷, University of Guelph²⁸, Imperial College London²⁹, University of California, Berkeley³⁰

04 Feb 2011-Science

TL;DR: The Daphnia genome reveals a multitude of genes and shows adaptation through gene family expansions, and the coexpansion of gene families interacting within metabolic pathways suggests that the maintenance of duplicated genes is not random.

...read moreread less

Abstract: We describe the draft genome of the microcrustacean Daphnia pulex, which is only 200 megabases and contains at least 30,907 genes. The high gene count is a consequence of an elevated rate of gene duplication resulting in tandem gene clusters. More than a third of Daphnia's genes have no detectable homologs in any other available proteome, and the most amplified gene families are specific to the Daphnia lineage. The coexpansion of gene families interacting within metabolic pathways suggests that the maintenance of duplicated genes is not random, and the analysis of gene expression under different environmental conditions reveals that numerous paralogs acquire divergent expression patterns soon after duplication. Daphnia-specific genes, including many additional loci within sequenced regions that are otherwise devoid of annotations, are the most responsive genes to ecological challenges.

...read moreread less

Journal Article•DOI•

The Medicago genome provides insight into the evolution of rhizobial symbioses

[...]

Nevin D. Young¹, Frédéric Debellé², Frédéric Debellé³, Giles E. D. Oldroyd⁴, René Geurts⁵, Steven B. Cannon⁶, Steven B. Cannon⁷, Michael K. Udvardi, Vagner A. Benedito⁸, Klaus F. X. Mayer, Jérôme Gouzy³, Jérôme Gouzy², Heiko Schoof⁹, Yves Van de Peer¹⁰, Sebastian Proost¹⁰, Douglas R. Cook¹¹, Blake C. Meyers¹², Manuel Spannagl, Foo Cheung¹³, Stéphane De Mita⁵, Vivek Krishnakumar¹³, Heidrun Gundlach, Shiguo Zhou¹⁴, Joann Mudge¹⁵, Arvind K. Bharti¹⁵, Jeremy D. Murray⁴, Marina Naoumkina, Benjamin D. Rosen¹¹, Kevin A. T. Silverstein¹, Haibao Tang¹³, Stephane Rombauts¹⁰, Patrick X. Zhao, Peng Zhou¹, Valérie Barbe, Philippe Bardou², Philippe Bardou³, Michael Bechner¹⁴, Arnaud Bellec², Anne Berger, Hélène Bergès², Shelby L. Bidwell¹³, Ton Bisseling¹⁶, Ton Bisseling⁵, Nathalie Choisne, Arnaud Couloux, Roxanne Denny¹, Shweta Deshpande¹⁷, Xinbin Dai, Jeff J. Doyle¹⁸, Anne Marie Dudez³, Anne Marie Dudez², Andrew Farmer¹⁵, Stéphanie Fouteau, Carolien Franken⁵, Chrystel Gibelin³, Chrystel Gibelin², John Gish¹¹, Steven A. Goldstein¹⁴, Alvaro J. González¹², Pamela J. Green¹², Asis Hallab¹⁹, Marijke Hartog⁵, Axin Hua¹⁷, Sean Humphray²⁰, Dong-Hoon Jeong¹², Yi Jing¹⁷, Anika Jöcker¹⁹, Steve Kenton¹⁷, Dong-Jin Kim¹¹, Dong-Jin Kim²¹, Kathrin Klee¹⁹, Hongshing Lai¹⁷, Chunting Lang⁵, Shaoping Lin¹⁷, Simone L. Macmil¹⁷, Ghislaine Magdelenat, Lucy Matthews²⁰, Jamison McCorrison¹³, Erin L. Monaghan¹³, Jeong Hwan Mun¹¹, Jeong Hwan Mun²², Fares Z. Najar¹⁷, Christine Nicholson²⁰, Céline Noirot², Majesta O'Bleness¹⁷, Charles Paule¹, Julie Poulain, Florent Prion³, Florent Prion², Baifang Qin¹⁷, Chunmei Qu¹⁷, Ernest F. Retzel¹⁵, Claire Riddle²⁰, Erika Sallet², Erika Sallet³, Sylvie Samain, Nicolas Samson², Nicolas Samson³, Iryna Sanders¹⁷, Olivier Saurat², Olivier Saurat³, Claude Scarpelli, Thomas Schiex², Béatrice Segurens, Andrew J. Severin⁷, D. Janine Sherrier¹², Ruihua Shi¹⁷, Sarah Sims²⁰, Susan R. Singer²³, Senjuti Sinharoy, Lieven Sterck¹⁰, Agnès Viollet, Bing Bing Wang¹, Keqin Wang¹⁷, Mingyi Wang, Xiaohong Wang¹, Jens Warfsmann¹⁹, Jean Weissenbach, Doug White¹⁷, James D. White¹⁷, Graham B. Wiley¹⁷, Patrick Wincker, Yanbo Xing¹⁷, Limei Yang¹⁷, Ziyun Yao¹⁷, Fu Ying¹⁷, Jixian Zhai¹², Liping Zhou¹⁷, Antoine Zuber³, Antoine Zuber², Jean Dénarié³, Jean Dénarié², Richard A. Dixon, Gregory D. May¹⁵, David C. Schwartz¹⁴, Jane Rogers²⁴, Francis Quetier, Christopher D. Town¹³, Bruce A. Roe¹⁷ - Show less +135 more•Institutions (24)

22 Dec 2011-Nature

TL;DR: The draft sequence of the M. truncatula genome sequence is described, a close relative of alfalfa (Medicago sativa), a widely cultivated crop with limited genomics tools and complex autotetraploid genetics, which provides significant opportunities to expand al falfa’s genomic toolbox.

...read moreread less

Abstract: Legumes (Fabaceae or Leguminosae) are unique among cultivated plants for their ability to carry out endosymbiotic nitrogen fixation with rhizobial bacteria, a process that takes place in a specialized structure known as the nodule. Legumes belong to one of the two main groups of eurosids, the Fabidae, which includes most species capable of endosymbiotic nitrogen fixation. Legumes comprise several evolutionary lineages derived from a common ancestor 60 million years ago (Myr ago). Papilionoids are the largest clade, dating nearly to the origin of legumes and containing most cultivated species. Medicago truncatula is a long-established model for the study of legume biology. Here we describe the draft sequence of the M. truncatula euchromatin based on a recently completed BAC assembly supplemented with Illumina shotgun sequence, together capturing ∼94% of all M. truncatula genes. A whole-genome duplication (WGD) approximately 58 Myr ago had a major role in shaping the M. truncatula genome and thereby contributed to the evolution of endosymbiotic nitrogen fixation. Subsequent to the WGD, the M. truncatula genome experienced higher levels of rearrangement than two other sequenced legumes, Glycine max and Lotus japonicus. M. truncatula is a close relative of alfalfa (Medicago sativa), a widely cultivated crop with limited genomics tools and complex autotetraploid genetics. As such, the M. truncatula genome sequence provides significant opportunities to expand alfalfa's genomic toolbox.

...read moreread less

Journal Article•DOI•

Metagenomic discovery of biomass-degrading genes and genomes from cow rumen.

[...]

Matthias Hess¹, Matthias Hess², Alexander Sczyrba¹, Alexander Sczyrba², Rob Egan², Rob Egan¹, Tae-Wan Kim³, Harshal A. Chokhawala³, Gary P. Schroth⁴, Shujun Luo⁴, Douglas S. Clark³, Feng Chen¹, Feng Chen², Tao Zhang², Tao Zhang¹, Roderick I. Mackie⁵, Len A. Pennacchio¹, Len A. Pennacchio², Susannah G. Tringe¹, Susannah G. Tringe², Axel Visel¹, Axel Visel², Tanja Woyke¹, Tanja Woyke², Zhong Wang¹, Zhong Wang², Edward M. Rubin¹, Edward M. Rubin² - Show less +24 more•Institutions (5)

Lawrence Berkeley National Laboratory¹, Joint Genome Institute², University of California, Berkeley³, Illumina⁴, University of Illinois at Urbana–Champaign⁵

28 Jan 2011-Science

TL;DR: To characterize biomass-degrading genes and genomes, this work sequenced and analyzed 268 gigabases of metagenomic DNA from microbes adherent to plant fiber incubated in cow rumen and identified 27,755 putative carbohydrate-active genes and expressed 90 candidate proteins, of which 57% were enzymatically active against cellulosic substrates.

...read moreread less

Abstract: The paucity of enzymes that efficiently deconstruct plant polysaccharides represents a major bottleneck for industrial-scale conversion of cellulosic biomass into biofuels. Cow rumen microbes specialize in degradation of cellulosic plant material, but most members of this complex community resist cultivation. To characterize biomass-degrading genes and genomes, we sequenced and analyzed 268 gigabases of metagenomic DNA from microbes adherent to plant fiber incubated in cow rumen. From these data, we identified 27,755 putative carbohydrate-active genes and expressed 90 candidate proteins, of which 57% were enzymatically active against cellulosic substrates. We also assembled 15 uncultured microbial genomes, which were validated by complementary methods including single-cell genome sequencing. These data sets provide a substantially expanded catalog of genes and genomes participating in the deconstruction of cellulosic biomass.

...read moreread less

Initial genome sequencing and analysis of multiple myeloma

[...]

01 Mar 2011

TL;DR: In this paper, a massively parallel sequencing of 38 tumour genomes and their comparison to matched normal DNAs was reported, and several new and unexpected oncogenic mechanisms were suggested by the pattern of somatic mutation across the data set.

...read moreread less

Journal Article•DOI•

Mapping copy number variation by population-scale genome sequencing

[...]

Ryan E. Mills¹, Klaudia Walter², Chip Stewart³, Robert E. Handsaker⁴ +371 more•Institutions (21)

03 Feb 2011-Nature

TL;DR: A map of unbalanced SVs is constructed based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations, and serves as a resource for sequencing-based association studies.

...read moreread less

Abstract: Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analysing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.

...read moreread less

Journal Article•DOI•

The genome of woodland strawberry ( Fragaria vesca )

[...]

Vladimir Shulaev¹, Daniel J. Sargent², Ross N. Crowhurst³, Todd C. Mockler⁴, Otto Folkerts, Arthur L. Delcher⁵, Pankaj Jaiswal⁴, Keithanne Mockaitis⁶, Aaron Liston⁴, Shrinivasrao P. Mane⁷, Paul Burns⁸, Thomas M. Davis⁹, Janet P. Slovin¹⁰, Nahla V. Bassil¹⁰, Roger P. Hellens³, Clive Evans⁷, Tim Harkins¹¹, Chinnappa D. Kodira¹¹, Brian Desany¹¹, Oswald Crasta, Roderick V. Jensen⁷, Andrew C. Allan¹², Andrew C. Allan³, Todd P. Michael¹³, João C. Setubal⁷, Jean-Marc Celton¹⁴, D. Jasper G. Rees¹⁴, Kelly P. Williams⁷, Sarah H. Holt⁷, Juan Jairo Ruiz Rojas⁷, Mithu Chatterjee¹⁵, Bo Liu⁹, Herman Silva¹⁶, Lee A. Meisel¹⁷, Avital Adato¹⁸, Sergei A. Filichkin⁴, Michela Troggio, Roberto Viola, Tia-Lynn Ashman¹⁹, Hao Wang²⁰, Palitha Dharmawardhana⁴, Justin Elser⁴, Rajani Raja⁴, Henry D. Priest⁴, Douglas W. Bryant⁴, Samuel E. Fox⁴, Scott A. Givan⁴, Larry J. Wilhelm⁴, Sushma Naithani⁴, Alan Christoffels¹⁴, David Y. Salama¹⁵, Jade Carter⁶, Elena Lopez Girona², Anna Zdepski¹³, Wenqin Wang¹³, Randall A. Kerstetter¹³, Wilfried Schwab²¹, Schuyler S. Korban²², Jahn Davik, Amparo Monfort, Beatrice Denoyes-Rothan²³, Pere Arús, Ron Mittler¹, Barry S. Flinn, Asaph Aharoni¹⁷, Jeffrey L. Bennetzen²⁰, Steven L. Salzberg⁵, Allan W. Dickerman⁷, Riccardo Velasco, Mark Borodovsky⁸, Richard E. Veilleux⁷, Kevin M. Folta¹⁵ - Show less +68 more•Institutions (23)

01 Feb 2011-Nature Genetics

TL;DR: New phylogenetic analysis of 154 protein-coding genes suggests that assignment of Populus to Malvidae, rather than Fabidae, is warranted, and macrosyntenic relationships between Fragaria and Prunus predict a hypothetical ancestral Rosaceae genome that had nine chromosomes.

...read moreread less

Abstract: The woodland strawberry, Fragaria vesca (2n = 2x = 14), is a versatile experimental plant system. This diminutive herbaceous perennial has a small genome (240 Mb), is amenable to genetic transformation and shares substantial sequence identity with the cultivated strawberry (Fragaria × ananassa) and other economically important rosaceous plants. Here we report the draft F. vesca genome, which was sequenced to ×39 coverage using second-generation technology, assembled de novo and then anchored to the genetic linkage map into seven pseudochromosomes. This diploid strawberry sequence lacks the large genome duplications seen in other rosids. Gene prediction modeling identified 34,809 genes, with most being supported by transcriptome mapping. Genes critical to valuable horticultural traits including flavor, nutritional value and flowering time were identified. Macrosyntenic relationships between Fragaria and Prunus predict a hypothetical ancestral Rosaceae genome that had nine chromosomes. New phylogenetic analysis of 154 protein-coding genes suggests that assignment of Populus to Malvidae, rather than Fabidae, is warranted.

...read moreread less

Journal Article•DOI•

A high-resolution map of human evolutionary constraint using 29 mammals.

[...]

Kerstin Lindblad-Toh¹, Manuel Garber¹, Or Zuk¹, Michael F. Lin², Michael F. Lin¹, Brian J. Parker³, Stefan Washietl², Pouya Kheradpour², Pouya Kheradpour¹, Jason Ernst¹, Jason Ernst², Gregory E. Jordan⁴, Evan Mauceli¹, Lucas D. Ward², Lucas D. Ward¹, Craig B. Lowe⁵, Craig B. Lowe⁶, Craig B. Lowe⁷, Alisha K. Holloway⁸, Michele Clamp¹, Sante Gnerre¹, Jessica Alföldi¹, Kathryn Beal⁴, Jean Chang¹, Hiram Clawson⁵, James Cuff⁹, Federica Di Palma¹, Stephen Fitzgerald⁴, Paul Flicek⁴, Mitchell Guttman¹, Melissa J. Hubisz¹⁰, David B. Jaffe¹, Irwin Jungreis², W. James Kent⁸, Dennis Kostka⁸, Marcia Lara¹, André L. Martins¹⁰, Tim Massingham⁴, Ida Moltke³, Brian J. Raney⁵, Matthew D. Rasmussen², James Robinson¹, Alexander Stark¹¹, Albert J. Vilella⁴, Jiayu Wen³, Xiaohui Xie¹, Michael C. Zody¹, Kim C. Worley¹², Christie Kovar¹², Donna M. Muzny¹², Richard A. Gibbs¹², Wesley C. Warren¹³, Elaine R. Mardis¹³, George M. Weinstock¹³, George M. Weinstock¹², Richard K. Wilson¹³, Ewan Birney⁴, Elliott H. Margulies¹⁴, Javier Herrero⁴, Eric D. Green¹⁴, David Haussler⁷, David Haussler⁵, Adam Siepel¹⁰, Nick Goldman⁴, Katherine S. Pollard⁸, Jakob Skou Pedersen¹⁵, Jakob Skou Pedersen³, Eric S. Lander¹, Manolis Kellis², Manolis Kellis¹ - Show less +66 more•Institutions (15)

Massachusetts Institute of Technology¹, Vassar College², University of Copenhagen³, Wellcome Trust⁴, University of California, Santa Cruz⁵, Stanford University⁶, Howard Hughes Medical Institute⁷, University of California, San Francisco⁸, Harvard University⁹, Cornell University¹⁰, Research Institute of Molecular Pathology¹¹, Human Genome Sequencing Center¹², Washington University in St. Louis¹³, National Institutes of Health¹⁴, Aarhus University Hospital¹⁵

27 Oct 2011-Nature

TL;DR: The comparison of related genomes has emerged as a powerful lens for genome interpretation and sequencing and comparative analysis of 29 eutherian genomes confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering ∼4.2%" of the genome.

...read moreread less

Abstract: The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering ∼4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for ∼60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements and more than 1,000 primate- and human-accelerated elements. Overlap with disease-associated variants indicates that our findings will be relevant for studies of human biology, health and disease.

...read moreread less

Journal Article•DOI•

A comprehensive genome‐scale reconstruction of Escherichia coli metabolism—2011

[...]

Jeffrey D. Orth¹, Tom M Conrad¹, Jessica Na¹, Joshua A. Lerman¹, Hojung Nam¹, Adam M. Feist¹, Bernhard O. Palsson¹ - Show less +3 more•Institutions (1)

University of California, San Diego¹

01 Jan 2011-Molecular Systems Biology

TL;DR: The initial genome‐scale reconstruction of the metabolic network of Escherichia coli K‐12 MG1655 was assembled in 2000 and an update has now been built, named iJO1366, which accounts for 1366 genes, 2251 metabolic reactions, and 1136 unique metabolites.

...read moreread less

Abstract: The initial genome-scale reconstruction of the metabolic network of Escherichia coli K-12 MG1655 was assembled in 2000. It has been updated and periodically released since then based on new and curated genomic and biochemical knowledge. An update has now been built, named iJO1366, which accounts for 1366 genes, 2251 metabolic reactions, and 1136 unique metabolites. iJO1366 was (1) updated in part using a new experimental screen of 1075 gene knockout strains, illuminating cases where alternative pathways and isozymes are yet to be discovered, (2) compared with its predecessor and to experimental data sets to confirm that it continues to make accurate phenotypic predictions of growth on different substrates and for gene knockout strains, and (3) mapped to the genomes of all available sequenced E. coli strains, including pathogens, leading to the identification of hundreds of unannotated genes in these organisms. Like its predecessors, the iJO1366 reconstruction is expected to be widely deployed for studying the systems biology of E. coli and for metabolic engineering applications.

...read moreread less

Journal Article•DOI•

Initial impact of the sequencing of the human genome

[...]

Eric S. Lander¹•Institutions (1)

Broad Institute¹

10 Feb 2011-Nature

TL;DR: The sequence of the human genome has dramatically accelerated biomedical research in the decade since its publication and its impact on understanding of the biological functions encoded in the genome, on the biological basis of inherited diseases and cancer, and on the evolution and history of thehuman species is explored.

...read moreread less

Abstract: The sequence of the human genome has dramatically accelerated biomedical research. Here I explore its impact, in the decade since its publication, on our understanding of the biological functions encoded in the genome, on the biological basis of inherited diseases and cancer, and on the evolution and history of the human species. I also discuss the road ahead in fulfilling the promise of genomics for medicine.

...read moreread less

Journal Article•DOI•

Repetitive Elements May Comprise Over Two-Thirds of the Human Genome

[...]

A. P. Jason de Koning¹, Wanjun Gu¹, Todd A. Castoe¹, Mark A. Batzer², David D. Pollock¹ - Show less +1 more•Institutions (2)

University of Colorado Boulder¹, Louisiana State University²

01 Dec 2011-PLOS Genetics

TL;DR: It is shown here that P-clouds predicts >840 Mbp of additional repetitive sequences in the human genome, thus suggesting that 66%–69% of the human chromosome is repetitive or repeat-derived, and that the human genomes consists of substantially more repetitive sequence than previously believed.

...read moreread less

Abstract: Transposable elements (TEs) are conventionally identified in eukaryotic genomes by alignment to consensus element sequences. Using this approach, about half of the human genome has been previously identified as TEs and low-complexity repeats. We recently developed a highly sensitive alternative de novo strategy, P-clouds, that instead searches for clusters of high-abundance oligonucleotides that are related in sequence space (oligo “clouds”). We show here that P-clouds predicts >840 Mbp of additional repetitive sequences in the human genome, thus suggesting that 66%–69% of the human genome is repetitive or repeat-derived. To investigate this remarkable difference, we conducted detailed analyses of the ability of both P-clouds and a commonly used conventional approach, RepeatMasker (RM), to detect different sized fragments of the highly abundant human Alu and MIR SINEs. RM can have surprisingly low sensitivity for even moderately long fragments, in contrast to P-clouds, which has good sensitivity down to small fragment sizes (∼25 bp). Although short fragments have a high intrinsic probability of being false positives, we performed a probabilistic annotation that reflects this fact. We further developed “element-specific” P-clouds (ESPs) to identify novel Alu and MIR SINE elements, and using it we identified ∼100 Mb of previously unannotated human elements. ESP estimates of new MIR sequences are in good agreement with RM-based predictions of the amount that RM missed. These results highlight the need for combined, probabilistic genome annotation approaches and suggest that the human genome consists of substantially more repetitive sequence than previously believed.

...read moreread less

Journal Article•DOI•

COSMIC: the catalogue of somatic mutations in cancer

[...]

Nidhi Bindal¹, Simon A. Forbes¹, David Beare¹, Prasad Gunasekaran¹, Kenric Leung¹, Chai Yin Kok¹, Mingming Jia¹, Sally Bamford¹, Charlotte G. Cole¹, Sari Ward¹, Jon W. Teague¹, Michael R. Stratton¹, Peter J. Campbell¹, Andrew Futreal¹ - Show less +10 more•Institutions (1)

Wellcome Trust Sanger Institute¹

19 Sep 2011-Genome Biology

TL;DR: The Catalogue Of Somatic Mutations In Cancer (COSMIC), one of the largest repositories of information on somatic mutations in human cancer, curates and standardizes this information in a single database, providing user-friendly browsing tools and analytical functions, thus ensuring its role as a key resource inhuman cancer genetics.

...read moreread less

Abstract: The Catalogue Of Somatic Mutations In Cancer (COSMIC) [1] is one of the largest repositories of information on somatic mutations in human cancer. The project has been running for more than ten years as part of the Cancer Genome Project (CGP) at the Wellcome Trust Sanger Institute in the UK. The data in COSMIC are curated from a variety of sources, primarily the scientific literature and large international consortia. The project includes information from the CGP, along with data from other consortia such as the International Cancer Genome Consortium and The Cancer Genome Atlas. In addition, COSMIC is regularly updated with the genes highlighted in the Cancer Gene Census, which curates the scientific literature for known cancer genes [2]. With the advent of whole exome and genome sequencing technology, the amount of data in COSMIC is increasing rapidly. The recent COSMIC release (version 53; 18 May 2011) contains 608,042 tumor and cell line samples, annotating 176,856 mutations across 19,439 genes, with 352 full exomes, 43 whole genome rearrangement screens and 4 full genomes now available. The data are updated regularly, with new releases scheduled every two months. COSMIC provides a large number of graphical and tabular views for interpreting and mining the large quantity of information, as well as the facility to export the relevant data in various formats. The website can be navigated in many ways to examine mutation patterns on the basis of genes, samples and phenotypes, which are the main entry points to COSMIC. COSMIC also provides various options to browse the data in a genomic context. Integration with the Ensembl genome browser allows the visualization of full genome annotations, together with COSMIC data, on the GRCh37 genome coordinates. COSMIC also contains its own genome browser, which facilitates data analysis by combining genome-wide gene structures and sequences with rearrangement breakpoints, copy number variations and all somatic substitutions, deletions, insertions and complex gene mutations. The main COSMIC website [1] encompasses all of the available data. However, within COSMIC, the Cancer Cell Line Project [3] is a specialized component, which provides details of the genotyping of almost 800 commonly used cancer cell lines, through the set of known cancer genes. Its focus is to identify driver mutations, or those likely to be implicated in the oncogenesis of each tumor. This information forms the basis for integrating COSMIC with the Genomics of Drug Sensitivity in Cancer project [4], which is a joint effort with the Massachusetts General Hospital [5] to screen this panel of cancer cell lines against potential anticancer therapeutic compounds to investigate correlations between somatic mutations and drug sensitivity. Data on somatic mutations in cancer are being produced at a rapidly increasing rate, and the combined analysis of large distributed datasets is becoming ever more difficult. However, COSMIC curates and standardizes this information in a single database, providing user-friendly browsing tools and analytical functions, thus ensuring its role as a key resource in human cancer genetics.

...read moreread less

Journal Article•DOI•

Selective chemical labeling reveals the genome-wide distribution of 5-hydroxymethylcytosine

[...]

Chun-Xiao Song¹, Keith E. Szulwach², Ye Fu¹, Qing Dai¹, Chengqi Yi¹, Xuekun Li², Yujing Li², Chih Hsin Chen³, Wen Zhang¹, Xing Jian¹, Jing Wang¹, Li Zhang³, Timothy J. Looney³, Baichen Zhang⁴, Lucy A. Godley¹, Leslie M. Hicks⁴, Bruce T. Lahn³, Peng Jin², Chuan He¹ - Show less +15 more•Institutions (4)

University of Chicago¹, Emory University², Howard Hughes Medical Institute³, Donald Danforth Plant Science Center⁴

01 Jan 2011-Nature Biotechnology

TL;DR: This method uses the T4 bacteriophage β-glucosyltransferase to transfer an engineered glucose moiety containing an azide group onto the hydroxyl group of 5-hmC, a recently identified epigenetic modification present in substantial amounts in certain mammalian cell types.

...read moreread less

Abstract: In contrast to 5-methylcytosine (5-mC), which has been studied extensively, little is known about 5-hydroxymethylcytosine (5-hmC), a recently identified epigenetic modification present in substantial amounts in certain mammalian cell types. Here we present a method for determining the genome-wide distribution of 5-hmC. We use the T4 bacteriophage β-glucosyltransferase to transfer an engineered glucose moiety containing an azide group onto the hydroxyl group of 5-hmC. The azide group can be chemically modified with biotin for detection, affinity enrichment and sequencing of 5-hmC-containing DNA fragments in mammalian genomes. Using this method, we demonstrate that 5-hmC is present in human cell lines beyond those previously recognized. We also find a gene expression level-dependent enrichment of intragenic 5-hmC in mouse cerebellum and an age-dependent acquisition of this modification in specific gene bodies linked to neurodegenerative disorders.

...read moreread less

Journal Article•DOI•

Analysis of the coding genome of diffuse large B-cell lymphoma

[...]

Laura Pasqualucci¹, Vladimir Trifonov¹, Giulia Fabbri¹, Jing Ma², Davide Rossi³, Annalisa Chiarenza¹, Victoria A. Wells¹, Adina Grunn¹, Monica Messina¹, Oliver Elliot¹, Joseph M. Chan¹, Govind Bhagat¹, Amy Chadburn⁴, Gianluca Gaidano³, Charles G. Mullighan², Raul Rabadan¹, Riccardo Dalla-Favera - Show less +13 more•Institutions (4)

Columbia University¹, St. Jude Children's Research Hospital², University of Eastern Piedmont³, Northwestern University⁴

01 Sep 2011-Nature Genetics

TL;DR: By combining next-generation sequencing and copy number analysis, it is shown that the DLBCL coding genome contains, on average, more than 30 clonally represented gene alterations per case and novel dysregulated pathways underlying its pathogenesis are identified.

...read moreread less

Abstract: Diffuse large B-cell lymphoma (DLBCL) is the most common form of human lymphoma. Although a number of structural alterations have been associated with the pathogenesis of this malignancy, the full spectrum of genetic lesions that are present in the DLBCL genome, and therefore the identity of dysregulated cellular pathways, remains unknown. By combining next-generation sequencing and copy number analysis, we show that the DLBCL coding genome contains, on average, more than 30 clonally represented gene alterations per case. This analysis also revealed mutations in genes not previously implicated in DLBCL pathogenesis, including those regulating chromatin methylation (MLL2; 24% of samples) and immune recognition by T cells. These results provide initial data on the complexity of the DLBCL coding genome and identify novel dysregulated pathways underlying its pathogenesis.

...read moreread less

Collapse