Showing papers on "Genomics published in 2014"

PDF

Open Access

Journal Article•DOI•

Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement

[...]

Bruce J. Walker¹, Thomas Abeel², Terrance Shea¹, Margaret Priest¹, Amr Abouelliel¹, Sharadha Sakthikumar¹, Christina A. Cuomo¹, Qiandong Zeng¹, Jennifer R. Wortman¹, Sarah Young¹, Ashlee M. Earl¹ - Show less +7 more•Institutions (2)

Broad Institute¹, Ghent University²

19 Nov 2014-PLOS ONE

TL;DR: Pilon is a fully automated, all-in-one tool for correcting draft assemblies and calling sequence variants of multiple sizes, including very large insertions and deletions, which is being used to improve the assemblies of thousands of new genomes and to identify variants from thousands of clinically relevant bacterial strains.

...read moreread less

Abstract: Advances in modern sequencing technologies allow us to generate sufficient data to analyze hundreds of bacterial genomes from a single machine in a single day. This potential for sequencing massive numbers of genomes calls for fully automated methods to produce high-quality assemblies and variant calls. We introduce Pilon, a fully automated, all-in-one tool for correcting draft assemblies and calling sequence variants of multiple sizes, including very large insertions and deletions. Pilon works with many types of sequence data, but is particularly strong when supplied with paired end data from two Illumina libraries with small e.g., 180 bp and large e.g., 3-5 Kb inserts. Pilon significantly improves draft genome assemblies by correcting bases, fixing mis-assemblies and filling gaps. For both haploid and diploid genomes, Pilon produces more contiguous genomes with fewer errors, enabling identification of more biologically relevant genes. Furthermore, Pilon identifies small variants with high accuracy as compared to state-of-the-art tools and is unique in its ability to accurately identify large sequence variants including duplications and resolve large insertions. Pilon is being used to improve the assemblies of thousands of new genomes and to identify variants from thousands of clinically relevant bacterial strains. Pilon is freely available as open source software.

...read moreread less

5,659 citations

Journal Article•DOI•

A general framework for estimating the relative pathogenicity of human genetic variants

[...]

Martin Kircher¹, Daniela Witten¹, Preti Jain, Brian J. O'Roak², Brian J. O'Roak¹, Gregory M. Cooper, Jay Shendure¹ - Show less +3 more•Institutions (2)

University of Washington¹, Oregon Health & Science University²

01 Mar 2014-Nature Genetics

TL;DR: The ability of CADD to prioritize functional, deleterious and pathogenic variants across many functional categories, effect sizes and genetic architectures is unmatched by any current single-annotation method.

...read moreread less

Abstract: Our capacity to sequence human genomes has exceeded our ability to interpret genetic variation. Current genomic annotations tend to exploit a single information type (e.g. conservation) and/or are restricted in scope (e.g. to missense changes). Here, we describe Combined Annotation Dependent Depletion (CADD), a framework that objectively integrates many diverse annotations into a single, quantitative score. We implement CADD as a support vector machine trained to differentiate 14.7 million high-frequency human derived alleles from 14.7 million simulated variants. We pre-compute “C-scores” for all 8.6 billion possible human single nucleotide variants and enable scoring of short insertions/deletions. C-scores correlate with allelic diversity, annotations of functionality, pathogenicity, disease severity, experimentally measured regulatory effects, and complex trait associations, and highly rank known pathogenic variants within individual genomes. The ability of CADD to prioritize functional, deleterious, and pathogenic variants across many functional categories, effect sizes and genetic architectures is unmatched by any current annotation.

...read moreread less

4,956 citations

Journal Article•DOI•

CRISPR-Cas systems for editing, regulating and targeting genomes

[...]

Jeffry D. Sander¹, J. Keith Joung¹•Institutions (1)

Harvard University¹

01 Apr 2014-Nature Biotechnology

TL;DR: A modified version of the CRISPR-Cas9 system has been developed to recruit heterologous domains that can regulate endogenous gene expression or label specific genomic loci in living cells, which will undoubtedly transform biological research and spur the development of novel molecular therapeutics for human disease.

...read moreread less

Abstract: Targeted genome editing using engineered nucleases has rapidly gone from being a niche technology to a mainstream method used by many biological researchers. This widespread adoption has been largely fueled by the emergence of the clustered, regularly interspaced, short palindromic repeat (CRISPR) technology, an important new approach for generating RNA-guided nucleases, such as Cas9, with customizable specificities. Genome editing mediated by these nucleases has been used to rapidly, easily and efficiently modify endogenous genes in a wide variety of biomedically important cell types and in organisms that have traditionally been challenging to manipulate genetically. Furthermore, a modified version of the CRISPR-Cas9 system has been developed to recruit heterologous domains that can regulate endogenous gene expression or label specific genomic loci in living cells. Although the genome-wide specificities of CRISPR-Cas9 systems remain to be fully defined, the power of these systems to perform targeted, highly efficient alterations of genome sequence and gene expression will undoubtedly transform biological research and spur the development of novel molecular therapeutics for human disease.

...read moreread less

2,930 citations

Journal Article•DOI•

Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics

[...]

Linn Fagerberg¹, Björn M. Hallström¹, Per Oksvold¹, Caroline Kampf², Dijana Djureinovic², Jacob Odeberg¹, Masato Habuka¹, Simin Tahmasebpoor², Angelika Danielsson², Karolina Edlund², Anna Asplund², Evelina Sjöstedt², Emma Lundberg¹, Cristina Al-Khalili Szigyarto¹, Marie Skogs¹, Jenny Ottosson Takanen¹, Holger Berling¹, Hanna Tegel¹, Jan Mulder³, Peter Nilsson¹, Jochen M. Schwenk¹, Cecilia Lindskog², Frida Danielsson¹, Adil Mardinoglu⁴, Åsa Sivertsson¹, Kalle von Feilitzen¹, Mattias Forsberg¹, Martin Zwahlen¹, IngMarie Olsson², Sanjay Navani, Mikael Huss¹, Jens Nielsen¹, Jens Nielsen⁴, Fredrik Pontén², Mathias Uhlén¹ - Show less +31 more•Institutions (4)

Royal Institute of Technology¹, Uppsala University², Science for Life Laboratory³, Chalmers University of Technology⁴

01 Feb 2014-Molecular & Cellular Proteomics

TL;DR: A quantitative transcriptomics analysis (RNA-Seq) is used to classify the tissue-specific expression of genes across a representative set of all major human organs and tissues and combined this analysis with antibody-based profiling of the same tissues.

...read moreread less

2,512 citations

Journal Article•DOI•

BEDTools: The Swiss‐Army Tool for Genome Feature Analysis

[...]

Aaron R. Quinlan¹•Institutions (1)

University of Virginia¹

08 Sep 2014-Current protocols in human genetics

TL;DR: The BEDTools toolkit as discussed by the authors is a toolkit for the exploration of high-throughput genomics datasets, which can be combined to create bespoke pipelines addressing complex questions.

...read moreread less

Abstract: Technological advances have enabled the use of DNA sequencing as a flexible tool to characterize genetic variation and to measure the activity of diverse cellular phenomena such as gene isoform expression and transcription factor binding Extracting biological insight from the experiments enabled by these advances demands the analysis of large, multi-dimensional datasets This unit describes the use of the BEDTools toolkit for the exploration of high-throughput genomics datasets Several protocols are presented for common genomic analyses, demonstrating how simple BEDTools operations may be combined to create bespoke pipelines addressing complex questions

...read moreread less

1,716 citations

Book Chapter•DOI•

Identification of Mutations in Laboratory-Evolved Microbes from Next-Generation Sequencing Data Using breseq

[...]

Daniel E. Deatherage¹, Jeffrey E. Barrick¹•Institutions (1)

University of Texas at Austin¹

01 Jan 2014-Methods of Molecular Biology

TL;DR: How to run the open-source breseq computational pipeline to identify and annotate genetic differences found in whole-genome and whole-population NGS data from haploid microbes where a high-quality reference genome is available is described.

...read moreread less

Abstract: Next-generation DNA sequencing (NGS) can be used to reconstruct eco-evolutionary population dynamics and to identify the genetic basis of adaptation in laboratory evolution experiments. Here, we describe how to run the open-source breseq computational pipeline to identify and annotate genetic differences found in whole-genome and whole-population NGS data from haploid microbes where a high-quality reference genome is available. These methods can also be used to analyze mutants isolated in genetic screens and to detect unintended mutations that may occur during strain construction and genome editing.

...read moreread less

1,077 citations

Journal Article•DOI•

MycoCosm portal: gearing up for 1000 fungal genomes

[...]

Igor V. Grigoriev¹, Roman Nikitin¹, Sajeet Haridas¹, Alan Kuo¹, Robin A. Ohm¹, Robert Otillar¹, Robert Riley¹, Asaf Salamov¹, Xueling Zhao¹, Frank Korzeniewski¹, Tatyana Smirnova¹, Henrik P. Nordberg¹, Inna Dubchak¹, Igor Shabalov¹ - Show less +10 more•Institutions (1)

United States Department of Energy¹

01 Jan 2014-Nucleic Acids Research

TL;DR: MycoCosm is a fungal genomics portal developed by the US Department of Energy Joint Genome Institute to support integration, analysis and dissemination of fungal genome sequences and other 'omics' data by providing interactive web-based tools.

...read moreread less

Abstract: MycoCosm is a fungal genomics portal (http://jgi.doe.gov/fungi), developed by the US Department of Energy Joint Genome Institute to support integration, analysis and dissemination of fungal genome sequences and other 'omics' data by providing interactive web-based tools. MycoCosm also promotes and facilitates user community participation through the nomination of new species of fungi for sequencing, and the annotation and analysis of resulting data. By efficiently filling gaps in the Fungal Tree of Life, MycoCosm will help address important problems associated with energy and the environment, taking advantage of growing fungal genomics resources.

...read moreread less

1,037 citations

Journal Article•DOI•

Enhanced homology-directed human genome engineering by controlled timing of CRISPR/Cas9 delivery.

[...]

Steven Lin¹, Brett T. Staahl¹, Ravi K Alla¹, Jennifer A. Doudna¹•Institutions (1)

University of California, Berkeley¹

15 Dec 2014-eLife

TL;DR: It is shown here that new genetic information can be introduced site-specifically and with high efficiency by homology-directed repair (HDR) of Cas9-induced site- specific double-strand DNA breaks using timed delivery ofCas9-guide RNA ribonucleoprotein (RNP) complexes.

...read moreread less

Abstract: The CRISPR/Cas9 system is a robust genome editing technology that works in human cells, animals and plants based on the RNA-programmed DNA cleaving activity of the Cas9 enzyme. Building on previous work (Jinek et al., 2013), we show here that new genetic information can be introduced site-specifically and with high efficiency by homology-directed repair (HDR) of Cas9-induced site-specific double-strand DNA breaks using timed delivery of Cas9-guide RNA ribonucleoprotein (RNP) complexes. Cas9 RNP-mediated HDR in HEK293T, human primary neonatal fibroblast and human embryonic stem cells was increased dramatically relative to experiments in unsynchronized cells, with rates of HDR up to 38% observed in HEK293T cells. Sequencing of on- and potential off-target sites showed that editing occurred with high fidelity, while cell mortality was minimized. This approach provides a simple and highly effective strategy for enhancing site-specific genome engineering in both transformed and primary human cells.

...read moreread less

988 citations

Journal Article•DOI•

Insect Mitochondrial Genomics: Implications for Evolution and Phylogeny

[...]

Stephen L. Cameron¹•Institutions (1)

Queensland University of Technology¹

07 Jan 2014-Annual Review of Entomology

TL;DR: Insects are model systems for studying aberrant mt genomes, including truncated tRNAs and multichromosomal genomes, and greater integration of nuclear and mt genomic studies is necessary to further the understanding of insect genomic evolution.

...read moreread less

Abstract: The mitochondrial (mt) genome is, to date, the most extensively studied genomic system in insects, outnumbering nuclear genomes tenfold and representing all orders versus very few. Phylogenomic analysis methods have been tested extensively, identifying compositional bias and rate variation, both within and between lineages, as the principal issues confronting accurate analyses. Major studies at both inter- and intraordinal levels have contributed to our understanding of phylogenetic relationships within many groups. Genome rearrangements are an additional data type for defining relationships, with rearrangement synapomorphies identified across multiple orders and at many different taxonomic levels. Hymenoptera and Psocodea have greatly elevated rates of rearrangement offering both opportunities and pitfalls for identifying rearrangement synapomorphies in each group. Finally, insects are model systems for studying aberrant mt genomes, including truncated tRNAs and multichromosomal genomes. Greater integration of nuclear and mt genomic studies is necessary to further our understanding of insect genomic evolution.

...read moreread less

910 citations

Journal Article•DOI•

Comparative genomics reveals insights into avian genome evolution and adaptation.

[...]

Guojie Zhang¹, Guojie Zhang², Cai Li², Qiye Li², Bo Li², Denis M. Larkin³, Chul Hee Lee⁴, Jay F. Storz⁵, Agostinho Antunes⁶, Matthew J. Greenwold⁷, Robert W. Meredith⁸, Anders Ödeen⁹, Jie Cui¹⁰, Qi Zhou¹¹, Luohao Xu², Hailin Pan², Zongji Wang¹², Lijun Jin², Pei Zhang², Haofu Hu², Wei Yang², Jiang Hu², Jin Xiao², Zhikai Yang², Yang Liu², Qiaolin Xie², Hao Yu², Jinmin Lian², Ping Wen², Fang Zhang², Hui Li², Yongli Zeng², Zijun Xiong², Shiping Liu¹², Long Zhou², Zhiyong Huang², Na An², Jie Wang¹³, Qiumei Zheng², Yingqi Xiong², Guangbiao Wang², Bo Wang², Jingjing Wang², Yu Fan¹⁴, Rute R. da Fonseca¹, Alonzo Alfaro-Núñez¹, Mikkel Schubert¹, Ludovic Orlando¹, Tobias Mourier¹, Jason T. Howard¹⁵, Ganeshkumar Ganapathy¹⁵, Andreas R. Pfenning¹⁵, Osceola Whitney¹⁵, Miriam V. Rivas¹⁵, Erina Hara¹⁵, Julia Smith¹⁵, Marta Farré³, Jitendra Narayan¹⁶, Gancho T. Slavov¹⁶, Michael N Romanov¹⁷, Rui Borges⁶, João Paulo Machado⁶, Imran Khan⁶, Mark S. Springer¹⁸, John Gatesy¹⁸, Federico G. Hoffmann¹⁹, Juan C. Opazo²⁰, Olle Håstad²¹, Roger H. Sawyer⁷, Heebal Kim⁴, Kyu-Won Kim⁴, Hyeon Jeong Kim⁴, Seoae Cho⁴, Ning Li²², Yinhua Huang²², Michael William Bruford²³, Xiangjiang Zhan¹³, Andrew Dixon, Mads F. Bertelsen²⁴, Elizabeth P. Derryberry²⁵, Wesley C. Warren²⁶, Richard K. Wilson²⁶, Shengbin Li²⁷, David A. Ray¹⁹, Richard E. Green²⁸, Stephen J. O'Brien²⁹, Darren K. Griffin¹⁷, Warren E. Johnson³⁰, David Haussler²⁸, Oliver A. Ryder, Eske Willerslev¹, Gary R. Graves³¹, Per Alström²¹, Jon Fjeldså³², David P. Mindell³³, Scott V. Edwards³⁴, Edward L. Braun³⁵, Carsten Rahbek³², David W. Burt³⁶, Peter Houde³⁷, Yong Zhang², Huanming Yang³⁸, Jian Wang², Erich D. Jarvis¹⁵, M. Thomas P. Gilbert¹, M. Thomas P. Gilbert³⁹, Jun Wang - Show less +103 more•Institutions (39)

University of Copenhagen¹, Beijing Genomics Institute², Royal Veterinary College³, Seoul National University⁴, University of Nebraska–Lincoln⁵, University of Porto⁶, University of South Carolina⁷, Montclair State University⁸, Uppsala University⁹, National University of Singapore¹⁰, University of California, Berkeley¹¹, South China University of Technology¹², Chinese Academy of Sciences¹³, Kunming Institute of Zoology¹⁴, Howard Hughes Medical Institute¹⁵, Aberystwyth University¹⁶, University of Kent¹⁷, University of California, Riverside¹⁸, Mississippi State University¹⁹, Austral University of Chile²⁰, Swedish University of Agricultural Sciences²¹, China Agricultural University²², Cardiff University²³, Copenhagen Zoo²⁴, Louisiana State University²⁵, Washington University in St. Louis²⁶, Xi'an Jiaotong University²⁷, University of California, Santa Cruz²⁸, Nova Southeastern University Oceanographic Center²⁹, Smithsonian Conservation Biology Institute³⁰, National Museum of Natural History³¹, Natural History Museum³², University of California, San Francisco³³, Harvard University³⁴, University of Florida³⁵, University of Edinburgh³⁶, New Mexico State University³⁷, Macau University of Science and Technology³⁸, Curtin University³⁹

12 Dec 2014-Science

TL;DR: This work explored bird macroevolution using full genomes from 48 avian species representing all major extant clades to reveal that pan-avian genomic diversity covaries with adaptations to different lifestyles and convergent evolution of traits.

...read moreread less

Abstract: Birds are the most species-rich class of tetrapod vertebrates and have wide relevance across many research fields. We explored bird macroevolution using full genomes from 48 avian species representing all major extant clades. The avian genome is principally characterized by its constrained size, which predominantly arose because of lineage-specific erosion of repetitive elements, large segmental deletions, and gene loss. Avian genomes furthermore show a remarkably high degree of evolutionary stasis at the levels of nucleotide sequence, gene synteny, and chromosomal structure. Despite this pattern of conservation, we detected many non-neutral evolutionary changes in protein-coding genes and noncoding regions. These analyses reveal that pan-avian genomic diversity covaries with adaptations to different lifestyles and convergent evolution of traits.

...read moreread less

872 citations

Journal Article•DOI•

The Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP): Illuminating the Functional Diversity of Eukaryotic Life in the Oceans through Transcriptome Sequencing

[...]

Patrick J. Keeling¹, Patrick J. Keeling², Fabien Burki², Heather M. Wilcox³, Bassem Allam⁴, Eric E. Allen⁵, Linda A. Amaral-Zettler⁶, Linda A. Amaral-Zettler⁷, E. Virginia Armbrust⁸, John M. Archibald¹, John M. Archibald⁹, Arvind K. Bharti¹⁰, Callum J. Bell¹⁰, Bank Beszteri¹¹, Kay D. Bidle¹², Connor Cameron¹⁰, Lisa Campbell¹³, David A. Caron¹⁴, Rose Ann Cattolico⁸, Jackie L. Collier⁴, Kathryn J. Coyne¹⁵, Simon K. Davy¹⁶, Phillipe Deschamps¹⁷, Sonya T. Dyhrman¹⁸, Bente Edvardsen¹⁹, Ruth D. Gates²⁰, Christopher J. Gobler⁴, Spencer J. Greenwood²¹, Stephanie Guida¹⁰, Jennifer L. Jacobi¹⁰, Kjetill S. Jakobsen¹⁹, Erick R. James², Bethany D. Jenkins²², Uwe John¹¹, Matthew D. Johnson²³, Andrew R. Juhl¹⁸, Anja Kamp²⁴, Anja Kamp²⁵, Laura A. Katz²⁶, Ronald P. Kiene²⁷, Alexander Kudryavtsev²⁸, Alexander Kudryavtsev²⁹, Brian S. Leander², Senjie Lin³⁰, Connie Lovejoy³¹, Denis H. Lynn², Denis H. Lynn³², Adrian Marchetti³³, George B. McManus³⁰, Aurora M. Nedelcu³⁴, Susanne Menden-Deuer²², Cristina Miceli³⁵, Thomas Mock³⁶, Marina Montresor³⁷, Mary Ann Moran³⁸, Shauna A. Murray³⁹, Govind Nadathur⁴⁰, Satoshi Nagai, Peter B. Ngam¹⁰, Brian Palenik⁵, Jan Pawlowski²⁸, Giulio Petroni⁴¹, Gwenael Piganeau⁴², Matthew C. Posewitz⁴³, Karin Rengefors⁴⁴, Giovanna Romano³⁷, Mary E. Rumpho³⁰, Tatiana A. Rynearson²², Kelly B. Schilling¹⁰, Declan C. Schroeder, Alastair G. B. Simpson¹, Alastair G. B. Simpson⁹, Claudio H. Slamovits⁹, Claudio H. Slamovits¹, David Roy Smith⁴⁵, G. Jason Smith⁴⁶, Sarah R. Smith⁵, Heidi M. Sosik²³, Peter Stief²⁵, Edward C. Theriot⁴⁷, Scott N. Twary⁴⁸, Pooja E. Umale¹⁰, Daniel Vaulot⁴⁹, Boris Wawrik⁵⁰, Glen L. Wheeler⁵¹, William H. Wilson⁵², Yan Xu⁵³, Adriana Zingone³⁷, Alexandra Z. Worden³, Alexandra Z. Worden¹ - Show less +86 more•Institutions (53)

Canadian Institute for Advanced Research¹, University of British Columbia², Monterey Bay Aquarium Research Institute³, Stony Brook University⁴, University of California, San Diego⁵, Brown University⁶, Marine Biological Laboratory⁷, University of Washington⁸, Dalhousie University⁹, National Center for Genome Resources¹⁰, Alfred Wegener Institute for Polar and Marine Research¹¹, Rutgers University¹², Texas A&M University¹³, University of Southern California¹⁴, University of Delaware¹⁵, Victoria University of Wellington¹⁶, University of Paris-Sud¹⁷, Columbia University¹⁸, University of Oslo¹⁹, University of Hawaii at Manoa²⁰, University of Prince Edward Island²¹, University of Rhode Island²², Woods Hole Oceanographic Institution²³, Jacobs University Bremen²⁴, Max Planck Society²⁵, Smith College²⁶, University of South Alabama²⁷, University of Geneva²⁸, Saint Petersburg State University²⁹, University of Connecticut³⁰, Laval University³¹, University of Guelph³², University of North Carolina at Chapel Hill³³, University of New Brunswick³⁴, University of Camerino³⁵, University of East Anglia³⁶, Stazione Zoologica Anton Dohrn³⁷, University of Georgia³⁸, University of Technology, Sydney³⁹, University of Puerto Rico⁴⁰, University of Pisa⁴¹, Centre national de la recherche scientifique⁴², Colorado School of Mines⁴³, Lund University⁴⁴, University of Western Ontario⁴⁵, California State University⁴⁶, University of Texas at Austin⁴⁷, Los Alamos National Laboratory⁴⁸, Pierre-and-Marie-Curie University⁴⁹, University of Oklahoma⁵⁰, Plymouth Marine Laboratory⁵¹, Bigelow Laboratory For Ocean Sciences⁵², Princeton University⁵³

24 Jun 2014-PLOS Biology

TL;DR: In this paper, the authors describe a resource of 700 transcriptomes from marine microbial eukaryotes to help understand their role in the world's oceans and their biology, evolution, and ecology.

...read moreread less

Abstract: Current sampling of genomic sequence data from eukaryotes is relatively poor, biased, and inadequate to address important questions about their biology, evolution, and ecology; this Community Page describes a resource of 700 transcriptomes from marine microbial eukaryotes to help understand their role in the world's oceans.

...read moreread less

Journal Article•DOI•

ngs.plot: Quick mining and visualization of next-generation sequencing data by integrating genomic databases

[...]

Li Shen¹, Ning-Yi Shao¹, Xiaochuan Liu¹, Eric J. Nestler¹•Institutions (1)

Icahn School of Medicine at Mount Sinai¹

15 Apr 2014-BMC Genomics

TL;DR: Ngs.plot is a standalone program to visualize enrichment patterns of DNA-interacting proteins at functionally important regions based on next-generation sequencing data and is a useful tool to help fill the gap between massive datasets and genomic information in this era of big sequencing data.

...read moreread less

Abstract: Understanding the relationship between the millions of functional DNA elements and their protein regulators, and how they work in conjunction to manifest diverse phenotypes, is key to advancing our understanding of the mammalian genome. Next-generation sequencing technology is now used widely to probe these protein-DNA interactions and to profile gene expression at a genome-wide scale. As the cost of DNA sequencing continues to fall, the interpretation of the ever increasing amount of data generated represents a considerable challenge. We have developed ngs.plot – a standalone program to visualize enrichment patterns of DNA-interacting proteins at functionally important regions based on next-generation sequencing data. We demonstrate that ngs.plot is not only efficient but also scalable. We use a few examples to demonstrate that ngs.plot is easy to use and yet very powerful to generate figures that are publication ready. We conclude that ngs.plot is a useful tool to help fill the gap between massive datasets and genomic information in this era of big sequencing data.

...read moreread less

Journal Article•DOI•

Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls

[...]

Justin M. Zook¹, Brad Chapman², Jason Wang, David Mittelman³, Oliver Hofmann², Winston Hide², Marc L. Salit¹ - Show less +3 more•Institutions (3)

National Institute of Standards and Technology¹, Harvard University², Virginia Bioinformatics Institute³

01 Mar 2014-Nature Biotechnology

TL;DR: Methods to make high-confidence, single-nucleotide polymorphism (SNP), indel and homozygous reference genotype calls for NA12878, the pilot genome for the Genome in a Bottle Consortium are presented.

...read moreread less

Abstract: Clinical adoption of human genome sequencing requires methods that output genotypes with known accuracy at millions or billions of positions across a genome. Because of substantial discordance among calls made by existing sequencing methods and algorithms, there is a need for a highly accurate set of genotypes across a genome that can be used as a benchmark. Here we present methods to make high-confidence, single-nucleotide polymorphism (SNP), indel and homozygous reference genotype calls for NA12878, the pilot genome for the Genome in a Bottle Consortium. We minimize bias toward any method by integrating and arbitrating between 14 data sets from five sequencing technologies, seven read mappers and three variant callers. We identify regions for which no confident genotype call could be made, and classify them into different categories based on reasons for uncertainty. Our genotype calls are publicly available on the Genome Comparison and Analytic Testing website to enable real-time benchmarking of any method.

...read moreread less

Journal Article•DOI•

Genome-wide analysis reveals characteristics of off-target sites bound by the Cas9 endonuclease

[...]

Cem Kuscu¹, Sevki Arslan¹, Ritambhara Singh¹, Jeremy Thorpe¹, Mazhar Adli¹ - Show less +1 more•Institutions (1)

University of Virginia¹

01 Jul 2014-Nature Biotechnology

TL;DR: Mapping genome-wide binding sites of catalytically inactive Cas9 in HEK293T cells and analysis of off-target binding sites showed the importance of the PAM-proximal region of the sgRNA guiding sequence and that dCas9 binding sites are enriched in open chromatin regions, and it is shown that ChIP-seq allows unbiased detection of Cas9 binding Site-wide.

...read moreread less

Abstract: ChIP-seq for Cas9 shows varying amounts of off-target binding with different guide RNAs and low levels of indels at some off-target sites.

...read moreread less

Journal Article•DOI•

Defining functional DNA elements in the human genome

[...]

Manolis Kellis¹, Barbara J. Wold², Michael Snyder³, Bradley E. Bernstein⁴, Anshul Kundaje⁵, Georgi K. Marinov², Lucas D. Ward⁵, Ewan Birney, Gregory E. Crawford⁶, Job Dekker⁷, Ian Dunham, Laura Elnitski⁸, Peggy J. Farnham⁹, Elise A. Feingold⁸, Mark Gerstein¹⁰, Morgan C. Giddings, David M. Gilbert¹¹, Thomas R. Gingeras¹², Eric D. Green⁸, Roderic Guigó, Tim Hubbard¹³, Jim Kent¹⁴, Jason D. Lieb¹⁵, Richard M. Myers, Michael J. Pazin⁸, Bing Ren¹⁶, John A. Stamatoyannopoulos¹⁷, Zhiping Weng⁷, Kevin P. White¹⁸, Ross C. Hardison¹⁹ - Show less +26 more•Institutions (19)

Massachusetts Institute of Technology¹, California Institute of Technology², Stanford University³, Harvard University⁴, Broad Institute⁵, Duke University⁶, University of Massachusetts Medical School⁷, National Institutes of Health⁸, University of Southern California⁹, Yale University¹⁰, Florida State University¹¹, Cold Spring Harbor Laboratory¹², Wellcome Trust Sanger Institute¹³, University of California, Santa Cruz¹⁴, Princeton University¹⁵, University of California, San Diego¹⁶, University of Washington¹⁷, University of Chicago¹⁸, Pennsylvania State University¹⁹

29 Apr 2014-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: The strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the observed differences in estimated genomic coverage, and the biological implications of these discrepancies are reviewed.

...read moreread less

Abstract: With the completion of the human genome sequence, attention turned to identifying and annotating its functional DNA elements. As a complement to genetic and comparative genomics approaches, the Encyclopedia of DNA Elements Project was launched to contribute maps of RNA transcripts, transcriptional regulator binding sites, and chromatin states in many cell types. The resulting genome-wide data reveal sites of biochemical activity with high positional resolution and cell type specificity that facilitate studies of gene regulation and interpretation of noncoding variants associated with human disease. However, the biochemically active regions cover a much larger fraction of the genome than do evolutionarily conserved regions, raising the question of whether nonconserved but biochemically active regions are truly functional. Here, we review the strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the observed differences in estimated genomic coverage, and the biological implications of these discrepancies. We also analyze the relationship between signal intensity, genomic coverage, and evolutionary conservation. Our results reinforce the principle that each approach provides complementary information and that we need to use combinations of all three to elucidate genome function in human biology and disease.

...read moreread less

Journal Article•DOI•

Whole-genome sequencing of 234 bulls facilitates mapping of monogenic and complex traits in cattle

[...]

Hans D. Daetwyler¹, Aurélien Capitan², Hubert Pausch³, Paul Stothard⁴, Rianne van Binsbergen⁵, R.F. Brøndum⁶, Xiaoping Liao⁴, Anis Djari², Sabrina Rodriguez², Cécile Grohs², Diane Esquerre², Olivier Bouchez², Marie-Noelle Rossignol, Christophe Klopp², Dominique Rocha², Sébastien Fritz, André Eggen², Phil J. Bowman¹, David Coote¹, Amanda J. Chamberlain¹, Charlotte Anderson⁷, Curt P VanTassell⁸, Ina Hulsegge⁵, Michael E. Goddard¹, Bernt Guldbrandtsen⁶, M.S. Lund⁶, Roel F. Veerkamp⁵, Didier Boichard², Ruedi Fries³, Ben J. Hayes¹ - Show less +26 more•Institutions (8)

Cooperative Research Centre¹, Institut national de la recherche agronomique², Technische Universität München³, University of Alberta⁴, Wageningen University and Research Centre⁵, Aarhus University⁶, Department of Environment and Primary Industries⁷, United States Department of Agriculture⁸

01 Aug 2014-Nature Genetics

TL;DR: The 1000 bull genomes project supports the goal of accelerating the rates of genetic gain in domestic cattle while at the same time considering animal health and welfare by providing the annotated sequence variants and genotypes of key ancestor bulls.

...read moreread less

Abstract: The 1000 bull genomes project supports the goal of accelerating the rates of genetic gain in domestic cattle while at the same time considering animal health and welfare by providing the annotated sequence variants and genotypes of key ancestor bulls. In the first phase of the 1000 bull genomes project, we sequenced the whole genomes of 234 cattle to an average of 8.3-fold coverage. This sequencing includes data for 129 individuals from the global Holstein-Friesian population, 43 individuals from the Fleckvieh breed and 15 individuals from the Jersey breed. We identified a total of 28.3 million variants, with an average of 1.44 heterozygous sites per kilobase for each individual. We demonstrate the use of this database in identifying a recessive mutation underlying embryonic death and a dominant mutation underlying lethal chrondrodysplasia. We also performed genome-wide association studies for milk production and curly coat, using imputed sequence variants, and identified variants associated with these traits in cattle.

...read moreread less

Journal Article•DOI•

Genome sequencing and population genomics in non-model organisms.

[...]

Hans Ellegren¹•Institutions (1)

Uppsala University¹

01 Jan 2014-Trends in Ecology and Evolution

TL;DR: High-throughput sequencing technologies are revolutionizing the life sciences, and the past 12 months have seen a burst of genome sequences from non-model organisms, in each case representing a fundamental source of data of significant importance to biological research.

...read moreread less

Abstract: High-throughput sequencing technologies are revolutionizing the life sciences. The past 12 months have seen a burst of genome sequences from non-model organisms, in each case representing a fundamental source of data of significant importance to biological research. This has bearing on several aspects of evolutionary biology, and we are now beginning to see patterns emerging from these studies. These include significant heterogeneity in the rate of recombination that affects adaptive evolution and base composition, the role of population size in adaptive evolution, and the importance of expansion of gene families in lineage-specific adaptation. Moreover, resequencing of population samples (population genomics) has enabled the identification of the genetic basis of critical phenotypes and cast light on the landscape of genomic divergence during speciation.

...read moreread less

Journal Article•DOI•

Comprehensive analysis of DNA methylation data with RnBeads

[...]

Yassen Assenov¹, Fabian Müller¹, Pavlo Lutsik², Jörn Walter², Thomas Lengauer¹, Christoph Bock¹ - Show less +2 more•Institutions (2)

Max Planck Society¹, Saarland University²

01 Nov 2014-Nature Methods

TL;DR: RnBeads is a software tool for large-scale analysis and interpretation of DNA methylation data, providing a user-friendly analysis workflow that yields detailed hypertext reports (http://rnbeads.mpi-inf.mpg).

...read moreread less

Abstract: RnBeads is a software tool for large-scale analysis and interpretation of DNA methylation data, providing a user-friendly analysis workflow that yields detailed hypertext reports (http://rnbeads.mpi-inf.mpg.de/). Supported assays include whole-genome bisulfite sequencing, reduced representation bisulfite sequencing, Infinium microarrays and any other protocol that produces high-resolution DNA methylation data. Notable applications of RnBeads include the analysis of epigenome-wide association studies and epigenetic biomarker discovery in cancer cohorts.

...read moreread less

Journal Article•DOI•

Functional annotation of noncoding sequence variants

[...]

Graham R. S. Ritchie¹, Ian Dunham², Eleftheria Zeggini¹, Paul Flicek¹•Institutions (2)

Wellcome Trust Sanger Institute¹, European Bioinformatics Institute²

01 Mar 2014-Nature Methods

TL;DR: This work presents genome-wide annotation of variants (GWAVA), a tool that supports prioritization of noncoding variants by integrating various genomic and epigenomic annotations.

...read moreread less

Abstract: Identifying functionally relevant variants against the background of ubiquitous genetic variation is a major challenge in human genetics For variants in protein-coding regions, our understanding of the genetic code and splicing allows us to identify likely candidates, but interpreting variants outside genic regions is more difficult Here we present genome-wide annotation of variants (GWAVA), a tool that supports prioritization of noncoding variants by integrating various genomic and epigenomic annotations

...read moreread less

Journal Article•DOI•

Genome-wide analysis of noncoding regulatory mutations in cancer.

[...]

Nils Weinhold¹, Anders Jacobsen¹, Nikolaus Schultz¹, Chris Sander¹, William Lee¹ - Show less +1 more•Institutions (1)

Memorial Sloan Kettering Cancer Center¹

01 Nov 2014-Nature Genetics

TL;DR: New frequency- and sequence-based approaches are used to comprehensively scan the genome for noncoding mutations with potential regulatory impact and identify recurrent mutations in regulatory elements upstream of PLEKHS1, WDR74 and SDHD, as well as previously identified mutations in the TERT promoter.

...read moreread less

Abstract: Cancer primarily develops because of somatic alterations in the genome. Advances in sequencing have enabled large-scale sequencing studies across many tumor types, emphasizing the discovery of alterations in protein-coding genes. However, the protein-coding exome comprises less than 2% of the human genome. Here we analyze the complete genome sequences of 863 human tumors from The Cancer Genome Atlas and other sources to systematically identify noncoding regions that are recurrently mutated in cancer. We use new frequency- and sequence-based approaches to comprehensively scan the genome for noncoding mutations with potential regulatory impact. These methods identify recurrent mutations in regulatory elements upstream of PLEKHS1, WDR74 and SDHD, as well as previously identified mutations in the TERT promoter. SDHD promoter mutations are frequent in melanoma and are associated with reduced gene expression and poor prognosis. The non-protein-coding cancer genome remains widely unexplored, and our findings represent a step toward targeting the entire genome for clinical purposes.

...read moreread less

Journal Article•DOI•

Generation of mouse models of myeloid malignancy with combinatorial genetic lesions using CRISPR-Cas9 genome editing

[...]

Dirk Heckl¹, Monika S. Kowalczyk², David Yudovich¹, Roger Belizaire¹, Rishi V. Puram¹, Marie McConkey¹, Anne Thielke², Jon C. Aster¹, Aviv Regev³, Benjamin L. Ebert¹ - Show less +6 more•Institutions (3)

Brigham and Women's Hospital¹, Broad Institute², Massachusetts Institute of Technology³

01 Sep 2014-Nature Biotechnology

TL;DR: In this article, a lentivirus-delivered sgRNA:Cas9 genome editing was used to generate mice with acute myeloid leukemia (AML) with cooperating mutations in genes encoding epigenetic modifiers, transcription factors and mediators of cytokine signaling.

...read moreread less

Abstract: Genome sequencing studies have shown that human malignancies often bear mutations in four or more driver genes, but it is difficult to recapitulate this degree of genetic complexity in mouse models using conventional breeding. Here we use the CRISPR-Cas9 system of genome editing to overcome this limitation. By delivering combinations of small guide RNAs (sgRNAs) and Cas9 with a lentiviral vector, we modified up to five genes in a single mouse hematopoietic stem cell (HSC), leading to clonal outgrowth and myeloid malignancy. We thereby generated models of acute myeloid leukemia (AML) with cooperating mutations in genes encoding epigenetic modifiers, transcription factors and mediators of cytokine signaling, recapitulating the combinations of mutations observed in patients. Our results suggest that lentivirus-delivered sgRNA:Cas9 genome editing should be useful to engineer a broad array of in vivo cancer models that better reflect the complexity of human disease.

...read moreread less

Journal Article•DOI•

RefSeq microbial genomes database: new representation and annotation strategy

[...]

Tatiana Tatusova¹, Stacy Ciufo¹, Boris Fedorov¹, Kathleen O'Neill¹, Igor Tolstoy¹ - Show less +1 more•Institutions (1)

National Institutes of Health¹

01 Jan 2014-Nucleic Acids Research

TL;DR: The source of the microbial genomic sequences in the RefSeq collection is the set of primary sequence records submitted to the International Nucleotide Sequence Database public archives.

...read moreread less

Abstract: The source of the microbial genomic sequences in the RefSeq collection is the set of primary sequence records submitted to the International Nucleotide Sequence Database public archives. These can be accessed through the Entrez search and retrieval system at http://www.ncbi.nlm.nih.gov/genome. Next-generation sequencing has enabled researchers to perform genomic sequencing at rates that were unimaginable in the past. Microbial genomes can now be sequenced in a matter of hours, which has led to a significant increase in the number of assembled genomes deposited in the public archives. This huge increase in DNA sequence data presents new challenges for the annotation, analysis and visualization bioinformatics tools. New strategies have been developed for the annotation and representation of reference genomes and sequence variations derived from population studies and clinical outbreaks.

...read moreread less

Journal Article•DOI•

DEG 10, an update of the database of essential genes that includes both protein-coding genes and noncoding genomic elements

[...]

Hao Luo¹, Yan Lin¹, Feng Gao¹, Chun Ting Zhang¹, Ren Zhang¹ - Show less +1 more•Institutions (1)

Wayne State University¹

01 Jan 2014-Nucleic Acids Research

TL;DR: DEG 10 includes essential genomic elements under different conditions in three domains of life, with customizable BLAST tools.

...read moreread less

Abstract: The combination of high-density transposon-mediated mutagenesis and high-throughput sequencing has led to significant advancements in research on essential genes, resulting in a dramatic increase in the number of identified prokaryotic essential genes under diverse conditions and a revised essential-gene concept that includes all essential genomic elements, rather than focusing on protein-coding genes only. DEG 10, a new release of the Database of Essential Genes (available at http://www.essentialgene.org), has been developed to accommodate these quantitative and qualitative advancements. In addition to increasing the number of bacterial and archaeal essential genes determined by genome-wide gene essentiality screens, DEG 10 also harbors essential noncoding RNAs, promoters, regulatory sequences and replication origins. These essential genomic elements are determined not only in vitro, but also in vivo, under diverse conditions including those for survival, pathogenesis and antibiotic resistance. We have developed customizable BLAST tools that allow users to perform species- and experiment-specific BLAST searches for a single gene, a list of genes, annotated or unannotated genomes. Therefore, DEG 10 includes essential genomic elements under different conditions in three domains of life, with customizable BLAST tools.

...read moreread less

Journal Article•DOI•

Origins and functional evolution of Y chromosomes across mammals

[...]

Diego Cortez¹, Ray M. Marín¹, Deborah Toledo-Flores², Laure Froidevaux³, Angélica Liechti³, Paul D. Waters⁴, Frank Grützner², Henrik Kaessmann¹ - Show less +4 more•Institutions (4)

Swiss Institute of Bioinformatics¹, University of Adelaide², University of Lausanne³, University of New South Wales⁴

24 Apr 2014-Nature

TL;DR: Although some genes evolved novel functions through spatial/temporal expression shifts, most Y genes probably endured, at least initially, because of dosage constraints, and show notable conservation of proto-sex chromosome expression patterns.

...read moreread less

Abstract: Y chromosomes underlie sex determination in mammals, but their repeat-rich nature has hampered sequencing and associated evolutionary studies. Here we trace Y evolution across 15 representative mammals on the basis of high-throughput genome and transcriptome sequencing. We uncover three independent sex chromosome originations in mammals and birds (the outgroup). The original placental and marsupial (therian) Y, containing the sex-determining gene SRY, emerged in the therian ancestor approximately 180 million years ago, in parallel with the first of five monotreme Y chromosomes, carrying the probable sex-determining gene AMH. The avian W chromosome arose approximately 140 million years ago in the bird ancestor. The small Y/W gene repertoires, enriched in regulatory functions, were rapidly defined following stratification (recombination arrest) and erosion events and have remained considerably stable. Despite expression decreases in therians, Y/W genes show notable conservation of proto-sex chromosome expression patterns, although various Y genes evolved testis-specificities through differential regulatory decay. Thus, although some genes evolved novel functions through spatial/temporal expression shifts, most Y genes probably endured, at least initially, because of dosage constraints. Using high-throughput genome and transcriptome sequencing, Y chromosome evolution across 15 representative mammals is explored, with results providing evidence for three independent sex chromosome originations in mammals and birds. Mammalian Y chromosomes, known for their roles in sex determination and male fertility, often contain repetitive sequences that make them harder to assemble than the rest of the genome. To counter this problem Henrik Kaessmann and colleagues have developed a new transcript assembly approach based on male-specific RNA/genomic sequencing data to explore Y evolution across 15 species representing all major mammalian lineages. They find evidence for two independent sex chromosome originations in mammals and one in birds. Their analysis of the Y/W gene repertoires suggests that although some genes evolved novel functions in sex determination/spermatogenesis as a result of temporal/spatial expression changes, most Y genes probably persisted, at least initially, as a result of dosage constraints. In a parallel study, Daniel Bellott and colleagues reconstructed the evolution of the Y chromosome, using a comprehensive comparative analysis of the genomic sequence of X–Y gene pairs from seven placental mammals and one marsupial. They conclude that evolution streamlined the gene content of the human Y chromosome through selection to maintain the ancestral dosage of homologous X–Y gene pairs that regulate gene expression throughout the body. They propose that these genes make the Y chromosome essential for male viability and contribute to differences between the sexes in health and disease.

...read moreread less

Journal Article•DOI•

The promise of whole-exome sequencing in medical genetics.

[...]

Bahareh Rabbani¹, Mustafa Tekin², Nejat Mahdieh•Institutions (2)

Qazvin University of Medical Sciences¹, John P. Hussman Institute for Human Genomics²

01 Jan 2014-Journal of Human Genetics

TL;DR: In this review, the impacts of WES in medical genetics as well as its consequences leading to improve health care are summarized.

...read moreread less

Abstract: Massively parallel DNA-sequencing systems provide sequence of huge numbers of different DNA strands at once. These technologies are revolutionizing our understanding in medical genetics, accelerating health-improvement projects, and ushering to a fully understood personalized medicine in near future. Whole-exome sequencing (WES) is application of the next-generation technology to determine the variations of all coding regions, or exons, of known genes. WES provides coverage of more than 95% of the exons, which contains 85% of disease-causing mutations in Mendelian disorders and many disease-predisposing SNPs throughout the genome. The role of more than 150 genes has been distinguished by means of WES, and this statistics is quickly growing. In this review, the impacts of WES in medical genetics as well as its consequences leading to improve health care are summarized.

...read moreread less

Journal Article•DOI•

Mutational signatures: the patterns of somatic mutations hidden in cancer genomes.

[...]

Ludmil B. Alexandrov¹, Michael R. Stratton¹•Institutions (1)

Wellcome Trust Sanger Institute¹

01 Feb 2014-Current Opinion in Genetics & Development

TL;DR: The current understanding of mutational patterns and mutational signatures in light of both the somatic cell paradigm of cancer research and the recent developments in the field of cancer genomics is summarized.

...read moreread less

Journal Article•DOI•

An international effort towards developing standards for best practices in analysis, interpretation and reporting of clinical genome sequencing results in the CLARITY Challenge.

[...]

Catherine A. Brownstein¹, Alan H. Beggs¹, Nils Homer, Barry Merriman² +207 more•Institutions (53)

25 Mar 2014-Genome Biology

TL;DR: The CLARITY Challenge provides a comprehensive assessment of current practices for using genome sequencing to diagnose and report genetic diseases and reveals a general convergence of practices on most elements of the analysis and interpretation process.

...read moreread less

Abstract: Background There is tremendous potential for genome sequencing to improve clinical diagnosis and care once it becomes routinely accessible, but this will require formalizing research methods into clinical best practices in the areas of sequence data generation, analysis, interpretation and reporting. The CLARITY Challenge was designed to spur convergence in methods for diagnosing genetic disease starting from clinical case history and genome sequencing data. DNA samples were obtained from three families with heritable genetic disorders and genomic sequence data were donated by sequencing platform vendors. The challenge was to analyze and interpret these data with the goals of identifying disease-causing variants and reporting the findings in a clinically useful format. Participating contestant groups were solicited broadly, and an independent panel of judges evaluated their performance.

...read moreread less

Journal Article•DOI•

Integrating genomics into the taxonomy and systematics of the Bacteria and Archaea.

[...]

Jongsik Chun¹, Fred A. Rainey²•Institutions (2)

Seoul National University¹, University of Alaska Anchorage²

01 Feb 2014-International Journal of Systematic and Evolutionary Microbiology

TL;DR: This special issue of International Journal of Systematic and Evolutionary Microbiology contains both original research and review articles covering the use of genomic sequence data in microbial taxonomy and systematics, and outlines of approaches for incorporating genomics into new strain isolation to new taxon description workflows.

...read moreread less

Abstract: The polyphasic approach used today in the taxonomy and systematics of the Bacteria and Archaea includes the use of phenotypic, chemotaxonomic and genotypic data. The use of 16S rRNA gene sequence data has revolutionized our understanding of the microbial world and led to a rapid increase in the number of descriptions of novel taxa, especially at the species level. It has allowed in many cases for the demarcation of taxa into distinct species, but its limitations in a number of groups have resulted in the continued use of DNA-DNA hybridization. As technology has improved, next-generation sequencing (NGS) has provided a rapid and cost-effective approach to obtaining whole-genome sequences of microbial strains. Although some 12,000 bacterial or archaeal genome sequences are available for comparison, only 1725 of these are of actual type strains, limiting the use of genomic data in comparative taxonomic studies when there are nearly 11,000 type strains. Efforts to obtain complete genome sequences of all type strains are critical to the future of microbial systematics. The incorporation of genomics into the taxonomy and systematics of the Bacteria and Archaea coupled with computational advances will boost the credibility of taxonomy in the genomic era. This special issue of International Journal of Systematic and Evolutionary Microbiology contains both original research and review articles covering the use of genomic sequence data in microbial taxonomy and systematics. It includes contributions on specific taxa as well as outlines of approaches for incorporating genomics into new strain isolation to new taxon description workflows.

...read moreread less

Journal Article•DOI•

Decoding the massive genome of loblolly pine using haploid DNA and novel assembly strategies

[...]

David B. Neale¹, Jill L. Wegrzyn¹, Kristian Stevens¹, Aleksey V. Zimin², Daniela Puiu³, Marc W. Crepeau¹, Charis Cardeno¹, Maxim Koriabine⁴, Ann E. Holtz-Morris⁴, John D. Liechty¹, Pedro J. Martínez-García¹, Hans A. Vasquez-Gross¹, Brian Y. Lin¹, Jacob J. Zieve¹, William M. Dougherty¹, Sara Fuentes-Soriano⁵, Le-Shin Wu⁵, Don Gilbert⁵, Guillaume Marçais², Michael Roberts², Carson Holt⁶, Mark Yandell⁶, John M. Davis⁷, Katherine E. Smith⁸, Jeffrey F. D. Dean⁹, W. Walter Lorenz⁹, Ross W. Whetten¹⁰, Ronald R. Sederoff¹⁰, Nicholas C. Wheeler¹, Patrick E. McGuire¹, Doreen Main¹¹, Carol A. Loopstra¹², Keithanne Mockaitis⁵, Pieter J. deJong⁴, James A. Yorke², Steven L. Salzberg³, Charles H. Langley¹ - Show less +33 more•Institutions (12)

University of California, Davis¹, University of Maryland, College Park², Johns Hopkins University³, Children's Hospital Oakland Research Institute⁴, Indiana University⁵, University of Utah⁶, University of Florida⁷, United States Forest Service⁸, University of Georgia⁹, North Carolina State University¹⁰, Washington State University¹¹, Texas A&M University¹²

04 Mar 2014-Genome Biology

TL;DR: In this paper, the authors used a whole genome shotgun approach relying on next generation sequence generated from a single haploid seed megagametophyte from a loblolly pine tree, 20-1010, that has been used in industrial forest tree breeding.

...read moreread less

Abstract: The size and complexity of conifer genomes has, until now, prevented full genome sequencing and assembly. The large research community and economic importance of loblolly pine, Pinus taeda L., made it an early candidate for reference sequence determination. We develop a novel strategy to sequence the genome of loblolly pine that combines unique aspects of pine reproductive biology and genome assembly methodology. We use a whole genome shotgun approach relying primarily on next generation sequence generated from a single haploid seed megagametophyte from a loblolly pine tree, 20-1010, that has been used in industrial forest tree breeding. The resulting sequence and assembly was used to generate a draft genome spanning 23.2 Gbp and containing 20.1 Gbp with an N50 scaffold size of 66.9 kbp, making it a significant improvement over available conifer genomes. The long scaffold lengths allow the annotation of 50,172 gene models with intron lengths averaging over 2.7 kbp and sometimes exceeding 100 kbp in length. Analysis of orthologous gene sets identifies gene families that may be unique to conifers. We further characterize and expand the existing repeat library based on the de novo analysis of the repetitive content, estimated to encompass 82% of the genome. In addition to its value as a resource for researchers and breeders, the loblolly pine genome sequence and assembly reported here demonstrates a novel approach to sequencing the large and complex genomes of this important group of plants that can now be widely applied.

...read moreread less

Journal Article•DOI•

In silico prediction of splice-altering single nucleotide variants in the human genome

[...]

Xueqiu Jian¹, Eric Boerwinkle, Xiaoming Liu¹•Institutions (1)

University of Texas Health Science Center at Houston¹

16 Dec 2014-Nucleic Acids Research

TL;DR: This work compared eight in silico tools on 2959 single nucleotide variants within splicing consensus regions (scSNVs) using receiver operating characteristic analysis and pre-computed ensemble scores for all potential scSNVs across the human genome, providing a whole genome level resource for identifying splice-altering sc SNVs discovered from large-scale sequencing studies.

...read moreread less

Abstract: In silico tools have been developed to predict variants that may have an impact on pre-mRNA splicing. The major limitation of the application of these tools to basic research and clinical practice is the difficulty in interpreting the output. Most tools only predict potential splice sites given a DNA sequence without measuring splicing signal changes caused by a variant. Another limitation is the lack of large-scale evaluation studies of these tools. We compared eight in silico tools on 2959 single nucleotide variants within splicing consensus regions (scSNVs) using receiver operating characteristic analysis. The Position Weight Matrix model and MaxEntScan outperformed other methods. Two ensemble learning methods, adaptive boosting and random forests, were used to construct models that take advantage of individual methods. Both models further improved prediction, with outputs of directly interpretable prediction scores. We applied our ensemble scores to scSNVs from the Catalogue of Somatic Mutations in Cancer database. Analysis showed that predicted splice-altering scSNVs are enriched in recurrent scSNVs and known cancer genes. We pre-computed our ensemble scores for all potential scSNVs across the human genome, providing a whole genome level resource for identifying splice-altering scSNVs discovered from large-scale sequencing studies.

...read moreread less

Collapse