scispace - formally typeset
Search or ask a question

Showing papers on "Genomics published in 2014"


Journal ArticleDOI
19 Nov 2014-PLOS ONE
TL;DR: Pilon is a fully automated, all-in-one tool for correcting draft assemblies and calling sequence variants of multiple sizes, including very large insertions and deletions, which is being used to improve the assemblies of thousands of new genomes and to identify variants from thousands of clinically relevant bacterial strains.
Abstract: Advances in modern sequencing technologies allow us to generate sufficient data to analyze hundreds of bacterial genomes from a single machine in a single day. This potential for sequencing massive numbers of genomes calls for fully automated methods to produce high-quality assemblies and variant calls. We introduce Pilon, a fully automated, all-in-one tool for correcting draft assemblies and calling sequence variants of multiple sizes, including very large insertions and deletions. Pilon works with many types of sequence data, but is particularly strong when supplied with paired end data from two Illumina libraries with small e.g., 180 bp and large e.g., 3-5 Kb inserts. Pilon significantly improves draft genome assemblies by correcting bases, fixing mis-assemblies and filling gaps. For both haploid and diploid genomes, Pilon produces more contiguous genomes with fewer errors, enabling identification of more biologically relevant genes. Furthermore, Pilon identifies small variants with high accuracy as compared to state-of-the-art tools and is unique in its ability to accurately identify large sequence variants including duplications and resolve large insertions. Pilon is being used to improve the assemblies of thousands of new genomes and to identify variants from thousands of clinically relevant bacterial strains. Pilon is freely available as open source software.

5,659 citations


Journal ArticleDOI
TL;DR: The ability of CADD to prioritize functional, deleterious and pathogenic variants across many functional categories, effect sizes and genetic architectures is unmatched by any current single-annotation method.
Abstract: Our capacity to sequence human genomes has exceeded our ability to interpret genetic variation. Current genomic annotations tend to exploit a single information type (e.g. conservation) and/or are restricted in scope (e.g. to missense changes). Here, we describe Combined Annotation Dependent Depletion (CADD), a framework that objectively integrates many diverse annotations into a single, quantitative score. We implement CADD as a support vector machine trained to differentiate 14.7 million high-frequency human derived alleles from 14.7 million simulated variants. We pre-compute “C-scores” for all 8.6 billion possible human single nucleotide variants and enable scoring of short insertions/deletions. C-scores correlate with allelic diversity, annotations of functionality, pathogenicity, disease severity, experimentally measured regulatory effects, and complex trait associations, and highly rank known pathogenic variants within individual genomes. The ability of CADD to prioritize functional, deleterious, and pathogenic variants across many functional categories, effect sizes and genetic architectures is unmatched by any current annotation.

4,956 citations


Journal ArticleDOI
TL;DR: A modified version of the CRISPR-Cas9 system has been developed to recruit heterologous domains that can regulate endogenous gene expression or label specific genomic loci in living cells, which will undoubtedly transform biological research and spur the development of novel molecular therapeutics for human disease.
Abstract: Targeted genome editing using engineered nucleases has rapidly gone from being a niche technology to a mainstream method used by many biological researchers. This widespread adoption has been largely fueled by the emergence of the clustered, regularly interspaced, short palindromic repeat (CRISPR) technology, an important new approach for generating RNA-guided nucleases, such as Cas9, with customizable specificities. Genome editing mediated by these nucleases has been used to rapidly, easily and efficiently modify endogenous genes in a wide variety of biomedically important cell types and in organisms that have traditionally been challenging to manipulate genetically. Furthermore, a modified version of the CRISPR-Cas9 system has been developed to recruit heterologous domains that can regulate endogenous gene expression or label specific genomic loci in living cells. Although the genome-wide specificities of CRISPR-Cas9 systems remain to be fully defined, the power of these systems to perform targeted, highly efficient alterations of genome sequence and gene expression will undoubtedly transform biological research and spur the development of novel molecular therapeutics for human disease.

2,930 citations



Journal ArticleDOI
TL;DR: The BEDTools toolkit as discussed by the authors is a toolkit for the exploration of high-throughput genomics datasets, which can be combined to create bespoke pipelines addressing complex questions.
Abstract: Technological advances have enabled the use of DNA sequencing as a flexible tool to characterize genetic variation and to measure the activity of diverse cellular phenomena such as gene isoform expression and transcription factor binding Extracting biological insight from the experiments enabled by these advances demands the analysis of large, multi-dimensional datasets This unit describes the use of the BEDTools toolkit for the exploration of high-throughput genomics datasets Several protocols are presented for common genomic analyses, demonstrating how simple BEDTools operations may be combined to create bespoke pipelines addressing complex questions

1,716 citations


Book ChapterDOI
TL;DR: How to run the open-source breseq computational pipeline to identify and annotate genetic differences found in whole-genome and whole-population NGS data from haploid microbes where a high-quality reference genome is available is described.
Abstract: Next-generation DNA sequencing (NGS) can be used to reconstruct eco-evolutionary population dynamics and to identify the genetic basis of adaptation in laboratory evolution experiments. Here, we describe how to run the open-source breseq computational pipeline to identify and annotate genetic differences found in whole-genome and whole-population NGS data from haploid microbes where a high-quality reference genome is available. These methods can also be used to analyze mutants isolated in genetic screens and to detect unintended mutations that may occur during strain construction and genome editing.

1,077 citations


Journal ArticleDOI
TL;DR: MycoCosm is a fungal genomics portal developed by the US Department of Energy Joint Genome Institute to support integration, analysis and dissemination of fungal genome sequences and other 'omics' data by providing interactive web-based tools.
Abstract: MycoCosm is a fungal genomics portal (http://jgi.doe.gov/fungi), developed by the US Department of Energy Joint Genome Institute to support integration, analysis and dissemination of fungal genome sequences and other 'omics' data by providing interactive web-based tools. MycoCosm also promotes and facilitates user community participation through the nomination of new species of fungi for sequencing, and the annotation and analysis of resulting data. By efficiently filling gaps in the Fungal Tree of Life, MycoCosm will help address important problems associated with energy and the environment, taking advantage of growing fungal genomics resources.

1,037 citations


Journal ArticleDOI
15 Dec 2014-eLife
TL;DR: It is shown here that new genetic information can be introduced site-specifically and with high efficiency by homology-directed repair (HDR) of Cas9-induced site- specific double-strand DNA breaks using timed delivery ofCas9-guide RNA ribonucleoprotein (RNP) complexes.
Abstract: The CRISPR/Cas9 system is a robust genome editing technology that works in human cells, animals and plants based on the RNA-programmed DNA cleaving activity of the Cas9 enzyme. Building on previous work (Jinek et al., 2013), we show here that new genetic information can be introduced site-specifically and with high efficiency by homology-directed repair (HDR) of Cas9-induced site-specific double-strand DNA breaks using timed delivery of Cas9-guide RNA ribonucleoprotein (RNP) complexes. Cas9 RNP-mediated HDR in HEK293T, human primary neonatal fibroblast and human embryonic stem cells was increased dramatically relative to experiments in unsynchronized cells, with rates of HDR up to 38% observed in HEK293T cells. Sequencing of on- and potential off-target sites showed that editing occurred with high fidelity, while cell mortality was minimized. This approach provides a simple and highly effective strategy for enhancing site-specific genome engineering in both transformed and primary human cells.

988 citations


Journal ArticleDOI
TL;DR: Insects are model systems for studying aberrant mt genomes, including truncated tRNAs and multichromosomal genomes, and greater integration of nuclear and mt genomic studies is necessary to further the understanding of insect genomic evolution.
Abstract: The mitochondrial (mt) genome is, to date, the most extensively studied genomic system in insects, outnumbering nuclear genomes tenfold and representing all orders versus very few. Phylogenomic analysis methods have been tested extensively, identifying compositional bias and rate variation, both within and between lineages, as the principal issues confronting accurate analyses. Major studies at both inter- and intraordinal levels have contributed to our understanding of phylogenetic relationships within many groups. Genome rearrangements are an additional data type for defining relationships, with rearrangement synapomorphies identified across multiple orders and at many different taxonomic levels. Hymenoptera and Psocodea have greatly elevated rates of rearrangement offering both opportunities and pitfalls for identifying rearrangement synapomorphies in each group. Finally, insects are model systems for studying aberrant mt genomes, including truncated tRNAs and multichromosomal genomes. Greater integration of nuclear and mt genomic studies is necessary to further our understanding of insect genomic evolution.

910 citations


Journal ArticleDOI
Guojie Zhang1, Guojie Zhang2, Cai Li2, Qiye Li2, Bo Li2, Denis M. Larkin3, Chul Hee Lee4, Jay F. Storz5, Agostinho Antunes6, Matthew J. Greenwold7, Robert W. Meredith8, Anders Ödeen9, Jie Cui10, Qi Zhou11, Luohao Xu2, Hailin Pan2, Zongji Wang12, Lijun Jin2, Pei Zhang2, Haofu Hu2, Wei Yang2, Jiang Hu2, Jin Xiao2, Zhikai Yang2, Yang Liu2, Qiaolin Xie2, Hao Yu2, Jinmin Lian2, Ping Wen2, Fang Zhang2, Hui Li2, Yongli Zeng2, Zijun Xiong2, Shiping Liu12, Long Zhou2, Zhiyong Huang2, Na An2, Jie Wang13, Qiumei Zheng2, Yingqi Xiong2, Guangbiao Wang2, Bo Wang2, Jingjing Wang2, Yu Fan14, Rute R. da Fonseca1, Alonzo Alfaro-Núñez1, Mikkel Schubert1, Ludovic Orlando1, Tobias Mourier1, Jason T. Howard15, Ganeshkumar Ganapathy15, Andreas R. Pfenning15, Osceola Whitney15, Miriam V. Rivas15, Erina Hara15, Julia Smith15, Marta Farré3, Jitendra Narayan16, Gancho T. Slavov16, Michael N Romanov17, Rui Borges6, João Paulo Machado6, Imran Khan6, Mark S. Springer18, John Gatesy18, Federico G. Hoffmann19, Juan C. Opazo20, Olle Håstad21, Roger H. Sawyer7, Heebal Kim4, Kyu-Won Kim4, Hyeon Jeong Kim4, Seoae Cho4, Ning Li22, Yinhua Huang22, Michael William Bruford23, Xiangjiang Zhan13, Andrew Dixon, Mads F. Bertelsen24, Elizabeth P. Derryberry25, Wesley C. Warren26, Richard K. Wilson26, Shengbin Li27, David A. Ray19, Richard E. Green28, Stephen J. O'Brien29, Darren K. Griffin17, Warren E. Johnson30, David Haussler28, Oliver A. Ryder, Eske Willerslev1, Gary R. Graves31, Per Alström21, Jon Fjeldså32, David P. Mindell33, Scott V. Edwards34, Edward L. Braun35, Carsten Rahbek32, David W. Burt36, Peter Houde37, Yong Zhang2, Huanming Yang38, Jian Wang2, Erich D. Jarvis15, M. Thomas P. Gilbert1, M. Thomas P. Gilbert39, Jun Wang 
12 Dec 2014-Science
TL;DR: This work explored bird macroevolution using full genomes from 48 avian species representing all major extant clades to reveal that pan-avian genomic diversity covaries with adaptations to different lifestyles and convergent evolution of traits.
Abstract: Birds are the most species-rich class of tetrapod vertebrates and have wide relevance across many research fields. We explored bird macroevolution using full genomes from 48 avian species representing all major extant clades. The avian genome is principally characterized by its constrained size, which predominantly arose because of lineage-specific erosion of repetitive elements, large segmental deletions, and gene loss. Avian genomes furthermore show a remarkably high degree of evolutionary stasis at the levels of nucleotide sequence, gene synteny, and chromosomal structure. Despite this pattern of conservation, we detected many non-neutral evolutionary changes in protein-coding genes and noncoding regions. These analyses reveal that pan-avian genomic diversity covaries with adaptations to different lifestyles and convergent evolution of traits.

872 citations


Journal ArticleDOI
Patrick J. Keeling1, Patrick J. Keeling2, Fabien Burki2, Heather M. Wilcox3, Bassem Allam4, Eric E. Allen5, Linda A. Amaral-Zettler6, Linda A. Amaral-Zettler7, E. Virginia Armbrust8, John M. Archibald1, John M. Archibald9, Arvind K. Bharti10, Callum J. Bell10, Bank Beszteri11, Kay D. Bidle12, Connor Cameron10, Lisa Campbell13, David A. Caron14, Rose Ann Cattolico8, Jackie L. Collier4, Kathryn J. Coyne15, Simon K. Davy16, Phillipe Deschamps17, Sonya T. Dyhrman18, Bente Edvardsen19, Ruth D. Gates20, Christopher J. Gobler4, Spencer J. Greenwood21, Stephanie Guida10, Jennifer L. Jacobi10, Kjetill S. Jakobsen19, Erick R. James2, Bethany D. Jenkins22, Uwe John11, Matthew D. Johnson23, Andrew R. Juhl18, Anja Kamp24, Anja Kamp25, Laura A. Katz26, Ronald P. Kiene27, Alexander Kudryavtsev28, Alexander Kudryavtsev29, Brian S. Leander2, Senjie Lin30, Connie Lovejoy31, Denis H. Lynn2, Denis H. Lynn32, Adrian Marchetti33, George B. McManus30, Aurora M. Nedelcu34, Susanne Menden-Deuer22, Cristina Miceli35, Thomas Mock36, Marina Montresor37, Mary Ann Moran38, Shauna A. Murray39, Govind Nadathur40, Satoshi Nagai, Peter B. Ngam10, Brian Palenik5, Jan Pawlowski28, Giulio Petroni41, Gwenael Piganeau42, Matthew C. Posewitz43, Karin Rengefors44, Giovanna Romano37, Mary E. Rumpho30, Tatiana A. Rynearson22, Kelly B. Schilling10, Declan C. Schroeder, Alastair G. B. Simpson1, Alastair G. B. Simpson9, Claudio H. Slamovits9, Claudio H. Slamovits1, David Roy Smith45, G. Jason Smith46, Sarah R. Smith5, Heidi M. Sosik23, Peter Stief25, Edward C. Theriot47, Scott N. Twary48, Pooja E. Umale10, Daniel Vaulot49, Boris Wawrik50, Glen L. Wheeler51, William H. Wilson52, Yan Xu53, Adriana Zingone37, Alexandra Z. Worden3, Alexandra Z. Worden1 
Canadian Institute for Advanced Research1, University of British Columbia2, Monterey Bay Aquarium Research Institute3, Stony Brook University4, University of California, San Diego5, Brown University6, Marine Biological Laboratory7, University of Washington8, Dalhousie University9, National Center for Genome Resources10, Alfred Wegener Institute for Polar and Marine Research11, Rutgers University12, Texas A&M University13, University of Southern California14, University of Delaware15, Victoria University of Wellington16, University of Paris-Sud17, Columbia University18, University of Oslo19, University of Hawaii at Manoa20, University of Prince Edward Island21, University of Rhode Island22, Woods Hole Oceanographic Institution23, Jacobs University Bremen24, Max Planck Society25, Smith College26, University of South Alabama27, University of Geneva28, Saint Petersburg State University29, University of Connecticut30, Laval University31, University of Guelph32, University of North Carolina at Chapel Hill33, University of New Brunswick34, University of Camerino35, University of East Anglia36, Stazione Zoologica Anton Dohrn37, University of Georgia38, University of Technology, Sydney39, University of Puerto Rico40, University of Pisa41, Centre national de la recherche scientifique42, Colorado School of Mines43, Lund University44, University of Western Ontario45, California State University46, University of Texas at Austin47, Los Alamos National Laboratory48, Pierre-and-Marie-Curie University49, University of Oklahoma50, Plymouth Marine Laboratory51, Bigelow Laboratory For Ocean Sciences52, Princeton University53
TL;DR: In this paper, the authors describe a resource of 700 transcriptomes from marine microbial eukaryotes to help understand their role in the world's oceans and their biology, evolution, and ecology.
Abstract: Current sampling of genomic sequence data from eukaryotes is relatively poor, biased, and inadequate to address important questions about their biology, evolution, and ecology; this Community Page describes a resource of 700 transcriptomes from marine microbial eukaryotes to help understand their role in the world's oceans.

Journal ArticleDOI
TL;DR: Ngs.plot is a standalone program to visualize enrichment patterns of DNA-interacting proteins at functionally important regions based on next-generation sequencing data and is a useful tool to help fill the gap between massive datasets and genomic information in this era of big sequencing data.
Abstract: Understanding the relationship between the millions of functional DNA elements and their protein regulators, and how they work in conjunction to manifest diverse phenotypes, is key to advancing our understanding of the mammalian genome. Next-generation sequencing technology is now used widely to probe these protein-DNA interactions and to profile gene expression at a genome-wide scale. As the cost of DNA sequencing continues to fall, the interpretation of the ever increasing amount of data generated represents a considerable challenge. We have developed ngs.plot – a standalone program to visualize enrichment patterns of DNA-interacting proteins at functionally important regions based on next-generation sequencing data. We demonstrate that ngs.plot is not only efficient but also scalable. We use a few examples to demonstrate that ngs.plot is easy to use and yet very powerful to generate figures that are publication ready. We conclude that ngs.plot is a useful tool to help fill the gap between massive datasets and genomic information in this era of big sequencing data.

Journal ArticleDOI
TL;DR: Methods to make high-confidence, single-nucleotide polymorphism (SNP), indel and homozygous reference genotype calls for NA12878, the pilot genome for the Genome in a Bottle Consortium are presented.
Abstract: Clinical adoption of human genome sequencing requires methods that output genotypes with known accuracy at millions or billions of positions across a genome. Because of substantial discordance among calls made by existing sequencing methods and algorithms, there is a need for a highly accurate set of genotypes across a genome that can be used as a benchmark. Here we present methods to make high-confidence, single-nucleotide polymorphism (SNP), indel and homozygous reference genotype calls for NA12878, the pilot genome for the Genome in a Bottle Consortium. We minimize bias toward any method by integrating and arbitrating between 14 data sets from five sequencing technologies, seven read mappers and three variant callers. We identify regions for which no confident genotype call could be made, and classify them into different categories based on reasons for uncertainty. Our genotype calls are publicly available on the Genome Comparison and Analytic Testing website to enable real-time benchmarking of any method.

Journal ArticleDOI
TL;DR: Mapping genome-wide binding sites of catalytically inactive Cas9 in HEK293T cells and analysis of off-target binding sites showed the importance of the PAM-proximal region of the sgRNA guiding sequence and that dCas9 binding sites are enriched in open chromatin regions, and it is shown that ChIP-seq allows unbiased detection of Cas9 binding Site-wide.
Abstract: ChIP-seq for Cas9 shows varying amounts of off-target binding with different guide RNAs and low levels of indels at some off-target sites.

Journal ArticleDOI
TL;DR: The strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the observed differences in estimated genomic coverage, and the biological implications of these discrepancies are reviewed.
Abstract: With the completion of the human genome sequence, attention turned to identifying and annotating its functional DNA elements. As a complement to genetic and comparative genomics approaches, the Encyclopedia of DNA Elements Project was launched to contribute maps of RNA transcripts, transcriptional regulator binding sites, and chromatin states in many cell types. The resulting genome-wide data reveal sites of biochemical activity with high positional resolution and cell type specificity that facilitate studies of gene regulation and interpretation of noncoding variants associated with human disease. However, the biochemically active regions cover a much larger fraction of the genome than do evolutionarily conserved regions, raising the question of whether nonconserved but biochemically active regions are truly functional. Here, we review the strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the observed differences in estimated genomic coverage, and the biological implications of these discrepancies. We also analyze the relationship between signal intensity, genomic coverage, and evolutionary conservation. Our results reinforce the principle that each approach provides complementary information and that we need to use combinations of all three to elucidate genome function in human biology and disease.

Journal ArticleDOI
TL;DR: The 1000 bull genomes project supports the goal of accelerating the rates of genetic gain in domestic cattle while at the same time considering animal health and welfare by providing the annotated sequence variants and genotypes of key ancestor bulls.
Abstract: The 1000 bull genomes project supports the goal of accelerating the rates of genetic gain in domestic cattle while at the same time considering animal health and welfare by providing the annotated sequence variants and genotypes of key ancestor bulls. In the first phase of the 1000 bull genomes project, we sequenced the whole genomes of 234 cattle to an average of 8.3-fold coverage. This sequencing includes data for 129 individuals from the global Holstein-Friesian population, 43 individuals from the Fleckvieh breed and 15 individuals from the Jersey breed. We identified a total of 28.3 million variants, with an average of 1.44 heterozygous sites per kilobase for each individual. We demonstrate the use of this database in identifying a recessive mutation underlying embryonic death and a dominant mutation underlying lethal chrondrodysplasia. We also performed genome-wide association studies for milk production and curly coat, using imputed sequence variants, and identified variants associated with these traits in cattle.

Journal ArticleDOI
Hans Ellegren1
TL;DR: High-throughput sequencing technologies are revolutionizing the life sciences, and the past 12 months have seen a burst of genome sequences from non-model organisms, in each case representing a fundamental source of data of significant importance to biological research.
Abstract: High-throughput sequencing technologies are revolutionizing the life sciences. The past 12 months have seen a burst of genome sequences from non-model organisms, in each case representing a fundamental source of data of significant importance to biological research. This has bearing on several aspects of evolutionary biology, and we are now beginning to see patterns emerging from these studies. These include significant heterogeneity in the rate of recombination that affects adaptive evolution and base composition, the role of population size in adaptive evolution, and the importance of expansion of gene families in lineage-specific adaptation. Moreover, resequencing of population samples (population genomics) has enabled the identification of the genetic basis of critical phenotypes and cast light on the landscape of genomic divergence during speciation.

Journal ArticleDOI
TL;DR: RnBeads is a software tool for large-scale analysis and interpretation of DNA methylation data, providing a user-friendly analysis workflow that yields detailed hypertext reports (http://rnbeads.mpi-inf.mpg).
Abstract: RnBeads is a software tool for large-scale analysis and interpretation of DNA methylation data, providing a user-friendly analysis workflow that yields detailed hypertext reports (http://rnbeads.mpi-inf.mpg.de/). Supported assays include whole-genome bisulfite sequencing, reduced representation bisulfite sequencing, Infinium microarrays and any other protocol that produces high-resolution DNA methylation data. Notable applications of RnBeads include the analysis of epigenome-wide association studies and epigenetic biomarker discovery in cancer cohorts.

Journal ArticleDOI
TL;DR: This work presents genome-wide annotation of variants (GWAVA), a tool that supports prioritization of noncoding variants by integrating various genomic and epigenomic annotations.
Abstract: Identifying functionally relevant variants against the background of ubiquitous genetic variation is a major challenge in human genetics For variants in protein-coding regions, our understanding of the genetic code and splicing allows us to identify likely candidates, but interpreting variants outside genic regions is more difficult Here we present genome-wide annotation of variants (GWAVA), a tool that supports prioritization of noncoding variants by integrating various genomic and epigenomic annotations

Journal ArticleDOI
TL;DR: New frequency- and sequence-based approaches are used to comprehensively scan the genome for noncoding mutations with potential regulatory impact and identify recurrent mutations in regulatory elements upstream of PLEKHS1, WDR74 and SDHD, as well as previously identified mutations in the TERT promoter.
Abstract: Cancer primarily develops because of somatic alterations in the genome. Advances in sequencing have enabled large-scale sequencing studies across many tumor types, emphasizing the discovery of alterations in protein-coding genes. However, the protein-coding exome comprises less than 2% of the human genome. Here we analyze the complete genome sequences of 863 human tumors from The Cancer Genome Atlas and other sources to systematically identify noncoding regions that are recurrently mutated in cancer. We use new frequency- and sequence-based approaches to comprehensively scan the genome for noncoding mutations with potential regulatory impact. These methods identify recurrent mutations in regulatory elements upstream of PLEKHS1, WDR74 and SDHD, as well as previously identified mutations in the TERT promoter. SDHD promoter mutations are frequent in melanoma and are associated with reduced gene expression and poor prognosis. The non-protein-coding cancer genome remains widely unexplored, and our findings represent a step toward targeting the entire genome for clinical purposes.

Journal ArticleDOI
TL;DR: In this article, a lentivirus-delivered sgRNA:Cas9 genome editing was used to generate mice with acute myeloid leukemia (AML) with cooperating mutations in genes encoding epigenetic modifiers, transcription factors and mediators of cytokine signaling.
Abstract: Genome sequencing studies have shown that human malignancies often bear mutations in four or more driver genes, but it is difficult to recapitulate this degree of genetic complexity in mouse models using conventional breeding. Here we use the CRISPR-Cas9 system of genome editing to overcome this limitation. By delivering combinations of small guide RNAs (sgRNAs) and Cas9 with a lentiviral vector, we modified up to five genes in a single mouse hematopoietic stem cell (HSC), leading to clonal outgrowth and myeloid malignancy. We thereby generated models of acute myeloid leukemia (AML) with cooperating mutations in genes encoding epigenetic modifiers, transcription factors and mediators of cytokine signaling, recapitulating the combinations of mutations observed in patients. Our results suggest that lentivirus-delivered sgRNA:Cas9 genome editing should be useful to engineer a broad array of in vivo cancer models that better reflect the complexity of human disease.

Journal ArticleDOI
TL;DR: The source of the microbial genomic sequences in the RefSeq collection is the set of primary sequence records submitted to the International Nucleotide Sequence Database public archives.
Abstract: The source of the microbial genomic sequences in the RefSeq collection is the set of primary sequence records submitted to the International Nucleotide Sequence Database public archives. These can be accessed through the Entrez search and retrieval system at http://www.ncbi.nlm.nih.gov/genome. Next-generation sequencing has enabled researchers to perform genomic sequencing at rates that were unimaginable in the past. Microbial genomes can now be sequenced in a matter of hours, which has led to a significant increase in the number of assembled genomes deposited in the public archives. This huge increase in DNA sequence data presents new challenges for the annotation, analysis and visualization bioinformatics tools. New strategies have been developed for the annotation and representation of reference genomes and sequence variations derived from population studies and clinical outbreaks.

Journal ArticleDOI
Hao Luo1, Yan Lin1, Feng Gao1, Chun Ting Zhang1, Ren Zhang1 
TL;DR: DEG 10 includes essential genomic elements under different conditions in three domains of life, with customizable BLAST tools.
Abstract: The combination of high-density transposon-mediated mutagenesis and high-throughput sequencing has led to significant advancements in research on essential genes, resulting in a dramatic increase in the number of identified prokaryotic essential genes under diverse conditions and a revised essential-gene concept that includes all essential genomic elements, rather than focusing on protein-coding genes only. DEG 10, a new release of the Database of Essential Genes (available at http://www.essentialgene.org), has been developed to accommodate these quantitative and qualitative advancements. In addition to increasing the number of bacterial and archaeal essential genes determined by genome-wide gene essentiality screens, DEG 10 also harbors essential noncoding RNAs, promoters, regulatory sequences and replication origins. These essential genomic elements are determined not only in vitro, but also in vivo, under diverse conditions including those for survival, pathogenesis and antibiotic resistance. We have developed customizable BLAST tools that allow users to perform species- and experiment-specific BLAST searches for a single gene, a list of genes, annotated or unannotated genomes. Therefore, DEG 10 includes essential genomic elements under different conditions in three domains of life, with customizable BLAST tools.

Journal ArticleDOI
24 Apr 2014-Nature
TL;DR: Although some genes evolved novel functions through spatial/temporal expression shifts, most Y genes probably endured, at least initially, because of dosage constraints, and show notable conservation of proto-sex chromosome expression patterns.
Abstract: Y chromosomes underlie sex determination in mammals, but their repeat-rich nature has hampered sequencing and associated evolutionary studies. Here we trace Y evolution across 15 representative mammals on the basis of high-throughput genome and transcriptome sequencing. We uncover three independent sex chromosome originations in mammals and birds (the outgroup). The original placental and marsupial (therian) Y, containing the sex-determining gene SRY, emerged in the therian ancestor approximately 180 million years ago, in parallel with the first of five monotreme Y chromosomes, carrying the probable sex-determining gene AMH. The avian W chromosome arose approximately 140 million years ago in the bird ancestor. The small Y/W gene repertoires, enriched in regulatory functions, were rapidly defined following stratification (recombination arrest) and erosion events and have remained considerably stable. Despite expression decreases in therians, Y/W genes show notable conservation of proto-sex chromosome expression patterns, although various Y genes evolved testis-specificities through differential regulatory decay. Thus, although some genes evolved novel functions through spatial/temporal expression shifts, most Y genes probably endured, at least initially, because of dosage constraints. Using high-throughput genome and transcriptome sequencing, Y chromosome evolution across 15 representative mammals is explored, with results providing evidence for three independent sex chromosome originations in mammals and birds. Mammalian Y chromosomes, known for their roles in sex determination and male fertility, often contain repetitive sequences that make them harder to assemble than the rest of the genome. To counter this problem Henrik Kaessmann and colleagues have developed a new transcript assembly approach based on male-specific RNA/genomic sequencing data to explore Y evolution across 15 species representing all major mammalian lineages. They find evidence for two independent sex chromosome originations in mammals and one in birds. Their analysis of the Y/W gene repertoires suggests that although some genes evolved novel functions in sex determination/spermatogenesis as a result of temporal/spatial expression changes, most Y genes probably persisted, at least initially, as a result of dosage constraints. In a parallel study, Daniel Bellott and colleagues reconstructed the evolution of the Y chromosome, using a comprehensive comparative analysis of the genomic sequence of X–Y gene pairs from seven placental mammals and one marsupial. They conclude that evolution streamlined the gene content of the human Y chromosome through selection to maintain the ancestral dosage of homologous X–Y gene pairs that regulate gene expression throughout the body. They propose that these genes make the Y chromosome essential for male viability and contribute to differences between the sexes in health and disease.

Journal ArticleDOI
TL;DR: In this review, the impacts of WES in medical genetics as well as its consequences leading to improve health care are summarized.
Abstract: Massively parallel DNA-sequencing systems provide sequence of huge numbers of different DNA strands at once. These technologies are revolutionizing our understanding in medical genetics, accelerating health-improvement projects, and ushering to a fully understood personalized medicine in near future. Whole-exome sequencing (WES) is application of the next-generation technology to determine the variations of all coding regions, or exons, of known genes. WES provides coverage of more than 95% of the exons, which contains 85% of disease-causing mutations in Mendelian disorders and many disease-predisposing SNPs throughout the genome. The role of more than 150 genes has been distinguished by means of WES, and this statistics is quickly growing. In this review, the impacts of WES in medical genetics as well as its consequences leading to improve health care are summarized.

Journal ArticleDOI
TL;DR: The current understanding of mutational patterns and mutational signatures in light of both the somatic cell paradigm of cancer research and the recent developments in the field of cancer genomics is summarized.

Journal ArticleDOI
Catherine A. Brownstein1, Alan H. Beggs1, Nils Homer, Barry Merriman2  +207 moreInstitutions (53)
TL;DR: The CLARITY Challenge provides a comprehensive assessment of current practices for using genome sequencing to diagnose and report genetic diseases and reveals a general convergence of practices on most elements of the analysis and interpretation process.
Abstract: Background There is tremendous potential for genome sequencing to improve clinical diagnosis and care once it becomes routinely accessible, but this will require formalizing research methods into clinical best practices in the areas of sequence data generation, analysis, interpretation and reporting. The CLARITY Challenge was designed to spur convergence in methods for diagnosing genetic disease starting from clinical case history and genome sequencing data. DNA samples were obtained from three families with heritable genetic disorders and genomic sequence data were donated by sequencing platform vendors. The challenge was to analyze and interpret these data with the goals of identifying disease-causing variants and reporting the findings in a clinically useful format. Participating contestant groups were solicited broadly, and an independent panel of judges evaluated their performance.

Journal ArticleDOI
TL;DR: This special issue of International Journal of Systematic and Evolutionary Microbiology contains both original research and review articles covering the use of genomic sequence data in microbial taxonomy and systematics, and outlines of approaches for incorporating genomics into new strain isolation to new taxon description workflows.
Abstract: The polyphasic approach used today in the taxonomy and systematics of the Bacteria and Archaea includes the use of phenotypic, chemotaxonomic and genotypic data. The use of 16S rRNA gene sequence data has revolutionized our understanding of the microbial world and led to a rapid increase in the number of descriptions of novel taxa, especially at the species level. It has allowed in many cases for the demarcation of taxa into distinct species, but its limitations in a number of groups have resulted in the continued use of DNA-DNA hybridization. As technology has improved, next-generation sequencing (NGS) has provided a rapid and cost-effective approach to obtaining whole-genome sequences of microbial strains. Although some 12,000 bacterial or archaeal genome sequences are available for comparison, only 1725 of these are of actual type strains, limiting the use of genomic data in comparative taxonomic studies when there are nearly 11,000 type strains. Efforts to obtain complete genome sequences of all type strains are critical to the future of microbial systematics. The incorporation of genomics into the taxonomy and systematics of the Bacteria and Archaea coupled with computational advances will boost the credibility of taxonomy in the genomic era. This special issue of International Journal of Systematic and Evolutionary Microbiology contains both original research and review articles covering the use of genomic sequence data in microbial taxonomy and systematics. It includes contributions on specific taxa as well as outlines of approaches for incorporating genomics into new strain isolation to new taxon description workflows.

Journal ArticleDOI
TL;DR: In this paper, the authors used a whole genome shotgun approach relying on next generation sequence generated from a single haploid seed megagametophyte from a loblolly pine tree, 20-1010, that has been used in industrial forest tree breeding.
Abstract: The size and complexity of conifer genomes has, until now, prevented full genome sequencing and assembly. The large research community and economic importance of loblolly pine, Pinus taeda L., made it an early candidate for reference sequence determination. We develop a novel strategy to sequence the genome of loblolly pine that combines unique aspects of pine reproductive biology and genome assembly methodology. We use a whole genome shotgun approach relying primarily on next generation sequence generated from a single haploid seed megagametophyte from a loblolly pine tree, 20-1010, that has been used in industrial forest tree breeding. The resulting sequence and assembly was used to generate a draft genome spanning 23.2 Gbp and containing 20.1 Gbp with an N50 scaffold size of 66.9 kbp, making it a significant improvement over available conifer genomes. The long scaffold lengths allow the annotation of 50,172 gene models with intron lengths averaging over 2.7 kbp and sometimes exceeding 100 kbp in length. Analysis of orthologous gene sets identifies gene families that may be unique to conifers. We further characterize and expand the existing repeat library based on the de novo analysis of the repetitive content, estimated to encompass 82% of the genome. In addition to its value as a resource for researchers and breeders, the loblolly pine genome sequence and assembly reported here demonstrates a novel approach to sequencing the large and complex genomes of this important group of plants that can now be widely applied.

Journal ArticleDOI
TL;DR: This work compared eight in silico tools on 2959 single nucleotide variants within splicing consensus regions (scSNVs) using receiver operating characteristic analysis and pre-computed ensemble scores for all potential scSNVs across the human genome, providing a whole genome level resource for identifying splice-altering sc SNVs discovered from large-scale sequencing studies.
Abstract: In silico tools have been developed to predict variants that may have an impact on pre-mRNA splicing. The major limitation of the application of these tools to basic research and clinical practice is the difficulty in interpreting the output. Most tools only predict potential splice sites given a DNA sequence without measuring splicing signal changes caused by a variant. Another limitation is the lack of large-scale evaluation studies of these tools. We compared eight in silico tools on 2959 single nucleotide variants within splicing consensus regions (scSNVs) using receiver operating characteristic analysis. The Position Weight Matrix model and MaxEntScan outperformed other methods. Two ensemble learning methods, adaptive boosting and random forests, were used to construct models that take advantage of individual methods. Both models further improved prediction, with outputs of directly interpretable prediction scores. We applied our ensemble scores to scSNVs from the Catalogue of Somatic Mutations in Cancer database. Analysis showed that predicted splice-altering scSNVs are enriched in recurrent scSNVs and known cancer genes. We pre-computed our ensemble scores for all potential scSNVs across the human genome, providing a whole genome level resource for identifying splice-altering scSNVs discovered from large-scale sequencing studies.