Author
Jacquie Schein
Other affiliations: Washington University in St. Louis, University of British Columbia, Simon Fraser University
Bio: Jacquie Schein is an academic researcher from BC Cancer Agency. The author has contributed to research in topics: Gene & Genome. The author has an hindex of 17, co-authored 27 publications receiving 13073 citations. Previous affiliations of Jacquie Schein include Washington University in St. Louis & University of British Columbia.
Papers
More filters
••
Oak Ridge National Laboratory1, University of Tennessee2, West Virginia University3, Umeå University4, University of British Columbia5, United States Department of Energy6, Ghent University7, Swedish University of Agricultural Sciences8, Institut national de la recherche agronomique9, Virginia Tech10, Michigan Technological University11, University of Toronto12, Pennsylvania State University13, University of Provence14, University of Georgia15, University of Florida16, University of California, Berkeley17, Lawrence Berkeley National Laboratory18, University of Arizona19, Purdue University20, Stanford University21, United States Department of Agriculture22, University of Turku23, University of Helsinki24, Massachusetts Institute of Technology25, University of Tennessee Health Science Center26, University of Tübingen27
TL;DR: The draft genome of the black cottonwood tree, Populus trichocarpa, has been reported in this paper, with more than 45,000 putative protein-coding genes identified.
Abstract: We report the draft genome of the black cottonwood tree, Populus trichocarpa. Integration of shotgun sequence assembly with genetic mapping enabled chromosome-scale reconstruction of the genome. More than 45,000 putative protein-coding genes were identified. Analysis of the assembled genome revealed a whole-genome duplication event; about 8000 pairs of duplicated genes from that event survived in the Populus genome. A second, older duplication event is indistinguishably coincident with the divergence of the Populus and Arabidopsis lineages. Nucleotide substitution, tandem gene duplication, and gross chromosomal rearrangement appear to proceed substantially more slowly in Populus than in Arabidopsis. Populus has more protein-coding genes than Arabidopsis, ranging on average from 1.4 to 1.6 putative Populus homologs for each Arabidopsis gene. However, the relative frequency of protein domains in the two genomes is similar. Overrepresented exceptions in Populus include genes associated with lignocellulosic wall biosynthesis, meristem development, disease resistance, and metabolite transport.
4,025 citations
••
Washington University in St. Louis1, Brown University2, University of British Columbia3, University of North Carolina at Chapel Hill4, University of Southern California5, Massachusetts Institute of Technology6, Seattle Cancer Care Alliance7, Johns Hopkins University8, University of Texas MD Anderson Cancer Center9, Nationwide Children's Hospital10, National Institutes of Health11, SRA International12, Temple University13, University of Chicago14, University of Pennsylvania15
TL;DR: It is found that a complex interplay of genetic events contributes to AML pathogenesis in individual patients and the databases from this study are widely available to serve as a foundation for further investigations of AMl pathogenesis, classification, and risk stratification.
Abstract: BACKGROUND—Many mutations that contribute to the pathogenesis of acute myeloid leukemia (AML) are undefined The relationships between patterns of mutations and epigenetic phenotypes are not yet clear METHODS—We analyzed the genomes of 200 clinically annotated adult cases of de novo AML, using either whole-genome sequencing (50 cases) or whole-exome sequencing (150 cases), along with RNA and microRNA sequencing and DNA-methylation analysis RESULTS—AML genomes have fewer mutations than most other adult cancers, with an average of only 13 mutations found in genes Of these, an average of 5 are in genes that are recurrently mutated in AML A total of 23 genes were significantly mutated, and another 237 were mutated in two or more samples Nearly all samples had at least 1 nonsynonymous mutation in one of nine categories of genes that are almost certainly relevant for pathogenesis, including transcriptionfactor fusions (18% of cases), the gene encoding nucleophosmin (NPM1) (27%), tumorsuppressor genes (16%), DNA-methylation–related genes (44%), signaling genes (59%), chromatin-modifying genes (30%), myeloid transcription-factor genes (22%), cohesin-complex genes (13%), and spliceosome-complex genes (14%) Patterns of cooperation and mutual exclusivity suggested strong biologic relationships among several of the genes and categories CONCLUSIONS—We identified at least one potential driver mutation in nearly all AML samples and found that a complex interplay of genetic events contributes to AML pathogenesis in individual patients The databases from this study are widely available to serve as a foundation for further investigations of AML pathogenesis, classification, and risk stratification (Funded by the National Institutes of Health) The molecular pathogenesis of acute myeloid leukemia (AML) has been studied with the use of cytogenetic analysis for more than three decades Recurrent chromosomal structural variations are well established as diagnostic and prognostic markers, suggesting that acquired genetic abnormalities (ie, somatic mutations) have an essential role in pathogenesis 1,2 However, nearly 50% of AML samples have a normal karyotype, and many of these genomes lack structural abnormalities, even when assessed with high-density comparative genomic hybridization or single-nucleotide polymorphism (SNP) arrays 3-5 (see Glossary) Targeted sequencing has identified recurrent mutations in FLT3, NPM1, KIT, CEBPA, and TET2 6-8 Massively parallel sequencing enabled the discovery of recurrent mutations in DNMT3A 9,10 and IDH1 11 Recent studies have shown that many patients with
3,980 citations
••
TL;DR: The ENCyclopedia Of DNA Elements (ENCODE) Project is organized as an international consortium of computational and laboratory-based scientists working to develop and apply high-throughput approaches for detecting all sequence elements that confer biological function.
Abstract: The ENCyclopedia Of DNA Elements (ENCODE) Project aims to identify all functional elements in the human genome sequence. The pilot phase of the Project is focused on a specified 30 megabases (∼1%) of the human genome sequence and is organized as an international consortium of computational and laboratory-based scientists working to develop and apply high-throughput approaches for detecting all sequence elements that confer biological function. The results of this pilot phase will guide future efforts to analyze the entire human genome.
2,248 citations
••
TL;DR: Recurrent somatic mutations affecting the polycomb-group oncogene EZH2, which encodes a histone methyltransferase responsible for trimethylating Lys27 of histone H3 (H3K27), are reported, consistent with the notion that EZh2 proteins with mutant Tyr641 have reduced enzymatic activity in vitro.
Abstract: Marco Marra and colleagues identify somatic mutations in EZH2 in diffuse large B-cell lymphomas and follicular lymphomas. EZH2 is a histone methyltransferase that participates in trimethylation of H3 Lys27 (H3K27) as part of the PRC2 complex. The mutations alter a single tyrosine residue in the SET domain of EZH2 and reduce the ability of PRC2 to trimethylate H3K27 in vitro.
1,468 citations
••
Wellcome Trust Sanger Institute1, Seattle Biomed2, Katholieke Universiteit Leuven3, GATC Biotech4, Max Planck Society5, Washington University in St. Louis6, University of Trieste7, International Centre for Genetic Engineering and Biotechnology8, European Bioinformatics Institute9, University of São Paulo10, National Scientific and Technical Research Council11, Université catholique de Louvain12, University of London13, University of Edinburgh14, University of Glasgow15, University of Wisconsin-Madison16, University of York17, University of Cambridge18, University of Washington19
TL;DR: The organization of protein-coding genes into long, strand-specific, polycistronic clusters and lack of general transcription factors in the L. major, Trypanosoma brucei, and Tritryp genomes suggest that the mechanisms regulating RNA polymerase II–directed transcription are distinct from those operating in other eukaryotes, although the trypanosomatids appear capable of chromatin remodeling.
Abstract: Leishmania species cause a spectrum of human diseases in tropical and subtropical regions of the world. We have sequenced the 36 chromosomes of the 32.8-megabase haploid genome of Leishmania major (Friedlin strain) and predict 911 RNA genes, 39 pseudogenes, and 8272 protein-coding genes, of which 36% can be ascribed a putative function. These include genes involved in host-pathogen interactions, such as proteolytic enzymes, and extensive machinery for synthesis of complex surface glycoconjugates. The organization of protein-coding genes into long, strand-specific, polycistronic clusters and lack of general transcription factors in the L. major, Trypanosoma brucei, and Trypanosoma cruzi (Tritryp) genomes suggest that the mechanisms regulating RNA polymerase II-directed transcription are distinct from those operating in other eukaryotes, although the trypanosomatids appear capable of chromatin remodeling. Abundant RNA-binding proteins are encoded in the Tritryp genomes, consistent with active posttranscriptional regulation of gene expression.
1,357 citations
Cited by
More filters
••
TL;DR: In this article, the authors present an approach for efficient and intuitive visualization tools able to scale to very large data sets and to flexibly integrate multiple data types, including clinical data.
Abstract: Rapid improvements in sequencing and array-based platforms are resulting in a flood of diverse genome-wide data, including data from exome and whole-genome sequencing, epigenetic surveys, expression profiling of coding and noncoding RNAs, single nucleotide polymorphism (SNP) and copy number profiling, and functional assays. Analysis of these large, diverse data sets holds the promise of a more comprehensive understanding of the genome and its relation to human disease. Experienced and knowledgeable human review is an essential component of this process, complementing computational approaches. This calls for efficient and intuitive visualization tools able to scale to very large data sets and to flexibly integrate multiple data types, including clinical data. However, the sheer volume and scope of data pose a significant challenge to the development of such tools.
10,798 citations
••
TL;DR: Circos uses a circular ideogram layout to facilitate the display of relationships between pairs of positions by the use of ribbons, which encode the position, size, and orientation of related genomic elements.
Abstract: We created a visualization tool called Circos to facilitate the identification and analysis of similarities and differences arising from comparisons of genomes. Our tool is effective in displaying variation in genome structure and, generally, any other kind of positional relationships between genomic intervals. Such data are routinely produced by sequence alignments, hybridization arrays, genome mapping, and genotyping studies. Circos uses a circular ideogram layout to facilitate the display of relationships between pairs of positions by the use of ribbons, which encode the position, size, and orientation of related genomic elements. Circos is capable of displaying data as scatter, line, and histogram plots, heat maps, tiles, connectors, and text. Bitmap or vector images can be created from GFF-style data inputs and hierarchical configuration files, which can be easily generated by automated tools, making Circos suitable for rapid deployment in data analysis and reporting pipelines.
8,315 citations
•
TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.
8,106 citations
••
TL;DR: The 2016 edition of the World Health Organization classification of tumors of the hematopoietic and lymphoid tissues represents a revision of the prior classification rather than an entirely new classification and attempts to incorporate new clinical, prognostic, morphologic, immunophenotypic, and genetic data that have emerged since the last edition.
7,147 citations
••
TL;DR: The Integrative Genomics Viewer (IGV) is a high-performance viewer that efficiently handles large heterogeneous data sets, while providing a smooth and intuitive user experience at all levels of genome resolution.
Abstract: Data visualization is an essential component of genomic data analysis. However, the size and diversity of the data sets produced by today’s sequencing and array-based profiling methods present major challenges to visualization tools. The Integrative Genomics Viewer (IGV) is a high-performance viewer that efficiently handles large heterogeneous data sets, while providing a smooth and intuitive user experience at all levels of genome resolution. A key characteristic of IGV is its focus on the integrative nature of genomic studies, with support for both array-based and next-generation sequencing data, and the integration of clinical and phenotypic data. Although IGV is often used to view genomic data from public sources, its primary emphasis is to support researchers who wish to visualize and explore their own data sets or those from colleagues. To that end, IGV supports flexible loading of local and remote data sets, and is optimized to provide high-performance data visualization and exploration on standard desktop systems. IGV is freely available for download from http://www.broadinstitute.org/igv, under a GNU LGPL open-source license.
6,930 citations