Author
Xin Zhou
Other affiliations: Rice University, Washington University in St. Louis, University of Minnesota ...read more
Bio: Xin Zhou is an academic researcher from St. Jude Children's Research Hospital. The author has contributed to research in topics: Computer science & Medicine. The author has an hindex of 36, co-authored 71 publications receiving 13444 citations. Previous affiliations of Xin Zhou include Rice University & Washington University in St. Louis.
Papers published on a yearly basis
Papers
More filters
••
Massachusetts Institute of Technology1, Broad Institute2, University of California, Los Angeles3, University of British Columbia4, Baylor College of Medicine5, Howard Hughes Medical Institute6, University of Washington7, Ludwig Institute for Cancer Research8, University of California, San Francisco9, University of Connecticut10, University of Zagreb11, University of Texas at Austin12, Washington University in St. Louis13, University of Queensland14, Harvard University15, Cold Spring Harbor Laboratory16, University of Southern California17, University of California, Santa Cruz18, Simon Fraser University19, Morgridge Institute for Research20, University of Texas at Dallas21, National Institutes of Health22
TL;DR: It is shown that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease.
Abstract: The reference human genome sequence set the stage for studies of genetic variation and its association with human disease, but epigenomic studies lack a similar reference. To address this need, the NIH Roadmap Epigenomics Consortium generated the largest collection so far of human epigenomes for primary cells and tissues. Here we describe the integrative analysis of 111 reference human epigenomes generated as part of the programme, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression. We establish global maps of regulatory elements, define regulatory modules of coordinated activity, and their likely activators and repressors. We show that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease. Our results demonstrate the central role of epigenomic information for understanding gene regulation, cellular differentiation and human disease.
5,037 citations
••
TL;DR: AgriGO as discussed by the authors is an integrated web-based GO analysis toolkit for the agricultural community, using the advantages of EasyGO, to meet analysis demands from new technologies and research objectives.
Abstract: Gene Ontology (GO), the de facto standard in gene functionality description, is used widely in functional annotation and enrichment analysis. Here, we introduce agriGO, an integrated web-based GO analysis toolkit for the agricultural community, using the advantages of our previous GO enrichment tool (EasyGO), to meet analysis demands from new technologies and research objectives. EasyGO is valuable for its proficiency, and has proved useful in uncovering biological knowledge in massive data sets from high-throughput experiments. For agriGO, the system architecture and website interface were redesigned to improve performance and accessibility. The supported organisms and gene identifiers were substantially expanded (including 38 agricultural species composed of 274 data types). The requirement on user input is more flexible, in that user-defined reference and annotation are accepted. Moreover, a new analysis approach using Gene Set Enrichment Analysis strategy and customizable features is provided. Four tools, SEA (Singular enrichment analysis), PAGE (Parametric Analysis of Gene set Enrichment), BLAST4ID (Transfer IDs by BLAST) and SEACOMPARE (Cross comparison of SEA), are integrated as a toolkit to meet different demands. We also provide a cross-comparison service so that different data sets can be compared and explored in a visualized way. Lastly, agriGO functions as a GO data repository with search and download functions; agriGO is publicly accessible at http://bioinfo.cau.edu.cn/agriGO/.
2,274 citations
••
Susanne Gröbner1, Barbara C. Worst, Joachim Weischenfeldt2, Joachim Weischenfeldt3 +182 more•Institutions (23)
TL;DR: The data suggest that 7–8% of the children in this cohort carry an unambiguous predisposing germline variant and that nearly 50% of paediatric neoplasms harbour a potentially druggable event, which is highly relevant for the design of future clinical trials.
Abstract: Pan-cancer analyses that examine commonalities and differences among various cancer types have emerged as a powerful way to obtain novel insights into cancer biology. Here we present a comprehensive analysis of genetic alterations in a pan-cancer cohort including 961 tumours from children, adolescents, and young adults, comprising 24 distinct molecular types of cancer. Using a standardized workflow, we identified marked differences in terms of mutation frequency and significantly mutated genes in comparison to previously analysed adult cancers. Genetic alterations in 149 putative cancer driver genes separate the tumours into two classes: small mutation and structural/copy-number variant (correlating with germline variants). Structural variants, hyperdiploidy, and chromothripsis are linked to TP53 mutation status and mutational signatures. Our data suggest that 7-8% of the children in this cohort carry an unambiguous predisposing germline variant and that nearly 50% of paediatric neoplasms harbour a potentially druggable event, which is highly relevant for the design of future clinical trials.
958 citations
••
TL;DR: Germline mutations in cancer-predisposing genes were identified in 8.5% of the children and adolescents with cancer, and family history did not predict the presence of an underlying predisposition syndrome in most patients.
Abstract: BackgroundThe prevalence and spectrum of predisposing mutations among children and adolescents with cancer are largely unknown. Knowledge of such mutations may improve the understanding of tumorigenesis, direct patient care, and enable genetic counseling of patients and families. MethodsIn 1120 patients younger than 20 years of age, we sequenced the whole genomes (in 595 patients), whole exomes (in 456), or both (in 69). We analyzed the DNA sequences of 565 genes, including 60 that have been associated with autosomal dominant cancer-predisposition syndromes, for the presence of germline mutations. The pathogenicity of the mutations was determined by a panel of medical experts with the use of cancer-specific and locus-specific genetic databases, the medical literature, computational predictions, and second hits identified in the tumor genome. The same approach was used to analyze data from 966 persons who did not have known cancer in the 1000 Genomes Project, and a similar approach was used to analyze data...
886 citations
••
St. Jude Children's Research Hospital1, German Cancer Research Center2, Heidelberg University3, University of Copenhagen4, Massachusetts Institute of Technology5, European Bioinformatics Institute6, Max Planck Society7, Broad Institute8, University Hospital Heidelberg9, Oregon Health & Science University10, Boston Children's Hospital11, University of Tübingen12, University of California, Los Angeles13, Hospital Sant Joan de Déu Barcelona14, Duke University15, McGill University16, Kitasato University17, BC Cancer Agency18, University of Toronto19
TL;DR: The application of integrative genomics to an extensive cohort of clinical samples derived from a single childhood cancer entity revealed a series of cancer genes and biologically relevant subtype diversity that represent attractive therapeutic targets for the treatment of patients with medulloblastoma.
Abstract: Current therapies for medulloblastoma, a highly malignant childhood brain tumour, impose debilitating effects on the developing child, and highlight the need for molecularly targeted treatments with reduced toxicity. Previous studies have been unable to identify the full spectrum of driver genes and molecular processes that operate in medulloblastoma subgroups. Here we analyse the somatic landscape across 491 sequenced medulloblastoma samples and the molecular heterogeneity among 1,256 epigenetically analysed cases, and identify subgroup-specific driver alterations that include previously undiscovered actionable targets. Driver mutations were confidently assigned to most patients belonging to Group 3 and Group 4 medulloblastoma subgroups, greatly enhancing previous knowledge. New molecular subtypes were differentially enriched for specific driver events, including hotspot in-frame insertions that target KBTBD4 and 'enhancer hijacking' events that activate PRDM6. Thus, the application of integrative genomics to an extensive cohort of clinical samples derived from a single childhood cancer entity revealed a series of cancer genes and biologically relevant subtype diversity that represent attractive therapeutic targets for the treatment of patients with medulloblastoma.
706 citations
Cited by
More filters
••
TL;DR: FeatureCounts as discussed by the authors is a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments, which implements highly efficient chromosome hashing and feature blocking techniques.
Abstract: MOTIVATION: Next-generation sequencing technologies generate millions of short sequence reads, which are usually aligned to a reference genome. In many applications, the key information required for downstream analysis is the number of reads mapping to each genomic feature, for example to each exon or each gene. The process of counting reads is called read summarization. Read summarization is required for a great variety of genomic analyses but has so far received relatively little attention in the literature. RESULTS: We present featureCounts, a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments. featureCounts implements highly efficient chromosome hashing and feature blocking techniques. It is considerably faster than existing methods (by an order of magnitude for gene-level summarization) and requires far less computer memory. It works with either single or paired-end reads and provides a wide range of options appropriate for different sequencing applications. AVAILABILITY AND IMPLEMENTATION: featureCounts is available under GNU General Public License as part of the Subread (http://subread.sourceforge.net) or Rsubread (http://www.bioconductor.org) software packages.
14,103 citations
••
TL;DR: The survey will help tool designers/developers and experienced end users understand the underlying algorithms and pertinent details of particular tool categories/tools, enabling them to make the best choices for their particular research interests.
Abstract: Functional analysis of large gene lists, derived in most cases from emerging high-throughput genomic, proteomic and bioinformatics scanning approaches, is still a challenging and daunting task. The gene-annotation enrichment analysis is a promising high-throughput strategy that increases the likelihood for investigators to identify biological processes most pertinent to their study. Approximately 68 bioinformatics enrichment tools that are currently available in the community are collected in this survey. Tools are uniquely categorized into three major classes, according to their underlying enrichment algorithms. The comprehensive collections, unique tool classifications and associated questions/issues will provide a more comprehensive and up-to-date view regarding the advantages, pitfalls and recent trends in a simpler tool-class level rather than by a tool-by-tool approach. Thus, the survey will help tool designers/developers and experienced end users understand the underlying algorithms and pertinent details of particular tool categories/tools, enabling them to make the best choices for their particular research interests.
13,102 citations
••
Massachusetts Institute of Technology1, Broad Institute2, University of California, Los Angeles3, University of British Columbia4, Baylor College of Medicine5, Howard Hughes Medical Institute6, University of Washington7, Ludwig Institute for Cancer Research8, University of California, San Francisco9, University of Connecticut10, University of Zagreb11, University of Texas at Austin12, Washington University in St. Louis13, University of Queensland14, Harvard University15, Cold Spring Harbor Laboratory16, University of Southern California17, University of California, Santa Cruz18, Simon Fraser University19, Morgridge Institute for Research20, University of Texas at Dallas21, National Institutes of Health22
TL;DR: It is shown that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease.
Abstract: The reference human genome sequence set the stage for studies of genetic variation and its association with human disease, but epigenomic studies lack a similar reference. To address this need, the NIH Roadmap Epigenomics Consortium generated the largest collection so far of human epigenomes for primary cells and tissues. Here we describe the integrative analysis of 111 reference human epigenomes generated as part of the programme, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression. We establish global maps of regulatory elements, define regulatory modules of coordinated activity, and their likely activators and repressors. We show that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease. Our results demonstrate the central role of epigenomic information for understanding gene regulation, cellular differentiation and human disease.
5,037 citations
••
TL;DR: Application of GOseq to a prostate cancer data set shows that GOseq dramatically changes the results, highlighting categories more consistent with the known biology.
Abstract: We present GOseq, an application for performing Gene Ontology (GO) analysis on RNA-seq data. GO analysis is widely used to reduce complexity and highlight biological processes in genome-wide expression studies, but standard methods give biased results on RNA-seq data due to over-detection of differential expression for long and highly expressed transcripts. Application of GOseq to a prostate cancer data set shows that GOseq dramatically changes the results, highlighting categories more consistent with the known biology.
5,034 citations
••
TL;DR: REVIGO is a Web server that summarizes long, unintelligible lists of GO terms by finding a representative subset of the terms using a simple clustering algorithm that relies on semantic similarity measures.
Abstract: Outcomes of high-throughput biological experiments are typically interpreted by statistical testing for enriched gene functional categories defined by the Gene Ontology (GO). The resulting lists of GO terms may be large and highly redundant, and thus difficult to interpret.
REVIGO is a Web server that summarizes long, unintelligible lists of GO terms by finding a representative subset of the terms using a simple clustering algorithm that relies on semantic similarity measures. Furthermore, REVIGO visualizes this non-redundant GO term set in multiple ways to assist in interpretation: multidimensional scaling and graph-based visualizations accurately render the subdivisions and the semantic relationships in the data, while treemaps and tag clouds are also offered as alternative views. REVIGO is freely available at http://revigo.irb.hr/.
4,919 citations