scispace - formally typeset
Search or ask a question
Journal ArticleDOI

PANTHER version 7: improved phylogenetic trees, orthologs and collaboration with the Gene Ontology Consortium

01 Jan 2010-Nucleic Acids Research (Oxford University Press)-Vol. 38, pp 204-210
TL;DR: Protein Analysis THrough Evolutionary Relationships (PANTHER) is a comprehensive software system for inferring the functions of genes based on their evolutionary relationships, resulting in an increasing number of curated functional annotations.
Abstract: Protein Analysis THrough Evolutionary Relationships (PANTHER) is a comprehensive software system for inferring the functions of genes based on their evolutionary relationships. Phylogenetic trees of gene families form the basis for PANTHER and these trees are annotated with ontology terms describing the evolution of gene function from ancestral to modern day genes. One of the main applications of PANTHER is in accurate prediction of the functions of uncharacterized genes, based on their evolutionary relationships to genes with functions known from experiment. The PANTHER website, freely available at http://www.pantherdb.org, also includes software tools for analyzing genomic data relative to known and inferred gene functions. Since 2007, there have been several new developments to PANTHER: (i) improved phylogenetic trees, explicitly representing speciation and gene duplication events, (ii) identification of gene orthologs, including least diverged orthologs (best one-to-one pairs), (iii) coverage of more genomes (48 genomes, up to 87% of genes in each genome; see http://www.pantherdb.org/panther/summaryStats.jsp), (iv) improved support for alternative database identifiers for genes, proteins and microarray probes and (v) adoption of the SBGN standard for display of biological pathways. In addition, PANTHER trees are being annotated with gene function as part of the Gene Ontology Reference Genome project, resulting in an increasing number of curated functional annotations.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: This work has focused on minimizing search times and the ability to rapidly display tabular results, regardless of the number of matches found, developing graphical summaries of the search results to provide quick, intuitive appraisement of them.
Abstract: HMMER is a software suite for protein sequence similarity searches using probabilistic methods Previously, HMMER has mainly been available only as a computationally intensive UNIX command-line tool, restricting its use Recent advances in the software, HMMER3, have resulted in a 100-fold speed gain relative to previous versions It is now feasible to make efficient profile hidden Markov model (profile HMM) searches via the web A HMMER web server (http://hmmerjaneliaorg) has been designed and implemented such that most protein database searches return within a few seconds Methods are available for searching either a single protein sequence, multiple protein sequence alignment or profile HMM against a target sequence database, and for searching a protein sequence against Pfam The web server is designed to cater to a range of different user expertise and accepts batch uploading of multiple queries at once All search methods are also available as RESTful web services, thereby allowing them to be readily integrated as remotely executed tasks in locally scripted workflows We have focused on minimizing search times and the ability to rapidly display tabular results, regardless of the number of matches found, developing graphical summaries of the search results to provide quick, intuitive appraisement of them

4,159 citations

Journal ArticleDOI
TL;DR: This protocol provides a detailed description of how to analyze genome-wide experimental data with the PANTHER classification system, and redesigned the website interface to improve both user experience and the system's analytical capability.
Abstract: The PANTHER (protein annotation through evolutionary relationship) classification system (http://wwwpantherdborg/) is a comprehensive system that combines gene function, ontology, pathways and statistical analysis tools that enable biologists to analyze large-scale, genome-wide data from sequencing, proteomics or gene expression experiments The system is built with 82 complete genomes organized into gene families and subfamilies, and their evolutionary relationships are captured in phylogenetic trees, multiple sequence alignments and statistical models (hidden Markov models or HMMs) Genes are classified according to their function in several different ways: families and subfamilies are annotated with ontology terms (Gene Ontology (GO) and PANTHER protein class), and sequences are assigned to PANTHER pathways The PANTHER website includes a suite of tools that enable users to browse and query gene functions, and to analyze large-scale experimental data with a number of statistical tests It is widely used by bench scientists, bioinformaticians, computer scientists and systems biologists In the 2013 release of PANTHER (v80), in addition to an update of the data content, we redesigned the website interface to improve both user experience and the system's analytical capability This protocol provides a detailed description of how to analyze genome-wide experimental data with the PANTHER classification system

2,221 citations

Journal ArticleDOI
TL;DR: The current PANTHER process as a whole, as well as the website tools for analysis of user-uploaded data are described, which include stable database identifiers for inferred ancestral genes, which are used to associate inferred gene attributes with particular genes in the common ancestral genomes of extant species.
Abstract: The data and tools in PANTHER—a comprehensive, curated database of protein families, trees, subfamilies and functions available at http://pantherdb.org—have undergone continual, extensive improvement for over a decade. Here, we describe the current PANTHER process as a whole, as well as the website tools for analysis of user-uploaded data. The main goals of PANTHER remain essentially unchanged: the accurate inference (and practical application) of gene and protein function over large sequence databases, using phylogenetic trees to extrapolate from the relatively sparse experimental information from a few model organisms. Yet the focus of PANTHER has continually shifted toward more accurate and detailed representations of evolutionary events in gene family histories. The trees are now designed to represent gene family evolution, including inference of evolutionary events, such as speciation and gene duplication. Subfamilies are still curated and used to define HMMs, but gene ontology functional annotations can now be made at any node in the tree, and are designed to represent gain and loss of function by ancestral genes during evolution. Finally, PANTHER now includes stable database identifiers for inferred ancestral genes, which are used to associate inferred gene attributes with particular genes in the common ancestral genomes of extant species.

1,627 citations

Journal ArticleDOI
TL;DR: A new web site with improved tools for pathway browsing and data analysis is developed, and orthology-based inferences of pathways in non-human species are made, applying Ensembl Compara to identify orthologs of curated human proteins in each of 20 other species.
Abstract: Reactome (http://www.reactome.org) is a collaboration among groups at the Ontario Institute for Cancer Research, Cold Spring Harbor Laboratory, New York University School of Medicine and The European Bioinformatics Institute, to develop an open source curated bioinformatics database of human pathways and reactions. Recently, we developed a new web site with improved tools for pathway browsing and data analysis. The Pathway Browser is an Systems Biology Graphical Notation (SBGN)-based visualization system that supports zooming, scrolling and event highlighting. It exploits PSIQUIC web services to overlay our curated pathways with molecular interaction data from the Reactome Functional Interaction Network and external interaction databases such as IntAct, BioGRID, ChEMBL, iRefIndex, MINT and STRING. Our Pathway and Expression Analysis tools enable ID mapping, pathway assignment and overrepresentation analysis of user-supplied data sets. To support pathway annotation and analysis in other species, we continue to make orthology-based inferences of pathways in non-human species, applying Ensembl Compara to identify orthologs of curated human proteins in each of 20 other species. The resulting inferred pathway sets can be browsed and analyzed with our Species Comparison tool. Collaborations are also underway to create manually curated data sets on the Reactome framework for chicken, Drosophila and rice.

1,460 citations

Journal ArticleDOI
Carl A. Anderson1, Gabrielle Boucher2, Charlie W. Lees3, Andre Franke4, Mauro D'Amato5, Kent D. Taylor6, James Lee7, Philippe Goyette2, Marcin Imielinski8, Anna Latiano9, Caroline Lagacé2, Regan Scott10, Leila Amininejad11, Suzannah Bumpstead1, Leonard Baidoo10, Robert N. Baldassano8, Murray L. Barclay12, Theodore M. Bayless13, Stephan Brand14, Carsten Büning15, Jean-Frederic Colombel16, Lee A. Denson17, Martine De Vos18, Marla Dubinsky6, Cathryn Edwards19, David Ellinghaus4, Rudolf S N Fehrmann20, James A B Floyd1, Timothy H. Florin21, Denis Franchimont11, Lude Franke20, Michel Georges22, Jürgen Glas14, Nicole L. Glazer23, Stephen L. Guthery24, Talin Haritunians6, Nicholas K. Hayward25, Jean-Pierre Hugot26, Gilles Jobin2, Debby Laukens18, Ian C. Lawrance27, Marc Lémann26, Arie Levine28, Cécile Libioulle22, Edouard Louis22, Dermot P.B. McGovern6, Monica Milla, Grant W. Montgomery25, Katherine I. Morley1, Craig Mowat29, Aylwin Ng30, William G. Newman31, Roel A. Ophoff32, Laura Papi33, Orazio Palmieri9, Laurent Peyrin-Biroulet, Julián Panés, Anne M. Phillips29, Natalie J. Prescott34, Deborah D. Proctor35, Rebecca L. Roberts12, Richard K Russell36, Paul Rutgeerts37, Jeremy D. Sanderson38, Miquel Sans39, Philip Schumm40, Frank Seibold41, Yashoda Sharma35, Lisa A. Simms25, Mark Seielstad42, Mark Seielstad43, A. Hillary Steinhart44, Stephan R. Targan6, Leonard H. van den Berg32, Morten H. Vatn45, Hein W. Verspaget46, Thomas D. Walters44, Cisca Wijmenga20, David C. Wilson3, Harm-Jan Westra20, Ramnik J. Xavier30, Zhen Zhen Zhao25, Cyriel Y. Ponsioen47, Vibeke Andersen48, Leif Törkvist5, Maria Gazouli49, Nicholas P. Anagnou49, Tom H. Karlsen45, Limas Kupčinskas50, Jurgita Sventoraityte50, John C. Mansfield51, Subra Kugathasan52, Mark S. Silverberg44, Jonas Halfvarson53, Jerome I. Rotter6, Christopher G. Mathew34, Anne M. Griffiths44, Richard B. Gearry12, Tariq Ahmad, Steven R. Brant13, Mathias Chamaillard54, Jack Satsangi3, Judy H. Cho35, Stefan Schreiber4, Mark J. Daly30, Jeffrey C. Barrett1, Miles Parkes7, Vito Annese9, Hakon Hakonarson55, Graham L. Radford-Smith25, Richard H. Duerr10, Severine Vermeire37, Rinse K. Weersma20, John D. Rioux2 
Wellcome Trust Sanger Institute1, Université de Montréal2, University of Edinburgh3, University of Kiel4, Karolinska Institutet5, Cedars-Sinai Medical Center6, University of Cambridge7, University of Pennsylvania8, Casa Sollievo della Sofferenza9, University of Pittsburgh10, Université libre de Bruxelles11, University of Otago12, Johns Hopkins University13, Ludwig Maximilian University of Munich14, Charité15, Lille University of Science and Technology16, Cincinnati Children's Hospital Medical Center17, Ghent University18, Torbay Hospital19, University of Groningen20, Mater Health Services21, University of Liège22, University of Washington23, University of Utah24, QIMR Berghofer Medical Research Institute25, University of Paris26, University of Western Australia27, Tel Aviv University28, University of Dundee29, Harvard University30, University of Manchester31, Utrecht University32, University of Florence33, King's College London34, Yale University35, Royal Hospital for Sick Children36, Katholieke Universiteit Leuven37, Guy's and St Thomas' NHS Foundation Trust38, University of Barcelona39, University of Chicago40, University of Bern41, Agency for Science, Technology and Research42, University of California, San Francisco43, University of Toronto44, University of Oslo45, Leiden University46, University of Amsterdam47, Aarhus University48, National and Kapodistrian University of Athens49, Lithuanian University of Health Sciences50, Newcastle University51, Emory University52, Örebro University53, French Institute of Health and Medical Research54, Center for Applied Genomics55
TL;DR: A meta-analysis of six ulcerative colitis genome-wide association study datasets found many candidate genes that provide potentially important insights into disease pathogenesis, including IL1R2, IL8RA-IL8RB, IL7R, IL12B, DAP, PRDM1, JAK2, IRF5, GNA12 and LSP1.
Abstract: Genome-wide association studies and candidate gene studies in ulcerative colitis have identified 18 susceptibility loci. We conducted a meta-analysis of six ulcerative colitis genome-wide association study datasets, comprising 6,687 cases and 19,718 controls, and followed up the top association signals in 9,628 cases and 12,917 controls. We identified 29 additional risk loci (P < 5 × 10(-8)), increasing the number of ulcerative colitis-associated loci to 47. After annotating associated regions using GRAIL, expression quantitative trait loci data and correlations with non-synonymous SNPs, we identified many candidate genes that provide potentially important insights into disease pathogenesis, including IL1R2, IL8RA-IL8RB, IL7R, IL12B, DAP, PRDM1, JAK2, IRF5, GNA12 and LSP1. The total number of confirmed inflammatory bowel disease risk loci is now 99, including a minimum of 28 shared association signals between Crohn's disease and ulcerative colitis.

1,291 citations

References
More filters
Journal ArticleDOI
TL;DR: The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.
Abstract: Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.

35,225 citations


"PANTHER version 7: improved phyloge..." refers background in this paper

  • ...Gene function—or, more commonly, the function of gene products such as proteins—is described using terms from the Gene ontology (GO) (3,4), or from representations of molecular pathways....

    [...]

Journal ArticleDOI
TL;DR: During 2004, tens of thousands of Knowledgebase records got manually annotated or updated; the UniProt keyword list got augmented by additional keywords; the documentation of the keywords and are continuously overhauling and standardizing the annotation of post-translational modifications.
Abstract: The Universal Protein Resource (UniProt) provides the scientific community with a single, centralized, authoritative resource for protein sequences and functional information. Formed by uniting the Swiss-Prot, TrEMBL and PIR protein database activities, the UniProt consortium produces three layers of protein sequence databases: the UniProt Archive (UniParc), the UniProt Knowledgebase (UniProt) and the UniProt Reference (UniRef) databases. The UniProt Knowledgebase is a comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase with extensive cross-references. This centrepiece consists of two sections: UniProt/Swiss-Prot, with fully, manually curated entries; and UniProt/TrEMBL, enriched with automated classification and annotation. During 2004, tens of thousands of Knowledgebase records got manually annotated or updated; we introduced a new comment line topic: TOXIC DOSE to store information on the acute toxicity of a toxin; the UniProt keyword list got augmented by additional keywords; we improved the documentation of the keywords and are continuously overhauling and standardizing the annotation of post-translational modifications. Furthermore, we introduced a new documentation file of the strains and their synonyms. Many new database cross-references were introduced and we started to make use of Digital Object Identifiers. We also achieved in collaboration with the Macromolecular Structure Database group at EBI an improved integration with structural databases by residue level mapping of sequences from the Protein Data Bank entries onto corresponding UniProt entries. For convenient sequence searches we provide the UniRef non-redundant sequence databases. The comprehensive UniParc database stores the complete body of publicly available protein sequence data. The UniProt databases can be accessed online (http://www.uniprot.org) or downloaded in several formats (ftp://ftp.uniprot.org/pub). New releases are published every two weeks.

4,074 citations


"PANTHER version 7: improved phyloge..." refers background or methods in this paper

  • ...Arabidopsis thaliana ARATH TAIR (11) Dicot plant Caenorhabditis elegans CAEEL WormBase (12) Nematode worm Danio rerio DANRE Ensembl, ZFIN (13) Zebrafish Dictyostelium discoideum DICDI DictyBase (14) Cellular slime mold Drosophila melanogaster DROME FlyBase (15) Fruit fly Escherichia coli ECOLI EcoCyc (16) Bacterium Gallus gallus CHICK Entrez Gene (17) Chicken Homo sapiens HUMAN SwissProt (18) Human Mus musculus MOUSE MGI (19) Mouse Rattus norvegicus RAT RGD (20) Rat Saccharomyces cerevisiae YEAST SGD (21) Budding yeast Schizosaccharomyces pombe SCHPO GeneDB (22) Fission yeast Other chordate genomes Ensembl (23) Other non-chordate genomes Entrez Gene (17) Nucleic Acids Research, 2010, Vol....

    [...]

  • ...In PANTHER 7, we now also support identifiers from Ensembl (23), model organism databases, the International Protein Index (IPI) (25) and UniProt (18)....

    [...]

  • ...All of these identifiers are obtained through the mapping files provided by UniProt (ftp://ftp.uniprot.org/pub/databases/uniprot/ current_release/knowledgebase/idmapping/)....

    [...]

Journal ArticleDOI
TL;DR: The initial version of the MAFFT program was developed in 2002 and was updated in 2007 with two new techniques: the PartTree algorithm and the Four-way consistency objective function, which improved the scalability of progressive alignment and the accuracy of ncRNA alignment.
Abstract: The accuracy and scalability of multiple sequence alignment (MSA) of DNAs and proteins have long been and are still important issues in bioinformatics. To rapidly construct a reasonable MSA, we developed the initial version of the MAFFT program in 2002. MSA software is now facing greater challenges in both scalability and accuracy than those of 5 years ago. As increasing amounts of sequence data are being generated by large-scale sequencing projects, scalability is now critical in many situations. The requirement of accuracy has also entered a new stage since the discovery of functional noncoding RNAs (ncRNAs); the secondary structure should be considered for constructing a high-quality alignment of distantly related ncRNAs. To deal with these problems, in 2007, we updated MAFFT to Version 6 with two new techniques: the PartTree algorithm and the Four-way consistency objective function. The former improved the scalability of progressive alignment and the latter improved the accuracy of ncRNA alignment. We review these and other techniques that MAFFTuses and suggest possible future directions of MSA software as a basis of comparative analyses. MAFFT is available at http://align.bmr.kyushu-u.ac.jp/mafft/software/.

3,278 citations


"PANTHER version 7: improved phyloge..." refers methods in this paper

  • ...A multiple sequence alignment was constructed for each family using the MAFFT program (6) and a phylogenetic tree was estimated from the protein multiple alignment....

    [...]

Journal ArticleDOI
TL;DR: The PANTHER/X ontology is used to give a high-level representation of gene function across the human and mouse genomes, and the family HMMs are used to rank missense single nucleotide polymorphisms (SNPs) according to their likelihood of affecting protein function.
Abstract: In the genomic era, one of the fundamental goals is to characterize the function of proteins on a large scale. We describe a method, PANTHER, for relating protein sequence relationships to function relationships in a robust and accurate way. PANTHER is composed of two main components: the PANTHER library (PANTHER/LIB) and the PANTHER index (PANTHER/X). PANTHER/LIB is a collection of "books," each representing a protein family as a multiple sequence alignment, a Hidden Markov Model (HMM), and a family tree. Functional divergence within the family is represented by dividing the tree into subtrees based on shared function, and by subtree HMMs. PANTHER/X is an abbreviated ontology for summarizing and navigating molecular functions and biological processes associated with the families and subfamilies. We apply PANTHER to three areas of active research. First, we report the size and sequence diversity of the families and subfamilies, characterizing the relationship between sequence divergence and functional divergence across a wide range of protein families. Second, we use the PANTHER/X ontology to give a high-level representation of gene function across the human and mouse genomes. Third, we use the family HMMs to rank missense single nucleotide polymorphisms (SNPs), on a database-wide scale, according to their likelihood of affecting protein function.

2,857 citations


"PANTHER version 7: improved phyloge..." refers background or methods in this paper

  • ...relevant sequences in the MAFFT alignment, trimmed it to include as match states only those columns aligned by 30% of the sequences in the subalignment [sequences were weighted using the same technique as in (1)], and used it to construct an initial model using the modelfromalign program in SAM3....

    [...]

  • ...As a result, in PANTHER 7, all molecular function, biological process and cellular component terms are exclusively GO terms [previous versions of PANTHER used the PANTHER/X ontology (1), though a mapping file to GO was provided]....

    [...]

  • ...PANTHER (Protein ANalysis THrough Evolutionary Relationships) is a database of phylogenetic trees of protein-coding gene families from all kingdoms of life (1)....

    [...]

Journal ArticleDOI
TL;DR: Entrez Gene is a step forward from NCBI's LocusLink, with both a major increase in taxonomic scope and improved access through the many tools associated with NCBI Entrez.
Abstract: Entrez Gene (www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene) is NCBI's database for gene-specific information. Entrez Gene includes records from genomes that have been completely sequenced, that have an active research community to contribute gene-specific information or that are scheduled for intense sequence analysis. The content of Entrez Gene represents the result of both curation and automated integration of data from NCBI's Reference Sequence project (RefSeq), from collaborating model organism databases and from other databases within NCBI. Records in Entrez Gene are assigned unique, stable and tracked integers as identifiers. The content (nomenclature, map location, gene products and their attributes, markers, phenotypes and links to citations, sequences, variation details, maps, expression, homologs, protein domains and external databases) is provided via interactive browsing through NCBI's Entrez system, via NCBI's Entrez programing utilities (E-Utilities), and for bulk transfer by ftp.

2,158 citations


"PANTHER version 7: improved phyloge..." refers background in this paper

  • ...Arabidopsis thaliana ARATH TAIR (11) Dicot plant Caenorhabditis elegans CAEEL WormBase (12) Nematode worm Danio rerio DANRE Ensembl, ZFIN (13) Zebrafish Dictyostelium discoideum DICDI DictyBase (14) Cellular slime mold Drosophila melanogaster DROME FlyBase (15) Fruit fly Escherichia coli ECOLI EcoCyc (16) Bacterium Gallus gallus CHICK Entrez Gene (17) Chicken Homo sapiens HUMAN SwissProt (18) Human Mus musculus MOUSE MGI (19) Mouse Rattus norvegicus RAT RGD (20) Rat Saccharomyces cerevisiae YEAST SGD (21) Budding yeast Schizosaccharomyces pombe SCHPO GeneDB (22) Fission yeast Other chordate genomes Ensembl (23) Other non-chordate genomes Entrez Gene (17) Nucleic Acids Research, 2010, Vol....

    [...]

  • ...Previously, for genes only identifiers from NCBI Entrez Gene (17) or FlyBase (15) were supported; for proteins only RefSeq (24) or FlyBase identifiers....

    [...]

  • ...Previously, for genes only identifiers from NCBI Entrez Gene (17) or Figure 3....

    [...]