scispace - formally typeset
Search or ask a question

Showing papers by "Christopher J. Mungall published in 2013"


Journal ArticleDOI
Judith A. Blake, Mary E. Dolan, H. Drabkin, David P. Hill, Li N, D. Sitnikov, Susan M. Bridges1, Shane C. Burgess1, Teresia Buza1, Fiona M. McCarthy1, Divyaswetha Peddinti1, Lakshmi Pillai1, Seth Carbon2, Heiko Dietze2, Amelia Ireland2, Suzanna E. Lewis2, Christopher J. Mungall2, Pascale Gaudet3, Chrisholm Rl3, Petra Fey3, Warren A. Kibbe3, S. Basu3, Deborah A. Siegele4, B. K. McIntosh4, Daniel P. Renfro4, Adrienne E. Zweifel4, James C. Hu4, Nicholas H. Brown5, Susan Tweedie5, Yasmin Alam-Faruque6, Rolf Apweiler6, A. Auchinchloss6, Kristian B. Axelsen6, Benoit Bely6, M. C. Blatter6, Bonilla C6, Bouguerleret L6, Emmanuel Boutet6, Lionel Breuza6, Alan Bridge6, W. M. Chan6, Gayatri Chavali6, Elisabeth Coudert6, E. Dimmer6, Anne Estreicher6, L Famiglietti6, Marc Feuermann6, Arnaud Gos6, Nadine Gruaz-Gumowski6, Hieta R6, Hinz C6, Chantal Hulo6, Rachael P. Huntley6, J. James6, Florence Jungo6, Guillaume Keller6, Kati Laiho6, Duncan Legge6, P. Lemercier6, Damien Lieberherr6, Michele Magrane6, Maria Jesus Martin6, Patrick Masson6, Mutowo-Muellenet P6, Claire O'Donovan6, Ivo Pedruzzi6, Klemens Pichler6, Diego Poggioli6, Porras Millán P6, Sylvain Poux6, Catherine Rivoire6, Bernd Roechert6, Tony Sawford6, Michel Schneider6, Andre Stutz6, Shyamala Sundaram6, Michael Tognolli6, Ioannis Xenarios6, Foulgar R, Jane Lomax, Paola Roncaglia, Varsha K. Khodiyar7, Ruth C. Lovering7, Philippa J. Talmud7, Marcus C. Chibucos8, Giglio Mg9, Hsin-Yu Chang9, Sarah Hunter9, Craig McAnulla9, Alex L. Mitchell9, Sangrador A9, Stephan R, Midori A. Harris5, Stephen G. Oliver5, Kim Rutherford5, Wood7, Jürg Bähler7, Antonia Lock7, Paul J. Kersey9, McDowall Dm9, Daniel M. Staines9, Melinda R. Dwinell10, Mary Shimoyama10, Stan Laulederkind10, Tom Hayman10, Shur-Jen Wang10, Timothy F. Lowry10, P D'Eustachio11, Lisa Matthews11, Rama Balakrishnan12, Gail Binkley12, J. M. Cherry12, Maria C. Costanzo12, Selina S. Dwight12, Engel12, Dianna G. Fisk12, Benjamin C. Hitz12, Eurie L. Hong12, Kalpana Karra12, Miyasato12, Robert S. Nash12, Julie Park12, Marek S. Skrzypek12, Shuai Weng12, Edith D. Wong12, Tanya Z. Berardini13, Eva Huala13, Huaiyu Mi14, Paul Thomas14, Juancarlos Chan15, Ranjana Kishore15, Paul W. Sternberg15, Van Auken K15, Doug Howe16, Monte Westerfield16 
TL;DR: The Gene Ontology (GO) Consortium is a community-based bioinformatics resource that classifies gene product function through the use of structured, controlled vocabularies and has been expanded not only to cover new areas of biology through focused interaction with experts, but also to capture greater specificity in all areas of the ontology.
Abstract: The Gene Ontology (GO) Consortium (GOC, http://www.geneontology.org) is a community-based bioinformatics resource that classifies gene product function through the use of structured, controlled vocabularies. Over the past year, the GOC has implemented several processes to increase the quantity, quality and specificity of GO annotations. First, the number of manual, literature-based annotations has grown at an increasing rate. Second, as a result of a new 'phylogenetic annotation' process, manually reviewed, homology-based annotations are becoming available for a broad range of species. Third, the quality of GO annotations has been improved through a streamlined process for, and automated quality checks of, GO annotations deposited by different annotation groups. Fourth, the consistency and correctness of the ontology itself has increased by using automated reasoning tools. Finally, the GO has been expanded not only to cover new areas of biology through focused interaction with experts, but also to capture greater specificity in all areas of the ontology using tools for adding new combinatorial terms. The GOC works closely with other ontology developers to support integrated use of terminologies. The GOC supports its user community through the use of e-mail lists, social media and web-based resources.

492 citations


Journal ArticleDOI
TL;DR: This paper summarises ENVO’s motivation, content, structure, adoption, and governance approach.
Abstract: As biological and biomedical research increasingly reference the environmental context of the biological entities under study, the need for formalisation and standardisation of environment descriptors is growing. The Environment Ontology (ENVO; http://www.environmentontology.org) is a community-led, open project which seeks to provide an ontology for specifying a wide range of environments relevant to multiple life science disciplines and, through an open participation model, to accommodate the terminological requirements of all those needing to annotate data using ontology classes. This paper summarises ENVO’s motivation, content, structure, adoption, and governance approach. The ontology is available from http://purl.obolibrary.org/obo/envo.owl - an OBO format version is also available by switching the file suffix to “obo”.

274 citations


Journal ArticleDOI
TL;DR: This paper focuses on the plant anatomical entity branch of the Plant Ontology, describing the organizing principles, resources available to users and examples of how the PO is integrated into other plant genomics databases and web portals.
Abstract: The Plant Ontology (PO; http://www.plantontology.org/) is a publicly available, collaborative effort to develop and maintain a controlled, structured vocabulary (‘ontology’) of terms to describe plant anatomy, morphology and the stages of plant development. The goals of the PO are to link (annotate) gene expression and phenotype data to plant structures and stages of plant development, using the data model adopted by the Gene Ontology. From its original design covering only rice, maize and Arabidopsis, the scope of the PO has been expanded to include all green plants. The PO was the first multispecies anatomy ontology developed for the annotation of genes and phenotypes. Also, to our knowledge, it was one of the first biological ontologies that provides translations (via synonyms) in non-English languages such as Japanese and Spanish. As of Release #18 (July 2012), there are about 2.2 million annotations linking PO terms to >110,000 unique data objects representing genes or gene models, proteins, RNAs, germplasm and quantitative trait loci (QTLs) from 22 plant species. In this paper, we focus on the plant anatomical entity branch of the PO, describing the organizing principles, resources available to users and examples of how the PO is integrated into other plant genomics databases and web portals. We also provide two examples of comparative analyses, demonstrating how the ontology structure and PO-annotated data can be used to discover the patterns of expression of the LEAFY (LFY) and terpene synthase (TPS) gene homologs.

161 citations


Journal ArticleDOI
01 Jan 2013-Database
TL;DR: Phenotype comparisons for DIsease Genes and Models (PhenoDigm) is proposed, as an automated method to provide evidence about gene–disease associations by analysing phenotype information, and results of an automated evaluation as well as selected manually assessed examples that support the validity of Pheno Digm are shown.
Abstract: The ultimate goal of studying model organisms is to translate what is learned into useful knowledge about normal human biology and disease to facilitate treatment and early screening for diseases. Recent advances in genomic technologies allow for rapid generation of models with a range of targeted genotypes as well as their characterization by high-throughput phenotyping. As an abundance of phenotype data become available, only systematic analysis will facilitate valid conclusions to be drawn from these data and transferred to human diseases. Owing to the volume of data, automated methods are preferable, allowing for a reliable analysis of the data and providing evidence about possible gene-disease associations. Here, we propose Phenotype comparisons for DIsease Genes and Models (PhenoDigm), as an automated method to provide evidence about gene-disease associations by analysing phenotype information. PhenoDigm integrates data from a variety of model organisms and, at the same time, uses several intermediate scoring methods to identify only strongly data-supported gene candidates for human genetic diseases. We show results of an automated evaluation as well as selected manually assessed examples that support the validity of PhenoDigm. Furthermore, we provide guidance on how to browse the data with PhenoDigm's web interface and illustrate its usefulness in supporting research. Database URL: http://www.sanger.ac.uk/resources/databases/phenodigm

129 citations


Journal ArticleDOI
TL;DR: A cross-species phenotype ontology for human, mouse and zebrafish that contains classes from the Human Phenotype Ontology, Mammalian PhenotypeOntology, and generated classes for zebra fish phenotypes is generated.
Abstract: Phenotype analyses, e.g. investigating metabolic processes, tissue formation, or organism behavior, are an important element of most biological and medical research activities. Biomedical researchers are making increased use of ontological standards and methods to capture the results of such analyses, with one focus being the comparison and analysis of phenotype information between species. We have generated a cross-species phenotype ontology for human, mouse and zebrafish that contains classes from the Human Phenotype Ontology, Mammalian Phenotype Ontology, and generated classes for zebrafish phenotypes. We also provide up-to-date annotation data connecting human genes to phenotype classes from the generated ontology. We have included the data generation pipeline into our continuous integration system ensuring stable and up-to-date releases. This article describes the data generation process and is intended to help interested researchers access both the phenotype annotation data and the associated cross-species phenotype ontology. The resource described here can be used in sophisticated semantic similarity and gene set enrichment analyses for phenotype data across species. The stable releases of this resource can be obtained from http://purl.obolibrary.org/obo/hp/uberpheno/ .

78 citations


Journal ArticleDOI
TL;DR: A collaborative effort between the GO and the Chemical Entities of Biological Interest (ChEBI) ontology developers to ensure that the representation of chemicals in the GO is both internally consistent and in alignment with the chemical expertise captured in ChEBI is described.
Abstract: The Gene Ontology (GO) facilitates the description of the action of gene products in a biological context. Many GO terms refer to chemical entities that participate in biological processes. To facilitate accurate and consistent systems-wide biological representation, it is necessary to integrate the chemical view of these entities with the biological view of GO functions and processes. We describe a collaborative effort between the GO and the Chemical Entities of Biological Interest (ChEBI) ontology developers to ensure that the representation of chemicals in the GO is both internally consistent and in alignment with the chemical expertise captured in ChEBI. We have examined and integrated the ChEBI structural hierarchy into the GO resource through computationally-assisted manual curation of both GO and ChEBI. Our work has resulted in the creation of computable definitions of GO terms that contain fully defined semantic relationships to corresponding chemical terms in ChEBI. The set of logical definitions using both the GO and ChEBI has already been used to automate aspects of GO development and has the potential to allow the integration of data across the domains of biology and chemistry. These logical definitions are available as an extended version of the ontology from http://purl.obolibrary.org/obo/go/extensions/go-plus.owl .

61 citations


Journal ArticleDOI
TL;DR: An ontology-based approach to identify similarities between human disease manifestations and the mutational phenotypes in characterized model organism genes is presented, and a striking, statistically significant tendency for individual disease phenotypes to be associated with multiple genes located within a single CNV region is observed, a phenomenon that is termed pheno-clustering.
Abstract: Numerous disease syndromes are associated with regions of copy number variation (CNV) in the human genome and, in most cases, the pathogenicity of the CNV is thought to be related to altered dosage of the genes contained within the affected segment. However, establishing the contribution of individual genes to the overall pathogenicity of CNV syndromes is difficult and often relies on the identification of potential candidates through manual searches of the literature and online resources. We describe here the development of a computational framework to comprehensively search phenotypic information from model organisms and single-gene human hereditary disorders, and thus speed the interpretation of the complex phenotypes of CNV disorders. There are currently more than 5000 human genes about which nothing is known phenotypically but for which detailed phenotypic information for the mouse and/or zebrafish orthologs is available. Here, we present an ontology-based approach to identify similarities between human disease manifestations and the mutational phenotypes in characterized model organism genes; this approach can therefore be used even in cases where there is little or no information about the function of the human genes. We applied this algorithm to detect candidate genes for 27 recurrent CNV disorders and identified 802 gene-phenotype associations, approximately half of which involved genes that were previously reported to be associated with individual phenotypic features and half of which were novel candidates. A total of 431 associations were made solely on the basis of model organism phenotype data. Additionally, we observed a striking, statistically significant tendency for individual disease phenotypes to be associated with multiple genes located within a single CNV region, a phenomenon that we denote as pheno-clustering. Many of the clusters also display statistically significant similarities in protein function or vicinity within the protein-protein interaction network. Our results provide a basis for understanding previously un-interpretable genotype-phenotype correlations in pathogenic CNVs and for mobilizing the large amount of model organism phenotype data to provide insights into human genetic disorders.

51 citations


Journal ArticleDOI
TL;DR: An overview of the GO-CCO, its overall design, and some recent extensions that make use of additional spatial information are provided.
Abstract: Background: The Gene Ontology (GO) (http://www.geneontology.org/) contains a set of terms for describing the activity and actions of gene products across all kingdoms of life. Each of these activities is executed in a location within a cell or in the vicinity of a cell. In order to capture this context, the GO includes a sub-ontology called the Cellular Component (CC) ontology (GO-CCO). The primary use of this ontology is for GO annotation, but it has also been used for phenotype annotation, and for the annotation of images. Another ontology with similar scope to the GO-CCO is the Subcellular Anatomy Ontology (SAO), part of the Neuroscience Information Framework Standard (NIFSTD) suite of ontologies. The SAO also covers cell components, but in the domain of neuroscience. Description: Recently, the GO-CCO was enriched in content and links to the Biological Process and Molecular Function branches of GO as well as to other ontologies. This was achieved in several ways. We carried out an amalgamation of SAO terms with GO-CCO ones; as a result, nearly 100 new neuroscience-related terms were added to the GO. The GO-CCO also contains relationships to GO Biological Process and Molecular Function terms, as well as connecting to external ontologies such as the Cell Ontology (CL). Terms representing protein complexes in the Protein Ontology (PRO) reference GO-CCO terms for their species-generic counterparts. GO-CCO terms can also be used to search a variety of databases. Conclusions: In this publication we provide an overview of the GO-CCO, its overall design, and some recent extensions that make use of additional spatial information. One of the most recent developments of the GO-CCO was the merging in of the SAO, resulting in a single unified ontology designed to serve the needs of GO annotators as well as the specific needs of the neuroscience community.

48 citations


Journal ArticleDOI
TL;DR: An ‘Ontologically BAsed Molecular Signature’ (OBAMS) method that identifies novel cellular biomarkers and infers biological functions as characteristics of particular cell types and shows the utility of incorporating structured ontological knowledge into biological data analysis.
Abstract: New technologies are focusing on characterizing cell types to better understand their heterogeneity. With large volumes of cellular data being generated, innovative methods are needed to structure the resulting data analyses. Here, we describe an ‘Ontologically BAsed Molecular Signature’ (OBAMS) method that identifies novel cellular biomarkers and infers biological functions as characteristics of particular cell types. This method finds molecular signatures for immune cell types based on mapping biological samples to the Cell Ontology (CL) and navigating the space of all possible pairwise comparisons between cell types to find genes whose expression is core to a particular cell type’s identity. We illustrate this ontological approach by evaluating expression data available from the Immunological Genome project (IGP) to identify unique biomarkers of mature B cell subtypes. We find that using OBAMS, candidate biomarkers can be identified at every strata of cellular identity from broad classifications to very granular. Furthermore, we show that Gene Ontology can be used to cluster cell types by shared biological processes in order to find candidate genes responsible for somatic hypermutation in germinal center B cells. Moreover, through in silico experiments based on this approach, we have identified genes sets that represent genes overexpressed in germinal center B cells and identify genes uniquely expressed in these B cells compared to other B cell types. This work demonstrates the utility of incorporating structured ontological knowledge into biological data analysis - providing a new method for defining novel biomarkers and providing an opportunity for new biological insights.

13 citations


Journal ArticleDOI
TL;DR: A proof of concept suggests that expressing complex phenotypes using formal ontologies provides considerable benefit for comparing phenotypes across scales and species.
Abstract: Neurodegenerative diseases present a wide and complex range of biological and clinical features. Animal models are key to translational research, yet typically only exhibit a subset of disease features rather than being precise replicas of the disease. Consequently, connecting animal to human conditions using direct data-mining strategies has proven challenging, particularly for diseases of the nervous system, with its complicated anatomy and physiology. To address this challenge we have explored the use of ontologies to create formal descriptions of structural phenotypes across scales that are machine processable and amenable to logical inference. As proof of concept, we built a Neurodegenerative Disease Phenotype Ontology (NDPO) and an associated Phenotype Knowledge Base (PKB) using an entity-quality model that incorporates descriptions for both human disease phenotypes and those of animal models. Entities are drawn from community ontologies made available through the Neuroscience Information Framework (NIF) and qualities are drawn from the Phenotype and Trait Ontology (PATO). We generated ~1200 structured phenotype statements describing structural alterations at the subcellular, cellular and gross anatomical levels observed in 11 human neurodegenerative conditions and associated animal models. PhenoSim, an open source tool for comparing phenotypes, was used to issue a series of competency questions to compare individual phenotypes among organisms and to determine which animal models recapitulate phenotypic aspects of the human disease in aggregate. Overall, the system was able to use relationships within the ontology to bridge phenotypes across scales, returning non-trivial matches based on common subsumers that were meaningful to a neuroscientist with an advanced knowledge of neuroanatomy. The system can be used both to compare individual phenotypes and also phenotypes in aggregate. This proof of concept suggests that expressing complex phenotypes using formal ontologies provides considerable benefit for comparing phenotypes across scales and species.

9 citations


01 Jan 2013
TL;DR: The genotype model developed is based on decomposing the different types of information represented in a genotype, is interoperable with existing OBO Foundry ontologies, and utilizes modeling from orthogonal ontologies to describe a broad range of attributes of these sequences.
Abstract: Exploration of the mechanistic basis of biology and disease has long leveraged the concept of a genotype, which represents the genetic composition associated with a physical trait Translational research efforts rely increasingly on the ability to integrate genotype-phenotype data across systems and organism communities, but are hindered by the lack of a shared, computable model of the information coded into genotype representations Here, we present the efforts of the Monarch Initiative to build GENO, an ontological model of genotype information The Monarch Initiative is a collaborative effort to integrate data from diverse resources to leverage model systems for disease research based on their phenotypes The genotype model we have developed is based on decomposing the different types of information represented in a genotype, is interoperable with existing OBO Foundry ontologies, and utilizes modeling from orthogonal ontologies to describe a broad range of attributes of these sequences We describe the features and utility of such an approach toward the integration of diverse genotype data with a broad spectrum of related biomedical data

01 Jan 2013
TL;DR: The Ontology Pre Processor Language (OPPL) as discussed by the authors is an OWL-based language for automating the changes to be performed in an ontology, which can be used for enriching, modifying and querying biomedical ontologies.
Abstract: Background: Biomedical ontologies are key elements for building up the Life Sciences Semantic Web. Reusing and building biomedical ontologies requires flexible and versatile tools to manipulate them efficiently, in particular for enriching their axiomatic content. The Ontology Pre Processor Language (OPPL) is an OWL-based language for automating the changes to be performed in an ontology. OPPL augments the ontologists’ toolbox by providing a more efficient, and less error-prone, mechanism for enriching a biomedical ontology than that obtained by a manual treatment. Results: We present OPPL-Galaxy, a wrapper for using OPPL within Galaxy. The functionality delivered by OPPL (i.e. automated ontology manipulation) can be combined with the tools and workflows devised within the Galaxy framework, resulting in an enhancement of OPPL. Use cases are provided in order to demonstrate OPPL-Galaxy’s capability for enriching, modifying and querying biomedical ontologies. Conclusions: Coupling OPPL-Galaxy with other bioinformatics tools of the Galaxy framework results in a system that is more than the sum of its parts. OPPL-Galaxy opens a new dimension of analyses and exploitation of biomedical ontologies, including automated reasoning, paving the way towards advanced biological data analyses.

Journal ArticleDOI
TL;DR: Coupling OPPL-Galaxy with other bioinformatics tools of the Galaxy framework results in a system that is more than the sum of its parts, paving the way towards advanced biological data analyses.
Abstract: Biomedical ontologies are key elements for building up the Life Sciences Semantic Web. Reusing and building biomedical ontologies requires flexible and versatile tools to manipulate them efficiently, in particular for enriching their axiomatic content. The Ontology Pre Processor Language (OPPL) is an OWL-based language for automating the changes to be performed in an ontology. OPPL augments the ontologists’ toolbox by providing a more efficient, and less error-prone, mechanism for enriching a biomedical ontology than that obtained by a manual treatment. We present OPPL-Galaxy, a wrapper for using OPPL within Galaxy. The functionality delivered by OPPL (i.e. automated ontology manipulation) can be combined with the tools and workflows devised within the Galaxy framework, resulting in an enhancement of OPPL. Use cases are provided in order to demonstrate OPPL-Galaxy’s capability for enriching, modifying and querying biomedical ontologies. Coupling OPPL-Galaxy with other bioinformatics tools of the Galaxy framework results in a system that is more than the sum of its parts. OPPL-Galaxy opens a new dimension of analyses and exploitation of biomedical ontologies, including automated reasoning, paving the way towards advanced biological data analyses.