scispace - formally typeset
Search or ask a question

Showing papers by "Simon Jupp published in 2011"


Journal ArticleDOI
TL;DR: Populous's contribution is in the knowledge gathering stage of ontology development; it separates knowledge gathering from the conceptualisation and axiomatisation, as well as separating the user from the standard ontology authoring environments.
Abstract: Background Ontologies are being developed for the life sciences to standardise the way we describe and interpret the wealth of data currently being generated. As more ontology based applications begin to emerge, tools are required that enable domain experts to contribute their knowledge to the growing pool of ontologies. There are many barriers that prevent domain experts engaging in the ontology development process and novel tools are needed to break down these barriers to engage a wider community of scientists.

58 citations


Journal ArticleDOI
TL;DR: A Semantic Web approach to developing a knowledge base that integrates data from high-throughput experiments on kidney and urine and using SPARQL as a query mechanism gives KUP biologists the means to ask queries across many resources in order to aggregate knowledge that is necessary for answering biological questions.
Abstract: Background Chronic renal disease is a global health problem. The identification of suitable biomarkers could facilitate early detection and diagnosis and allow better understanding of the underlying pathology. One of the challenges in meeting this goal is the necessary integration of experimental results from multiple biological levels for further analysis by data mining. Data integration in the life science is still a struggle, and many groups are looking to the benefits promised by the Semantic Web for data integration.

48 citations


01 Jan 2011
TL;DR: The ability to gather and process data from many molecular biological sources with RapidMiner's data mining capabilities to provide a powerful tool for scientific analysis is combined.
Abstract: Knowledge discovery through pattern finding in data is central to modern molecular biology, which now has thousands of databases and similar numbers of tools for processing those data. Any data analysis in molecular biology involves gathering and processing data from many sources, even before the analysis for the central biological question takes place. Taverna is a workflow workbench that allows bioinformaticians to create data pipelines involving distributed Web services and other forms of tool; these workflows gather and manage data in order to perform analyses that answer biological questions. RapidMiner brings a large suite of data processing, visualisation and data mining tools to bear upon tables of data, but there is a disconnect between these operators and the services available to users of Taverna. Through a RapidMiner extension to Taverna we have combined the ability to gather and process data from many molecular biological sources with RapidMiner's data mining capabilities to provide a powerful tool for scientific analysis. In this article we describe this RapidMiner extension to Taverna and some preliminary analyses we have performed using RapidMiner on biological data.

5 citations


01 Jan 2011
TL;DR: Populous presents authors with a table-based form where columns are tied to take values from particular ontologies; the user can select a concept from an ontology via its meaningful label to give a value for a given entity.
Abstract: We present Populous, an open source application for gathering content for an ontology and populating that ontology en masse. Populous presents authors with a table-based form where columns are tied to take values from particular ontologies; the user can select a concept from an ontology via its meaningful label to give a value for a given entity. Populated tables are fed into templates that can then be used to generate the ontology’s axioms. Populous separates knowledge gathering from the conceptualisation; it also removes users from the usual ontology authoring tools. Availability: Download, source and video via http://www.e-lico.eu/ populous. Ontology building environments such as Protege and OBOEdit offer facilities for the manual authoring of axioms. Such tools are vital for capturing an ontology’s form. Many ontologies are, however, large with considerable portions formed of repetitions of the same pattern of axioms, varying only in the fillers within that pattern. To avoid the tedium and potential errors of doing this manually, templates can be filled and the axioms for the pattern generated, avoiding the manual authoring of many axioms. Populous [1] does this by presenting a familiar form-filling table-based user interface for any ontology authors to populate ontology patterns or templates. Rows are tied to the entities being described; columns are tied to properties and the cells constrained to take values from particular ontologies or fragments of an ontology. As an author fills out the template, he or she is guided to place appropriate values within the template. The content of this table can then be transformed into the axioms of the target ontology with an OWL scripting language. Populous is an extension of RightField,1 which is used for creating Excel documents that contain ontology based restrictions on a spreadsheet’s content. RightField is primarily designed for generating spreadsheet templates for data annotation; Populous extends RightField to support knowledge gathering and ontology generation. Populous and RightField are both open source, cross platform Java applications released under the BSD licence. They use the Apache-POI 2 for interacting with Microsoft documents and manipulating Excel spreadsheets. 1 http://www.rightfield.org.uk 2 http://poi.apache.org Both OWL and OBO ontologies can be uploaded into Populous. Users can also browse and load ontologies directly from BioPortal. Once the ontologies are loaded they are classified by a reasoner and the basic class hierarchy can be inspected. Terms can be selected from the ontology to create validation sets for values that are permitted for a particular selection of cells in the table. Labels from an ontology’s entities can be used within a cell, not just URI or URI fragments. Populous allows the addition of free text, even if the cell has an associated validation range; these values are highlighted in red and can act as placeholders for new or suggested terms when no suitable candidate can be found in the validation set. Populous supports the use of the Ontology Pre-Processor Language 3 (OPPL) patterns in order to generate new OWL axioms from the populated template. OPPL is an extension of Manchester OWL Syntax to select, add and remove axioms and it has an interpreter for scripts that manipulate the ontology. Variables from the OPPL pattern are mapped to columns from the table using the column name through the Populous pattern Wizard. We have used Populous with biologists to populate large portions of a kidney and urinary pathway ontology [2]. Populous is another piece in the ‘jigsaw’ of tools that support the ontology authoring process. It starts to fill the gap between the term request system and the manual axiom authoring systems by providing a mechanism for ‘filling out’ templates in such a way that they can be validated against the ontologies with which the ontology is being composed. We see Populous as a means for engaging domain experts who are not ontology experts in the authoring process and any ontology author to more effectively populate their ontology’s content. Acknowledgements: We acknowledge Mikel Egana Aranguren for his advice, requirements and testing of Populous. This work was funded by the e-LICO project— EU/FP7/ICT-2007.4.4 and by SysMO-DB BBSRC grant BBG0102181.

4 citations


Proceedings ArticleDOI
01 Dec 2011
TL;DR: This article describes challenges with respect to a project that is using data mining techniques to analyse data from the kidney and urinary pathway (KUP) domain and using Semantic Web technologies to manage the complexity and change in the data.
Abstract: Data in biomedicine are characterised by their complexity, volatility and heterogeneity. It is these characteristics, rather than size of the data, that make managing these data an issue for their analysis. Any significant data analysis task requires gathering data from many places, organising the relationships between the data's entities and overcoming the issues of recognising the nature of each entity such that this organisation can take place. It is the inter-relationship of these data and the semantic confusion inherent in the data that make the data complex. On top of this we have volatility in the domain's data, knowledge and experimental techniques that make the processing of data from the domain a distinct challenge, even before those data are organised. In this article we describe these challenges with respect to a project that is using data mining techniques to analyse data from the kidney and urinary pathway (KUP) domain. We are using Semantic Web technologies to manage the complexity and change in our data and we report on our experiences in this project.

3 citations


01 Jan 2011
TL;DR: The iKUP browser supports renal biologists in finding data integrated from many disparate data sources, providing a simple interface to survey a large set of the KUP domains 'omics experiments simultaneously.
Abstract: The iKUP browser, a web-based interface to the kidney and urinary pathway knowledge base (KUPKB) [4], shows ontologies coming of age by enabling biologists to ask questions of integrated data that form hypotheses that are being tested in the laboratory. That is, semantically annotated data have been delivered to the target users in a form that they can use to change how they undertake their job. iKUP uses a browsing approach to query the KUP data as its users are usually not bioinformaticians that will design and use sophisticated scripts or workflows, and they are almost certainly not users familiar with semantic web technologies. The KUPKB contains data from high-throughput experiments on the kidney and the urinary system. The experimental data is richly interconnected to other biological data to form a single integrated repository for querying and exploration. The KUPKB uses multiple biomedical ontologies that act as a controlled vocabulary for standardised annotation of the datasets. These ontologies' semantics are used to ask queries that return intelligent answers that form part of the biologists' hypothesis generation process. By reducing the data to common representation languages like the Web Ontology Language (OWL) and the Resource Description Framework (RDF), we have shown semantic web technologies offering novel opportunities for data analysis. The iKUP browser supports renal biologists in finding data integrated from many disparate data sources, providing a simple interface to survey a large set of the KUP domains 'omics experiments simultaneously. At present, biologists must gather and integrate many data sets by hand and this integration is vital as genes, proteins, and small molecules have to be coordinated across many 'omic levels through investigations reported by many people. By hand, this kind of integration and querying is long, tedious and error prone. Users can use the iKUP browser to search for a molecule (mRNA, miRNA, Protein) or list of molecules. The query box exploits the label and synonym tags for molecules to guide the user with their search via a dynamic suggestion box that pops up as users type. Once users confirm the molecules they are looking for are present in the KUPKB, the application performs a SPARQL query based on these search terms to generate the results. The results show known information about each molecule from the range of datasets in the KUPKB. For each result, the user can see where anatomically the molecule is active and under what conditions. …

2 citations


01 Jul 2011
TL;DR: A set of OWL classes for mouse GOA genes are created, each gene is represented as a class, with the appropriate relationships to the GO aspects with which it has been annotated to give a fine partitioning of the proteins in the ontology.
Abstract: Ontologies such as the Gene Ontology (GO) and their use in annotations make cross species compari- sons of genes possible, along with a wide range of other activities. Tools, such as AmiGO, allow exploration of genes based on their GO annotations. This human driven explora- tion and querying of GO is obviously useful, but by taking advantage of the ontological representation we can use the- se annotations to create a rich polyhierarchy of proteins for enhanced querying. This also opens up possibilities for ex- ploring GOA for redundancies and defects in annotations.To do this we have created a set of OWL classes for mouse GOA genes. Each gene is represented as a class, with the appropriate relationships to the GO aspects with which it has been annotated. We then use defined classes to query these protein classes and to build a complex hierarchy. This standard use of OWL affords a rich interaction with GO an- notations to give a fine partitioning of the proteins in the ontology.

2 citations