scispace - formally typeset
Search or ask a question

Showing papers presented at "Semantic Web Applications and Tools for Life Sciences in 2011"


Proceedings ArticleDOI
07 Dec 2011
TL;DR: A much improved version of LogMap, a highly scalable ontology matching system with 'built-in' reasoning and diagnosis capabilities, and provides the necessary infrastructure for domain experts to interactively contribute to the matching process.
Abstract: In this paper we present a much improved version of LogMap, a highly scalable ontology matching system with 'built-in' reasoning and diagnosis capabilities. LogMap 2.0 is not only more scalable and robust than its predecessor, but it also provides the necessary infrastructure for domain experts to interactively contribute to the matching process.

28 citations


Proceedings ArticleDOI
07 Dec 2011
TL;DR: A demo of the capabilities of DOG4DAG, the Dresden Ontology Generator for Directed Acyclic Graphs, which is available as plugin to both OBO-Edit and Protégé, and summarise the strengths and limits of the different the steps of the generation process.
Abstract: In the biomedical domain, Protege and OBO-Edit are the main ontology editors supporting the manual construction of ontologies. Since manual creation is a laborious and hence costly process, there have been efforts to automate parts of this process. Here, we give a demo of the capabilities of DOG4DAG, the Dresden Ontology Generator for Directed Acyclic Graphs, which is available as plugin to both OBO-Edit and Protege. In the demo, we describe how to generate terms and in particular siblings, definitions, and is-a relationships using an example in the domain of nervous system diseases. We summarise the strengths and limits of the different the steps of the generation process.

17 citations


Proceedings ArticleDOI
07 Dec 2011
TL;DR: It is argued that traditional journal publication is no longer sufficient, and a methodology based on the workflow paradigm, Semantic Web models, and Digital Library infrastructure is proposed, to enable the preservation of the necessary and sufficient information for researchers to understand the steps of a computational experiment that led to new biological insight, at any point in the future.
Abstract: One of the main challenges for biomedical research lies in the integrative study of large and increasingly complex combinations of data in order to understand molecular mechanisms, for instance to explain the onset and progression of human diseases. Computer-assisted methodology is needed to perform these studies, posing new challenges for upholding scientific quality standards for the reproducibility of science. This pertains to the preservation of the 'materials and methods' of computational experiments as a record of the evidence for the biological interpretation of their results. We argue that traditional journal publication is no longer sufficient, and propose a methodology based on the workflow paradigm, Semantic Web models, and Digital Library infrastructure.Our primary goal is to enable the preservation of the necessary and sufficient information for researchers to understand the steps of a computational experiment that led to new biological insight, at any point in the future. Central to our approach is the development of a 'Research Object' (RO) model that captures this information for preservation, publication and acknowledgement. We adopted a combination of a Semantic Web and Digital Library approach for the representation and publication of such a model. The RO model can be viewed as an artifact that aggregates and annotates a number of resources that are used and/or produced in a given scientific investigation. The figure below (Figure 1) illustrates a high level description of the elements that are needed to specify a research object.A resource can be a workflow, web service, document, data item, data set, workflow run, software or a research object. Instead of building a new model, we use the Object Reuse and Exchange (OAI-ORE) for specifying aggregation of resources, and Annotation Ontology (AO) for their annotations. ORE defines standards for the description and exchange of aggregations of Web resources. For example, a Research Object can be defined as an ore:Aggregation, and an ore:ResourceMap can be used to describe the research object and its constituent resources (ro:ResearchObject a owhClass; rdfs:subClassOf ore:Aggregation). Annotations in a RO are specified using the Annotation Ontology, which provides a common model for document metadata, typically for annotating electronic documents or parts of electronic documents. Together with domain-specific vocabularies that extend the generic RO model we can specifically annotate the roles of the individual resources. We aim to develop tooling that facilitates annotation at each step of the research cycle, harvesting metadata from users in small steps.We present an example of an instantiated prototype RO in the context of a study of Metabolic Syndrome, for which we perform computational experiments that help interpret Genome Wide Association Data by using a special text mining method [1]. While we conceived the experiment, designed and performed it, we populated the prototype RO model and annotated the entities to describe their role and their interrelationships. For instance, we defined that 'a particular ranked list of candidate biological processes was produced by a particular workflow run', for which we assert that 'this particular workflow run is a run instance of a specific GWAS Interpretation Workflow', and 'a specific Text Mining Web Service is used in this particular GWAS Interpretation Workflow', while 'a particular GWAS data set is input to the workflow run'. We also defined that the 'RO is created by Kristina Hettne', 'created at a particular time and date', 'motivated by a particular hypothesis', and 'the result is interpretated through a particular change in the hypothesis'. ROs can also refer to previous work of which the output was used in the experiment. ROs may be related to each other and other resources, which can create a graph of scientific progress.The results presented here are the outcomes of the EU FP7 project 'Wf4Ever that aims to provide tools and recommendations for digitally preserving computational experiments.

11 citations


Proceedings ArticleDOI
07 Dec 2011
TL;DR: This paper is exploring the possibility of using SADI Semantic Web services for semantic querying of clinical data and report preliminary progress on prototyping a semantic queried infrastructure for the surveillance of, and research on hospital-acquired infections.
Abstract: Clinical Intelligence, as a research and engineering discipline, is dedicated to the development of tools for data analysis for the purposes of clinical research, surveillance and rational health care management. Ad hoc querying of clinical data is one desirable type of functionality. Since most of the data is currently stored in relational or similar form, ad hoc querying is problematic as it requires specialised technical skills and the knowledge of particular data schemas. A possible solution is semantic querying where the user formulates queries in terms of domain ontologies that are much easier to navigate and comprehend than data schemas. Existing approaches to semantic querying of relational data, based on declarative semantic mappings from data schemas to ontologies, such as RDFizing and query rewriting, cannot cope with situations when some computation is required to turn relational data into RDF or OWL, e. g., to implement temporal reasoning. In this paper, we are exploring the possibility of using SADI Semantic Web services for semantic querying of clinical data and report preliminary progress on prototyping a semantic querying infrastructure for the surveillance of, and research on hospital-acquired infections.

11 citations


Proceedings Article
07 Dec 2011
TL;DR: The SWAT4LS 2011 brings together practitioners and technical experts, developers, users, and researchers, who attended an exciting venue to exchange new ideas, practical developments and experiences on issues pertinent to the application of semantic technologies.
Abstract: The 2011 International Workshop on Semantic Web Applications and Tools for the Life Sciences (SWAT4LS 2011, http://wwwswat4lsorg/workshops/london2011/) is a workshop to meet and to exchange ideas from all fields of semantic technologies applied to life sciences The aim of SWAT4LS is both to present new and interesting research results and to show successful and deployed semantic applications in biomedical informatics and computational biology After a series of three successful international SWAT4LS workshops in Edinburgh, Amsterdam, and Berlin, with about 100 participants each time, the SWAT4LS 2011 is held in London, December 7th-9th, 2011 It makes clear that there is a successful path from high-quality research results to applied applications The workshop brings together practitioners and technical experts, developers, users, and researchers, who attended an exciting venue to exchange new ideas, practical developments and experiences on issues pertinent to the application of semantic technologies The technical program shows a carefully selected presentation of current research and developments in 6 full papers, 5 short papers, 4 highlight posters and demo papers, and 17 posters These were complemented by 3 invited keynote talks

10 citations


Proceedings ArticleDOI
07 Dec 2011
TL;DR: This paper proposes a semi-automatic ontology debugging approach, which supports domain experts in debugging the is-a structure in taxonomies, and develops algorithms to detect and repair wrong and missing is- a relations.
Abstract: With the proliferation of ontologies and their use in semantically-enabled applications, the issue of finding and repairing defects in ontologies has become increasingly important. In this paper we address the problem of defects in the is-a structure of taxonomies, the currently most frequently used kind of ontologies. We deal with both missing is-a relations as well as existing wrong is-a relations. The context of our study is a taxonomy network consisting of taxonomies networked by correct mappings. We propose a semi-automatic ontology debugging approach, which supports domain experts in debugging the is-a structure in taxonomies. We develop algorithms to detect and repair wrong and missing is-a relations. Further, we discuss an implemented system, RepOSE, and an experiment on real-world ontologies.

9 citations


Proceedings ArticleDOI
07 Dec 2011
TL;DR: This paper presents a scalable method for the extraction of biomedical relations from text that enables seamless integration of the extracted relations with the available biomedical resources through the process of semantic annotation.
Abstract: The increasing amount of biomedical scientific literature published on the Web is demanding new tools and methods to automatically process and extract relevant information. Traditional information extraction has focused on recognizing well-defined entities such as genes or proteins, which constitutes the basis for extracting the relations between the recognized entities. Most of the work has focused on harvesting domain-specific, pre-specified relations, which usually requires manual labor and heavy machinery. The intrinsic features and scale of the Web demand new approaches able to cope with the diversity of documents, where the number of relations is unbounded and not known in advance. This paper presents a scalable method for the extraction of biomedical relations from text. The method is not geared to any specific sub-domain (e.g. protein-protein interactions, drug-drug interactions, etc.) and does not require any manual input or deep processing. Even better, the method uses the extracted relations to infer a set of abstract semantic relations and their signature types, which constitutes a valuable source of knowledge when constructing formal knowledge bases. We enable seamless integration of the extracted relations with the available biomedical resources through the process of semantic annotation. The proposed approach has successfully been applied to the CALBC corpus (i.e. almost a million text documents) and UMLS has been used as knowledge resource for semantic annotation.

7 citations


Proceedings ArticleDOI
07 Dec 2011
TL;DR: This tool is not only designed to aid bioinformaticians when designing SPARQL queries to access biological databases exposed as Linked Data but also aid biologists to gain a deeper insight into the potential use of this technology.
Abstract: Life Sciences have emerged as a key domain in the Linked Data community because of the diversity of data semantics and formats available by means of a great variety of databases and web technologies. Thus, it has been used as the perfect domain for applications in the Web of Data. Unfortunately, on the one hand, bioinformaticians are not exploiting the full potential of this already available technology and, on the other hand, the experts in Life Sciences have real problems to discover, understand and devise how to take advantage of these interlinked (integrated) data. In this paper, we present Bioqueries, a wiki-based portal that is aimed at community building around Biological Linked Data. This public space offers several services and a collaborative infrastructure with the objective of stimulating the generation of activity in the consumption of Biological Linked Data and therefore contributing to the deployment of the benefits of the Web of Data in this domain. This tool is not only designed to aid bioinformaticians when designing SPARQL queries to access biological databases exposed as Linked Data but also aid biologists to gain a deeper insight into the potential use of this technology. These queries published in the portal are also described and commented on natural language, to enable their use by experts in the domain but with less expertise in semantic technologies. The Bioqueries portal is accessible at http://bioqueries.uma.es

7 citations


Proceedings ArticleDOI
07 Dec 2011
TL;DR: An ontology-based smart SPARQL query builder named Bio-SPARQL is an implementation of the needed logic that generates structurally-optimised queries over an ontologically classified RDF/OWL based bio-medical LOD by logically analysing their semantic graph structure.
Abstract: Building an efficient SPAQRL query over the great variety of copious bio-medical Linked Open Data (LOD) requires users to understand the data schema, and makes it difficult for biologists to handle such data. To address this problem, we challenge to realise a SPARQL query builder that generates a structurally-optimised query by logically analysing the target RDF/OWL data; but still the corresponding unifying logic over RDF/OWL data needs to be implemented. An ontology-based smart SPARQL query builder named Bio-SPARQL is an implementation of the needed logic. Bio-SPARQL generates structurally-optimised queries over an ontologically classified RDF/OWL based bio-medical LOD by logically analysing their semantic graph structure. Bio-SPARQL employs our database named BioLOD having LOD data sets categorised in 744 classes with 7.88 million data items (instances) integrated public various types of omic databases by human curation and provides a set of LOD data files of each class. To aid in writing a query, it provides a graphical user interface that suggests possible data path schema and filters by analysing its corresponding ontological BioLOD data structure. The generated SPARQL query is designed to be performed in a user's local environment with its corresponding downloaded BioLOD data files in order to control the influence on query results due to data updates.

6 citations


Proceedings ArticleDOI
07 Dec 2011
TL;DR: A number of heterogeneous resources related to the definitions of diseases, including the linked open data from DBpedia, the textual definitions from the UMLS and the formal definitions of SNOMED CT are investigated and integrated in a Semantic Web framework.
Abstract: The beta phase of the 11th revision of International Classification of Diseases (ICD-11) intends to accept public input through a distributed model of crowdsourcing. One of the core use cases is to create textual definitions for the ICD categories. The objective of the present study is to design, develop and evaluate approaches to support ICD-11 textual definitions authoring using Semantic Web technology. We investigated a number of heterogeneous resources related to the definitions of diseases, including the linked open data (LOD) from DBpedia, the textual definitions from the UMLS and the formal definitions of SNOMED CT. We integrated them in a Semantic Web framework (i.e. linked data in a RDF triple store), which is being proposed as a backend in a prototype platform for collaborative authoring of ICD-11 beta. We performed a preliminary evaluation on the usefulness of our approaches and discussed the potential challenges from both technical and clinical perspectives.

5 citations


Proceedings ArticleDOI
07 Dec 2011
TL;DR: COEUS follows a "Semantic Web in a box" approach, with a package including advanced data integration and triplification tools, base ontologies, a web-oriented engine and a flexible exploration API, targeted at life sciences developers.
Abstract: As the "omics" revolution unfolds, the growth in data quantity and diversity is pushing forward the need for pioneering bioinformatics software, capable of significantly improving the research workflow. To cope with these computer science demands, biomedical software engineers are adopting emerging Semantic Web technologies that better suit the life sciences domain. The latter complex innate relationships are easily mapped into Semantic Web graphs, enabling a superior understanding of collected knowledge. Despite the increased awareness regarding Semantic Web technologies in bioinformatics, its usage is still diminished.With COEUS, we introduce a new Semantic Web framework, aiming at a streamlined application development cycle. COEUS follows a "Semantic Web in a box" approach, with a package including advanced data integration and triplification tools, base ontologies, a web-oriented engine and a flexible exploration API. The platform, targeted at life sciences developers, provides a complete application skeleton ready for rapid application deployment, and is available free as open source at http://bioinformatics.ua.pt/coeus/.

Proceedings ArticleDOI
07 Dec 2011
TL;DR: The Linked Clinical Data project at Mayo Clinic aims to develop a semantics-driven framework for high-throughput phenotype extraction, representation, integration, and querying from electronic medical records using emerging Semantic Web technologies, such as Linked Open Data.
Abstract: Systematic study of clinical phenotypes is important to better understand the genetic basis of human diseases and more effective gene-based disease management. The Linked Clinical Data (LCD) project at Mayo Clinic aims to develop a semantics-driven framework for high-throughput phenotype extraction, representation, integration, and querying from electronic medical records using emerging Semantic Web technologies, such as Linked Open Data. This poster abstract provides a brief background and overview of the recently initiated LCD project.

Proceedings ArticleDOI
07 Dec 2011
TL;DR: This research introduces CLI-mate, a framework to facilitate development of user-friendly interfaces for command line programs, and discusses the ontology model of a command line program.
Abstract: We introduce CLI-mate, a framework to facilitate development of user-friendly interfaces for command line programs.In the agile development environment of bioinformatics, many command line programs are created quickly to fill the gaps between complex information processes. A command line interface (CLI) is sometimes sufficient for the task, but it limits adoption by a broader audience. As such, it is often necessary for the developer to create a wrapper that provides a more user-friendly interface. Furthermore, the CLI itself might not meet minimal requirements, or is subject to change. Dealing with these changes, as well as wrapping the program itself is the focus of this research.In this demo, we will demonstrate the functionality of CLI-mate and the generated interfaces in Galaxy and MO-TEUR. We will also discuss the ontology model of a command line program.

Proceedings ArticleDOI
07 Dec 2011
TL;DR: The vast amount of various life sciences data at RIKEN and other institutes including genome, transcriptome, proteome, metabolome, and phenome data are ontologically integrated into a common system to facilitate data retrieval, integration and collaboration.
Abstract: The vast amount of various life sciences data at RIKEN and other institutes including genome, transcriptome, proteome, metabolome, and phenome data are ontologically integrated into a common system. The challenge is to facilitate data retrieval, integration and collaboration.BioLOD.org - the Broadly Integrated Ontological Linked Open Data database (http://biolod.org) - provides over 6,800 downloadable OWL/RDF graph files of mutually linked public biological data organized as a semantic web using standardized formats of the World Wide Web Consortium Linking Open Data (W3C LOD) project. BioLOD.org mines numerous semantic links from original databases and re-classifies them into graph files based on ontology classifications. Relationships between the files are mutually and clearly referenced so it is easy to find other files associated by semantic links included in detailed data instances.BioLOD.org intensively surveyed both forward and reverse semantic-link relationships from 36 databases for humans and mice, 33 databases for plants and 16 databases related to proteins. BioLOD summarizes this information as archive files available for download in various useful formats. The BioLOD.org database uniquely provides Linked Open Data annotated contextually with biological vocabulary and supports visualization services to browse LOD data through SciNetS.org, repository services to deposit users' LOD through LinkData.org and SPARQL endpoint service for BioLOD data is through BioSPARQL.org.

Proceedings ArticleDOI
07 Dec 2011
TL;DR: A semantic-based approach and PythonRules are proposed to increase programming productivity in heterogeneous spaces through adaptable and scalable ontology-based development of Smart Space applications.
Abstract: In biomedical sciences new and increasingly complex technologies are constantly being introduced, and to facilitate new scientific discoveries they need to be able to adapt to current demands and provide seamless functionality. However, the interoperability of devices, tools and data is not always simple, and is often a bottleneck in bioimaging. Tools that ease the integration of diverse technologies could amplify functionality and performance considerably. BioImageXD software is presented as a use case. We propose a semantic-based approach and PythonRules to increase programming productivity in heterogeneous spaces through adaptable and scalable ontology-based development of Smart Space applications.

Proceedings ArticleDOI
07 Dec 2011
TL;DR: This work presents a methodology to integrate the results and experimental context of three different representations of microarray-based transcriptomic experiments: the Gene Expression Atlas, the W3C BioRDF task force approach to reporting provenance of micro array experiments, and the HSCI blood genomics project.
Abstract: Sharing and describing experimental results unambiguously with sufficient detail to enable replication of results is a fundamental tenet of scientific research. In today's cluttered world of "-omics" sciences, data standards and standardized use of terminologies and ontologies for biomedical informatics play an important role in reporting high-throughput experiment results in formats that can be interpreted by both researchers and analytical tools. Increasing adoption of Semantic Web and Linked Data technologies for the integration of heterogeneous and distributed Health Care and Life Sciences (HCLS) datasets has made the reuse of standards even more pressing; dynamic semantic query federation can be used for integrative bioinformatics when ontologies and identifiers are reused across data instances. We present here a methodology to integrate the results and experimental context of three different representations of microarray-based transcriptomic experiments: the Gene Expression Atlas, the W3C BioRDF task force approach to reporting provenance of microarray experiments, and the HSCI blood genomics project. Our approach does not attempt to improve the expressivity of existing standards for genomics but, instead, to enable integration of existing datasets published from microarray-based transcriptomic experiments. SPARQL Construct is used to create a posteriori mappings of concepts and properties and linking rules that match entities based on query constraints. We discuss how our integrative approach can encourage reuse of the Experimental Factor Ontology (EFO) and the Ontology for Biomedical Investigations (OBI) for the reporting of experimental context and results of gene expression studies. A demo is made available at http://ui.genexpressfusion.googlecode.com/hg/index.html

Proceedings ArticleDOI
07 Dec 2011
TL;DR: This paper presents a framework for developing UIMA-compliant, REST-like, text mining services which can either supplement the existing components or be used as bespoke, stand-alone workflows.
Abstract: In this paper, we address the issue of automatically adding U-Compare workflows to BioCatalogue by exploring the compatibility of UIMA to a REST-like web service. We aim to make workflows consisting of state-of-the-art text mining components available to the bioinformatics community without the need for any expertise in programming or software library dependencies. We present a framework for developing UIMA-compliant, REST-like, text mining services which can either supplement the existing components or be used as bespoke, stand-alone workflows. The framework embodies U-Compare's component library and refactors Apache UIMA SimpleServer to provide a post-processing component of the analysis results, a human-readable access mechanism, and documentation templates. As an application, we implemented a number of new services, which are registered with the BioCatalogue.

Proceedings ArticleDOI
07 Dec 2011
TL;DR: It is argued that RIO can provide a means by which the possible deviations of design styles can be found and reported to the domain experts.
Abstract: Detecting planned and unplanned deviations from guidelines that give rise to patterns in ontologies can be difficult, even simply in terms of detecting regularities in a complex artefact. In this paper we demonstrate the usage of RIO; a framework for detecting such syntactic regularities using cluster analysis of the entities in ontologies. We demonstrate its usage with SNOMED-CT, a large medical terminology. We focus on the inspection of three modules from SNOMED-CT and we analyse them in terms of their types and number of regularities and irregularities. The results show that modules of the ontology that did not follow a general pattern contained defects such as missing existential restrictions. In the worst case, the expected patterns described in the technical guide of the ontology were followed by 10% of the corresponding entities in the module. We argue that RIO can provide a means by which the possible deviations of design styles can be found and reported to the domain experts.

Proceedings ArticleDOI
07 Dec 2011
TL;DR: This work describes an approach of building cooperative decentralized repositories of ontologies on top ServO (Server of Ontologies), a dynamic ontology repository building tool aiming at indexing and searching KOS and computing similarities between their entities.
Abstract: Ontologies, structured vocabularies and terminologies are used as knowledge organization system (KOS) for building knowledge-based applications KOS are used for facilitating sharing data and the Linked Open Data initiative has opened new perspectives for integrating heterogeneous data on the web However, the increasing availability of KOS on the web as well as ad hoc and in-house classifications not yet present in the infrastructure of the Semantic Web raises the question of their identification and reuse We describe an approach of building cooperative decentralized repositories of ontologies The approach is being implemented on top ServO (Server of Ontologies), a dynamic ontology repository building tool aiming at indexing and searching KOS and computing similarities between their entities

Proceedings ArticleDOI
07 Dec 2011
TL;DR: OPPL-Galaxy is presented, an OPPL wrapper for the Galaxy platform, and a series of examples demonstrating its functionality for enriching ontologies, which can be combined along with the tools and workflows devised in Galaxy.
Abstract: Biomedical ontologies are key to the success of Semantic Web technologies in Life Sciences; therefore, it is important to provide appropriate tools for their development and further exploitation. The Ontology Pre Processor Language (OPPL) can be used for automating the complex manipulation needed to devise biomedical ontologies with richer axiomatic content, which in turn pave the way towards advanced biological data analyses. We present OPPL-Galaxy, an OPPL wrapper for the Galaxy platform, and a series of examples demonstrating its functionality for enriching ontologies. As Galaxy provides an integrated framework to make use of various bioinformatics tools, the functionality delivered by OPPL to manipulate ontologies can be combined along with the tools and workflows devised in Galaxy. As a result, those workflows can be used to perform more thorough analyses of biological information by exploiting extant biological knowledge codified in (enriched) biomedical ontologies.

Proceedings ArticleDOI
07 Dec 2011
TL;DR: The ChEBI ontology is developed as a public dictionary of molecular entities, which is used to ensure interoperability of applications supporting tasks such as drug discovery and prevents one from describing non-tree-like relationships using OWL 2 schema axioms.
Abstract: OWL 2 is commonly used to represent objects with complex structure, such as complex assemblies in engineering applications, human anatomy or the structure of chemical molecules [2]. Towards that direction, the European Bioinformatics Institute (EBI) has developed the ChEBI ontology as a public dictionary of molecular entities, which is used to ensure interoperability of applications supporting tasks such as drug discovery. In order to automate the classification of molecules, ChEBI descriptions have been translated into OWL and then classified using state of the art Semantic Web reasoners. While this has uncovered numerous implicit subsumptions between ChEBI classes, the usefulness of the approach was somewhat limited by a fundamental inability of OWL 2 to correctly represent the structure of complex molecular entities. OWL 2 exhibits a so-called tree-model property, which prevents one from describing non-tree-like relationships using OWL 2 schema axioms. For example, OWL 2 axioms can state that butane molecules have four carbon atoms, but they cannot state that the four atoms in a cyclobutane molecule are arranged in a ring.

Proceedings ArticleDOI
07 Dec 2011
TL;DR: This work proposes a method for resource discovery which is able to exploit such textual descriptions to find relevant resources in open registries, and conducts several experiments on resources extracted from the BioCatalogue registry.
Abstract: Open metadata registries are a fundamental tool for researchers in the Life Sciences trying to locate resources such as web services or databases. While sophisticated standards have been produced for annotating these resources with rich, well-structured metadata, evidence shows that in open registries a majority of annotations simply consists of informal free text descriptions. This reality must be taken into account in order to develop effective techniques for resource discovery in the Life Sciences.In this work we propose a method for resource discovery which is able to exploit such textual descriptions to find relevant resources. It is a requirement-driven approach, in which the user specifies informational needs as a target task and a set of facets of interest, expressed using free text.We have conducted several experiments on resources extracted from the BioCatalogue registry. For a sample set of queries that reflect common Bioinformatics-related research questions, the results show that our method is effective and provides useful answers.

Proceedings ArticleDOI
07 Dec 2011
TL;DR: A plug-in to KE is described which allows selected data to be passed to appropriate SADI Semantic Web Services for analysis, allowing discovery of compatible services in environments that lack reasoning support.
Abstract: Integrating, visualizing and exploring Semantic Web data is the primary focus of the Sentient Knowledge Explorer (KE). To date, export of displayed data to arbitrary analytical tools has not been possible through the KE interface. Here we describe a plug-in to KE which allows selected data to be passed to appropriate SADI Semantic Web Services for analysis. Bootstrapping the semantics of a selected node is achieved through resolution of its URI to metadata; thereafter, utilizing the semantics in SADI service interface descriptions, only appropriate services are presented to the user. These services are then invoked by simple mouse clicks. Key to success is the recording of a SPARQL query in the SADI registry, allowing discovery of compatible services in environments that lack reasoning support.

Proceedings ArticleDOI
07 Dec 2011
TL;DR: The application of ontologies within GWAS Central for the description and standardisation of phenotypic observations and their use in inferring disease phenotypes is presented.
Abstract: The genome-wide association study (GWAS) database - GWAS Central (http://www.gwascentral.org) - allows the sophisticated interrogation and comparison of summary-level GWAS data. Here we present the application of ontologies within GWAS Central for the description and standardisation of phenotypic observations and their use in inferring disease phenotypes. For orthologous genes, our cross-species phenotype comparison pipeline allows for comparison of phenotypes defined using alternative mammalian phenotype ontologies. Building on the existing rich semantic phenotype annotation layer, we are currently involved in an effort to publish a core subset of the data as RDF nanopublications.

Proceedings ArticleDOI
07 Dec 2011
TL;DR: A method for automating the detection and correction of spelling errors in the Foundational Model of Anatomy was described, which identified 43 errors occurring in 97 terms, and 6 words of questionable or inconsistent spelling occurring in 26 terms.
Abstract: We describe a method for automating the detection and correction of spelling errors in the Foundational Model of Anatomy (FMA). The FMA was tokenized into 4893 distinct words; misspellings were identified and corrected using the National Library of Medicine's SPECIALIST GSpell Spelling Suggestion API. We identified 43 errors occurring in 97 terms, and 6 words of questionable or inconsistent spelling occurring in 26 terms. These errors are replicated in other reference terminologies that use the FMA. Our approach may be useful as part of a quality assurance process for other large-scale biomedical knowledge resources.

Proceedings ArticleDOI
07 Dec 2011
TL;DR: This paper proposes a flexible scientific workflow system using a rule-based semantic multi-agent system to handle failures, exceptions, and dynamic changes.
Abstract: Flexibility and adaptability are regarded as the important challenges of scientific workflows. In this paper, we propose a flexible scientific workflow system using a rule-based semantic multi-agent system to handle failures, exceptions, and dynamic changes. The approach provides advantages such as making decisions at runtime, collaboration between organizations, provenance, result explanation, etc.

Proceedings ArticleDOI
07 Dec 2011
TL;DR: Software tools are reviewed, guidelines for genome design software are suggested, and GenoCODE: a program for designing genomes using the semantic web SciNetS.
Abstract: Advances in DNA synthesis technology have enabled the design of novel genomes. Semantic web will improve the capability of genome design tools for integrating experimental data to guide the DNA design process. Here we review software tools, suggest guidelines for genome design software, and present GenoCODE: a program for designing genomes using the semantic web SciNetS.

Proceedings ArticleDOI
07 Dec 2011
TL;DR: This work proposes a model, namely Tagsty, that enables a community to semantically annotate protein sequences while benefit from the emerged social network, and argues that such environment lowers the barriers on extracting semantic content from social platforms, enables users to share new content, and facilitates knowledge discovery.
Abstract: We have reviewed different approaches on community-based annotation on protein and gene sequences, mainly relying on the wiki paradigms. Currently such approaches are not fully exploiting the social component that naturally emerges within communities. We propose a model, namely Tagsty, that enables a community to semantically annotate protein sequences while benefit from the emerged social network. We argue that such environment lowers the barriers on extracting semantic content from social platforms, enables users to share new content, and facilitates knowledge discovery. We also present an architecture supporting our proposal.

Proceedings ArticleDOI
07 Dec 2011
TL;DR: A service, which is developed specifically for nanopublication provenance, that provides the conversion between two types of identifiers; the PubMed Identifier (PMID) which is a unique number assigned to PubMed citations of life science journal articles and the Digital Object Identifier™ (DOI), which is used for identifying digital content.
Abstract: A major challenge of linked data is resolving the many different identifiers representing the same object Interconnecting the data requires mappings between the vocabularies and identifiers used in different data sets. To help with this issue, we have developed a service, which we use specifically for nanopublication provenance, that provides the conversion between two types of identifiers; the PubMed Identifier (PMID) which is a unique number assigned to PubMed citations of life science journal articles and the Digital Object Identifier™ (DOI) which is used for identifying digital content. DOI's are used to provide current information, including where the content (or information about the content) can be found on the Internet. DOI's are a very useful identifier as they often give a direct link back to the full text scientific article. We provide SOAP and REST web services the conversion data. In addition, there is a SPARQL endpoint for querying the mappings. http://www.pmid2doi.org/

Proceedings ArticleDOI
07 Dec 2011
TL;DR: A linked open data set on the Allie database that stores abbreviation-long form pairs in life sciences, which tries to link long forms to DBpedia entries using key collision methods and uses UMLS to absorb fluctuations of terms in life science.
Abstract: We built a linked open data set on the Allie database that stores abbreviation-long form pairs in life sciences. We tried to link long forms to DBpedia entries using key collision methods (i.e., fingerprint and n-gram fingerprint). In addition, we used UMLS to absorb fluctuations of terms in life science. As a result of combining the key collision methods with the domain-specific tools/dictionaries, more than five-sevenths of long forms in Allie have links to DBpedia entries when they appear 100 times or more in MEDLINE, and around 90 percent of those have links to them when their appearance frequencies are 500 or more. The string matching result achieved an F-measure of 0.98, and the number of links between Allie and DBpedia is 77 608. This outcome helps Allie users to find knowledge related to the long forms of interest.