scispace - formally typeset
Search or ask a question

Showing papers presented at "Semantic Web Applications and Tools for Life Sciences in 2015"


Proceedings Article
01 Jan 2015
TL;DR: This paper presents a solution that uses an ontology based on the HL7 v3 Reference Information Model and a set of R2RML mappings that relate this ontology to an underlying relational database implementation, and where morph-RDB is used to expose a virtual SPARQL endpoint over the data.
Abstract: Semantic interoperability is essential when carrying out post-genomic clinical trials where several institutions collaborate, since researchers and developers need to have an integrated view and access to heterogeneous data sources. In this paper we present a solution that uses an ontology based on the HL7 v3 Reference Information Model and a set of R2RML mappings that relate this ontology to an underlying relational database implementation, and where morph-RDB is used to expose a virtual SPARQL endpoint over the data. In previous efforts with other existing RDB2RDF systems we had not been able to work with live databases. Now we can issue SPARQL queries to the underlying relational data with acceptable performance, in general.%with a similar performance to having used the corresponding SQL native queries.

7 citations


Proceedings Article
07 Dec 2015
TL;DR: The objective of this work is to define explicitly, by means of an ontology, the vocabulary pertaining to imaging biomarkers’ domain.
Abstract: The importance of imaging biomarkers in biomedical research and drug design is well-acknowledged in the literature, calling for appropriate standards and guidelines for imaging biomarker development , validation and qualification. The objective of this work is to define explicitly, by means of an ontology, the vocabulary pertaining to imaging biomarkers.

7 citations


Proceedings Article
01 Dec 2015
TL;DR: Validata is an online web application for validating an RDF document against a set of constraints that extends the ShEx functionality to support multiple requirement levels.
Abstract: Validata is an online web application for validating an RDF document against a set of constraints. This is useful for data exchange applications or ensuring conformance of an RDF dataset against a community agreed standard. Constraints are expressed as a Shape Expression (ShEx) schema. Validata extends the ShEx functionality to support multiple requirement levels. Validata can be repurposed for different deployments by providing it with a new ShEx schema. The Validata code is available from https://github.com/HW-SWeL/Validata.

6 citations


Proceedings Article
07 Dec 2015
TL;DR: In this article, the authors present the overview and the initial results of the Linked Open Data project for the plant bioinformatics node of the Institut Francais de Bioinformatique (IFB).
Abstract: The advancements in empirical technologies has generated vast amounts of heterogeneous data. This situation has created a need to integrate the data to understand the system of interest in its entirety. Therefore, information systems play a crucial role in managing these data, enabling the biologists in the extraction of new knowledge. The plant bioinformatics node of the Institut Francais de Bioinformatique (IFB) maintains public information systems that houses domain specific data. Currently, efforts are being taken to expose the IFB plant bioinformatics resources as Linked Open Data, utilising domain specific ontologies and metadata. Here, we present the overview and the initial results of the project.

3 citations


Proceedings Article
14 Dec 2015
TL;DR: The Karyotype Ontology as mentioned in this paper is an ontology for describing bio-medicine, which allows rich descriptions of the chromosomal complement in humans, and is based on literate programming.
Abstract: Ontologies present an attractive technology for describing bio-medicine, because they can be shared, and have rich computational properties. However, they lack the rich expressivity of English and fit poorly with the current scientific “publish or perish” model. While, there have been attempts to combine free text and ontologies, most of these perform post-hoc annotation of text. In this paper, we introduce our new environment which borrows from literate programming, to allow an author to co-develop both text and ontological description. We are currently using this environment to document the Karyotype Ontology which allows rich descriptions of the chromosomal complement in humans. We explore some of the advantages and difficulties of this form of ontology development.

2 citations


Proceedings Article
08 Dec 2015
TL;DR: A novel method that consumes an eclectic set of linked data sources to help validating uncertain drug–gene relationships and produces a model that enables classifying drug–Gene pairs as related or not, thus confirming the validity of candidate pharmacogenes.
Abstract: A standard task in pharmacogenomics research is identifying genes that may be involved in drug response variability, i.e., pharmacogenes. Because genomic experiments tended to generate many false positives, computational approaches based on the use of background knowledge have been proposed. Until now, those have used only molecular networks or the biomedical literature. Here we propose a novel method that consumes an eclectic set of linked data sources to help validating uncertain drug–gene relationships. One of the advantages relies on that linked data are implemented in a standard framework that facilitates the joint use of various sources, making easy the consideration of features of various origins. Consequently, we propose an initial selection of linked data sources relevant to pharmacogenomics. We formatted these data to train a random forest algorithm , producing a model that enables classifying drug–gene pairs as related or not, thus confirming the validity of candidate pharmacogenes. Our model achieve the performance of F-measure=0.92, on a 100 folds cross-validation. A list of top candidates is provided and their obtention is discussed.

2 citations


Proceedings Article
01 Jan 2015
TL;DR: This work proposes a method to perform evidence-based hypothesis testing in the biomedical domain, such that specialists can evaluate confidence of their hypothesis and communicate their findings.
Abstract: Evidence-based hypothesis testing assumes the existence of a causal chain between the facts. By studying the propagation of evidenced facts in the causal chain (hypothesis) we gain new insights on the progression of a disease. In practice, a hypothesis cannot always be substantiated with a complete asserted knowledge (inability to collect the required evidence), yet it is possible to test a hypothesis with missing knowledge with a lower confidence. In this work we propose a method to perform evidence-based hypothesis testing in the biomedical domain, such that specialists can evaluate confidence of their hypothesis and communicate their findings. We assume that a hypothesis is formalized in an OWL 2 EL ontology and the KB contains incomplete asserted knowledge (ABox). We extract a causal chain from an ontology and represent it as a DAG (node fact, arc causal relationship). Users assign importance weights to the facts which they think are more important to support the hypothesis. Evaluation of the hypothesis confidence is then done by computing a weighted sum of fact confidences over the directed path in the DAG (corresponding to the causal chain).

1 citations


Proceedings Article
01 Jan 2015
TL;DR: The transformation process, and how the resulting artifacts can be used in modeling FHIR profiles, and its validation and information mapping are described.
Abstract: This presentation describes our progress to date in developing tools to mechanically parse the core models in the HL7 Fast Healthcare Interoperability Resources (FHIR) DSTU2 Ballot, and produce corresponding Shape Expressions (ShEx) schemas. In it we describe the transformation process, and how the resulting artifacts can be used in modeling FHIR profiles, and its validation and information mapping. We also discuss our plans to integrate this process into the OpenRefine platform to provide a user-friendly interface to support RDF/FHIR data element harmonization and model transformation.

1 citations


Proceedings Article
01 Jan 2015
TL;DR: SCRY, the authors' SPARQL compatible service layer, improves this by executing services at query time and making their outputs query-accessible, generating RDF data on demand.
Abstract: The inability to include quantitative reasoning in SPARQL queries slows down the application of Semantic Web technology in the life sciences. SCRY, our SPARQL compatible service layer, improves this by executing services at query time and making their outputs query-accessible, generating RDF data on demand. The power of this approach is demonstrated with two use cases, where we use SCRY to calculate standard deviations and to find homologous proteins.

1 citations


Proceedings Article
01 Jan 2015
TL;DR: OBOPedia offers another view into a field of interest that is based on a collection of ontologies as a reference resource, but one in which a user need not know it is an ontology.
Abstract: Ontologies contain knowledge about a domain for use by tools or humans. A source of knowledge should be usable by a human to ‘find out about’ or learn about the entities of a domain and their relationship to each other. The corpus of biomedical ontologies now contains ‘encyclopaedic’ knowledge about biology and should be capable of being used by humans to learn about entities in molecular biology. Yet multiple separate ontologies and the typical style of presentation of the knowledge in the ontologies mean that their use as a learning resource is sub-optimal. To address this issue we have created OBOPedia, a web based encyclopaedia of biology as seen by the Open Biomedical Ontologies (OBO) Consortium. OBOPedia exploits the OBO’s use of standard representations and meaningful human readable terms and natural language definitions to create a basic OBOPedia encyclopaedia entry. An entry is supplemented with an ontology’s synonyms and uses the ontology’s taxonomic links to provide ‘see also’ cross-references within the alphabetical list of entries. Currently, OBOPedia has access to ten OBO ontologies, including all the OBO Foundry ontologies, which have a total of over 210,000 entries. Our evaluations indicate that an OBOPedia style of presentation has a role as an alternative way of presenting knowledge of a domain collected as an ontology or ontologies. OBOPedia offers another view into a field of interest that is based on a collection of ontologies as a reference resource, but one in which a user need not know it is an ontology. OBOPedia may be used via http://www.obopedia.org.uk. The source code and documentation for OBOPedia are available via https://bitbucket. org/adam944/ontologyencyclopaedia.

1 citations


Proceedings Article
01 Jan 2015
TL;DR: A semantic alerting platform that allows this platform to rapidly form a view about the current health status of a patient and automatically notify the staff about any anomalies, and to let the physicians enter the rules/axioms without intervention of an IT specialist.
Abstract: Do you know Dr. Gregory House? He and his staff are medical experts. Together they can tackle almost any medical problem. At the Intensive Care department of the University Hospital of Ghent, we have developed a digital Dr. House. In the Intensive Care department there is a lot of data [1]. Per patient more than 20,000 values are generated each day [4]. Studies show that people can interpret at most 7 parameters at once [3]. As many doctors have different expertise, they look at different parameters. Still, they cant interpret them all together. A computer however does not really (besides memory constraints) have any constraints on how many data it can process. To efficiently analyze all the clinical data, a semantic alerting platform, called the digital Dr. House was built, as shown in Figure 1. The platform is able to rapidly consolidate all the gathered parameters and link them together by using an ontology. The ontology describes all the knowledge about the medical domain and is thus able to semantically enrich the gathered data. A reasoner, which acts like a human brain, can process data against the ontology and can infer new facts from this data. If it finds interesting facts, it can trigger an alert that is send to a doctor. As such, this digital Dr. House is able derive and filter interesting knowledge from a huge amount of clinical data. This allows this platform to rapidly form a view about the current health status of a patient and automatically notify the staff about any anomalies. Afterward the physicians are able to give feedback to alerts so that the system could eventually become self-learning. However, to generate the appropriate alerts, these alerts need to be expressed as rules/axioms in the ontology. These are usually specified by an IT specialist. However, this often leads to a lot of communication problems between the physicians and the IT specialists. The IT specialists talk about databases, queries, etc. The ICU specialists talk about nosocomial respiratory infections, co morbidity, staphylococcus aureus, etc. As you can see they almost talk a different language. An interface was therefore created, such that the IT specialist and physician can view the ontology and easily inserts new alerts together. Eventually, the goal is to let the physicians enter the rules/axioms without intervention of an IT specialist.

Proceedings Article
01 Jan 2015
TL;DR: Spatial descriptions take advantage of existing ontological spatial information, such as in anatomy ontologies and spatial relations ontolo-gies, e.g. BSPO, to specify regions of interest to search 3D anatomical space.

Proceedings Article
01 Jan 2015
TL;DR: Over the past 3 years there has been a massive growth in genetic testing both in terms of the scope of testing and the numbers of individuals offered genetic testing, and the ability to capture and share key clinical information, as well as genetic information, is becoming increasingly important.
Abstract: Over the past 3 years there has been a massive growth in genetic testing both in terms of the scope of testing and the numbers of individuals offered genetic testing. Targeted sequencing of small genomic regions has been replaced by panel testing, whole exome sequencing and most recently whole genome sequencing [4]. Furthermore, genetic testing in research, but also clinical settings has extended beyond small numbers of selected individuals with very rare, highly-defined disorders to cover larger populations. While this information presents new clinical opportunities and opens the way for development of novel therapies, it also presents major challenges. For clinicians, reliable identification of disease-associated genetic variants from amongst the broader background of variants present in all human genomes that are rare, but not actually pathogenic, is a concern. It is likely that for many rare genetic disorders obtaining clarity will require a worldwide effort and so the ability to capture and share key clinical information, as well as genetic information, is becoming increasingly important. However, at present there are major challenges with regard to the collection and storage of clinical information, particularly in the context of rare genetic disorders. The process of studying a patient with a possible rare genetic disorder typically involves many different clinical specialists with no “standard” patient route. During this process, it is very common to refer from one specialist to another in order to obtain a range of opinions and access different tests. The output of this process is usually clinical letters which are used both to document the patient’s progress and communicate findings between specialists (e.g. patient history, examination findings, investigation results and clinical impression). Since clinical letters are a key source of knowledge, their proper annotation and storage would enable access to the information they contain in a systematic way. Currently, the creation and processing of clinical letters requires the following steps: 1. Letters are dictated using a speech recognition system by the clinician. 2. The recording is uploaded to a server. 3. The voice data is transcribed using another application. 4. The text is downloaded and checked by qualified personnel. 5. The letter is tagged manually by a specialist responsible for reading and annotating the interesting terms.

Proceedings Article
01 Jan 2015
TL;DR: A pipeline to identify and annotate additional entities within the SureChEMBL patent corpus using the Termite text-mining tool, which represents the first large-scale, semantically-annotated life-science patent knowledgebase.
Abstract: SureChEMBL (https://www.surechembl.org) is a patent chemistry resource, originally a commercial product developed by SureChem/Digital Science, and recently made freely available at EMBL-EBI [1]. SureChEMBL uses a live and fully automated cloud-based pipeline that combines text-mining and chemistry tools to extract compounds named or depicted in patent documents and make them readily structure searchable by users. Over 50,000 new patent documents and 80,000 new compounds are entered into the system per month and new chemical annotations are usually available in the SureChEMBL interface within 1-7 days of the patent being released by the patent office. While the current SureChEMBL system addresses several chemistry use-cases, such as the identification of novel scaffolds and chemistry, there is an enormous amount of additional knowledge captured within the patent corpus. Much of this information will never be published elsewhere and may be of great value to the drug-discovery and broader life-science community. The Open PHACTS Discovery Platform is a semantic-web data integration platform, developed for the purpose of providing both the pharmaceutical industry and academic researchers with open access to interoperable drug discovery information [2, 3]. The platform currently includes data from a wide variety of public databases and provides API access to the integrated information. However, the further addition of biological and chemical patent information to the platform was considered to be of great potential utility. We have therefore developed a pipeline to identify and annotate additional entities (namely genes and diseases) within the SureChEMBL patent corpus using the Termite text-mining tool (https://scibite.com /content/termite.html). Since patent documents are often designed to obfuscate the key subject matter, it was essential to also develop an algorithm to assess the relevance of each gene or disease within a particular patent document, allowing users to restrict results to only highly relevant entities if they wish. An RDF model has been developed to capture the relationships between patent documents and annotated compounds, genes and diseases, and annotations for more than 6 million life-science patents have been made available in this format via the Open PHACTS platform (https://dev. openphacts.org/). A series of API calls have been developed to allow users of the platform to query the data and to integrate it with the extensive range of other data resources included in the platform (e.g., protein, pathway, bioactivity and disease information). In addition, KNIME and Pipeline Pilot nodes have also been created to facilitate the construction of workflows using patent data, for example, identifying all of the compounds from patents that mention a particular target or disease with high relevance. This represents the first large-scale, semantically-annotated life-science patent knowledgebase, freely available to both industrial and academic researchers.