scispace - formally typeset
Open AccessJournal ArticleDOI

BioAssay Ontology (BAO): a semantic description of bioassays and high-throughput screening results

Reads0
Chats0
TLDR
The first ontology to describe HTS experiments and screening results using expressive description logic is developed and BAO opens new functionality for annotating, querying, and analyzing HTS datasets and the potential for discovering new knowledge by means of inference.
Abstract
High-throughput screening (HTS) is one of the main strategies to identify novel entry points for the development of small molecule chemical probes and drugs and is now commonly accessible to public sector research. Large amounts of data generated in HTS campaigns are submitted to public repositories such as PubChem, which is growing at an exponential rate. The diversity and quantity of available HTS assays and screening results pose enormous challenges to organizing, standardizing, integrating, and analyzing the datasets and thus to maximize the scientific and ultimately the public health impact of the huge investments made to implement public sector HTS capabilities. Novel approaches to organize, standardize and access HTS data are required to address these challenges. We developed the first ontology to describe HTS experiments and screening results using expressive description logic. The BioAssay Ontology (BAO) serves as a foundation for the standardization of HTS assays and data and as a semantic knowledge model. In this paper we show important examples of formalizing HTS domain knowledge and we point out the advantages of this approach. The ontology is available online at the NCBO bioportal http://bioportal.bioontology.org/ontologies/44531 . After a large manual curation effort, we loaded BAO-mapped data triples into a RDF database store and used a reasoner in several case studies to demonstrate the benefits of formalized domain knowledge representation in BAO. The examples illustrate semantic querying capabilities where BAO enables the retrieval of inferred search results that are relevant to a given query, but are not explicitly defined. BAO thus opens new functionality for annotating, querying, and analyzing HTS datasets and the potential for discovering new knowledge by means of inference.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

The ChEMBL database in 2017.

TL;DR: ChEMBL is an open large-scale bioactivity database that includes the annotation of assays and targets using ontologies, the inclusion of targets and indications for clinical candidates, addition of metabolic pathways for drugs and calculation of structural alerts.
Journal ArticleDOI

The ChEMBL bioactivity database: an update

TL;DR: More comprehensive tracking of compounds from research stages through clinical development to market is provided through the inclusion of data from United States Adopted Name applications and a new richer data model for representing drug targets has been developed.
Journal ArticleDOI

PubChem BioAssay: 2017 update

TL;DR: An update for the PubChem Bio Assay database is provided describing several recent development including added sources of research data, redesigned BioAssay record page, new bioAssay classification browser and new features in the Upload system facilitating data sharing.
Journal ArticleDOI

Information retrieval and text mining technologies for chemistry

TL;DR: This Review provides a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting information demands of chemical information contained in scientific literature, patents, technical reports, or the web.
References
More filters
Journal ArticleDOI

Gene Ontology: tool for the unification of biology

TL;DR: The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.
Journal ArticleDOI

Gene Expression Omnibus: NCBI gene expression and hybridization array data repository

TL;DR: The Gene Expression Omnibus (GEO) project was initiated in response to the growing demand for a public repository for high-throughput gene expression data and provides a flexible and open design that facilitates submission, storage and retrieval of heterogeneous data sets from high-power gene expression and genomic hybridization experiments.
Related Papers (5)