BioInfer: a corpus for information extraction in the biomedical domain

doi:10.1186/1471-2105-8-50

Open AccessJournal ArticleDOI

BioInfer: a corpus for information extraction in the biomedical domain

Sampo Pyysalo, +6 more

- 09 Feb 2007 -

BMC Bioinformatics

- Vol. 8, Iss: 1, pp 50-50

Chats0

TLDR

A corpus targeted at protein, gene, and RNA relationships which serves as a resource for the development of information extraction systems and their components such as parsers and domain analyzers is introduced.

Abstract:

Lately, there has been a great interest in the application of information extraction methods to the biomedical domain, in particular, to the extraction of relationships of genes, proteins, and RNA from scientific publications. The development and evaluation of such methods requires annotated domain corpora. We present BioInfer (Bio Information Extraction Resource), a new public resource providing an annotated corpus of biomedical English. We describe an annotation scheme capturing named entities and their relationships along with a dependency analysis of sentence syntax. We further present ontologies defining the types of entities and relationships annotated in the corpus. Currently, the corpus contains 1100 sentences from abstracts of biomedical research articles annotated for relationships, named entities, as well as syntactic dependencies. Supporting software is provided with the corpus. The corpus is unique in the domain in combining these annotation types for a single set of sentences, and in the level of detail of the relationship annotation. We introduce a corpus targeted at protein, gene, and RNA relationships which serves as a resource for the development of information extraction systems and their components such as parsers and domain analyzers. The corpus will be maintained and further developed with a current version being available at http://www.it.utu.fi/BioInfer .

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

The Stanford Typed Dependencies Representation

Marie-Catherine de Marneffe, +1 more

TL;DR: This paper examines the Stanford typed dependencies representation, which was designed to provide a straightforward description of grammatical relations for any user who could benefit from automatic text understanding, and considers the underlying design principles of the Stanford scheme.

...read moreread less

Proceedings ArticleDOI

Overview of BioNLP'09 Shared Task on Event Extraction

Jin-Dong Kim, +4 more

TL;DR: The design and implementation of the BioNLP'09 Shared Task is presented, indicating that state-of-the-art performance is approaching a practically applicable level and revealing some remaining challenges.

...read moreread less

Journal ArticleDOI

Special Report: NCBI disease corpus: A resource for disease name recognition and concept normalization

Rezarta Islamaj Dogan, +2 more

- 01 Feb 2014 -

Journal of Biomedical Informatics

TL;DR: The results show that the NCBI disease corpus has the potential to significantly improve the state-of-the-art in disease name recognition and normalization research, by providing a high-quality gold standard thus enabling the development of machine-learning based approaches for such tasks.

...read moreread less

Journal ArticleDOI

Deep learning with word embeddings improves biomedical named entity recognition.

Maryam Habibi, +4 more

- 15 Jul 2017 -

Bioinformatics

TL;DR: This work shows that a completely generic method based on deep learning and statistical word embeddings [called long short‐term memory network‐conditional random field (LSTM‐CRF)] outperforms state‐of‐the‐art entity‐specific NER tools, and often by a large margin.

...read moreread less

Journal ArticleDOI

The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes

Veronika Vincze, +4 more

- 19 Nov 2008 -

BMC Bioinformatics

TL;DR: A corpus annotation project that has produced a freely available resource for research on handling negation and uncertainty in biomedical texts, which is also a good resource for the linguistic analysis of scientific and clinical texts.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

Nonparametric statistics for the behavioral sciences

Sidney Siegel

TL;DR: This is the revision of the classic text in the field, adding two new chapters and thoroughly updating all others as discussed by the authors, and the original structure is retained, and the book continues to serve as a combined text/reference.

...read moreread less

Journal ArticleDOI

A Coefficient of agreement for nominal Scales

Jacob Cohen

- 01 Apr 1960 -

Educational and Psychological Measuremen...

TL;DR: In this article, the authors present a procedure for having two or more judges independently categorize a sample of units and determine the degree, significance, and significance of the units. But they do not discuss the extent to which these judgments are reproducible, i.e., reliable.

...read moreread less

Proceedings ArticleDOI

The Berkeley FrameNet Project

Collin F. Baker, +2 more

TL;DR: This report will present the project's goals and workflow, and information about the computational tools that have been adapted or created in-house for this work.

...read moreread less

Journal ArticleDOI

The Database of Interacting Proteins: 2004 update

Lukasz Salwinski, +5 more

- 01 Jan 2001 -

Nucleic Acids Research

TL;DR: The Database of Interacting Proteins (DIP; http://dip.doe-mbi.ucla. edu) is a database that documents experimentally determined protein-protein interactions.

...read moreread less