BioInfer: a corpus for information extraction in the biomedical domain
Sampo Pyysalo,Filip Ginter,Juho Heimonen,Jari Björne,Jorma Boberg,Jouni Järvinen,Tapio Salakoski +6 more
Reads0
Chats0
TLDR
A corpus targeted at protein, gene, and RNA relationships which serves as a resource for the development of information extraction systems and their components such as parsers and domain analyzers is introduced.Abstract:
Lately, there has been a great interest in the application of information extraction methods to the biomedical domain, in particular, to the extraction of relationships of genes, proteins, and RNA from scientific publications. The development and evaluation of such methods requires annotated domain corpora. We present BioInfer (Bio Information Extraction Resource), a new public resource providing an annotated corpus of biomedical English. We describe an annotation scheme capturing named entities and their relationships along with a dependency analysis of sentence syntax. We further present ontologies defining the types of entities and relationships annotated in the corpus. Currently, the corpus contains 1100 sentences from abstracts of biomedical research articles annotated for relationships, named entities, as well as syntactic dependencies. Supporting software is provided with the corpus. The corpus is unique in the domain in combining these annotation types for a single set of sentences, and in the level of detail of the relationship annotation. We introduce a corpus targeted at protein, gene, and RNA relationships which serves as a resource for the development of information extraction systems and their components such as parsers and domain analyzers. The corpus will be maintained and further developed with a current version being available at http://www.it.utu.fi/BioInfer
.read more
Citations
More filters
Proceedings ArticleDOI
The Stanford Typed Dependencies Representation
TL;DR: This paper examines the Stanford typed dependencies representation, which was designed to provide a straightforward description of grammatical relations for any user who could benefit from automatic text understanding, and considers the underlying design principles of the Stanford scheme.
Proceedings ArticleDOI
Overview of BioNLP'09 Shared Task on Event Extraction
TL;DR: The design and implementation of the BioNLP'09 Shared Task is presented, indicating that state-of-the-art performance is approaching a practically applicable level and revealing some remaining challenges.
Journal ArticleDOI
Special Report: NCBI disease corpus: A resource for disease name recognition and concept normalization
TL;DR: The results show that the NCBI disease corpus has the potential to significantly improve the state-of-the-art in disease name recognition and normalization research, by providing a high-quality gold standard thus enabling the development of machine-learning based approaches for such tasks.
Journal ArticleDOI
Deep learning with word embeddings improves biomedical named entity recognition.
TL;DR: This work shows that a completely generic method based on deep learning and statistical word embeddings [called long short‐term memory network‐conditional random field (LSTM‐CRF)] outperforms state‐of‐the‐art entity‐specific NER tools, and often by a large margin.
Journal ArticleDOI
The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes
TL;DR: A corpus annotation project that has produced a freely available resource for research on handling negation and uncertainty in biomedical texts, which is also a good resource for the linguistic analysis of scientific and clinical texts.
References
More filters
Book
Nonparametric statistics for the behavioral sciences
TL;DR: This is the revision of the classic text in the field, adding two new chapters and thoroughly updating all others as discussed by the authors, and the original structure is retained, and the book continues to serve as a combined text/reference.
Journal ArticleDOI
Gene Ontology: tool for the unification of biology
M Ashburner,Catherine A. Ball,Judith A. Blake,David Botstein,Heather Butler,J. M. Cherry,Allan Peter Davis,Kara Dolinski,Selina S. Dwight,J.T. Eppig,Midori A. Harris,David P. Hill,Laurie Issel-Tarver,Andrew Kasarskis,Suzanna E. Lewis,John C. Matese,Joel E. Richardson,M. Ringwald,Gerald M. Rubin,Gavin Sherlock +19 more
TL;DR: The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.
Journal ArticleDOI
A Coefficient of agreement for nominal Scales
TL;DR: In this article, the authors present a procedure for having two or more judges independently categorize a sample of units and determine the degree, significance, and significance of the units. But they do not discuss the extent to which these judgments are reproducible, i.e., reliable.
Proceedings ArticleDOI
The Berkeley FrameNet Project
TL;DR: This report will present the project's goals and workflow, and information about the computational tools that have been adapted or created in-house for this work.
Journal ArticleDOI
The Database of Interacting Proteins: 2004 update
Lukasz Salwinski,Christopher S. Miller,Adam J. Smith,Frank K. Pettit,James U. Bowie,David Eisenberg +5 more
TL;DR: The Database of Interacting Proteins (DIP; http://dip.doe-mbi.ucla. edu) is a database that documents experimentally determined protein-protein interactions.