scispace - formally typeset
Open AccessProceedings Article

Integrated Annotation for Biomedical Information Extraction

Reads0
Chats0
TLDR
An approach to two areas of biomedical information extraction, drug development and cancer genomics using a framework which includes corpus annotation integrated at multiple levels: a Treebank containing syntactic structure, a Propbank containing predicate-argument structure, and annotation of entities and relations among the entities.
Abstract
We describe an approach to two areas of biomedical information extraction, drug development and cancer genomics. We have developed a framework which includes corpus annotation integrated at multiple levels: a Treebank containing syntactic structure, a Propbank containing predicate-argument structure, and annotation of entities and relations among the entities. Crucial to this approach is the proper characterization of entities as relation components, which allows the integration of the entity annotation with the syntactic structure while retaining the capacity to annotate and extract more complex events. We are training statistical taggers using this annotation for such extraction as well as using them for improving the annotation process.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings Article

The CoNLL 2007 Shared Task on Dependency Parsing

TL;DR: The tasks of the different tracks are defined and how the data sets were created from existing treebanks for ten languages are described, to characterize the different approaches of the participating systems and report the test results and provide a first analysis of these results.
Book ChapterDOI

Developing a robust part-of-speech tagger for biomedical text

TL;DR: Experimental results on the Wall Street Journal corpus, the GENIA corpus, and the PennBioIE corpus revealed that adding training data from a different domain does not hurt the performance of a tagger, and the authors' tagger exhibits very good precision on all these corpora.
Journal ArticleDOI

Special Report: NCBI disease corpus: A resource for disease name recognition and concept normalization

TL;DR: The results show that the NCBI disease corpus has the potential to significantly improve the state-of-the-art in disease name recognition and normalization research, by providing a high-quality gold standard thus enabling the development of machine-learning based approaches for such tasks.
Journal ArticleDOI

Deep learning with word embeddings improves biomedical named entity recognition.

TL;DR: This work shows that a completely generic method based on deep learning and statistical word embeddings [called long short‐term memory network‐conditional random field (LSTM‐CRF)] outperforms state‐of‐the‐art entity‐specific NER tools, and often by a large margin.
Book

Dependency Parsing

TL;DR: This book surveys the three major classes of parsing models that are in current use: transition- based, graph-based, and grammar-based models, and gives a thorough introduction to the methods that are most widely used today.
References
More filters
Journal ArticleDOI

Gene Ontology: tool for the unification of biology

TL;DR: The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.
Proceedings Article

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

TL;DR: This work presents iterative parameter estimation algorithms for conditional random fields and compares the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.
Proceedings ArticleDOI

Shallow parsing with conditional random fields

TL;DR: This work shows how to train a conditional random field to achieve performance as good as any reported base noun-phrase chunking method on the CoNLL task, and better than any reported single model.
Related Papers (5)