Integrated Annotation for Biomedical Information Extraction

Open AccessProceedings Article

Integrated Annotation for Biomedical Information Extraction

Seth Kulick, +9 more

- pp 61-68

Chats0

TLDR

An approach to two areas of biomedical information extraction, drug development and cancer genomics using a framework which includes corpus annotation integrated at multiple levels: a Treebank containing syntactic structure, a Propbank containing predicate-argument structure, and annotation of entities and relations among the entities.

Abstract:

We describe an approach to two areas of biomedical information extraction, drug development and cancer genomics. We have developed a framework which includes corpus annotation integrated at multiple levels: a Treebank containing syntactic structure, a Propbank containing predicate-argument structure, and annotation of entities and relations among the entities. Crucial to this approach is the proper characterization of entities as relation components, which allows the integration of the entity annotation with the syntactic structure while retaining the capacity to annotate and extract more complex events. We are training statistical taggers using this annotation for such extraction as well as using them for improving the annotation process.

Citations

PDF

Open Access

More filters

Proceedings Article

The CoNLL 2007 Shared Task on Dependency Parsing

Joakim Nivre, +6 more

TL;DR: The tasks of the different tracks are defined and how the data sets were created from existing treebanks for ten languages are described, to characterize the different approaches of the participating systems and report the test results and provide a first analysis of these results.

...read moreread less

Book ChapterDOI

Developing a robust part-of-speech tagger for biomedical text

Yoshimasa Tsuruoka, +6 more

TL;DR: Experimental results on the Wall Street Journal corpus, the GENIA corpus, and the PennBioIE corpus revealed that adding training data from a different domain does not hurt the performance of a tagger, and the authors' tagger exhibits very good precision on all these corpora.

...read moreread less

Journal ArticleDOI

Special Report: NCBI disease corpus: A resource for disease name recognition and concept normalization

Rezarta Islamaj Dogan, +2 more

- 01 Feb 2014 -

Journal of Biomedical Informatics

TL;DR: The results show that the NCBI disease corpus has the potential to significantly improve the state-of-the-art in disease name recognition and normalization research, by providing a high-quality gold standard thus enabling the development of machine-learning based approaches for such tasks.

...read moreread less

Journal ArticleDOI

Deep learning with word embeddings improves biomedical named entity recognition.

Maryam Habibi, +4 more

- 15 Jul 2017 -

Bioinformatics

TL;DR: This work shows that a completely generic method based on deep learning and statistical word embeddings [called long short‐term memory network‐conditional random field (LSTM‐CRF)] outperforms state‐of‐the‐art entity‐specific NER tools, and often by a large margin.

...read moreread less

Book

Dependency Parsing

Sandra Kübler, +3 more

TL;DR: This book surveys the three major classes of parsing models that are in current use: transition- based, graph-based, and grammar-based models, and gives a thorough introduction to the methods that are most widely used today.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

John Lafferty, +2 more

TL;DR: This work presents iterative parameter estimation algorithms for conditional random fields and compares the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.

...read moreread less

Probabilistic Models for Segmenting and Labeling Sequence Data

John Lafferty, +3 more

Proceedings ArticleDOI

Shallow parsing with conditional random fields

Fei Sha, +1 more

TL;DR: This work shows how to train a conditional random field to achieve performance as good as any reported base noun-phrase chunking method on the CoNLL task, and better than any reported single model.

...read moreread less