Open AccessProceedings Article
Integrated Annotation for Biomedical Information Extraction
Seth Kulick,Ann Bies,Mark Liberman,Mark Mandel,Ryan McDonald,Martha Palmer,Andrew I. Schein,Lyle H. Ungar,Scott Winters,Pete White +9 more
- pp 61-68
Reads0
Chats0
TLDR
An approach to two areas of biomedical information extraction, drug development and cancer genomics using a framework which includes corpus annotation integrated at multiple levels: a Treebank containing syntactic structure, a Propbank containing predicate-argument structure, and annotation of entities and relations among the entities.Abstract:
We describe an approach to two areas of biomedical information extraction, drug development and cancer genomics. We have developed a framework which includes corpus annotation integrated at multiple levels: a Treebank containing syntactic structure, a Propbank containing predicate-argument structure, and annotation of entities and relations among the entities. Crucial to this approach is the proper characterization of entities as relation components, which allows the integration of the entity annotation with the syntactic structure while retaining the capacity to annotate and extract more complex events. We are training statistical taggers using this annotation for such extraction as well as using them for improving the annotation process.read more
Citations
More filters
Proceedings Article
The CoNLL 2007 Shared Task on Dependency Parsing
Joakim Nivre,Johan Hall,Sandra K"ubler,Ryan McDonald,Jens Nilsson,Sebastian Riedel,Deniz Yuret +6 more
TL;DR: The tasks of the different tracks are defined and how the data sets were created from existing treebanks for ten languages are described, to characterize the different approaches of the participating systems and report the test results and provide a first analysis of these results.
Book ChapterDOI
Developing a robust part-of-speech tagger for biomedical text
Yoshimasa Tsuruoka,Yuka Tateishi,Jin-Dong Kim,Tomoko Ohta,John McNaught,Sophia Ananiadou,Jun'ichi Tsujii +6 more
TL;DR: Experimental results on the Wall Street Journal corpus, the GENIA corpus, and the PennBioIE corpus revealed that adding training data from a different domain does not hurt the performance of a tagger, and the authors' tagger exhibits very good precision on all these corpora.
Journal ArticleDOI
Special Report: NCBI disease corpus: A resource for disease name recognition and concept normalization
TL;DR: The results show that the NCBI disease corpus has the potential to significantly improve the state-of-the-art in disease name recognition and normalization research, by providing a high-quality gold standard thus enabling the development of machine-learning based approaches for such tasks.
Journal ArticleDOI
Deep learning with word embeddings improves biomedical named entity recognition.
TL;DR: This work shows that a completely generic method based on deep learning and statistical word embeddings [called long short‐term memory network‐conditional random field (LSTM‐CRF)] outperforms state‐of‐the‐art entity‐specific NER tools, and often by a large margin.
Book
Dependency Parsing
TL;DR: This book surveys the three major classes of parsing models that are in current use: transition- based, graph-based, and grammar-based models, and gives a thorough introduction to the methods that are most widely used today.
References
More filters
Journal ArticleDOI
Gene Ontology: tool for the unification of biology
M Ashburner,Catherine A. Ball,Judith A. Blake,David Botstein,Heather Butler,J. M. Cherry,Allan Peter Davis,Kara Dolinski,Selina S. Dwight,J.T. Eppig,Midori A. Harris,David P. Hill,Laurie Issel-Tarver,Andrew Kasarskis,Suzanna E. Lewis,John C. Matese,Joel E. Richardson,M. Ringwald,Gerald M. Rubin,Gavin Sherlock +19 more
TL;DR: The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.
Proceedings Article
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
TL;DR: This work presents iterative parameter estimation algorithms for conditional random fields and compares the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.
Proceedings ArticleDOI
Shallow parsing with conditional random fields
Fei Sha,Fernando Pereira +1 more
TL;DR: This work shows how to train a conditional random field to achieve performance as good as any reported base noun-phrase chunking method on the CoNLL task, and better than any reported single model.