Proceedings ArticleDOI
Address extraction using hidden Markov models
Kazem Taghva,Jeffrey Coombs,Ray Pereda,Thomas A. Nartker +3 more
- Vol. 5676, pp 119-126
TLDR
The implementation and evaluation of a Hidden Markov Model to extract addresses from OCR text is presented and it is shown that this type of Information Extraction task seems to be affected negatively by the presence of OCRtext.Citations
More filters
Proceedings ArticleDOI
Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling
TL;DR: By using simulated annealing in place of Viterbi decoding in sequence models such as HMMs, CMMs, and CRFs, it is possible to incorporate non-local structure while preserving tractable inference.
Book
The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data
Ronen Feldman,James Sanger +1 more
TL;DR: Providing an in-depth examination of core text mining and link detection algorithms and operations, this text examines advanced pre-processing techniques, knowledge representation considerations, and visualization approaches.
Journal ArticleDOI
GeneWays: a system for extracting, analyzing, visualizing, and integrating molecular pathway data
Andrey Rzhetsky,Ivan Iossifov,Tomohiro Koike,Michael Krauthammer,Pauline Kra,Mitzi Morris,Hong Yu,Pablo Ariel Duboué,Wubin Weng,W. John Wilbur,Vasileios Hatzivassiloglou,Carol Friedman +11 more
TL;DR: GeneWays analyzes interactions between molecular substances, drawing on multiple sources of information to infer a consensus view of molecular networks, and is designed as an open platform, allowing researchers to query, review, and critique stored information.
Proceedings ArticleDOI
An Effective Two-Stage Model for Exploiting Non-Local Dependencies in Named Entity Recognition
TL;DR: This paper shows that a simple two-stage approach to handle non-local dependencies in Named Entity Recognition (NER) can outperform existing approaches that handleNon- local dependencies, while being much more computationally efficient.
Journal ArticleDOI
New directions in biomedical text annotation: definitions, guidelines and corpus construction
TL;DR: The results of the inquiry into properties of scientific text that have sufficient generality to transcend the confines of a narrow subject area are reported, while supporting practical mining of text for factual information.
References
More filters
Book
Fundamentals of speech recognition
TL;DR: This book presents a meta-modelling framework for speech recognition that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of manually modeling speech.
Posted Content
Nymble: a High-Performance Learning Name-finder
TL;DR: This paper presented a statistical, learned approach to finding names and other non-recursive entities in text using a variant of the standard hidden Markov model, which was used for the MUC-6 NE task.
Proceedings ArticleDOI
Nymble: a High-Performance Learning Name-finder
TL;DR: This paper presents a statistical, learned approach to finding names and other nonrecursive entities in text (as per the MUC-6 definition of the NE task), using a variant of the standard hidden Markov model.
Information Extraction with HMMs and Shrinkage
Dayne Freitag,Andrew McCallum +1 more
TL;DR: A statistical technique called shrinkage is used that significantly improves parameter estimation of the HMM emission probabilities in the face of sparse training data and the resulting HMM outperforms a state-of-the-art rule-learning system.
Proceedings ArticleDOI
Automatic segmentation of text into structured records
TL;DR: A tool DATAMOLD is described that learns to automatically extract structure when seeded with a small number of training examples and enhances on Hidden Markov Models (HMM) to build a powerful probabilistic model that corroborates multiple sources of information.