scispace - formally typeset
Proceedings ArticleDOI

Address extraction using hidden Markov models

TLDR
The implementation and evaluation of a Hidden Markov Model to extract addresses from OCR text is presented and it is shown that this type of Information Extraction task seems to be affected negatively by the presence of OCRtext.
Abstract
This paper presents the implementation and evaluation of a Hidden Markov Model to extract addresses from OCR text. Although Hidden Markov Models discover addresses with high precision and recall, this type of Information Extraction task seems to be affected negatively by the presence of OCR text.

read more

Citations
More filters
Proceedings ArticleDOI

Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling

TL;DR: By using simulated annealing in place of Viterbi decoding in sequence models such as HMMs, CMMs, and CRFs, it is possible to incorporate non-local structure while preserving tractable inference.
Book

The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data

TL;DR: Providing an in-depth examination of core text mining and link detection algorithms and operations, this text examines advanced pre-processing techniques, knowledge representation considerations, and visualization approaches.
Journal ArticleDOI

GeneWays: a system for extracting, analyzing, visualizing, and integrating molecular pathway data

TL;DR: GeneWays analyzes interactions between molecular substances, drawing on multiple sources of information to infer a consensus view of molecular networks, and is designed as an open platform, allowing researchers to query, review, and critique stored information.
Proceedings ArticleDOI

An Effective Two-Stage Model for Exploiting Non-Local Dependencies in Named Entity Recognition

TL;DR: This paper shows that a simple two-stage approach to handle non-local dependencies in Named Entity Recognition (NER) can outperform existing approaches that handleNon- local dependencies, while being much more computationally efficient.
Journal ArticleDOI

New directions in biomedical text annotation: definitions, guidelines and corpus construction

TL;DR: The results of the inquiry into properties of scientific text that have sufficient generality to transcend the confines of a narrow subject area are reported, while supporting practical mining of text for factual information.
References
More filters
Book

Fundamentals of speech recognition

TL;DR: This book presents a meta-modelling framework for speech recognition that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of manually modeling speech.
Posted Content

Nymble: a High-Performance Learning Name-finder

TL;DR: This paper presented a statistical, learned approach to finding names and other non-recursive entities in text using a variant of the standard hidden Markov model, which was used for the MUC-6 NE task.
Proceedings ArticleDOI

Nymble: a High-Performance Learning Name-finder

TL;DR: This paper presents a statistical, learned approach to finding names and other nonrecursive entities in text (as per the MUC-6 definition of the NE task), using a variant of the standard hidden Markov model.

Information Extraction with HMMs and Shrinkage

TL;DR: A statistical technique called shrinkage is used that significantly improves parameter estimation of the HMM emission probabilities in the face of sparse training data and the resulting HMM outperforms a state-of-the-art rule-learning system.
Proceedings ArticleDOI

Automatic segmentation of text into structured records

TL;DR: A tool DATAMOLD is described that learns to automatically extract structure when seeded with a small number of training examples and enhances on Hidden Markov Models (HMM) to build a powerful probabilistic model that corroborates multiple sources of information.