Address extraction using hidden Markov models

doi:10.1117/12.587799

Proceedings ArticleDOI

Address extraction using hidden Markov models

- Vol. 5676, pp 119-126

TLDR

The implementation and evaluation of a Hidden Markov Model to extract addresses from OCR text is presented and it is shown that this type of Information Extraction task seems to be affected negatively by the presence of OCRtext.

Abstract:

This paper presents the implementation and evaluation of a Hidden Markov Model to extract addresses from OCR text. Although Hidden Markov Models discover addresses with high precision and recall, this type of Information Extraction task seems to be affected negatively by the presence of OCR text.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling

Jenny Rose Finkel, +2 more

TL;DR: By using simulated annealing in place of Viterbi decoding in sequence models such as HMMs, CMMs, and CRFs, it is possible to incorporate non-local structure while preserving tractable inference.

...read moreread less

Book

The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data

Ronen Feldman, +1 more

TL;DR: Providing an in-depth examination of core text mining and link detection algorithms and operations, this text examines advanced pre-processing techniques, knowledge representation considerations, and visualization approaches.

...read moreread less

Journal ArticleDOI

GeneWays: a system for extracting, analyzing, visualizing, and integrating molecular pathway data

Andrey Rzhetsky, +11 more

- 01 Feb 2004 -

Journal of Biomedical Informatics

TL;DR: GeneWays analyzes interactions between molecular substances, drawing on multiple sources of information to infer a consensus view of molecular networks, and is designed as an open platform, allowing researchers to query, review, and critique stored information.

...read moreread less

Proceedings ArticleDOI

An Effective Two-Stage Model for Exploiting Non-Local Dependencies in Named Entity Recognition

Vijay Krishnan, +1 more

TL;DR: This paper shows that a simple two-stage approach to handle non-local dependencies in Named Entity Recognition (NER) can outperform existing approaches that handleNon- local dependencies, while being much more computationally efficient.

...read moreread less

Journal ArticleDOI

New directions in biomedical text annotation: definitions, guidelines and corpus construction

W. John Wilbur, +2 more

- 25 Jul 2006 -

BMC Bioinformatics

TL;DR: The results of the inquiry into properties of scientific text that have sufficient generality to transcend the confines of a narrow subject area are reported, while supporting practical mining of text for factual information.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

Fundamentals of speech recognition

Lawrence R. Rabiner, +1 more

TL;DR: This book presents a meta-modelling framework for speech recognition that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of manually modeling speech.

...read moreread less

Posted Content

Nymble: a High-Performance Learning Name-finder

Daniel M. Bikel, +3 more

- 27 Mar 1998 -

arXiv: Computation and Language

TL;DR: This paper presented a statistical, learned approach to finding names and other non-recursive entities in text using a variant of the standard hidden Markov model, which was used for the MUC-6 NE task.

...read moreread less

Proceedings ArticleDOI

Nymble: a High-Performance Learning Name-finder

Daniel M. Bikel, +3 more

TL;DR: This paper presents a statistical, learned approach to finding names and other nonrecursive entities in text (as per the MUC-6 definition of the NE task), using a variant of the standard hidden Markov model.

...read moreread less

Information Extraction with HMMs and Shrinkage

Dayne Freitag, +1 more

TL;DR: A statistical technique called shrinkage is used that significantly improves parameter estimation of the HMM emission probabilities in the face of sparse training data and the resulting HMM outperforms a state-of-the-art rule-learning system.

...read moreread less

Proceedings ArticleDOI

Automatic segmentation of text into structured records

Vinayak Borkar, +2 more

TL;DR: A tool DATAMOLD is described that learns to automatically extract structure when seeded with a small number of training examples and enhances on Hidden Markov Models (HMM) to build a powerful probabilistic model that corroborates multiple sources of information.

...read moreread less

Address extraction using hidden Markov models

Citations

Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling

The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data

GeneWays: a system for extracting, analyzing, visualizing, and integrating molecular pathway data

An Effective Two-Stage Model for Exploiting Non-Local Dependencies in Named Entity Recognition

New directions in biomedical text annotation: definitions, guidelines and corpus construction

References

Fundamentals of speech recognition

Nymble: a High-Performance Learning Name-finder

Nymble: a High-Performance Learning Name-finder

Information Extraction with HMMs and Shrinkage

Automatic segmentation of text into structured records

Related Papers (5)

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

A tutorial on hidden Markov models and selected applications in speech recognition

Maximum Entropy Markov Models for Information Extraction and Segmentation

Nymble: a High-Performance Learning Name-finder

Learning Information Extraction Rules for Semi-Structured and Free Text