An improved error model for noisy channel spelling correction
Eric D. Brill,Robert C. Moore +1 more
- pp 286-293
TLDR
A new channel model for spelling correction, based on generic string to string edits, is described, which gives significant performance improvements compared to previously proposed models.Abstract:
The noisy channel model has been applied to a wide range of problems, including spelling correction. These models consist of two components: a source model and a channel model. Very little research has gone into improving the channel model for spelling correction. This paper describes a new channel model for spelling correction, based on generic string to string edits. Using this model gives significant performance improvements compared to previously proposed models.read more
Citations
More filters
Proceedings Article
Lexical Normalisation of Short Text Messages: Makn Sens a #twitter
Bo Han,Timothy Baldwin +1 more
TL;DR: This paper targets out-of-vocabulary words in short text messages and proposes a method for identifying and normalising ill-formed words, which achieves state- of-the-art performance over an SMS corpus and a novel dataset based on Twitter.
Proceedings ArticleDOI
Challenges in Data-to-Document Generation
TL;DR: A new, large-scale corpus of data records paired with descriptive documents is introduced, a series of extractive evaluation methods for analyzing performance are proposed, and baseline results are obtained using current neural generation methods.
Proceedings ArticleDOI
A Phrase-Based Statistical Model for SMS Text Normalization
TL;DR: This paper views the task of SMS normalization as a translation problem from the SMS language to the English language and proposes to adapt a phrase-based statistical MT model for the task, which can largely boost SMS translation performance.
Proceedings Article
Spelling Correction as an Iterative Process that Exploits the Collective Knowledge of Web Users.
Silviu Cucerzan,Eric D. Brill +1 more
TL;DR: This paper presents an approach that uses an iterative transformation of the input query strings into other strings that correspond to more and more likely queries according to statistics extracted from internet search query logs.
Journal ArticleDOI
Investigation and modeling of the structure of texting language
TL;DR: The nature and type of compressions used in SMS texts are investigated, and a Hidden Markov Model based word-model for TL is developed, which results in a 35% reduction of the relative word level error rates.
References
More filters
Journal ArticleDOI
A mathematical theory of communication
TL;DR: This final installment of the paper considers the case where the signals or the messages or both are continuously variable, in contrast with the discrete nature assumed until now.
Journal ArticleDOI
Maximum likelihood from incomplete data via the EM algorithm
Journal Article
Binary codes capable of correcting deletions, insertions, and reversals
Book
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
Dan Jurafsky,James Martin +1 more
TL;DR: This book takes an empirical approach to language processing, based on applying statistical and other machine-learning algorithms to large corpora, to demonstrate how the same algorithm can be used for speech recognition and word-sense disambiguation.