scispace - formally typeset
Open AccessProceedings ArticleDOI

An improved error model for noisy channel spelling correction

Eric D. Brill, +1 more
- pp 286-293
TLDR
A new channel model for spelling correction, based on generic string to string edits, is described, which gives significant performance improvements compared to previously proposed models.
Abstract
The noisy channel model has been applied to a wide range of problems, including spelling correction. These models consist of two components: a source model and a channel model. Very little research has gone into improving the channel model for spelling correction. This paper describes a new channel model for spelling correction, based on generic string to string edits. Using this model gives significant performance improvements compared to previously proposed models.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings Article

Lexical Normalisation of Short Text Messages: Makn Sens a #twitter

TL;DR: This paper targets out-of-vocabulary words in short text messages and proposes a method for identifying and normalising ill-formed words, which achieves state- of-the-art performance over an SMS corpus and a novel dataset based on Twitter.
Proceedings ArticleDOI

Challenges in Data-to-Document Generation

TL;DR: A new, large-scale corpus of data records paired with descriptive documents is introduced, a series of extractive evaluation methods for analyzing performance are proposed, and baseline results are obtained using current neural generation methods.
Proceedings ArticleDOI

A Phrase-Based Statistical Model for SMS Text Normalization

TL;DR: This paper views the task of SMS normalization as a translation problem from the SMS language to the English language and proposes to adapt a phrase-based statistical MT model for the task, which can largely boost SMS translation performance.
Proceedings Article

Spelling Correction as an Iterative Process that Exploits the Collective Knowledge of Web Users.

TL;DR: This paper presents an approach that uses an iterative transformation of the input query strings into other strings that correspond to more and more likely queries according to statistics extracted from internet search query logs.
Journal ArticleDOI

Investigation and modeling of the structure of texting language

TL;DR: The nature and type of compressions used in SMS texts are investigated, and a Hidden Markov Model based word-model for TL is developed, which results in a 35% reduction of the relative word level error rates.
References
More filters
Journal ArticleDOI

A mathematical theory of communication

TL;DR: This final installment of the paper considers the case where the signals or the messages or both are continuously variable, in contrast with the discrete nature assumed until now.
Book

Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition

Dan Jurafsky, +1 more
TL;DR: This book takes an empirical approach to language processing, based on applying statistical and other machine-learning algorithms to large corpora, to demonstrate how the same algorithm can be used for speech recognition and word-sense disambiguation.