scispace - formally typeset
Open AccessJournal ArticleDOI

An empirical study of smoothing techniques for language modeling

TLDR
This work surveys the most widely-used algorithms for smoothing models for language n -gram modeling, and presents an extensive empirical comparison of several of these smoothing techniques, including those described by Jelinek and Mercer (1980), and introduces methodologies for analyzing smoothing algorithm efficacy in detail.
About
This article is published in Computer Speech & Language.The article was published on 1999-10-01 and is currently open access. It has received 1948 citations till now. The article focuses on the topics: Kneser–Ney smoothing & Smoothing.

read more

Citations
More filters
Journal ArticleDOI

A neural probabilistic language model

TL;DR: The authors propose to learn a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences, which can be expressed in terms of these representations.
Proceedings Article

SRILM – An Extensible Language Modeling Toolkit

TL;DR: The functionality of the SRILM toolkit is summarized and its design and implementation is discussed, highlighting ease of rapid prototyping, reusability, and combinability of tools.
Journal ArticleDOI

A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval

TL;DR: This paper examines the sensitivity of retrieval performance to the smoothing parameters and compares several popular smoothing methods on different test collection.
Proceedings Article

Character-aware neural language models

TL;DR: A simple neural language model that relies only on character-level inputs that is able to encode, from characters only, both semantic and orthographic information and suggests that on many languages, character inputs are sufficient for language modeling.
Journal ArticleDOI

A study of smoothing methods for language models applied to information retrieval

TL;DR: Evaluation on five different databases and four types of queries indicates that the two-stage smoothing method with the proposed parameter estimation methods consistently gives retrieval performance that is close to or better than the best results achieved using a single smoothing methods and exhaustive parameter search on the test data.
References
More filters

Numerical recipes in C

TL;DR: The Diskette v 2.06, 3.5''[1.44M] for IBM PC, PS/2 and compatibles [DOS] Reference Record created on 2004-09-07, modified on 2016-08-08.
Journal ArticleDOI

Class-based n -gram models of natural language

TL;DR: This work addresses the problem of predicting a word from previous words in a sample of text and discusses n-gram models based on classes of words, finding that these models are able to extract classes that have the flavor of either syntactically based groupings or semanticallybased groupings, depending on the nature of the underlying statistics.
Proceedings ArticleDOI

SWITCHBOARD: telephone speech corpus for research and development

TL;DR: SWITCHBOARD as mentioned in this paper is a large multispeaker corpus of conversational speech and text which should be of interest to researchers in speaker authentication and large vocabulary speech recognition.
Related Papers (5)