scispace - formally typeset
M

Mingjing Li

Researcher at Microsoft

Publications -  6
Citations -  314

Mingjing Li is an academic researcher from Microsoft. The author has contributed to research in topics: Language model & Perplexity. The author has an hindex of 6, co-authored 6 publications receiving 307 citations.

Papers
More filters
Journal ArticleDOI

Toward a unified approach to statistical language modeling for Chinese

TL;DR: This article presents a unified approach to Chinese statistical language modeling, which automatically and consistently gathers a high-quality training data set from the Web, creates ahigh-quality lexicon, segments the training data using this Lexicon, and compresses the language model by using the maximum likelihood principle, which is consistent with trigram model training.
Proceedings Article

Discriminative training on language model.

TL;DR: This paper proposed a discriminative training method to minimize the error rate of recognizer rather than estimate the distribution of training data, which gets approximately 5%-25% recognition error reduction with discrim inative training on language model building.
Patent

A system and method for joint optimization of language model performance and size

TL;DR: In this article, a method for joint optimization of language model performance and size is presented, comprising of developing a language model from a tuning set of information, segmenting at least a subset of a received textual corpus and calculating a perplexity value for each segment and refining the language model with one or more segments of the received corpus based, at least in part, on the calculated perplexity values for the segments.
Proceedings ArticleDOI

A unified approach to statistical language modeling for Chinese

TL;DR: The paper presents a unified approach to Chinese statistical language modeling, which automatically and consistently gathers a high-quality training data set from the Web, creates ahigh-quality lexicon, and segments the training data using this lexicon all using a maximum likelihood principle, which is consistent with the trigram training.

Lexicon Optimization for Chinese Language Modeling

TL;DR: The method is an iterative procedure consisting of two phases, namely lexicon generation and lexicon pruning, which prune the lexicon to a preset memory limitation using a perplexity minimization criterion.