M
Mingjing Li
Researcher at Microsoft
Publications - 6
Citations - 314
Mingjing Li is an academic researcher from Microsoft. The author has contributed to research in topics: Language model & Perplexity. The author has an hindex of 6, co-authored 6 publications receiving 307 citations.
Papers
More filters
Journal ArticleDOI
Toward a unified approach to statistical language modeling for Chinese
TL;DR: This article presents a unified approach to Chinese statistical language modeling, which automatically and consistently gathers a high-quality training data set from the Web, creates ahigh-quality lexicon, segments the training data using this Lexicon, and compresses the language model by using the maximum likelihood principle, which is consistent with trigram model training.
Proceedings Article
Discriminative training on language model.
TL;DR: This paper proposed a discriminative training method to minimize the error rate of recognizer rather than estimate the distribution of training data, which gets approximately 5%-25% recognition error reduction with discrim inative training on language model building.
Patent
A system and method for joint optimization of language model performance and size
TL;DR: In this article, a method for joint optimization of language model performance and size is presented, comprising of developing a language model from a tuning set of information, segmenting at least a subset of a received textual corpus and calculating a perplexity value for each segment and refining the language model with one or more segments of the received corpus based, at least in part, on the calculated perplexity values for the segments.
Proceedings ArticleDOI
A unified approach to statistical language modeling for Chinese
TL;DR: The paper presents a unified approach to Chinese statistical language modeling, which automatically and consistently gathers a high-quality training data set from the Web, creates ahigh-quality lexicon, and segments the training data using this lexicon all using a maximum likelihood principle, which is consistent with the trigram training.
Lexicon Optimization for Chinese Language Modeling
TL;DR: The method is an iterative procedure consisting of two phases, namely lexicon generation and lexicon pruning, which prune the lexicon to a preset memory limitation using a perplexity minimization criterion.