G
George Saon
Researcher at IBM
Publications - 161
Citations - 6882
George Saon is an academic researcher from IBM. The author has contributed to research in topics: Word error rate & Language model. The author has an hindex of 40, co-authored 149 publications receiving 6185 citations.
Papers
More filters
Proceedings ArticleDOI
Speaker adaptation of neural network acoustic models using i-vectors
TL;DR: This work proposes to adapt deep neural network acoustic models to a target speaker by supplying speaker identity vectors (i-vectors) as input features to the network in parallel with the regular acoustic features for ASR, comparable in performance to DNNs trained on speaker-adapted features with the advantage that only one decoding pass is needed.
Journal ArticleDOI
Deep Convolutional Neural Networks for Large-scale Speech Tasks
Tara N. Sainath,Brian Kingsbury,George Saon,Hagen Soltau,Abdelrahman Mohamed,George E. Dahl,Bhuvana Ramabhadran +6 more
TL;DR: This paper determines the appropriate architecture to make CNNs effective compared to DNNs for LVCSR tasks, and investigates how to incorporate speaker-adapted features, which cannot directly be modeled by CNNs as they do not obey locality in frequency, into the CNN framework.
Proceedings ArticleDOI
Boosted MMI for model and feature-space discriminative training
Daniel Povey,Dimitri Kanevsky,Brian Kingsbury,Bhuvana Ramabhadran,George Saon,Karthik Visweswariah +5 more
TL;DR: A modified form of the maximum mutual information (MMI) objective function which gives improved results for discriminative training by boosting the likelihoods of paths in the denominator lattice that have a higher phone error relative to the correct transcript.
Proceedings ArticleDOI
fMPE: discriminatively trained features for speech recognition
TL;DR: In this paper, a matrix projection from posteriors of Gaussians to a normal size feature space is used to train a matrix and then add the projected features to normal features such as PLP.
Proceedings ArticleDOI
English Conversational Telephone Speech Recognition by Humans and Machines
George Saon,Gakuto Kurata,Tom Sercu,Kartik Audhkhasi,Samuel Thomas,Dimitrios Dimitriadis,Xiaodong Cui,Bhuvana Ramabhadran,Michael Picheny,Lynn-Li Lim,Bergul Roomi,Phil Hall +11 more
TL;DR: In this article, a set of acoustic and language modeling techniques were used to lower the word error rate of a conversational telephone LVCSR system to 5.5%/10.3% on the Switchboard/CallHome subsets of the Hub5 2000 evaluation.