scispace - formally typeset
K

Kartik Audhkhasi

Researcher at IBM

Publications -  92
Citations -  2916

Kartik Audhkhasi is an academic researcher from IBM. The author has contributed to research in topics: Language model & Word error rate. The author has an hindex of 27, co-authored 86 publications receiving 2310 citations. Previous affiliations of Kartik Audhkhasi include University of Southern California & Google.

Papers
More filters
Proceedings ArticleDOI

English Conversational Telephone Speech Recognition by Humans and Machines

TL;DR: In this article, a set of acoustic and language modeling techniques were used to lower the word error rate of a conversational telephone LVCSR system to 5.5%/10.3% on the Switchboard/CallHome subsets of the Hub5 2000 evaluation.
Journal ArticleDOI

Applying Machine Learning to Facilitate Autism Diagnostics: Pitfalls and Promises

TL;DR: Proposed best-practices when using machine learning in autism research are highlighted, and some especially promising areas for collaborative work at the intersection of computational and behavioral science are highlighted.
Proceedings ArticleDOI

Direct Acoustics-to-Word Models for English Conversational Speech Recognition

TL;DR: This paper presents the first results employing direct acoustics-to-word CTC models on two well-known public benchmark tasks: Switchboard and CallHome, and presents rescoring results on CTC word model lattices to quantify the performance benefits of a LM, and contrast the performance of word and phone C TC models.
Journal ArticleDOI

Noise-enhanced convolutional neural networks

TL;DR: The Noisy CNN algorithm speeds training on average because the backpropagation algorithm is a special case of the generalized expectation-maximization (EM) algorithm and because such carefully chosen noise always speeds up the EM algorithm on average.
Proceedings ArticleDOI

Building Competitive Direct Acoustics-to-Word Models for English Conversational Speech Recognition

TL;DR: In this paper, a joint word-character A2W model was proposed to learn to first spell the word and then recognize it, achieving a word error rate of 8.8%/13.9% on the Hub5-2000 Switchboard/CallHome test sets without any decoder, pronunciation lexicon, or externally-trained language model.