Search or ask a question

Showing papers by "Pedro J. Moreno published in 2016"

PDF

Open Access

Journal Article•DOI•

On the use of deep feedforward neural networks for automatic language identification

[...]

Ignacio Lopez-Moreno¹, Javier Gonzalez-Dominguez², David Martínez, Oldřich Plchot³, Joaquin Gonzalez-Rodriguez², Pedro J. Moreno¹ - Show less +2 more•Institutions (3)

Google¹, Autonomous University of Madrid², Brno University of Technology³

01 Nov 2016-Computer Speech & Language

TL;DR: This work presents a comprehensive study on the use of deep neural networks for automatic language identification that includes a detailed performance analysis for different data selection strategies and DNN architectures, and presents a novel approach that combines DNN and i-vector systems by using bottleneck features.

...read moreread less

67 citations

Proceedings Article•DOI•

Towards acoustic model unification across dialects

[...]

Mohamed G. Elfeky¹, Meysam Bastani¹, Xavier Velez¹, Pedro J. Moreno¹, Austin Waters¹ - Show less +1 more•Institutions (1)

Google¹

01 Dec 2016

TL;DR: Two techniques are presented: Distillation and MultiTask Learning (MTL), which show that both techniques are superior to the jointly-trained model that is trained on all dialectal data, reducing word error rates by 4:2% and 0:6%, respectively.

...read moreread less

Abstract: Acoustic model performance typically decreases when evaluated on a dialectal variation of the same language that was not used during training. Similarly, models simultaneously trained on a group of dialects tend to underperform dialect-specific models. In this paper, we report on our efforts towards building a unified acoustic model that can serve a multi-dialectal language. Two techniques are presented: Distillation and MultiTask Learning (MTL). In Distillation, we use an ensemble of dialect-specific acoustic models and distill its knowledge in a single model. In MTL, we utilize multitask learning to train a unified acoustic model that learns to distinguish dialects as a side task. We show that both techniques are superior to the jointly-trained model that is trained on all dialectal data, reducing word error rates by 4:2% and 0:6%, respectively. While achieving this improvement, neither technique degrades the performance of the dialect-specific models by more than 3:4%.

...read moreread less

30 citations

Proceedings Article•DOI•

Selection and combination of hypotheses for dialectal speech recognition

[...]

Victor Soto¹, Olivier Siohan², Mohamed G. Elfeky², Pedro J. Moreno²•Institutions (2)

Columbia University¹, Google²

21 Mar 2016

TL;DR: This paper presents two methods to select and combine the best decoded hypothesis from a pool of dialectal recognizers, following a Machine Learning approach and extracts features from the Speech Recognition output along with Word Embeddings and use Shallow Neural Networks for classification.

...read moreread less

Abstract: While research has often shown that building dialect-specific Automatic Speech Recognizers is the optimal approach to dealing with dialectal variations of the same language, we have observed that dialect-specific recognizers do not always output the best recognitions. Often enough, another dialectal recognizer outputs a better recognition than the dialect-specific one. In this paper, we present two methods to select and combine the best decoded hypothesis from a pool of dialectal recognizers. We follow a Machine Learning approach and extract features from the Speech Recognition output along with Word Embeddings and use Shallow Neural Networks for classification. Our experiments using Dictation and Voice Search data from the main four Arabic dialects show good WER improvements for the hypothesis selection scheme, reducing the WER by 2.1 to 12.1% depending on the test set, and promising results for the hypotheses combination scheme.

...read moreread less

15 citations

Proceedings Article•DOI•

High quality agreement-based semi-supervised training data for acoustic modeling

[...]

Felix de Chaumont Quitry¹, Asa Oines¹, Pedro J. Moreno¹, Eugene Weinstein¹•Institutions (1)

Google¹

01 Dec 2016

TL;DR: This paper describes a new technique to automatically obtain large high-quality training speech corpora that are superior in transcript correctness even to those manually transcribed by humans, and can train new acoustic models which outperform those trained solely on previously available data sets.

...read moreread less

Abstract: This paper describes a new technique to automatically obtain large high-quality training speech corpora for acoustic modeling. Traditional approaches select utterances based on confidence thresholds and other heuristics. We propose instead to use an ensemble approach: we transcribe each utterance using several recognizers, and only keep those on which they agree. The recognizers we use are trained on data from different dialects of the same language, and this diversity leads them to make different mistakes in transcribing speech utterances. In this work we show, however, that when they agree, this is an extremely strong signal that the transcript is correct. This allows us to produce automatically transcribed speech corpora that are superior in transcript correctness even to those manually transcribed by humans. Furthermore, we show that using the produced semi-supervised data sets, we can train new acoustic models which outperform those trained solely on previously available data sets.

...read moreread less

14 citations