The NTT CHiME-3 system: Advances in speech enhancement and recognition for mobile multi-microphone devices

doi:10.1109/ASRU.2015.7404828

Proceedings ArticleDOI

The NTT CHiME-3 system: Advances in speech enhancement and recognition for mobile multi-microphone devices

Takuya Yoshioka, +11 more

- pp 436-443

Chats0

TLDR

NTT's CHiME-3 system is described, which integrates advanced speech enhancement and recognition techniques, which achieves a 3.45% development error rate and a 5.83% evaluation error rate.

Abstract:

CHiME-3 is a research community challenge organised in 2015 to evaluate speech recognition systems for mobile multi-microphone devices used in noisy daily environments. This paper describes NTT's CHiME-3 system, which integrates advanced speech enhancement and recognition techniques. Newly developed techniques include the use of spectral masks for acoustic beam-steering vector estimation and acoustic modelling with deep convolutional neural networks based on the "network in network" concept. In addition to these improvements, our system has several key differences from the official baseline system. The differences include multi-microphone training, dereverberation, and cross adaptation of neural networks with different architectures. The impacts that these techniques have on recognition performance are investigated. By combining these advanced techniques, our system achieves a 3.45% development error rate and a 5.83% evaluation error rate. Three simpler systems are also developed to perform evaluations with constrained set-ups.

Citations

PDF

Open Access

More filters

Proceedings Article

Deep speech 2: end-to-end speech recognition in English and mandarin

Dario Amodei, +68 more

TL;DR: In this article, an end-to-end deep learning approach was used to recognize either English or Mandarin Chinese speech-two vastly different languages-using HPC techniques, enabling experiments that previously took weeks to now run in days.

...read moreread less

Journal ArticleDOI

A survey of the recent architectures of deep convolutional neural networks

Asifullah Khan, +3 more

- 01 Dec 2020 -

Artificial Intelligence Review

TL;DR: Deep Convolutional Neural Networks (CNNs) as mentioned in this paper are a special type of Neural Networks, which has shown exemplary performance on several competitions related to Computer Vision and Image Processing.

...read moreread less

Journal ArticleDOI

Supervised Speech Separation Based on Deep Learning: An Overview

DeLiang Wang, +1 more

- 01 Oct 2018 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: A comprehensive overview of deep learning-based supervised speech separation can be found in this paper, where three main components of supervised separation are discussed: learning machines, training targets, and acoustic features.

...read moreread less

Journal ArticleDOI

Deep Learning in Mobile and Wireless Networking: A Survey

Chaoyun Zhang, +2 more

- 13 Mar 2019 -

IEEE Communications Surveys and Tutorial...

TL;DR: This paper bridges the gap between deep learning and mobile and wireless networking research, by presenting a comprehensive survey of the crossovers between the two areas, and provides an encyclopedic review of mobile and Wireless networking research based on deep learning, which is categorize by different domains.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Going deeper with convolutions

Christian Szegedy, +8 more

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

...read moreread less

Journal ArticleDOI

A fast learning algorithm for deep belief nets

Geoffrey E. Hinton, +2 more

- 01 Jul 2006 -

Neural Computation

TL;DR: A fast, greedy algorithm is derived that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory.

...read moreread less

Proceedings Article

Recurrent neural network based language model

Tomas Mikolov, +4 more

TL;DR: Results indicate that it is possible to obtain around 50% reduction of perplexity by using mixture of several RNN LMs, compared to a state of the art backoff language model.

...read moreread less

Journal ArticleDOI

Training products of experts by minimizing contrastive divergence

Geoffrey E. Hinton

- 01 Aug 2002 -

Neural Computation

TL;DR: A product of experts (PoE) is an interesting candidate for a perceptual system in which rapid inference is vital and generation is unnecessary because it is hard even to approximate the derivatives of the renormalization term in the combination rule.

...read moreread less

Posted Content

Network In Network

Min Lin, +2 more

- 16 Dec 2013 -

arXiv: Neural and Evolutionary Computing

TL;DR: With enhanced local modeling via the micro network, the proposed deep network structure NIN is able to utilize global average pooling over feature maps in the classification layer, which is easier to interpret and less prone to overfitting than traditional fully connected layers.

...read moreread less

Collapse

IEEE Transactions on Audio, Speech, and ...

Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks

Morten Kolbæk, +3 more

- 01 Oct 2017 -

IEEE Transactions on Audio, Speech, and ...

The NTT CHiME-3 system: Advances in speech enhancement and recognition for mobile multi-microphone devices

Citations

End to end speech recognition in English and Mandarin

Deep speech 2: end-to-end speech recognition in English and mandarin

A survey of the recent architectures of deep convolutional neural networks

Supervised Speech Separation Based on Deep Learning: An Overview

Deep Learning in Mobile and Wireless Networking: A Survey

References

Going deeper with convolutions

A fast learning algorithm for deep belief nets

Recurrent neural network based language model

Training products of experts by minimizing contrastive divergence

Network In Network

Related Papers (5)

The third ‘CHiME’ speech separation and recognition challenge: Dataset, task and baselines

Neural network based spectral mask estimation for acoustic beamforming

Acoustic Beamforming for Speaker Diarization of Meetings

The Kaldi Speech Recognition Toolkit

Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks