Quality-Net: An End-to-End Non-intrusive Speech Quality Assessment Model Based on BLSTM.

doi:10.21437/INTERSPEECH.2018-1802

Open AccessProceedings ArticleDOI

Quality-Net: An End-to-End Non-intrusive Speech Quality Assessment Model Based on BLSTM.

Szu-Wei Fu, +3 more

- pp 1873-1877

Chats0

TLDR

In this paper, an end-to-end, non-intrusive speech quality evaluation model, termed Quality-Net, based on bidirectional long short-term memory (LSTM) was proposed.

Abstract:

Nowadays, most of the objective speech quality assessment tools (e.g., perceptual evaluation of speech quality (PESQ)) are based on the comparison of the degraded/processed speech with its clean counterpart. The need of a "golden" reference considerably restricts the practicality of such assessment tools in real-world scenarios since the clean reference usually cannot be accessed. On the other hand, human beings can readily evaluate the speech quality without any reference (e.g., mean opinion score (MOS) tests), implying the existence of an objective and non-intrusive (no clean reference needed) quality assessment mechanism. In this study, we propose a novel end-to-end, non-intrusive speech quality evaluation model, termed Quality-Net, based on bidirectional long short-term memory. The evaluation of utterance-level quality in Quality-Net is based on the frame-level assessment. Frame constraints and sensible initializations of forget gate biases are applied to learn meaningful frame-level quality assessment from the utterance-level quality label. Experimental results show that Quality-Net can yield high correlation to PESQ (0.9 for the noisy speech and 0.84 for the speech processed by speech enhancement). We believe that Quality-Net has potential to be used in a wide variety of applications of speech signal processing.

Citations

PDF

Open Access

More filters

Proceedings Article

MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement.

Szu-Wei Fu, +3 more

TL;DR: In this article, the authors proposed a novel metricGAN approach with an aim to optimize the generator with respect to one or multiple evaluation metrics, based on which the generated data can also be arbitrarily specified by users.

...read moreread less

Posted Content

MOSNet: Deep Learning based Objective Assessment for Voice Conversion

Chen-Chou Lo, +6 more

- 17 Apr 2019 -

arXiv: Sound

TL;DR: Results confirm that the proposed deep learning-based assessment models could be used as a computational evaluator to measure the MOS of VC systems to reduce the need for expensive human rating.

...read moreread less

Posted Content

DNSMOS: A Non-Intrusive Perceptual Objective Speech Quality metric to evaluate Noise Suppressors

Chandan K A Reddy, +2 more

- 28 Oct 2020 -

arXiv: Sound

TL;DR: This paper introduces a multi-stage self-teaching based perceptual objective metric that is designed to evaluate noise suppressors and generalizes well in challenging test conditions with a high correlation to human ratings.

...read moreread less

Proceedings ArticleDOI

Non-intrusive Speech Quality Assessment Using Neural Networks

Anderson R. Avila, +5 more

TL;DR: In this article, three neural network-based approaches for mean opinion score (MOS) estimation were proposed, with a fully connected deep neural network using Mel-frequency features providing the best correlation and lowest mean squared error.

...read moreread less

Posted Content

Non-intrusive speech quality assessment using neural networks

Anderson R. Avila, +5 more

- 16 Mar 2019 -

arXiv: Audio and Speech Processing

TL;DR: This work presents an investigation of the applicability of neural networks for non-intrusive audio quality assessment, and proposes three neural network-based approaches for mean opinion score (MOS) estimation.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Posted Content

Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)

Djork-Arné Clevert, +2 more

- 23 Nov 2015 -

arXiv: Learning

TL;DR: The Exponential Linear Unit (ELU) as mentioned in this paper was proposed to alleviate the vanishing gradient problem via the identity for positive values, which has improved learning characteristics compared to the units with other activation functions.

...read moreread less

Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST

John S. Garofolo, +5 more

Journal ArticleDOI

An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech

Cees H. Taal, +3 more

- 01 Sep 2011 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: A short-time objective intelligibility measure (STOI) is presented, which shows high correlation with the intelligibility of noisy and time-frequency weighted noisy speech (e.g., resulting from noise reduction) of three different listening experiments and showed better correlation with speech intelligibility compared to five other reference objective intelligible models.

...read moreread less

Proceedings Article

An Empirical Exploration of Recurrent Network Architectures

Rafal Jozefowicz, +3 more

TL;DR: It is found that adding a bias of 1 to the LSTM's forget gate closes the gap between the L STM and the recently-introduced Gated Recurrent Unit (GRU) on some but not all tasks.

...read moreread less

Journal ArticleDOI

An Experimental Study on Speech Enhancement Based on Deep Neural Networks

Yong Xu, +3 more

- 01 Jan 2014 -

IEEE Signal Processing Letters

TL;DR: This letter presents a regression-based speech enhancement framework using deep neural networks (DNNs) with a multiple-layer deep architecture that tends to achieve significant improvements in terms of various objective quality measures.

...read moreread less

Collapse

Related Papers (5)

Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs

Antony William Rix, +3 more

An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech

Cees H. Taal, +3 more

- 01 Sep 2011 -

IEEE Transactions on Audio, Speech, and ...

P.563—The ITU-T Standard for Single-Ended Speech Quality Assessment

Ludovic Malfait, +2 more

- 01 Nov 2006 -

IEEE Transactions on Audio, Speech, and ...

Speech enhancement based on deep denoising autoencoder.

Xugang Lu, +3 more

A Non-Intrusive Quality and Intelligibility Measure of Reverberant and Dereverberated Speech

Tiago H. Falk, +2 more

- 01 Sep 2010 -

IEEE Transactions on Audio, Speech, and ...

Quality-Net: An End-to-End Non-intrusive Speech Quality Assessment Model Based on BLSTM.

Citations

MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement.

MOSNet: Deep Learning based Objective Assessment for Voice Conversion

DNSMOS: A Non-Intrusive Perceptual Objective Speech Quality metric to evaluate Noise Suppressors

Non-intrusive Speech Quality Assessment Using Neural Networks

Non-intrusive speech quality assessment using neural networks

References

Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)

Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST

An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech

An Empirical Exploration of Recurrent Network Architectures

An Experimental Study on Speech Enhancement Based on Deep Neural Networks

Related Papers (5)

Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs

An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech

P.563&#8212;The ITU-T Standard for Single-Ended Speech Quality Assessment

Speech enhancement based on deep denoising autoencoder.

A Non-Intrusive Quality and Intelligibility Measure of Reverberant and Dereverberated Speech

P.563—The ITU-T Standard for Single-Ended Speech Quality Assessment