scispace - formally typeset
Open AccessProceedings ArticleDOI

Quality-Net: An End-to-End Non-intrusive Speech Quality Assessment Model Based on BLSTM.

Reads0
Chats0
TLDR
In this paper, an end-to-end, non-intrusive speech quality evaluation model, termed Quality-Net, based on bidirectional long short-term memory (LSTM) was proposed.
Abstract
Nowadays, most of the objective speech quality assessment tools (e.g., perceptual evaluation of speech quality (PESQ)) are based on the comparison of the degraded/processed speech with its clean counterpart. The need of a "golden" reference considerably restricts the practicality of such assessment tools in real-world scenarios since the clean reference usually cannot be accessed. On the other hand, human beings can readily evaluate the speech quality without any reference (e.g., mean opinion score (MOS) tests), implying the existence of an objective and non-intrusive (no clean reference needed) quality assessment mechanism. In this study, we propose a novel end-to-end, non-intrusive speech quality evaluation model, termed Quality-Net, based on bidirectional long short-term memory. The evaluation of utterance-level quality in Quality-Net is based on the frame-level assessment. Frame constraints and sensible initializations of forget gate biases are applied to learn meaningful frame-level quality assessment from the utterance-level quality label. Experimental results show that Quality-Net can yield high correlation to PESQ (0.9 for the noisy speech and 0.84 for the speech processed by speech enhancement). We believe that Quality-Net has potential to be used in a wide variety of applications of speech signal processing.

read more

Citations
More filters
Proceedings Article

MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement.

TL;DR: In this article, the authors proposed a novel metricGAN approach with an aim to optimize the generator with respect to one or multiple evaluation metrics, based on which the generated data can also be arbitrarily specified by users.
Posted Content

MOSNet: Deep Learning based Objective Assessment for Voice Conversion

TL;DR: Results confirm that the proposed deep learning-based assessment models could be used as a computational evaluator to measure the MOS of VC systems to reduce the need for expensive human rating.
Posted Content

DNSMOS: A Non-Intrusive Perceptual Objective Speech Quality metric to evaluate Noise Suppressors

TL;DR: This paper introduces a multi-stage self-teaching based perceptual objective metric that is designed to evaluate noise suppressors and generalizes well in challenging test conditions with a high correlation to human ratings.
Proceedings ArticleDOI

Non-intrusive Speech Quality Assessment Using Neural Networks

TL;DR: In this article, three neural network-based approaches for mean opinion score (MOS) estimation were proposed, with a fully connected deep neural network using Mel-frequency features providing the best correlation and lowest mean squared error.
Posted Content

Non-intrusive speech quality assessment using neural networks

TL;DR: This work presents an investigation of the applicability of neural networks for non-intrusive audio quality assessment, and proposes three neural network-based approaches for mean opinion score (MOS) estimation.
References
More filters
Posted Content

Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)

TL;DR: The Exponential Linear Unit (ELU) as mentioned in this paper was proposed to alleviate the vanishing gradient problem via the identity for positive values, which has improved learning characteristics compared to the units with other activation functions.
Journal ArticleDOI

An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech

TL;DR: A short-time objective intelligibility measure (STOI) is presented, which shows high correlation with the intelligibility of noisy and time-frequency weighted noisy speech (e.g., resulting from noise reduction) of three different listening experiments and showed better correlation with speech intelligibility compared to five other reference objective intelligible models.
Proceedings Article

An Empirical Exploration of Recurrent Network Architectures

TL;DR: It is found that adding a bias of 1 to the LSTM's forget gate closes the gap between the L STM and the recently-introduced Gated Recurrent Unit (GRU) on some but not all tasks.
Journal ArticleDOI

An Experimental Study on Speech Enhancement Based on Deep Neural Networks

TL;DR: This letter presents a regression-based speech enhancement framework using deep neural networks (DNNs) with a multiple-layer deep architecture that tends to achieve significant improvements in terms of various objective quality measures.
Related Papers (5)