Mean opinion score
About: Mean opinion score is a research topic. Over the lifetime, 1739 publications have been published within this topic receiving 25606 citations. The topic is also known as: Mean opinion score, MOS.
Papers published on a yearly basis
20 Aug 2017
TL;DR: Tacotron as mentioned in this paper is an end-to-end generative text to speech model that synthesizes speech directly from characters, given pairs, the model can be trained completely from scratch with random initialization.
Abstract: A text-to-speech synthesis system typically consists of multiple stages, such as a text analysis frontend, an acoustic model and an audio synthesis module. Building these components often requires extensive domain expertise and may contain brittle design choices. In this paper, we present Tacotron, an end-to-end generative text-to-speech model that synthesizes speech directly from characters. Given pairs, the model can be trained completely from scratch with random initialization. We present several key techniques to make the sequence-to-sequence framework perform well for this challenging task. Tacotron achieves a 3.82 subjective 5-scale mean opinion score on US English, outperforming a production parametric system in terms of naturalness. In addition, since Tacotron generates speech at the frame level, it's substantially faster than sample-level autoregressive methods.
TL;DR: In this article, a new underwater color image quality evaluation (UCIQE) metric is proposed to quantify the non-uniform color cast, blurring, and low contrast that characterize underwater engineering and monitoring images.
Abstract: Quality evaluation of underwater images is a key goal of underwater video image retrieval and intelligent processing. To date, no metric has been proposed for underwater color image quality evaluation (UCIQE). The special absorption and scattering characteristics of the water medium do not allow direct application of natural color image quality metrics especially to different underwater environments. In this paper, subjective testing for underwater image quality has been organized. The statistical distribution of the underwater image pixels in the CIELab color space related to subjective evaluation indicates the sharpness and colorful factors correlate well with subjective image quality perception. Based on these, a new UCIQE metric, which is a linear combination of chroma, saturation, and contrast, is proposed to quantify the non-uniform color cast, blurring, and low-contrast that characterize underwater engineering and monitoring images. Experiments are conducted to illustrate the performance of the proposed UCIQE metric and its capability to measure the underwater image enhancement results. They show that the proposed metric has comparable performance to the leading natural color image quality metrics and the underwater grayscale image quality metrics available in the literature, and can predict with higher accuracy the relative amount of degradation with similar image content in underwater environments. Importantly, UCIQE is a simple and fast solution for real-time underwater video processing. The effectiveness of the presented measure is also demonstrated by subjective evaluation. The results show better correlation between the UCIQE and the subjective mean opinion score.
TL;DR: It is demonstrated that modeling periodic patterns of an audio is crucial for enhancing sample quality and the generality of HiFi-GAN is shown to the mel-spectrogram inversion of unseen speakers and end-to-end speech synthesis.
Abstract: Several recent work on speech synthesis have employed generative adversarial networks (GANs) to produce raw waveforms. Although such methods improve the sampling efficiency and memory usage, their sample quality has not yet reached that of autoregressive and flow-based generative models. In this work, we propose HiFi-GAN, which achieves both efficient and high-fidelity speech synthesis. As speech audio consists of sinusoidal signals with various periods, we demonstrate that modeling periodic patterns of an audio is crucial for enhancing sample quality. A subjective human evaluation (mean opinion score, MOS) of a single speaker dataset indicates that our proposed method demonstrates similarity to human quality while generating 22.05 kHz high-fidelity audio 167.9 times faster than real-time on a single V100 GPU. We further show the generality of HiFi-GAN to the mel-spectrogram inversion of unseen speakers and end-to-end speech synthesis. Finally, a small footprint version of HiFi-GAN generates samples 13.4 times faster than real-time on CPU with comparable quality to an autoregressive counterpart.
TL;DR: The proposed approach relies on the success (and retraining) of proven, state-of-the-art deep object recognition networks and can be used to not only score images reliably and with high correlation to human perception, but also to assist with adaptation and optimization of photo editing/enhancement algorithms in a photographic pipeline.
Abstract: Automatically learned quality assessment for images has recently become a hot topic due to its usefulness in a wide variety of applications, such as evaluating image capture pipelines, storage techniques, and sharing media. Despite the subjective nature of this problem, most existing methods only predict the mean opinion score provided by data sets, such as AVA and TID2013. Our approach differs from others in that we predict the distribution of human opinion scores using a convolutional neural network. Our architecture also has the advantage of being significantly simpler than other methods with comparable performance. Our proposed approach relies on the success (and retraining) of proven, state-of-the-art deep object recognition networks. Our resulting network can be used to not only score images reliably and with high correlation to human perception, but also to assist with adaptation and optimization of photo editing/enhancement algorithms in a photographic pipeline. All this is done without need for a “golden” reference image, consequently allowing for single-image, semantic- and perceptually-aware, no-reference quality assessment.
Trending Questions (7)
Related Topics (5)