Journal ArticleDOI
Alias-and-Separate: Wideband Speech Coding Using Sub-Nyquist Sampling and Speech Separation
Reads0
Chats0
TLDR
In this article , the authors proposed a novel method for low-rate wideband speech coding utilizing a standard narrowband codec, which can achieve subjective quality comparable to the speeches coded by wideband codecs at higher bitrates in a subjective MUSHRA test.Abstract:
Decimation of a discrete-time signal below the Nyquist rate without applying an appropriate lowpass filter results in a distortion called aliasing. If wideband speech sampled at 16 kHz is decimated by 2 to result in a signal sampled at 8 kHz with aliasing, the decimated signal would be the summation of two speech-like signals, which are the narrowband speech covering 0-4 kHz and the spectrally flipped aliasing component coming from 8-4 kHz. Recently, the performance of speech separation has been remarkably improved with deep learning-based approaches, implying that the narrowband and aliasing components may be able to be separated. In this letter, we propose a novel method for low-rate wideband speech coding utilizing a standard narrowband codec. Instead of coding wideband speech using a wideband codec with a limited bitrate, we propose to decimate the input wideband speech incurring aliasing, and then encode it with a narrowband codec by allocating all the allowed bitrate to 0-4 kHz. After decoding the encoded bitstream, we apply a speech separation technique to obtain the narrowband and aliasing signals, which are then used to reconstruct the wideband speech by expansion, low/highpass filtering, and summation. Experimental results showed that the proposed method could achieve subjective quality comparable to the speeches coded by wideband codecs at higher bitrates in a subjective MUSHRA test. read more
References
More filters
Journal ArticleDOI
Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation
Yi Luo,Nima Mesgarani +1 more
TL;DR: A fully convolutional time-domain audio separation network (Conv-TasNet), a deep learning framework for end-to-end time- domain speech separation, which significantly outperforms previous time–frequency masking methods in separating two- and three-speaker mixtures.
Journal ArticleDOI
Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks
TL;DR: In this article, the utterance-level permutation invariant training (uPIT) technique was proposed for speaker independent multitalker speech separation, where RNNs, trained with uPIT, can separate multitalker mixed speech without any prior knowledge of signal duration, number of speakers, speaker identity, or gender.
Proceedings ArticleDOI
SDR – Half-baked or Well Done?
TL;DR: The scale-invariant signal-to-distortion ratio (SI-SDR) as mentioned in this paper is a more robust measure for single-channel separation, which has been proposed in the BSS_eval toolkit.
Proceedings ArticleDOI
TaSNet: Time-Domain Audio Separation Network for Real-Time, Single-Channel Speech Separation
Yi Luo,Nima Mesgarani +1 more
TL;DR: TasNet as mentioned in this paper directly models the signal in the time-domain using an encoder-decoder framework and performs the source separation on nonnegative encoder outputs, which is then synthesized by the decoder.
Proceedings ArticleDOI
The voice bank corpus: Design, collection and data analysis of a large regional accent speech database
TL;DR: The motivation and the processes involved in the design and recording of the Voice Bank corpus, specifically designed for the creation of personalised synthetic voices for individuals with speech disorders, are described.
Related Papers (5)
Non-intrusive Speech Quality Assessment for Super-wideband Speech Communication Networks
Gabriel Mittag,Sebastian Möller +1 more