scispace - formally typeset
Journal ArticleDOI

Alias-and-Separate: Wideband Speech Coding Using Sub-Nyquist Sampling and Speech Separation

- 01 Jan 2022 - 
- Vol. 29, pp 2003-2007
Reads0
Chats0
TLDR
In this article , the authors proposed a novel method for low-rate wideband speech coding utilizing a standard narrowband codec, which can achieve subjective quality comparable to the speeches coded by wideband codecs at higher bitrates in a subjective MUSHRA test.
Abstract
Decimation of a discrete-time signal below the Nyquist rate without applying an appropriate lowpass filter results in a distortion called aliasing. If wideband speech sampled at 16 kHz is decimated by 2 to result in a signal sampled at 8 kHz with aliasing, the decimated signal would be the summation of two speech-like signals, which are the narrowband speech covering 0-4 kHz and the spectrally flipped aliasing component coming from 8-4 kHz. Recently, the performance of speech separation has been remarkably improved with deep learning-based approaches, implying that the narrowband and aliasing components may be able to be separated. In this letter, we propose a novel method for low-rate wideband speech coding utilizing a standard narrowband codec. Instead of coding wideband speech using a wideband codec with a limited bitrate, we propose to decimate the input wideband speech incurring aliasing, and then encode it with a narrowband codec by allocating all the allowed bitrate to 0-4 kHz. After decoding the encoded bitstream, we apply a speech separation technique to obtain the narrowband and aliasing signals, which are then used to reconstruct the wideband speech by expansion, low/highpass filtering, and summation. Experimental results showed that the proposed method could achieve subjective quality comparable to the speeches coded by wideband codecs at higher bitrates in a subjective MUSHRA test.

read more

Content maybe subject to copyright    Report

References
More filters
Journal ArticleDOI

Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation

TL;DR: A fully convolutional time-domain audio separation network (Conv-TasNet), a deep learning framework for end-to-end time- domain speech separation, which significantly outperforms previous time–frequency masking methods in separating two- and three-speaker mixtures.
Journal ArticleDOI

Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks

TL;DR: In this article, the utterance-level permutation invariant training (uPIT) technique was proposed for speaker independent multitalker speech separation, where RNNs, trained with uPIT, can separate multitalker mixed speech without any prior knowledge of signal duration, number of speakers, speaker identity, or gender.
Proceedings ArticleDOI

SDR – Half-baked or Well Done?

TL;DR: The scale-invariant signal-to-distortion ratio (SI-SDR) as mentioned in this paper is a more robust measure for single-channel separation, which has been proposed in the BSS_eval toolkit.
Proceedings ArticleDOI

TaSNet: Time-Domain Audio Separation Network for Real-Time, Single-Channel Speech Separation

TL;DR: TasNet as mentioned in this paper directly models the signal in the time-domain using an encoder-decoder framework and performs the source separation on nonnegative encoder outputs, which is then synthesized by the decoder.
Proceedings ArticleDOI

The voice bank corpus: Design, collection and data analysis of a large regional accent speech database

TL;DR: The motivation and the processes involved in the design and recording of the Voice Bank corpus, specifically designed for the creation of personalised synthetic voices for individuals with speech disorders, are described.