Alias-and-Separate: Wideband Speech Coding Using Sub-Nyquist Sampling and Speech Separation

doi:10.1109/lsp.2022.3207381

Journal ArticleDOI

Alias-and-Separate: Wideband Speech Coding Using Sub-Nyquist Sampling and Speech Separation

- 01 Jan 2022 -

IEEE Signal Processing Letters

- Vol. 29, pp 2003-2007

Chats0

TLDR

In this article , the authors proposed a novel method for low-rate wideband speech coding utilizing a standard narrowband codec, which can achieve subjective quality comparable to the speeches coded by wideband codecs at higher bitrates in a subjective MUSHRA test.

Abstract:

Decimation of a discrete-time signal below the Nyquist rate without applying an appropriate lowpass filter results in a distortion called aliasing. If wideband speech sampled at 16 kHz is decimated by 2 to result in a signal sampled at 8 kHz with aliasing, the decimated signal would be the summation of two speech-like signals, which are the narrowband speech covering 0-4 kHz and the spectrally flipped aliasing component coming from 8-4 kHz. Recently, the performance of speech separation has been remarkably improved with deep learning-based approaches, implying that the narrowband and aliasing components may be able to be separated. In this letter, we propose a novel method for low-rate wideband speech coding utilizing a standard narrowband codec. Instead of coding wideband speech using a wideband codec with a limited bitrate, we propose to decimate the input wideband speech incurring aliasing, and then encode it with a narrowband codec by allocating all the allowed bitrate to 0-4 kHz. After decoding the encoded bitstream, we apply a speech separation technique to obtain the narrowband and aliasing signals, which are then used to reconstruct the wideband speech by expansion, low/highpass filtering, and summation. Experimental results showed that the proposed method could achieve subjective quality comparable to the speeches coded by wideband codecs at higher bitrates in a subjective MUSHRA test.

References

PDF

Open Access

More filters

Journal ArticleDOI

Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation

Yi Luo, +1 more

- 20 Sep 2018 -

arXiv: Sound

TL;DR: A fully convolutional time-domain audio separation network (Conv-TasNet), a deep learning framework for end-to-end time- domain speech separation, which significantly outperforms previous time–frequency masking methods in separating two- and three-speaker mixtures.

...read moreread less

Journal ArticleDOI

Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks

Morten Kolbæk, +3 more

- 01 Oct 2017 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: In this article, the utterance-level permutation invariant training (uPIT) technique was proposed for speaker independent multitalker speech separation, where RNNs, trained with uPIT, can separate multitalker mixed speech without any prior knowledge of signal duration, number of speakers, speaker identity, or gender.

...read moreread less

Proceedings ArticleDOI

SDR – Half-baked or Well Done?

Jonathan Le Roux, +3 more

TL;DR: The scale-invariant signal-to-distortion ratio (SI-SDR) as mentioned in this paper is a more robust measure for single-channel separation, which has been proposed in the BSS_eval toolkit.

...read moreread less

Proceedings ArticleDOI

TaSNet: Time-Domain Audio Separation Network for Real-Time, Single-Channel Speech Separation

Yi Luo, +1 more

TL;DR: TasNet as mentioned in this paper directly models the signal in the time-domain using an encoder-decoder framework and performs the source separation on nonnegative encoder outputs, which is then synthesized by the decoder.

...read moreread less

Proceedings ArticleDOI

The voice bank corpus: Design, collection and data analysis of a large regional accent speech database

Christophe Veaux, +2 more

TL;DR: The motivation and the processes involved in the design and recording of the Voice Bank corpus, specifically designed for the creation of personalised synthetic voices for individuals with speech disorders, are described.

...read moreread less

Collapse

Alias-and-Separate: Wideband Speech Coding Using Sub-Nyquist Sampling and Speech Separation

References

Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation

Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks

SDR – Half-baked or Well Done?

TaSNet: Time-Domain Audio Separation Network for Real-Time, Single-Channel Speech Separation

The voice bank corpus: Design, collection and data analysis of a large regional accent speech database

Related Papers (5)

Non-intrusive Speech Quality Assessment for Super-wideband Speech Communication Networks

Proposal on objective speech quality assessment for wideband IP telephony

Non-intrusive speech quality estimation for GSM system using narrowband and wideband AMR codec

The Broadvoice Speech Coding Algorithm

EVRC-Wideband: The New 3GPP2 Wideband Vocoder Standard