scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Speech coding and enhancement using quantized compressive sensing measurements

TL;DR: This work characterizes a speech codec in a Compressive Sensing (CS) framework and demonstrates simultaneous compression and de-noising of speech by CS, and Appropriate quantization of CS measurements to design medium bit-rate codec.
Abstract: Medium bit rate hybrid speech coding schemes have gained much interest in the recent years and many of them have been standardized for various applications. This work characterizes a speech codec in a Compressive Sensing (CS) framework. We mainly demonstrate two aspects 1) Simultaneous compression and de-noising of speech by CS 2) Appropriate quantization of CS measurements to design medium bit-rate codec. The proposed scheme renders better quality speech compared to CELP, the widely used hybrid coding scheme, at the same bit rates. The CS speech codec has the added advantage of inherent noise suppression and easy scalability, without complex parameter extractions and voice activity detections.
Citations
More filters
Journal ArticleDOI
TL;DR: Comparison with recent state-of-the-art methods is performed in terms of segmental signal to noise ratio, perceptual evaluation of speech quality, and short-time objective intelligibility.

40 citations

Journal ArticleDOI
TL;DR: This study introduces a new method for speech signal encryption and compression in a single step using compressive sensing (CS), and the contourlet transform is used to increase the sparsity of the signal required by CS.
Abstract: This study introduces a new method for speech signal encryption and compression in a single step The combined compression/encryption procedures are accomplished using compressive sensing (CS) The contourlet transform is used to increase the sparsity of the signal required by CS Due to its randomness properties and very high sensitivity to initial conditions, the chaotic system is used to generate the sensing matrix of CS This largely increases the key size of encryption to 10 135 when logistic map is used A spectral segmental signal-to-noise ratio of -36813 dB is obtained as a measure of encryption strength The quality of reconstructed speech is given by means of signal-to-noise ratio (SNR), and perceptual evaluation speech quality (PESQ) For 60% compression ratio the proposed method gives 48203 dB SNR and 4437 PESQ for voiced speech segments However, for continuous speech (voiced and unvoiced), it gives 41097 dB SNR and 4321 PESQ

29 citations

Journal ArticleDOI
01 May 2019-Heliyon
TL;DR: A novel scalable speech coding scheme based on Compressive Sensing, which can operate at bit rates from 3.275 to 7.275 kbps is designed and implemented and offers the listening quality for reconstructed speech similar to that of Adaptive Multi rate - Narrowband codec at 6.7 kbps and Enhanced Voice Services (EVS) at 7.2 kbps.

7 citations


Cites background or methods from "Speech coding and enhancement using..."

  • ...In [14], the CS concept is used for both speech coding...

    [...]

  • ...[14], the low bit rate of our proposed scheme is due to the vector quantization of measurements....

    [...]

  • ...A comparison of the proposed speech coding with CS based compression schemes given in [14] and [33] in terms of bit rate, Compression Ratio (CR), complexity and reconstruction accuracy has been carried out....

    [...]

  • ...In both [14] and [33], the signal transformation prior to sensing measurements increases complexity....

    [...]

  • ...The proposed method is also compared with CS based speech compression mechanisms proposed in [14] and [33]....

    [...]

Proceedings ArticleDOI
01 Oct 2018
TL;DR: Comparison of CS-based enhancement with three state-of-the-art methods is performed in terms of segmental SNR, root mean square error, and perceptual evaluation of speech quality (PESQ); and proved promising performances.
Abstract: A procedure based on compressed sensing (CS) is proposed for speech enhancement. The latter is Prior-noise-estimation free. This approach is motivated by the fact that CS allows the recovery of only the sparse signal in the presence of non-sparse signal (noise). An application to an Arabic speech signal corrupted with white Gaussian noise is studied. Comparison of CS-based enhancement with three state-of-the-art methods is performed in terms of segmental SNR, root mean square error, and perceptual evaluation of speech quality (PESQ); and proved promising performances.

4 citations


Cites methods from "Speech coding and enhancement using..."

  • ...A few CS-based speech enhancement methods have also been proposed [29]– [33]....

    [...]

Journal ArticleDOI
TL;DR: Results prove that an optimal selection of the frame length, the shift rate and the iterations number allows enhancing the quality of the reconstructed signals.
Abstract: Signal reconstruction from a given sequence of short-time Fourier transform magnitude spectra without phase information has been a challenging topic since many years. The key issue is how to invert a sequence of overlapping magnitude spectrum containing minimal phase data to generate a real-valued signal free of audible artifacts. Yet, practical implementations are still not able to accurately do that. Based on an implementation of the classical RTISI method for a variety of signal types including both monophonic and polyphonic audio signals such as speech and music, this study aims to determine the optimal conditions required to reconstruct a signal from magnitude spectrum, to understand the relevance of the contribution of each parameter and to take care of the recording conditions of the original signal. Results prove that an optimal selection of the frame length, the shift rate and the iterations number allows enhancing the quality of the reconstructed signals.

4 citations

References
More filters
Journal ArticleDOI
TL;DR: This paper proposes gradient projection algorithms for the bound-constrained quadratic programming (BCQP) formulation of these problems and test variants of this approach that select the line search parameters in different ways, including techniques based on the Barzilai-Borwein method.
Abstract: Many problems in signal processing and statistical inference involve finding sparse solutions to under-determined, or ill-conditioned, linear systems of equations. A standard approach consists in minimizing an objective function which includes a quadratic (squared ) error term combined with a sparseness-inducing regularization term. Basis pursuit, the least absolute shrinkage and selection operator (LASSO), wavelet-based deconvolution, and compressed sensing are a few well-known examples of this approach. This paper proposes gradient projection (GP) algorithms for the bound-constrained quadratic programming (BCQP) formulation of these problems. We test variants of this approach that select the line search parameters in different ways, including techniques based on the Barzilai-Borwein method. Computational experiments show that these GP approaches perform well in a wide range of applications, often being significantly faster (in terms of computation time) than competing methods. Although the performance of GP methods tends to degrade as the regularization term is de-emphasized, we show how they can be embedded in a continuation scheme to recover their efficient practical performance.

3,488 citations


"Speech coding and enhancement using..." refers methods in this paper

  • ...At the receiver, the sparse estimate of input signal was obtained using any of the non-linear CS reconstruction methods (GPSR [12] was...

    [...]

Journal ArticleDOI
TL;DR: A noisy speech corpus is developed suitable for evaluation of speech enhancement algorithms encompassing four classes of algorithms: spectral subtractive, subspace, statistical-model based and Wiener-type algorithms.

634 citations


"Speech coding and enhancement using..." refers background in this paper

  • ...Further, let Φ be an M × N matrix, (M≪N), which is typically called sensing or measurement matrix which satisfy certain mathematical properties such as Restricted Isometric Property (RIP), spark etc [13]....

    [...]

Journal ArticleDOI
01 Oct 1994
TL;DR: The objective of this paper is to provide a tutorial overview of speech coding methodologies with emphasis on those algorithms that are part of the recent low-rate standards for cellular communications.
Abstract: The past decade has witnessed substantial progress towards the application of low-rate speech coders to civilian and military communications as well as computer-related voice applications. Central to this progress has been the development of new speech coders capable of producing high-quality speech at low data rates. Most of these coders incorporate mechanisms to: represent the spectral properties of speech, provide for speech waveform matching, and "optimize" the coder's performance for the human ear. A number of these coders have already been adopted in national and international cellular telephony standards. The objective of this paper is to provide a tutorial overview of speech coding methodologies with emphasis on those algorithms that are part of the recent low-rate standards for cellular communications. Although the emphasis is on the new low-rate coders, we attempt to provide a comprehensive survey by covering some of the traditional methodologies as well. We feel that this approach will not only point out key references but will also provide valuable background to the beginner. The paper starts with a historical perspective and continues with a brief discussion on the speech properties and performance measures. We then proceed with descriptions of waveform coders, sinusoidal transform coders, linear predictive vocoders, and analysis-by-synthesis linear predictive coders. Finally, we present concluding remarks followed by a discussion of opportunities for future research. >

461 citations


"Speech coding and enhancement using..." refers background in this paper

  • ...They are simple but they provide lesser compression [1]....

    [...]

  • ...Parametric coders are error prone and they yield poor quality speech [1]....

    [...]

Book
01 Feb 1995
TL;DR: A detailed account of the most recently developed digital speech coders designed specifically for use in the evolving communications systems, including an in-depth examination of the important topic of code excited linear prediction (CELP).
Abstract: From the Publisher: A detailed account of the most recently developed digital speech coders designed specifically for use in the evolving communications systems. Discusses the variety of speech coders utilized with such new systems as MBE IMMARSAT-M. Includes an in-depth examination of the important topic of code excited linear prediction (CELP).

453 citations

Book
11 Jun 2008
TL;DR: This book reviews relevant backgrounds and reports research aimed at increasing the robustness of single- and multi-modal biometric identity verification systems and can serve as a useful primer for face and speech processing, as well as information fusion.
Abstract: Over the last decade, interest in biometric based identification and verification systems has increased considerably. One application is the use of speech signals, face images or fingerprints in order to supplement security systems based on passwords. Biometric recognition can also be applied to other areas, such as passport control (immigration checkpoints), forensic work (to determine whether a biometric sample belongs to a suspect) and law enforcement applications (e.g. surveillance). While biometric systems based on face images and/or speech signals can be effective, their performance can degrade in the presence of challenging conditions. In face based systems this can be in the form of a change in the illumination direction and/or face pose variations. Multi-modal systems use more than one biometric at the same time. This is done for two main reasons -- to achieve better robustness and to increase discrimination power. This book can serve as a useful primer for face and speech processing, as well as information fusion. It reviews relevant backgrounds and reports research aimed at increasing the robustness of single- and multi-modal biometric identity verification systems.

105 citations