Speech/Silence segmentation for real-time coding via rule based adaptive endpoint detection

doi:10.1109/ICASSP.1987.1169516

Proceedings ArticleDOI

Speech/Silence segmentation for real-time coding via rule based adaptive endpoint detection

- Vol. 12, pp 1348-1351

TLDR

A new algorithmic technique is presented for efficiently implementing the end-point decisions necessary to separate and segment speech from noisy background environments and for silence compression of speech in which speech segments are encoded with a low bit-rate encoding scheme and silence information is characterized by a set of parameters.

Abstract:

A new algorithmic technique is presented for efficiently implementing the end-point decisions necessary to separate and segment speech from noisy background environments. The algorithm utilizes a set of computationally efficient production rules that are used to generate speech and noise metrics continuously from the input speech waveform. These production rules are based on statistical assumptions about the characteristics of the speech and noise waveform and are generated via time-domain processing to achieve a zero delay decision. An end-pointer compares the speech and silence metrics using an adaptive thresholding scheme with a hysteresis characteristic to control the switching speed of the speech/silence decision. The paper further describes the application of this algorithm to silence compression of speech in which speech segments are encoded with a low bit-rate encoding scheme and silence information is characterized by a set of parameters. In the receiver the resulting packetized speech is reconstructed by decoding the speech segments and reconstructing the silence intervals through a noise substitution process in which the amplitude and duration of background noise is defined by the silence parameters. A noise generation technique is described which utilizes an 18th order polynomial to generate a spectrally flat pseudo-random sequence that is filtered to match the mean coloration of acoustical background noise. A technique is further described in which the speech/silence transitions are merged rather than switched to achieve maximum subjective performance of the compression technique. The above silence compression algorithm has been implemented in a single DSP-20 signal processing chip using sub-band coding for speech encoding. Using this system, experiments were conducted to evaluate the performance of the technique and to verify the robustness of the endpoint and silence compression over a wide range of background noise conditions.

Speech/Silence segmentation for real-time coding via rule based adaptive endpoint detection

Citations

Single channel speech enhancement based on masking properties of the human auditory system

Method and apparatus for performing speech frame encoding mode selection in a variable rate encoding system

A robust algorithm for accurate endpointing of speech signals

SpeechSkimmer: interactively skimming recorded speech

Interactively skimming recorded speech

References

A statistical analysis of on-off patterns in 16 conversations

A technique for investigating on-off patterns of speech

Real-Time Speech Coding

Room Noise Spectra at Subscribers' Telephone Locations

Digital Voice Storage in a Microprocessor

Related Papers (5)

Speech data compression through sparse coding of innovations

A low-complexity, background-noise reduction preprocessor for speech encoder

Signal modification for robust speech coding

Integrated speech enhancement and coding in the time-frequency domain

Variable-subframe-length speech-coding classes derived from wavelet-transform parameters