scispace - formally typeset
Proceedings ArticleDOI

Speech/Silence segmentation for real-time coding via rule based adaptive endpoint detection

J. Lynch, +2 more
- Vol. 12, pp 1348-1351
TLDR
A new algorithmic technique is presented for efficiently implementing the end-point decisions necessary to separate and segment speech from noisy background environments and for silence compression of speech in which speech segments are encoded with a low bit-rate encoding scheme and silence information is characterized by a set of parameters.
Abstract
A new algorithmic technique is presented for efficiently implementing the end-point decisions necessary to separate and segment speech from noisy background environments. The algorithm utilizes a set of computationally efficient production rules that are used to generate speech and noise metrics continuously from the input speech waveform. These production rules are based on statistical assumptions about the characteristics of the speech and noise waveform and are generated via time-domain processing to achieve a zero delay decision. An end-pointer compares the speech and silence metrics using an adaptive thresholding scheme with a hysteresis characteristic to control the switching speed of the speech/silence decision. The paper further describes the application of this algorithm to silence compression of speech in which speech segments are encoded with a low bit-rate encoding scheme and silence information is characterized by a set of parameters. In the receiver the resulting packetized speech is reconstructed by decoding the speech segments and reconstructing the silence intervals through a noise substitution process in which the amplitude and duration of background noise is defined by the silence parameters. A noise generation technique is described which utilizes an 18th order polynomial to generate a spectrally flat pseudo-random sequence that is filtered to match the mean coloration of acoustical background noise. A technique is further described in which the speech/silence transitions are merged rather than switched to achieve maximum subjective performance of the compression technique. The above silence compression algorithm has been implemented in a single DSP-20 signal processing chip using sub-band coding for speech encoding. Using this system, experiments were conducted to evaluate the performance of the technique and to verify the robustness of the endpoint and silence compression over a wide range of background noise conditions.

read more

Citations
More filters
Journal ArticleDOI

Single channel speech enhancement based on masking properties of the human auditory system

TL;DR: This paper addresses the problem of single channel speech enhancement at very low signal-to-noise ratios (SNRs) (<10 dB) with a new computationally efficient algorithm developed based on masking properties of the human auditory system, resulting in improved results over classical subtractive-type algorithms.
Patent

Method and apparatus for performing speech frame encoding mode selection in a variable rate encoding system

TL;DR: In this paper, the authors proposed a method for dynamically adjusting the average output data rate of the speech coder with minimal impact on the quality of the input speech, which can be applied to both unvoiced speech and temporally masked speech.
Journal ArticleDOI

A robust algorithm for accurate endpointing of speech signals

TL;DR: A robust new algorithm for accurate endpointing of speech signals is described in this paper after an overview of the literature, which uses simple measures based on energy and zero-crossing rate for speech/silence detection.
Proceedings ArticleDOI

SpeechSkimmer: interactively skimming recorded speech

TL;DR: This paper presents a multi-level approach to auditory skimming, along with user interface techniques for interacting with the audio and providing feedback, and a prototype user interface for skimming speech.
Dissertation

Interactively skimming recorded speech

TL;DR: SpeechSkimmer as mentioned in this paper uses simple speech processing techniques to allow a user to hear recorded sounds quickly, and at several levels of detail, and provides continuous real-time control of the speed and detail level of the audio presentation.
References
More filters
Journal ArticleDOI

A statistical analysis of on-off patterns in 16 conversations

TL;DR: This is a summary of data from an extensive analysis of on-off speech patterns in 16 experimental telephone conversations, determined by a fixed threshold speech detector having certain rules for rejecting noise and for filling in short gaps.
Journal ArticleDOI

A technique for investigating on-off patterns of speech

TL;DR: The detection technique developed here is considered to be an improvement over conventional methods, but still yields data whose significance is uncertain, and it may be that a simple automatic speech detecting technique using fixed parameters is inadequate for some purposes.
Journal ArticleDOI

Real-Time Speech Coding

TL;DR: The approach and methodology for real-time hardware for coder techniques ranging from low to high complexity, with examples of realizations given for each approach, are discussed.
Journal ArticleDOI

Room Noise Spectra at Subscribers' Telephone Locations

TL;DR: In this paper, an average spectrum, giving the absolute sound pressure as a function of frequency is presented for a given sound level of room noise, and data for the results of similar measurements made in two Bell System office buildings in New York and the results are compared with the Philadelphia data.
Journal ArticleDOI

Digital Voice Storage in a Microprocessor

TL;DR: It is found that silent intervals corresponding to 20-40 percent of the duration of continuous input speech can be usefully eliminated and this latitude can significantly alleviate buffer requirements in a packet transmission system for voice.