scispace - formally typeset
Search or ask a question

Showing papers on "Voice activity detection published in 1975"


Journal ArticleDOI
01 Apr 1975
TL;DR: This paper presents several digital signal processing methods for representing speech, including simple waveform coding methods; time domain techniques; frequency domain representations; nonlinear or homomorphic methods; and finaIly linear predictive coding techniques.
Abstract: This paper presents several digital signal processing methods for representing speech. Included among the representations are simple waveform coding methods; time domain techniques; frequency domain representations; nonlinear or homomorphic methods; and finaIly linear predictive coding techniques. The advantages and disadvantages of each of these representations for various speech processing applications are discussed.

238 citations


Patent
07 Jul 1975
TL;DR: In this article, a plurality of specialized digital signal processing techniques are employed to analyze in real time four speech channels in parallel and multiplex speech frame parameters of the channels into a single data output channel for transmission through a suitable media.
Abstract: Method and apparatus for speech analysis and synthesis adapted for analyzing and multiplexing speech signals from a plurality of voice grade telephone lines for further transmission through a single voice grade telephone line. A plurality of specialized digital signal processing techniques are employed to analyze in real time four speech channels in parallel and multiplex speech frame parameters of the channels into a single data output channel for transmission through a suitable media. The received data channel is demultiplexed and the speech frame parameters for the individual channels are utilized to synthesize, in parallel, the four speech signals. Certain of the digital processing techniques utilize the characteristics of speech signals to truncate conventional signal processing time while other processing techniques are substantially statistical analyses of speech to resolve ambiguities, particularly in making the voiced/unvoiced decision for a frame of analyzed speech data.

60 citations



Journal ArticleDOI
TL;DR: A lattice representation of the segmentation is devised which allows for multiple choices that can be sorted out by higher level processes and to deal effectively with acoustic recognition errors.
Abstract: Errors in acoustic-phonetic recognition occur not only because of the limited scope of the recognition algorithm, but also because certain ambiguities are inherent in analyzing the speech signal. Examples of such ambiguities in segmentation and labeling (feature extraction) are given. In order to allow for these phenomena and to deal effectively with acoustic recognition errors, we have devised a lattice representation of the segmentation which allows for multiple choices that can be sorted out by higher level processes. A description of the current acoustic-phonetic recognition program in the Bolt Beranek and Newman (BBN) Speech Understanding System is given, along with a specification of the parameters used in the recognition.

23 citations


Journal ArticleDOI
TL;DR: The extraction of acoustic cues pertinent to a phonetic feature can be tuned to classes of sounds separated on the basis of other cues, and this serves to increase the reliability of segment labeling.
Abstract: This paper considers a sequential strategy for acoustic-phonetic speech analysis. Each analysis process is applied to an appropriately labeled speech segment and results in a possible sub-segmentation of the original segment. The segments resulting from the analysis are labeled according to the analysis results. The advantages of the strategy are that no more segments are considered than those actually differentiated by the analysis steps. The extraction of acoustic cues pertinent to a phonetic feature can be tuned to classes of sounds separated on the basis of other cues, and this serves to increase the reliability of segment labeling. The analysis sequence yields a structure for the syllabic units of the speech signal that may be used to retrieve similar syllabic units for detailed comparison.

20 citations


Patent
Raymond H. Lanier1
21 Aug 1975
TL;DR: In this article, a digital voice switch for detecting voice PCM samples in the presence of noise samples is disclosed, which includes a digital variable threshold generating means which adapts the threshold level to changes in the noise level.
Abstract: A digital voice switch for detecting voice PCM samples in the presence of noise samples is disclosed. The switch includes a digital variable threshold generating means which adapts the threshold level to changes in the noise level. Advantage is taken of the fact that, over a given interval of time T, speech will occur as random talk spurts separated by periods of silence, while noise (Gaussian distributed) will be continuous. This difference between speech and noise makes it possible to detect the noise level with respect to the voice switch threshold level.

16 citations


Journal ArticleDOI
TL;DR: A real-time computer implementaation of a fourth-order adaptive predictive coder (APC) which transmits speech at 6400 bits/s and achieves an intelligibility score of 87 percent on the diagnostic rhyme tests (DRT) and produces speech with acceptable voice naturalness even in the presence of acoustic background noise.
Abstract: This paper describes a real-time computer implementaation of a fourth-order adaptive predictive coder (APC) which transmits speech at 6400 bits/s. To reduce the demands on the central processor, the terminal uses the average magnitude difference function (AMDF) for pitch extraction and calculates the predictor coefficients and reflection coefficients assuming stationarity of the input data. With 16-bit fixed-point arithmetic processing, the coder achieves an intelligibility score of 87 percent on the diagnostic rhyme tests (DRT) and produces speech with acceptable voice naturalness even in the presence of acoustic background noise.

13 citations


Journal ArticleDOI
M. Knudsen1
TL;DR: The present program converts frames of 256 16-bit samples into 14 coefficients and then into 128 points of logarithmic power spectrum at 100 frames/s and introduces a new autocorrelation scheme with several valuable properties.
Abstract: The autocorrelation method for linear-predictive coding of speech [1] has been implemented in real time on the SPS-41, a commercially available system composed of three dissimilar microprocessors working in parallel. Using user-written microcode, one processor performs I/O and master control, the second handles loop indexing and counting, and the third does the actual arithmetic on data. Such parallelism allows 2 × 106I/O operations and 4 × 106multiplications/s, but actually realizing this potential requires fresh approaches to some old algorithms. Most important is a new autocorrelation scheme with several valuable properties. Using 16-bit fixed-point single-precision arithmetic to accumulate autocorrelation sums and invert the autocorrelation matrix presents problems which have been solved reasonably well. The present program converts frames of 256 16-bit samples into 14 coefficients and then into 128 points of logarithmic power spectrum at 100 frames/s.

12 citations



25 Aug 1975
TL;DR: An ultra-high performance programmable speech processor, consisting in the main of a custom designed 55-nsec microcomputer, has been designed and constructed.
Abstract: : An ultra-high performance programmable speech processor, consisting in the main of a custom designed 55-nsec microcomputer, has been designed and constructed. To date, five real-time speech compression programs have been implemented and evaluated.

7 citations


Journal ArticleDOI
R. Becker1, F. Poza1
TL;DR: This paper describes the acoustic processing in a syntactically guided natural language speech understanding system that through the interaction of pragmatic, semantic, and syntactic information, candidate words are proposed to exist at specific points in the acoustic stream.
Abstract: This paper describes the acoustic processing in a syntactically guided natural language speech understanding system. A major characteristic of this system is that through the interaction of pragmatic, semantic, and syntactic information, candidate words are proposed to exist at specific points in the acoustic stream. The purpose of the acoustic processor is to verify or reject these hypotheses. This verification is done in two stages. First, digital filtering is done to classify each 10-ms segment as one of ten primitive classes. If the proposed word is consistent with the pattern of primitive classes at the corresponding point in the acoustic stream, further analysis is done using linear predictive coding and other digital filters. The results of this analysis are used to segment the acoustic signal and to further classify the voiced segments. Because this segmentation and classification can be tailored for each word, difficult analysis problems caused by coarticulation between adjacent sounds can be successfully solved. When combined with a sophisticated linguistic processor, these acoustical processing methods yield correct understanding of natural language utterances.

Patent
27 May 1975
TL;DR: In this article, a delay line which disappears under voice control is comprised of a pluray of shift registers which delays the incoming voice signal to permit transmission of the voice scrambler preamble.
Abstract: A delay line which disappears under voice control is comprised of a pluray of shift registers delays the incoming voice signal to permit transmission of the voice scrambler preamble. The delay is removed over a period of time during speech transmission by removing a segment of the delay line each time a pause of a specified length is detected in the speech.

01 Jun 1975
TL;DR: In this paper, a technique for enhancing speech obscured by wideband noise has been proposed, which has been improved to the point that it will be realistic to implement it as a real-time system for use in practical applications.
Abstract: : Two of the most serious types of interference in speech communications are wideband noise and the speech of a competing talker. This report describes research on techniques for coping with these types of interference. INTEL, a technique for enhancing speech obscured by wideband noise, has now been improved to the point that it will be realistic to implement it as a real-time system for use in practical applications. A technique has also been developed for dealing with interference from a competing talker. (The current technique is restricted to vocalic utterances). Separation is done by selecting the components of the desired voice in the Fourier transform of the input. In implementing this process, techniques have been developed for resolving overlapping spectrum components, for determining pitches of both talkers, and for assuring consistent separation. This report describes the improvements in INTEL and the implementation of the two-talker process, and discusses the prospects for basing a general two-talker process on these techniques.

01 Mar 1975
TL;DR: The objective of the work described in this report was to make it less tiring to monitor speech at low signal-to-noise ratios over long periods of time by selecting a method that intrinsically is independent of the spectrum characteristics of the communication channel or tape being monitored.
Abstract: : The objective of the work described in this report was to make it less tiring to monitor speech at low signal-to-noise ratios over long periods of time Two approaches to reducing auditory fatigue were studied: (1) automatic detection of speech and (2) automatic enhancement of the S/N of speech The first approach was aimed at reducing the amount of time spent in simply listening for speech to occur After examining several methods of detecting speech, we selected a method that intrinsically is independent of the spectrum characteristics of the communication channel or tape being monitored, of the speech characteristics of the talker, and of the language The technique proved to be capable of detecting speech in wideband noise at an S/N of -6 dB Its major disadvantage appears to be that the complexity of the required computations demands the use of a computer to implement the method

Journal ArticleDOI
TL;DR: A well-known, real-time computational algorithm for use in adaptive linear prediction of speech waveforms is discussed and is related to known research in other fields.
Abstract: A well-known, real-time computational algorithm for use in adaptive linear prediction of speech waveforms is discussed and is related to known research in other fields.

Journal ArticleDOI
TL;DR: Several versions of TRIVOC processing are presented, together with some results of subjective assessment of performance, which confirm the superior speech quality attainable by this method.
Abstract: The first‐formant region of the speech spectrum must be characterized with a high degree of accuracy in order to attain good speech quality and naturalness in a narrow‐band speech processing system. This consideration led to the concept of using linear predictive coding for the first‐formant region, vocoder channels for the upper spectrum, and detection and coding of the excitation function, to establish a narrow‐band voice digitizer. This concept has advantages in the tradeoff of speech quality versus data rate, in comparison with the use of either channel vocoding or linear prediction alone. Such a triple‐function voice coder (TRIVOC) has been implemented and investigated using a CSP‐30 Digital Signal Processor. Processing algorithms have been established to permit comparison of various combinations of baseband and vocoder channels, LPC coefficients, and data rates ranging from 2400 to 4800 bits per second. Subjective results have confirmed the superior speech quality attainable by this method. Several versions of TRIVOC processing are presented, together with some results of subjective assessment of performance.


ReportDOI
10 Jan 1975
TL;DR: This study, using as a research vehicle the homomorphic vocoder algorithm, is directed toward determining the time resolution and the frequency resolution required to faithfully reproduce a speech signal.
Abstract: : A problem basic to the development of all digital telecommunication systems is the efficient digital encoding of speech signals for transmission. Many speech digitization algorithms have been proposed. This study, using as a research vehicle the homomorphic vocoder algorithm, is directed toward determining the time resolution and the frequency resolution required to faithfully reproduce a speech signal. The results reported are fundamental in that they are applicable to any speech digitization algorithm. The resolutions required for typical speaker situations are summarized.