Showing papers on "Voice activity detection published in 1975"

PDF

Open Access

Journal Article•DOI•

Digital representations of speech signals

[...]

R. Schafer¹, Lawrence R. Rabiner•Institutions (1)

01 Apr 1975

TL;DR: This paper presents several digital signal processing methods for representing speech, including simple waveform coding methods; time domain techniques; frequency domain representations; nonlinear or homomorphic methods; and finaIly linear predictive coding techniques.

...read moreread less

Abstract: This paper presents several digital signal processing methods for representing speech. Included among the representations are simple waveform coding methods; time domain techniques; frequency domain representations; nonlinear or homomorphic methods; and finaIly linear predictive coding techniques. The advantages and disadvantages of each of these representations for various speech processing applications are discussed.

...read moreread less

238 citations

Patent•

Speech analysis and synthesis system

[...]

Arthur L. Wilkes, Fred B. Wade, Robert L. Thompson

07 Jul 1975

TL;DR: In this article, a plurality of specialized digital signal processing techniques are employed to analyze in real time four speech channels in parallel and multiplex speech frame parameters of the channels into a single data output channel for transmission through a suitable media.

...read moreread less

Abstract: Method and apparatus for speech analysis and synthesis adapted for analyzing and multiplexing speech signals from a plurality of voice grade telephone lines for further transmission through a single voice grade telephone line. A plurality of specialized digital signal processing techniques are employed to analyze in real time four speech channels in parallel and multiplex speech frame parameters of the channels into a single data output channel for transmission through a suitable media. The received data channel is demultiplexed and the speech frame parameters for the individual channels are utilized to synthesize, in parallel, the four speech signals. Certain of the digital processing techniques utilize the characteristics of speech signals to truncate conventional signal processing time while other processing techniques are substantially statistical analyses of speech to resolve ambiguities, particularly in making the voiced/unvoiced decision for a frame of analyzed speech data.

...read moreread less

60 citations

Journal Article•

A Study of Time-Domain Speech Compression by Means of a New Analog Speech Processor

[...]

Ian M. Bennett, John G. Linvill

01 May 1975-Journal of The Audio Engineering Society

25 citations

Journal Article•DOI•

Where the phonemes are: Dealing with ambiguity in acoustic-phonetic recognition

[...]

R. Schwartz¹, John Makhoul¹•Institutions (1)

BBN Technologies¹

01 Feb 1975-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: A lattice representation of the segmentation is devised which allows for multiple choices that can be sorted out by higher level processes and to deal effectively with acoustic recognition errors.

...read moreread less

Abstract: Errors in acoustic-phonetic recognition occur not only because of the limited scope of the recognition algorithm, but also because certain ambiguities are inherent in analyzing the speech signal. Examples of such ambiguities in segmentation and labeling (feature extraction) are given. In order to allow for these phenomena and to deal effectively with acoustic recognition errors, we have devised a lattice representation of the segmentation which allows for multiple choices that can be sorted out by higher level processes. A description of the current acoustic-phonetic recognition program in the Bolt Beranek and Newman (BBN) Speech Understanding System is given, along with a specification of the parameters used in the recognition.

...read moreread less

23 citations

Journal Article•DOI•

A phonetic-context controlled strategy for segmentation and phonetic labeling of speech

[...]

P. Mermelstein¹•Institutions (1)

Haskins Laboratories¹

01 Feb 1975-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: The extraction of acoustic cues pertinent to a phonetic feature can be tuned to classes of sounds separated on the basis of other cues, and this serves to increase the reliability of segment labeling.

...read moreread less

Abstract: This paper considers a sequential strategy for acoustic-phonetic speech analysis. Each analysis process is applied to an appropriately labeled speech segment and results in a possible sub-segmentation of the original segment. The segments resulting from the analysis are labeled according to the analysis results. The advantages of the strategy are that no more segments are considered than those actually differentiated by the analysis steps. The extraction of acoustic cues pertinent to a phonetic feature can be tuned to classes of sounds separated on the basis of other cues, and this serves to increase the reliability of segment labeling. The analysis sequence yields a structure for the syllabic units of the speech signal that may be used to retrieve similar syllabic units for detailed comparison.

...read moreread less

20 citations

Patent•

Digital voice switch for single or multiple channel applications

[...]

Raymond H. Lanier¹•Institutions (1)

COMSAT¹

21 Aug 1975

TL;DR: In this article, a digital voice switch for detecting voice PCM samples in the presence of noise samples is disclosed, which includes a digital variable threshold generating means which adapts the threshold level to changes in the noise level.

...read moreread less

Abstract: A digital voice switch for detecting voice PCM samples in the presence of noise samples is disclosed. The switch includes a digital variable threshold generating means which adapts the threshold level to changes in the noise level. Advantage is taken of the fact that, over a given interval of time T, speech will occur as random talk spurts separated by periods of silence, while noise (Gaussian distributed) will be continuous. This difference between speech and noise makes it possible to detect the noise level with respect to the voice switch threshold level.

...read moreread less

16 citations

Journal Article•DOI•

A Real-Time Adaptive Predictive Coder Using Small Computers

[...]

A. Goldberg, H. Shaffer

01 Dec 1975-IEEE Transactions on Communications

TL;DR: A real-time computer implementaation of a fourth-order adaptive predictive coder (APC) which transmits speech at 6400 bits/s and achieves an intelligibility score of 87 percent on the diagnostic rhyme tests (DRT) and produces speech with acceptable voice naturalness even in the presence of acoustic background noise.

...read moreread less

Abstract: This paper describes a real-time computer implementaation of a fourth-order adaptive predictive coder (APC) which transmits speech at 6400 bits/s. To reduce the demands on the central processor, the terminal uses the average magnitude difference function (AMDF) for pitch extraction and calculates the predictor coefficients and reflection coefficients assuming stationarity of the input data. With 16-bit fixed-point arithmetic processing, the coder achieves an intelligibility score of 87 percent on the diagnostic rhyme tests (DRT) and produces speech with acceptable voice naturalness even in the presence of acoustic background noise.

...read moreread less

13 citations

Journal Article•DOI•

Real-time linear-predictive coding of speech on the SPS-41 triple-microprocessor machine

[...]

M. Knudsen¹•Institutions (1)

Bell Labs¹

01 Feb 1975-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: The present program converts frames of 256 16-bit samples into 14 coefficients and then into 128 points of logarithmic power spectrum at 100 frames/s and introduces a new autocorrelation scheme with several valuable properties.

...read moreread less

Abstract: The autocorrelation method for linear-predictive coding of speech [1] has been implemented in real time on the SPS-41, a commercially available system composed of three dissimilar microprocessors working in parallel. Using user-written microcode, one processor performs I/O and master control, the second handles loop indexing and counting, and the third does the actual arithmetic on data. Such parallelism allows 2 × 106I/O operations and 4 × 106multiplications/s, but actually realizing this potential requires fresh approaches to some old algorithms. Most important is a new autocorrelation scheme with several valuable properties. Using 16-bit fixed-point single-precision arithmetic to accumulate autocorrelation sums and invert the autocorrelation matrix presents problems which have been solved reasonably well. The present program converts frames of 256 16-bit samples into 14 coefficients and then into 128 points of logarithmic power spectrum at 100 frames/s.

...read moreread less

12 citations

Book•

Speech: content and communication

[...]

Charles S. Mudd, Malcolm O. Sillars

01 Jan 1975

8 citations

The Lincoln Digital Voice Terminal System

[...]

Peter E. Blankenship, Edward M. Hofstetter, Albert H. Huntoon, Marilyn L. Malpass, Stephanie Seneff, Vincent J. Sferrino - Show less +2 more

25 Aug 1975

TL;DR: An ultra-high performance programmable speech processor, consisting in the main of a custom designed 55-nsec microcomputer, has been designed and constructed.

...read moreread less

Abstract: : An ultra-high performance programmable speech processor, consisting in the main of a custom designed 55-nsec microcomputer, has been designed and constructed. To date, five real-time speech compression programs have been implemented and evaluated.

...read moreread less

7 citations

Journal Article•DOI•

Acoustic phonetic research in speech understanding

[...]

R. Becker¹, F. Poza¹•Institutions (1)

SRI International¹

01 Oct 1975-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: This paper describes the acoustic processing in a syntactically guided natural language speech understanding system that through the interaction of pragmatic, semantic, and syntactic information, candidate words are proposed to exist at specific points in the acoustic stream.

...read moreread less

Abstract: This paper describes the acoustic processing in a syntactically guided natural language speech understanding system. A major characteristic of this system is that through the interaction of pragmatic, semantic, and syntactic information, candidate words are proposed to exist at specific points in the acoustic stream. The purpose of the acoustic processor is to verify or reject these hypotheses. This verification is done in two stages. First, digital filtering is done to classify each 10-ms segment as one of ten primitive classes. If the proposed word is consistent with the pattern of primitive classes at the corresponding point in the acoustic stream, further analysis is done using linear predictive coding and other digital filters. The results of this analysis are used to segment the acoustic signal and to further classify the voiced segments. Because this segmentation and classification can be tailored for each word, difficult analysis problems caused by coarticulation between adjacent sounds can be successfully solved. When combined with a sophisticated linguistic processor, these acoustical processing methods yield correct understanding of natural language utterances.

...read moreread less

Patent•

Voice controlled disappearing audio delay line

[...]

George B. Johnson¹•Institutions (1)

United States Department of the Navy¹

27 May 1975

TL;DR: In this article, a delay line which disappears under voice control is comprised of a pluray of shift registers which delays the incoming voice signal to permit transmission of the voice scrambler preamble.

...read moreread less

Abstract: A delay line which disappears under voice control is comprised of a pluray of shift registers delays the incoming voice signal to permit transmission of the voice scrambler preamble. The delay is removed over a period of time during speech transmission by removing a segment of the delay line each time a pause of a specified length is detected in the speech.

...read moreread less

Enhancing/Intelligibility of Speech in Noisy or Multi-Talker Environments.

[...]

T. W. Parsons, M. R. Weiss

01 Jun 1975

TL;DR: In this paper, a technique for enhancing speech obscured by wideband noise has been proposed, which has been improved to the point that it will be realistic to implement it as a real-time system for use in practical applications.

...read moreread less

Abstract: : Two of the most serious types of interference in speech communications are wideband noise and the speech of a competing talker. This report describes research on techniques for coping with these types of interference. INTEL, a technique for enhancing speech obscured by wideband noise, has now been improved to the point that it will be realistic to implement it as a real-time system for use in practical applications. A technique has also been developed for dealing with interference from a competing talker. (The current technique is restricted to vocalic utterances). Separation is done by selecting the components of the desired voice in the Fourier transform of the input. In implementing this process, techniques have been developed for resolving overlapping spectrum components, for determining pitches of both talkers, and for assuring consistent separation. This report describes the improvements in INTEL and the implementation of the two-talker process, and discusses the prospects for basing a general two-talker process on these techniques.

...read moreread less

Automatic Detection and Enhancement of Speech Signals

[...]

Mark R. Weiss, Ernest Aschkenasy

01 Mar 1975

TL;DR: The objective of the work described in this report was to make it less tiring to monitor speech at low signal-to-noise ratios over long periods of time by selecting a method that intrinsically is independent of the spectrum characteristics of the communication channel or tape being monitored.

...read moreread less

Abstract: : The objective of the work described in this report was to make it less tiring to monitor speech at low signal-to-noise ratios over long periods of time Two approaches to reducing auditory fatigue were studied: (1) automatic detection of speech and (2) automatic enhancement of the S/N of speech The first approach was aimed at reducing the amount of time spent in simply listening for speech to occur After examining several methods of detecting speech, we selected a method that intrinsically is independent of the spectrum characteristics of the communication channel or tape being monitored, of the speech characteristics of the talker, and of the language The technique proved to be capable of detecting speech in wideband noise at an S/N of -6 dB Its major disadvantage appears to be that the complexity of the required computations demands the use of a computer to implement the method

...read moreread less

Journal Article•DOI•

A note on real-time linear prediction of speech waveforms

[...]

Dennis R. Morgan¹•Institutions (1)

General Electric¹

01 Aug 1975-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: A well-known, real-time computational algorithm for use in adaptive linear prediction of speech waveforms is discussed and is related to known research in other fields.

...read moreread less

Abstract: A well-known, real-time computational algorithm for use in adaptive linear prediction of speech waveforms is discussed and is related to known research in other fields.

...read moreread less

Journal Article•DOI•

Triple‐function voice coder (TRIVOC)

[...]

J. E. Roberts, C. P. Smith, R. H. Wiggins

01 Apr 1975-Journal of the Acoustical Society of America

TL;DR: Several versions of TRIVOC processing are presented, together with some results of subjective assessment of performance, which confirm the superior speech quality attainable by this method.

...read moreread less

Abstract: The first‐formant region of the speech spectrum must be characterized with a high degree of accuracy in order to attain good speech quality and naturalness in a narrow‐band speech processing system. This consideration led to the concept of using linear predictive coding for the first‐formant region, vocoder channels for the upper spectrum, and detection and coding of the excitation function, to establish a narrow‐band voice digitizer. This concept has advantages in the tradeoff of speech quality versus data rate, in comparison with the use of either channel vocoding or linear prediction alone. Such a triple‐function voice coder (TRIVOC) has been implemented and investigated using a CSP‐30 Digital Signal Processor. Processing algorithms have been established to permit comparison of various combinations of baseband and vocoder channels, LPC coefficients, and data rates ranging from 2400 to 4800 bits per second. Subjective results have confirmed the superior speech quality attainable by this method. Several versions of TRIVOC processing are presented, together with some results of subjective assessment of performance.

...read moreread less

Journal Article•

Speech detection thresholds and comfortable loudness levels for speech in children with limited hearing.

[...]

Martha Rubin, Ira M. Ventry

01 Dec 1975-American Annals of the Deaf

Report•DOI•

Time and Frequency Resolution in Speech Analysis and Synthesis

[...]

Aubrey Marvin Bush

10 Jan 1975

TL;DR: This study, using as a research vehicle the homomorphic vocoder algorithm, is directed toward determining the time resolution and the frequency resolution required to faithfully reproduce a speech signal.

...read moreread less

Abstract: : A problem basic to the development of all digital telecommunication systems is the efficient digital encoding of speech signals for transmission. Many speech digitization algorithms have been proposed. This study, using as a research vehicle the homomorphic vocoder algorithm, is directed toward determining the time resolution and the frequency resolution required to faithfully reproduce a speech signal. The results reported are fundamental in that they are applicable to any speech digitization algorithm. The resolutions required for typical speaker situations are summarized.

...read moreread less