scispace - formally typeset
Search or ask a question

Showing papers on "Voice activity detection published in 1971"


Journal ArticleDOI
S. Das1, W. Mohn2
TL;DR: Experiments investigating adaptive pattern recognition in automatic speaker verification are reported, indicating that the utterances used for training purposes should preferably be collected over a relatively long period of time.
Abstract: Experiments investigating adaptive pattern recognition in automatic speaker verification are reported. A binary decision confirming or rejecting a speaker's purported identity is required. The experiments involve 7000 phrase length utterances of 118 speakers. An average misclassification rate of one percent with a "no decision" rate of ten percent is obtained. Other experiments indicate that the utterances used for training purposes should preferably be collected over a relatively long period of time.

41 citations


Journal ArticleDOI
TL;DR: A computer technique for synthesizing continuous messages by concatenating formant data for word-length utterances is described and the results show the synthesized numbers to be comparable in communicative effectiveness to naturally spoken digits.
Abstract: Speech signals can be described in terms of the resonances of the vocal tract. These resonances, or formants, change at rates comparable to the motions of the vocal tract. They therefore can be sampled and quantized to low bit-rates, and hence constitute an economical form for digital storage of speech information. Formant coding also permits flexible arrangement of speech elements into various contexts. This report describes a computer technique for synthesizing continuous messages by concatenating formant data for word-length utterances. The stored data for the synthesis corresponds to a bit-rate of 533 b/s. A Honeywell DDP-516 computer is used to experimentally evaluate a voice response system. In an initial application, the system is used to synthesize 7-digit telephone numbers. To assess the synthesis an interactive dialing experiment, also conducted by the computer, is described. The results show the synthesized numbers to be comparable in communicative effectiveness to naturally spoken digits.

37 citations


Journal ArticleDOI
TL;DR: Application of a type of predictive coding to the channel signals of a homomorphic vocoder has produced sizable bit rate reduction and a technique for obtaining the formant frequencies from the predictive coding parameters is described; this approach promises further bit rate reductions.
Abstract: Application of a type of predictive coding to the channel signals of a homomorphic vocoder has produced sizable bit rate reductions. With only slight degradation in speech quality, reduction (for the spectral envelope information) from 7800 to 4000 bits/s was achieved. A technique for obtaining the formant frequencies from the predictive coding parameters is described; this approach promises further bit rate reductions. As a by-product of this study of predictive coding, direct and cascade form speech synthesizers are compared on the basis of differing quantization effects.

18 citations


Journal ArticleDOI
TL;DR: A digital hardware realization of a formant synthesizer which utilizes the technique of digital multiplexing of a single arithmetic unit among several digital filter sections to produce speech in real time.
Abstract: Terminal analog or formant speech synthesizers have found many applications in speech research. These include investigation of computer voice response, speech synthesis-by-rule, and speech perception studies, among others. Many types of formant synthesizers have been designed and realized either in analog circuitry or as a computer program. In this paper we describe a digital hardware realization of a formant synthesizer which utilizes the technique of digital multiplexing of a single arithmetic unit among several digital filter sections. The advantages of this hardware over conventional analog hardware include: precise control over center frequencies and bandwidths of the resonators in the synthesizer, stability and reliability of the hardware, light weight, small size, and low power consumption. The synthesizer is capable of producing speech in real time at sampling rates up to 12.8 kHz, using 24 bits to process the digital signals internal to the synthesizer. A 12-bit digital-to-analog convertor supplies an immediate analog output for monitoring the speech and a provision is included for returning 16 bits of the output signal to the computer for future processing such as waveform display or spectrum analysis.

16 citations


Journal ArticleDOI
R.W. Berry1
01 Feb 1971
TL;DR: An instrument has been designed which measures the r.m.s. voltage of the signal when integrated over a given period of several seconds, to cater for fragmentary speech and to make an estimate of the activity factor of the conversational speech so measured.
Abstract: Knowledge of the electrical levels of speech signals at the input to speech-transmission systems is required for (a) study of the talking behaviour in telephone conversations (b) provision, from measurements under service conditions in the field, of basic data for estimating the power loading of multiplex transmission systems. For these purposes, an instrument has been designed which measures the r.m.s. voltage of the signal when integrated over a given period of several seconds. To cater for fragmentary speech, the total integration period is not fixed, the timing being stopped during silent periods. The period of `active' speech over which integration takes place is 10s. Accurate square-law integration over a sufficiently wide range of amplitude caters for the dynamic range of speech from a talker under conversation conditions. The operate and release levels, and operate and hangover times, of the timer switch have been chosen so that timing is not stopped during structural pauses, but only when a listener would have judged that the talker had stopped. Calibration is in decibels relative to 1V (dB V), and so conversion to decibels relative to 1mW (dBm) into an impedance of 600Ω requires the addition of 2.2dB to each reading. With this allowance, the reading may be regarded as `the long-term mean power while the talker is active'. The instrument has been used to measure the distribution of speech volumes at the input to a transatlantic cable system and to make an estimate of the activity factor of the conversational speech so measured.

8 citations


Journal ArticleDOI
TL;DR: A new pattern recognition technique is proposed that avoids the exhaustive comparison process associated with pattern matching and some preliminary results obtained show that a performance very similar to that obtained from the exhaustive compare process is attainable with a significant saving in computational effort.
Abstract: A description is given of an unusual pattern recognition technique which has been used in an experimental speech recognition system. Preliminary results obtained using this technique are reported. The speech analyzer produces a multichannel ternary signal at its output, which is the short term digital autocorrelation function of the input signal. This output is sampled at regular intervals and this sampled information is transferred to a computer. A new pattern recognition technique is proposed that avoids the exhaustive comparison process associated with pattern matching. The technique is similar to a tree-structured process in that decisions are taken that exclude certain master patterns from further processing as it becomes apparent that these are sufficiently dissimilar to the unknown pattern. However, retracing within the structure and the substitution of an alternative path are permitted if the current path appears unlikely to lead to a correct decision. Some preliminary results obtained using this technique are described. These show that a performance very similar to that obtained from the exhaustive comparison process is attainable with a significant saving in computational effort. The effect of varying certain parameters within the recognition process is also considered and some preliminary optimization of parameter values is reported.

5 citations


Journal ArticleDOI
TL;DR: A large set of vocoded speech signals has been evaluated in terms of preference and it is shown that, in certain respects, reliable system evaluations pose formidable problems.
Abstract: Starting from an IEEE Recommended Practice for Speech Quality Measurements and from previous work of the authors, a large set of vocoded speech signals has been evaluated in terms of preference. The set of speech samples has been taken from the vocoder survey of the 1967 Conference on Speech Communication and Processing, Boston, Mass. The test samples are evaluated by several methods: direct comparisons, the isopreferenee method, the relative preference method, the category judgment method, and the absolute preference judgment method. Due to the size of the test material, not all the test samples could be evaluated by all these methods. The test results are discussed and it is shown that, in certain respects, reliable system evaluations pose formidable problems. An effort to rank order the systems, which are described by small sets of test samples of frequently very different quality, for good reasons shows only limited success. The majority of the systems are of about equal preference with only insignificant differences. There are only a few systems that are outside this group and are either significantly better or worse than the rest.

4 citations


Journal ArticleDOI
TL;DR: The preliminary results obtained indicate that most of the necessary adaptation can be achieved in a relatively short time, provided that the speakers are instructed in how to change their articulations to produce the desired effects.
Abstract: This study examines the feasibility and limitations of speaker adaptation in improving the performance of a fixed (speaker-independent) automatic speech recognition system. A fixed vocabulary of 55 [ƏCVd] syllables is used in the recognition system, where C is 1 of 11 stops and fricatives, and V is 1 of 5 tense vowels. The results of the experiment on speaker adaptation, performed with 6 male and 6 female adult speakers, show that speakers can learn to change their articulations to appreciably improve recognition scores. The preliminary results obtained also indicate that most of the necessary adaptation can be achieved in a relatively short time, provided that the speakers are instructed in how to change their articulations to produce the desired effects.

4 citations


Journal ArticleDOI
TL;DR: The use of a small digital computer in processing the speech signal to achieve the intelligibility in speech signals by converting them into dichotic signals with an interaural time delay is described with illustrations.
Abstract: An increase in the rate and the intelligibility of sound is highly desirable in speech communication. Also, it is useful to have an accurate and efficient method of obtaining desired segments of a speech sample. In this paper, the use of a small digital computer in processing the speech signal to achieve the above purposes is described with illustrations. On‐line simulation of the method of Fairbanks et al. [G. Fairbanks et al., IRE Trans. Audio 2, 7–12, (1954)] of increasing the speech rate has been achieved with flexible speed‐up ratios and sampling intervals. Increase of intelligibility in speech signals by converting them into dichotic signals with an interaural time delay is discussed. These dichotic signals have been obtained from the computer for time delays between 0 and 1 sec. To obtain different segments of a speech sample, the computer is programmed to store the speech sample and display its waveform on an oscilloscope, so that various segments of the speech sample can be extracted and also joi...

3 citations


Proceedings ArticleDOI
01 Dec 1971

2 citations


01 Sep 1971
TL;DR: The capability and flexibility of the Voice Data Processor System (VDPS) was increased to include the following features: a modification was designed to provide a real-time CRT display of selected VDPS parameters; the AFCRL Linc Processor was interfaced with the VD PS.
Abstract: : The report contains the results of an investigation of speech pattern-matching using a digital voice data processor to evaluate pattern-matching speech bandwidth compression techniques. The capability and flexibility of the Voice Data Processor System (VDPS) was increased to include the following features: A modification was designed to provide a real-time CRT display of selected VDPS parameters; the AFCRL Linc Processor was interfaced with the VDPS; a modification was made to reinsert silent frames into the output from silence-edited digital tapes as they are run in a tape input mode; the capability of editing out the onset portions of words of speech in real-time has been added to the system. In addition, the results of studies on increasing VDPS flexibility and memory capacity are presented, to accommodate the research on the effects of speaker selection. (Author)