Showing papers on "Voice activity detection published in 1978"

PDF

Open Access

Book•

[...]

05 Sep 1978

TL;DR: This paper presents a meta-modelling framework for digital Speech Processing for Man-Machine Communication by Voice that automates the very labor-intensive and therefore time-heavy and expensive process of encoding and decoding speech.

...read moreread less

Abstract: 1. Introduction. 2. Fundamentals of Digital Speech Processing. 3. Digital Models for the Speech Signal. 4. Time-Domain Models for Speech Processing. 5. Digital Representation of the Speech Waveform. 6. Short-Time Fourier Analysis. 7. Homomorphic Speech Processing. 8. Linear Predictive Coding of Speech. 9. Digital Speech Processing for Man-Machine Communication by Voice.

...read moreread less

3,103 citations

Journal Article•DOI•

Optimizing digital speech coders by exploiting masking properties of the human ear

[...]

Manfred R. Schroeder, B. S. Atal, J. L. Hall

01 Nov 1978-Journal of the Acoustical Society of America

TL;DR: New results of masking and loudness reduction of noise are reported and the design principles of speech coding systems exploiting auditory masking are described.

...read moreread less

Abstract: In any speech coding system that adds noise to the speech signal, the primary goal should not be to reduce the noise power as much as possible, but to make the noise inaudible or to minimize its subjective loudness. ’’Hiding’’ the noise under the signal spectrum is feasible because of human auditory masking: sounds whose spectrum falls near the masking threshold of another sound are either completely masked by the other sound or reduced in loudness. In speech coding applications, the ’’other sound’’ is, of course, the speech signal itself. In this paper we report new results of masking and loudness reduction of noise and describe the design principles of speech coding systems exploiting auditory masking.

...read moreread less

434 citations

Patent•DOI•

Continuous speech recognition method

[...]

Stephen L. Moshier

27 Apr 1978-Journal of the Acoustical Society of America

TL;DR: In this article, a speech recognition method for detecting and recognizing one or more keywords in a continuous audio signal is disclosed, where each keyword is represented by a keyword template representing one of the target patterns, and each target pattern comprises statistics of each of at least one spectrum selected from a predetermined system for processing of the incoming audio.

...read moreread less

Abstract: A speech recognition method for detecting and recognizing one or more keywords in a continuous audio signal is disclosed. Each keyword is represented by a keyword template representing one or more target patterns, and each target pattern comprises statistics of each of at least one spectrum selected from plural short-term spectra generated according to a predetermined system for processing of the incoming audio. The spectra are processed by a frequency equalization and normalizing method to enhance the separation between the spectral pattern classes during later analysis. The processed audio spectra are grouped into spectral patterns, are transformed to reduce dimensionality of the patterns, and are compared by means of likelihood statistics with the target patterns of the keyword templates. A concatenation technique employing a loosely set detection threshold makes it very unlikely that a correct pattern will be rejected.

...read moreread less

78 citations

Patent•DOI•

Speech recognition system

[...]

Hiroaki Sakoe

28 Dec 1978-Journal of the Acoustical Society of America

TL;DR: In this article, a speech recognition system adaptable to noisy environments is described, which includes a recognition unit for recognizing input speech signals and a noise measuring unit for measuring the intensity of ambient noises.

...read moreread less

Abstract: A speech recognition system adaptable to noisy environments is disclosed. The system includes a recognition unit for recognizing input speech signals and a noise measuring unit for measuring the intensity of ambient noises. The system also includes a rejection unit responsive to a rejection standard controlled by the intensity of the measured noise for rejecting the rejection results given from the recognition unit when the rejection standard is exceeded.

...read moreread less

54 citations

Journal Article•DOI•

Digital Dynamic Speech Detectors

[...]

P. Drago, Alcide Molinari, F. Vagliani

01 Jan 1978-IEEE Transactions on Communications

TL;DR: This paper proposes two dynamic-type speech detectors based on the same operational principle: the presence of the speech signal is detected by analyzing the dynamic variations of the short-time-power of the channel signal.

...read moreread less

Abstract: This paper proposes two dynamic-type speech detectors; their performances are described also by means of in-field experimental results. The two detectors are based on the same operational principle: the presence of the speech signal is detected by analyzing the dynamic variations of the short-time-power of the channel signal.

...read moreread less

35 citations

Journal Article•DOI•

Time-encoded speech

[...]

R.A. King¹, W. Gosling²•Institutions (2)

United Kingdom Ministry of Defence¹, University of Bath²

20 Jul 1978-Electronics Letters

TL;DR: A new method of digitising speech waveforms is described, based on the comparison of successive segments of the waveform with a suitably stored catalogue of possible distinct shapes.

...read moreread less

Abstract: A new method of digitising speech waveforms is described, based on the comparison of successive segments of the waveform with a suitably stored catalogue of possible distinct shapes.

...read moreread less

34 citations

Proceedings Article•DOI•

Perceptual and objective evaluation of speech processed by adaptive differential PCM

[...]

B. McDermott¹, C. Scagliola, D. Goodman•Institutions (1)

Bell Labs¹

01 Apr 1978

TL;DR: Overall subjective quality of speech processed by adaptive differential PCM is well predicted by segmental signal-to-noise ratio and even better by a linear combination of measures of granular distortion and overload distortion.

...read moreread less

Abstract: An experiment has been performed to study the perceptual characteristics of speech processed by ADPCM. We created 18 three-bit and four-bit coders spanning a wide range of quantizer adaptation parameters. Subjects judged the difference between each pair of coders and rated the quality of each coder individually. The difference data reveal three important perceptual dimensions (overall clarity, signal vs. background distortion, muffled vs. hoarse) which are related to various objective measures of coder performance. Overall subjective quality is well predicted by segmental SNR and even better by a linear combination of measures of granular distortion and overload distortion.

...read moreread less

30 citations

Proceedings Article•DOI•

9.6/7.2 Kbps Voice excited predictive coder (VEPC)

[...]

Daniel Esteban¹, Claude Galand, D. Mauduit, J. Menez•Institutions (1)

IBM¹

01 Apr 1978

TL;DR: This coding scheme, in addition to the baseband excitation concepts, takes advantage of the association of recently published digital speech processing techniques such that transversal predictive coding, splitband coding by signal decimation/interpolation and adaptive block quantization.

...read moreread less

Abstract: This paper describes a common voice coding architecture based on a Voice Excited Predictive Coding (VEPC) scheme allowing operation at different bit rates : 9600, 7200 bps or below by simply modifying the bandwidth allocated to the coding of the baseband excitation signal. This coding scheme, in addition to the baseband excitation concepts, takes advantage of the association of recently published digital speech processing techniques such that transversal predictive coding, splitband coding by signal decimation/interpolation and adaptive block quantization. Simulations have shown that the proposed architecture allows to obtain a 'standard telephone quality' assuming a 300-3400 Hz telephone bandwidth at transmission rates below 9600 bps.

...read moreread less

28 citations

Proceedings Article•DOI•

Studies on pattern recognition approach to voiced-unvoiced-silence classification

[...]

V. Sarma¹, D. Venugopal¹•Institutions (1)

Indian Institute of Science¹

10 Apr 1978

TL;DR: It is demonstrated that it is possible to achieve pattern recognition classification with much less computational effort by adopting a scheme based on the concept of variable decision space, using only three features and by avoiding the time consuming linear prediction analysis.

...read moreread less

Abstract: A pattern recognition approach for deciding whether a given segment of speech should be classified as voiced speech, unvoiced speech or silence based on a set of five measurements of the signal is given by Atal and Rabiner [1]. In this paper, we demonstrate that it is possible to achieve this classification with much less computational effort. These computational savings are mainly achieved by adopting a scheme based on the concept of variable decision space, using only three features and by avoiding the time consuming linear prediction analysis.

...read moreread less

22 citations

Fast Algorithms for Speech Modeling.

[...]

Martin Morf, D T Lee

15 Dec 1978

TL;DR: The usefulness of the new approach for speech modeling has been successfully established after several parameter quantization methods were considered to achieve the desired low bit rates.

...read moreread less

Abstract: : This constitutes our final report on a research program aimed at the development of a high quality low data rate speech transmission system based on new types of speech modeling algorithms. Several such algorithms were developed and tested on simulated and real speech data. These algorithms have many desirable features including the capability of rapidly tracking time-varying model parameters. The best algorithm was used as the basis of a speech transmission system in order to test the quality of the speech models. The model parameters (reflection coefficients) together with pitch information and speech energy form a speech parameter vector to be transmitted and used to reconstruct the original speech. Several parameter quantization methods were considered to achieve the desired low bit rates. The various algorithms as well as the complete transmission system were coded and tested. Simulation results are very promising and the usefulness of our new approach for speech modeling has been successfully established. (Author)

...read moreread less

19 citations

Journal Article•DOI•

An approach to secure voice communication based on the data encryption standard

[...]

M. Orceyre¹, R. Heller•Institutions (1)

IBM¹

01 Nov 1978-IEEE Communications Magazine

TL;DR: The matter of secure voice communication-enabling speakers to converse naturally over telephone media without fear that their conversation can be usefully intercepted-poses special problems and is receiving close attention within both the commercial and the Government sectors.

...read moreread less

Abstract: Telephone communications have been understood from their beginnings to be vulnerable to interception (unauthorized reception). In recent years, with increasing public and private sector reliance upon electronic media for communicating sensitive technical, financia’l, military, political, economic, and personal information, and with the rapidly increasing use of microwave and satellite telephone carrier media, concern about these vulnerabilities .has mounted dramatically. Starting in mid-1977 there has been considerable attention given in the news media to the matter of wholesale interception by foreign governments of American private and commercial voice and data communications. Publicly available documents note he ase with which such ommon carrier transmissions can be “captured” for subsequent analysis arid use by unauthorized listeners. Fig. 1 illustrates the many vulnerabilities of a typical public switched telephone network. Within this broad framework, the matter of secure voice communication-enabling speakers to converse naturally over telephone media without fear that their conversation can be usefully intercepted-poses special problems and is receiving close attention within both the commercial and the Government sectors.

...read moreread less

Patent•DOI•

Speech transmission system

[...]

Osamu Fujimura¹•Institutions (1)

Bell Labs¹

14 Jun 1978-Journal of the Acoustical Society of America

TL;DR: In this article, a speech transmission system is improved in intelligibility and naturalness by separating voiced from invoiced speech segments prior to application to a transmission channel of restricted bandwidth.

...read moreread less

Abstract: A speech transmission system is improved in intelligibility and naturalness by separating voiced from invoiced speech segments prior to application to a transmission channel of restricted bandwidth. Voiced segments are combined without processing with discrete-frequency coded unvoiced segments processed in analog or digital fashion conformably with the limited channel bandwidth at the transmitter. Voiced segments are reproduced conventionally while unvoiced segments are simulated by noise sources triggered by decoded discrete frequencies at the receiver. The reconstructed speech signal can thus occupy substantially more than the limited channel bandwidth.

...read moreread less

Patent•DOI•

Method of communicating digital speech data and a memory for storing such data

[...]

Richard H. Wiggins¹, George L. Brantingham¹•Institutions (1)

Texas Instruments¹

19 Jun 1978-Journal of the Acoustical Society of America

TL;DR: In this article, a method of communicating Digital Speech Data to a speech synthesis circuit is described. But the data is stored in a memory which is coupled to the speech synthesis circuits.

...read moreread less

Abstract: A method of communicating Digital Speech Data to a speech synthesis circuit. The data is compressed to on the order of 1000-1200 bits, per second for normal human speech. The speech synthesis circuit utilizes linear predictive coding techniques for producing high quality speech or other sounds. The data is preferably stored in a memory which is coupled to the speech synthesis circuit. The data has variable frame lengths; in the disclosed embodiment, four different frame lengths are described having frame lengths from four bits to forty-nine bits. The memory stores the variable frame length data and communicates the same to the speech synthesis circuit in response to certain control signals.

...read moreread less

Proceedings Article•DOI•

Linear predictive coding of speech signals in a high ambient noise environment

[...]

H. Kobatake¹, J. Inari², S. Kakuta²•Institutions (2)

Tokyo University of Agriculture and Technology¹, University of Tokyo²

10 Apr 1978

TL;DR: This paper describes a method of speech coding in a high ambient noise environment and shows that the spectral envelope of speech signal is a most reliable information when the noise reduction method proposed in this paper is used.

...read moreread less

Abstract: Preservation of both the spectral distribution and the periodicity of speech signals are essential in speech processing. This paper describes a method of speech coding in a high ambient noise environment and shows that the spectral envelope of speech signal is a most reliable information when the noise reduction method proposed in this paper is used. Also reported in this paper comparisons of several pitch extraction methods with extensive experimental data, based on which a pitch extraction method suited for noisy speech signals is proposed.

...read moreread less

Journal Article•

Comparison of speech detection and spondee thresholds and half- versus full-list intelligibility scores with MLV and taped presentations of NU-6.

[...]

R C Beattie, D A Svihovec, B J Edgerton

01 May 1978-Journal of the American Audiology Society

TL;DR: Although the data support the use of MLV testing, verification with a standardized recording should be considered when unusually poor SDS's are obtained, and half-list testing can be an effective screening procedure to determine it full- list testing is advisable.

...read moreread less

Abstract: Several speech audiometric measurements were made on 212 ears with mild sensorineural hearing loss. An 8-dB difference between speech detection and spondee thresholds was observed, which is the same relationship that has been found in normal ears. No significant differences in speech discrimination scores (SDS's) were observed when NU-6 was administered via monitored live voice (MLV) and the Auditec recordings. Although our data support the use of MLV testing, verification with a standardized recording should be considered when unusually poor SDS's are obtained. Half-list and full-list SDS's were analyzed for both taped and MLV presentation modes. This analysis showed that both the MLV and taped stimuli exhibited very similar variability and that about 96% of the half-list scores were within 6% of the full-list scores. The clinician should be cautious, however, because 4% of the ears had half-list/full-list discrepancies ranging from 8 to 14% and differences as large as 28% have been reported by Raffin and Thornton (1977). Furthermore, variability between half-list and full-list SDS's varies as a function of intelligibility impairment, being least for scores approaching the extremes of 0 and 100% and greatest for scores in the 30 to 70% range. Finally, our data suggest that half-list testing can be an effective screening procedure to determine it full-list testing is advisable.

...read moreread less

Proceedings Article•DOI•

Providing channel error protection for a 2400 bps linear predictive coded voice system

[...]

J. Fussell, B. Abzug, P. Boudra, M. Cowing

01 Apr 1978

TL;DR: Several techniques for reducing the effect of channel bit errors on the synthesized speech are described, which cause no measurable degradation of the LPC speech transmitted over an error-free channel and they require less than a one percent increase in computer execution time.

...read moreread less

Abstract: The U.S. Government has developed a real-time 2400 bps Linear Predictive Coded (LPC) voice algorithm which was designed to provide maximum intelligibility and quality within the time and accuracy limitations imposed by modern high-speed minicomputers. The algorithm which resulted provides excellent intelligibility and quality when transmitted over an ideal channel. However, the speech is significantly degraded in an error environment. This paper describes several techniques for reducing the effect of channel bit errors on the synthesized speech. These techniques cause no measurable degradation of the LPC speech transmitted over an error-free channel and they require less than a one percent increase in computer execution time.

...read moreread less

Proceedings Article•DOI•

An automatic word spotting system for conversational speech

[...]

M. Medress, T. Diller, D. Kloker, L. Lutton, H. Oredson, T. Skinner - Show less +2 more

01 Apr 1978

TL;DR: Sperry Univac is developing a linguistically oriented system for locating important words in conversational speech that uses acoustic, prosodic, and phonetic analyses to produce a phonetic description of the incoming speech.

...read moreread less

Abstract: Sperry Univac is developing a linguistically oriented system for locating important words in conversational speech. The system uses acoustic, prosodic, and phonetic analyses to produce a phonetic description of the incoming speech. Next, phonetic dictionary representations of the keywords to be found are compared to all portions of the phonetic analysis. High scoring matches are then verified by aligning prestored spectral patterns with the spectral information found during analysis, and resulting good matches are announced as likely keyword occurrences. Current results are presented for this system, which is being developed and tested on bandlimited, conversational speech from a large, diverse speaker population.

...read moreread less

Proceedings Article•DOI•

High quality adaptive predictive coding of speech

[...]

M. Berouti¹, J. Makhoul•Institutions (1)

BBN Technologies¹

01 Apr 1978

TL;DR: The pitch predictor is not useful on balance and should be eliminated, and the residual should be quantized with no clipping and encoded using a variable-length code, which seems to be adequate for all speech and all conditions.

...read moreread less

Abstract: We report on the results of research to code speech at 16 kbps under the condition that the quality of the transmitted speech be equal to that of the original. Some of the original speech had been corrupted by noise and distortions typical of long distance telephone lines. The rigorous requirements of this work led to a new outlook on adaptive predictive coding. We have found that the pitch predictor is not useful on balance and should be eliminated, and that the residual should be quantized with no clipping and encoded using a variable-length code. A single coding scheme seems to be adequate for all speech and all conditions. In addition, the adaptive predictive coding system has been modified to include a noise spectral shaping filter that effectively eliminates the perception of background granular noise.

...read moreread less

Journal Article•DOI•

Bit Rate Per Channel Halving in PCM Multiplexes by Speech Interpolation and Adaptive Quantization

[...]

Alcide Molinari, F. Vagliani

01 May 1978-IEEE Transactions on Communications

TL;DR: An all digital system, labeled PCM.RR is presented, which enables the doubling of traffic capacity of PCM links, by properly using "Adaptive Quantization and Speech Interpolation" performed by means of a "Speech Detector" that works directly on the A -law compressed digital signal.

...read moreread less

Abstract: An all digital system, labeled PCM.RR. is presented, which enables the doubling of traffic capacity of PCM links. This is obtained, although keeping the transmission quality impairment very close to the normal PCM standards, by properly using "Adaptive Quantization" and "Speech Interpolation" performed by means of a "Speech Detector" that works directly on the A -law compressed digital signal.

...read moreread less

Patent•

Multiplexing speech signals

[...]

Cochrane P

14 Mar 1978

TL;DR: In this paper, the frequency range of each speech channel is broken into sub-channels and each of these is considered separately for operational activity, and composite speech signals are then formed from the active frequency subchannels of individual speech channels and these are transmitted with coding signals indicative of their composition.

...read moreread less

Abstract: To transmit a number of individual speech channels over a smaller number of transmission channels, the frequency range of each speech channel is broken into sub-channels and each of these is considered separately for operational activity. Composite speech signals are then formed from the active frequency sub-channels of the individual speech channels and these are transmitted with coding signals indicative of their composition.

...read moreread less

Journal Article•DOI•

Research on low bit rate speech coding at the Electrical Communication Laboratory, NTT

[...]

Fumitada Itakura

01 Nov 1978-Journal of the Acoustical Society of America

TL;DR: The Parcor analysis‐synthesis method is being applied to a wide range of speech coding from 1200 bps variable frame‐rate coding to high quality 16 kbps adaptive, predictive coding.

...read moreread less

Abstract: Since the introduction of speech analysis—synthesis based on the maximum likelihood spectrum estimation—in 1966, we have been conducting research activities on low bit rate speech coding techniques, and their aplication to audio response and low bit rate digital speech transmission. Parcor analysis‐synthesis, demonstrated in 1969, was one of the most fundamental methods, and it has formed the basis of the present development of linear predictive coding. Recently, various kinds of techniques have been proposed to improve speech quality, such as interpolation and nonlinear quantization of parameters, spectral smoothing, etc. They have been applied in the hardware realization of a 4 CH multiplexed 2400 bps Vocoder. At present, the Parcor method is being applied to a wide range of speech coding from 1200 bps variable frame‐rate coding to high quality 16 kbps adaptive, predictive coding.

...read moreread less

Journal Article•DOI•

A Microprocessor Log PCM/ADPCM Code Converter

[...]

J. Dubnowski¹•Institutions (1)

Bell Labs¹

01 May 1978-IEEE Transactions on Communications

TL;DR: A microprocessor has been used to translate between Log PCM and ADPCM (Adaptive Differential PCM) code forms, in bridging the gap between simulation and prototyping, provides realtime speech processing with user interaction.

...read moreread less

Abstract: A microprocessor has been used to translate between Log PCM and ADPCM (Adaptive Differential PCM) code forms. This system, in bridging the gap between simulation and prototyping, provides realtime speech processing with user interaction. Continuously coded speech can he subjectively evaluated while switching the values of code word length, step size, or predictor coefficients. Translations of additional code forms such as Δ-Mod, NIC, or Tree Codes could easily be implemented with the micro-codable system. The processor is configured as a stand-alone device competitive with special purpose hardware in size, speed, and cost.

...read moreread less

Journal Article•DOI•

Speech output from a computer-controlled water-supply network

[...]

F. Fallside¹, Steve Young¹•Institutions (1)

University of Cambridge¹

01 Feb 1978

TL;DR: A system is described which provides for the output of information from a real engineering database in spoken form, using its own predefined knowledge of the information domain and a knowledge of simple English.

...read moreread less

Abstract: A system is described which provides for the output of information from a real engineering database in spoken form. Data extracted from the database is converted by the system, using its own predefined knowledge of the information domain and a knowledge of simple English, into a sequence of words and an associated pitch contour. The spoken output is then generated by the concatenation and resynthesis of previously analysed stored isolated words using a hardware digital speech synthesiser.

...read moreread less

Report•DOI•

The Speech Enhancement Advanced Development Model.

[...]

Mark R. Weiss, Ernest Aschkenasy

01 Nov 1978

TL;DR: In this paper, the authors describe the design, principles of operation, and performance characteristics of an Advanced Development Model of a speech enhancement unit, which improves the quality and intelligibility of speech signals by the removal of frequently encountered interference or noise from received or recorded speech signals.

...read moreread less

Abstract: : This report describes the design, principles of operation, and performance characteristics of an Advanced Development Model of a speech enhancement unit. This unit improves the quality and intelligibility of speech signals by the removal of frequently encountered interference or noise from received or recorded speech signals. A high speed digital array processor and various time and frequency domain algorithms permits the detection and attenuation of narrowband noise (such as tones, hums, whistles, etc.) and impulse noise (such as ignition pulses, static, etc.) with minimum degradation to the speech signals. The enhancement unit provides automatic tracking and attenuation of interferring signals in real time and with a maximum lag of .15 second. The heart of the speech enhancement unit is a powerful computer known as a macro-array processor, or MAP, that performs all of the measurement, analysis, and processing of the input signal. It is supported by a digital magnetic tape unit used to program the MAP and a minicomputer which reads the program into the MAP. Tests on the unit showed attenuation of 30 to 50 db on both narrowband and impulse noise. Operational tests performed by trained Air Force personnel showed the unit to be highly effective in providing improved intelligibility and listenability which significantly reduced listener fatigue. Provision has been made in the design and fabrication of the speech enhancement unit to implement a technique for attenuating wideband random noise. This technique known as INTEL is one of the few known methods of suppressing this commonly encountered noise without severely distorting co-existing speech.

...read moreread less

Proceedings Article•DOI•

A voice-input programming system using basic-like language

[...]

Y. Niimi, Y. Kobayashi

01 Apr 1978

TL;DR: A modified version of BASIC which the authors call 'SPOKEN-BASIC-1' is selected as a task for speech recognition and the two space search strategies, depth- first method and best-first method have been compared quantitatively in their effectiveness.

...read moreread less

Abstract: This paper describes a speech recognition system developed as a voice-input programming system. A modified version of BASIC which we call 'SPOKEN-BASIC-1' is selected as a task for speech recognition. The system consists of four major components: acoustic, lexical matching, syntactic and semantic processors. 71 sentences spoken by each of four speakers have been applied to the system for gathering several statistics on system performance. The system have achieved a sentence recognition rate of 85.5 %. The average time required to recognize an utterance is from 1/4 to 1/5 times real time on the large scale computer. The two space search strategies, depth-first method and best-first method have been compared quantitatively in their effectiveness. Further, various types of knowledge sources have been investigated in their contribution to the system performance.

...read moreread less

Proceedings Article•DOI•

Text-independent speaker identification from a large linguistically unconstrained time-spaced data base

[...]

J. Markel, S. Davis

01 Apr 1978

TL;DR: A very large data base consisting of over thirty-six hours of linguistically unconstrained extemporaneous speech, from seventeen speakers, recorded over a period of more than three months, was analyzed to determine the effectiveness of long-term average features for speaker identification.

...read moreread less

Abstract: A very large data base consisting of over thirty-six hours of linguistically unconstrained extemporaneous speech, from seventeen speakers, recorded over a period of more than three months, was analyzed to determine the effectiveness of long-term average features for speaker identification. The results were strongly dependent on the voiced speech averaging interval, or L v . Monotonic increases in the probability of correct identification were obtained as L v increased, even with substantial time periods between successive sessions. Speaker identification performance in open tests improved if features with small between-class to within-class variance ratios were eliminated. For L v corresponding to approximately thirty-nine seconds of speech, true text-independent results (no linguistic constraints embedded into the data base) of 98.05% for speaker identification were obtained.

...read moreread less

Speech compression and evaluation

[...]

R. Viswanathan, John Makhoul, A. W. F. Huggins

01 Apr 1978

TL;DR: The development of a speech processing computer facility with the ultimate goal of transmitting narrowband speech in real time over the ARPA Network and a reliable method for measuring subjective speech quality are described.

...read moreread less

Abstract: : This report describes our work in the past three years on data compression and quality evaluation of digital speech We developed and implemented linear predictive coding (LPC) techniques with the overall objective of digitally transmitting high quality speech at the lowest possible average data rates over packet-switched communication media Major techniques reported include: covariance lattice method of linear prediction analysis, adaptive lattice methods, linear predictive spectral warping, improved quantization of LPC parameters, variable frame rate transmission of LPC parameters based on a functional perceptual model of speech, and a mixed-source model for LPC synthesizer to produce more natural-sounding speech Also, we developed a reliable method for measuring subjective speech quality This method was employed to formally demonstrate the quality improvements provided by our speech analysis/synthesis techniques as well as for studying speech quality as a function of LPC parameters As subjective procedures are generally expensive and time-consuming, we developed and tested several objective procedures for speech quality evaluation The results from these objective procedures were found to be highly correlated to the corresponding subjective quality judgments Another highlight of our work is the development of a speech processing computer facility with the ultimate goal of transmitting narrowband speech in real time over the ARPA Network

...read moreread less

Book Chapter•DOI•

Speech Processing for Low Data Rate Digital Voice Communications

[...]

E. V. Stansfield

01 Jan 1978

Study and Development of Speech-Separation Techniques.

[...]

Thomas W Parsons

01 May 1978

TL;DR: In this paper, the Fourier transform of the input and the harmonics of the desired voice were selected to suppress the interference caused by the speech of a competing talker in a natural-speech environment.

...read moreread less

Abstract: : One of the most common types of interference in speech communication is that caused by the speech of a competing talker. A technique has been developed for suppressing such interference by examining the Fourier transform of the input and selecting the harmonics of the desired voice. The initial version of this process was applicable only to vocalic speech (i.e., speech consisting only of vowels and vowel-like sounds), but in subsequent research steps have been taken to extend the process to natural (i.e., unrestricted) speech. This report describes the improvements which have been made in this research, first, to ruggedize the process so that it can perform in an natural-speech environment, second, to improve the intelligibility and naturalness of the recovered speech, and third, to enable the process to handle the non-vocalic speech sounds (such as plosives and fricatives) which occur in natural speech. (Author)

...read moreread less

Journal Article•DOI•

A Low-Rate Digital Formant Vocoder

[...]

Chong Un

01 Mar 1978-IEEE Transactions on Communications

TL;DR: A complete algorithm of a 1200-bits/s digital formant vocoder system is described, which draws heavily on the results of recent research in linear predictive coding.

...read moreread less

Abstract: A complete algorithm of a 1200-bits/s digital formant vocoder system is described. This vocoder algorithm draws heavily on the results of recent research in linear predictive coding. The transmitting parameters are frequencies and amplitudes of the first three formants, the pitch period, voiced/unvoiced decision, and the gain. Formant bandwidths are estimated at the synthesizer by using the amplitude information. The synthesizer structure is in the parallel form. The synthetic speech quality at 1200 bits/s is reasonably good; most of the speech is intelligible and speaker-recognizable.

...read moreread less