Showing papers on "Voice activity detection published in 1990"

PDF

Open Access

Journal Article•DOI•

Digital Coding of Waveforms: Principles and Applications to Speech and Video

[...]

Nuggehally Sampath Jayant, P. Noll

01 Nov 1990-Signal Processing

869 citations

Journal Article•DOI•

Augmenting a window system with speech input

[...]

Chris Schmandt¹, Mark S. Ackerman¹, Debby Hindus¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Aug 1990-IEEE Computer

TL;DR: With Xspeak, window navigation tasks usually performed with a mouse can be controlled by voice, and an improved version, Xspeak II, which incorporates a language for translating spoken commands, is introduced.

...read moreread less

Abstract: Some necessary background in speech recognition and window systems is given, with an analysis of how they might be combined. Xspeak, a navigation application, and its operation and a field study of its use are described. With Xspeak, window navigation tasks usually performed with a mouse can be controlled by voice. An improved version, Xspeak II, which incorporates a language for translating spoken commands, is introduced. >

...read moreread less

249 citations

Proceedings Article•DOI•

Switching adaptive filters for enhancing noisy and reverberant speech from microphone array recordings

[...]

D. Van Compernolle¹•Institutions (1)

Katholieke Universiteit Leuven¹

03 Apr 1990

TL;DR: Switching adaptive filters, suitable for speech beamforming, with no prior knowledge about the speech source are presented, and the most robust solution, i.e. a delay and sum beamformer that cues in on the direct path only and neglects all multipath contributions is given.

...read moreread less

Abstract: Switching adaptive filters, suitable for speech beamforming, with no prior knowledge about the speech source are presented. The filters have two sections, of which only one section at any given time is allowed to adapt its coefficients. The switch between both is controlled by a speech detection function. The first section implements an adaptive look direction and cues in on the desired speech. This section only adapts when speech is present. The second section acts as a multichannel adaptive noise canceller. The obtained noise references are typically very bad; hence, adaptation must be restricted to silence-only periods. Several ideas were explored for the first section. The most robust solution, and the one with the best sound quality, was given by the simplest solution, i.e. a delay and sum beamformer that cues in on the direct path only and neglects all multipath contributions. Tests were performed with a four-microphone array in a highly reverberant room with both music and fan type noise as jammers, SNR improvements of 10 dB were typical with no audible distortion. >

...read moreread less

141 citations

Patent•DOI•

Preprocessing system for speech recognition

[...]

William Stuart Meisel, W. Andreas Wittenstein

19 Nov 1990-Journal of the Acoustical Society of America

TL;DR: In this article, a set of speaker specific enrollment parameters for normalizing analysis parameters including the speaker's pitch, the frequency spectrum of the speech as a function of time, and certain measurements of speech signal in the time-domain.

...read moreread less

Abstract: The present invention processes an independent body of speech during an enrollment process and creates a set of speaker specific enrollment parameters for normalizing analysis parameters including the speaker's pitch, the frequency spectrum of the speech as a function of time, and certain measurements of the speech signal in the time-domain. A particular objective of the invention is to make these analysis parameters have the same meaning from speaker to speaker. Thus after the pre-processing performed by this invention, the parameters would look much the same for the same word independent of speaker. In this manner, variations in the speech signal caused by the physical makeup of a speaker's throat, mouth, lips, teeth, and nasal cavity would be, at least in part, reduced by the pre-processing.

...read moreread less

92 citations

Journal Article•DOI•

An introduction to speech and speaker recognition

[...]

Richard D. Peacocke¹, Daryl H. Graf¹•Institutions (1)

bell northern research¹

01 Aug 1990-IEEE Computer

TL;DR: In this article, five approaches that can be used to control and simplify the speech recognition task are examined: isolated words, speaker-dependent systems, limited vocabulary size, a tightly constrained grammar, and quiet and controlled environmental conditions.

...read moreread less

Abstract: Five approaches that can be used to control and simplify the speech recognition task are examined. They entail the use of isolated words, speaker-dependent systems, limited vocabulary size, a tightly constrained grammar, and quiet and controlled environmental conditions. The five components of a speech recognition system are described: a speech capture device, a digital signal processing module, preprocessed signal storage, reference speech patterns, and a pattern-matching algorithm. Current speech recognition systems are reviewed and categorized. Speaker recognition approaches and systems are also discussed. >

...read moreread less

87 citations

Journal Article•DOI•

Speech recognition in noisy environments with the aid of microphone arrays

[...]

Dirk Van Compernolle¹, W Ma¹, Fei Xie¹, Marc Van Diest¹•Institutions (1)

Katholieke Universiteit Leuven¹

01 Dec 1990-Speech Communication

TL;DR: In this paper, the authors presented a microphone array adaptive beamformer with a dual function, which is suited to transmission as well as to use as input to speech recognition systems. But the performance of the beamformer was limited.

...read moreread less

84 citations

Patent•DOI•

Voice response unit

[...]

Sedat I. Gokcen¹, Roy V. Grubbe¹, Robert J. Perdue¹•Institutions (1)

Bell Labs¹

28 Mar 1990-Journal of the Acoustical Society of America

TL;DR: A voice response unit for transmitting voice prompt messages to customers and for receiving messages generated by customers in response to the voice prompt message has a speech recognizer for recognizing customer commands used to control operation of the unit.

...read moreread less

Abstract: A voice response unit for transmitting voice prompt messages to customers and for receiving messages generated by customers in response to the voice prompt message. The unit has a speech recognizer for recognizing customer commands used to control operation of the voice response unit. Apparatus interconnects a voice decoder and voice recorder with a telephone line to transmit a generated voice prompt message and to recieve a customer message in response thereto from a calling customer coupled with the telephone line. The apparatus is also coupled with the speech recognizer and responds to receipt of a customer command combined with a portion of the transmitted voice prompt message reflected from the telephone line by cancelling the reflected voice prompt message with the transmitted voice prompt message thereby enabling the speech recognizer to respond to the customer command during transmission of the voice prompt message and interrupt the voice prompt message.

...read moreread less

68 citations

Patent•DOI•

Fading bit error protection for digital cellular multi-pulse speech coder

[...]

Richard Louis Zinser¹, Steven R. Koch¹, Raymond L. Toy¹•Institutions (1)

General Electric¹

13 Sep 1990-Journal of the Acoustical Society of America

TL;DR: Protection of a digital multi-pulse speech coder from fading pattern bit errors common in a digital mobile radio channel is accomplished with error detection techniques which are simple to implement and require no error correcting codes.

...read moreread less

Abstract: Protection of a digital multi-pulse speech coder from fading pattern bit errors common in a digital mobile radio channel is accomplished with error detection techniques which are simple to implement and require no error correcting codes. A synthetic regeneration algorithm is employed which uses only the perceptually significant bits in the transmitted frame. Separate parity checksums for line spectrum pair frequency data, pitch lag data and pulse amplitude data are added to each frame of speech coder bits in the transmitter. The bits are then transmitted through a mobile environment susceptible to fading that induces bursty error patterns in the stream. At the receiving station, the parity checksum bits and speech coder bits are used to determine if an error has occurred in a particular section of the bit stream. Detected errors are flagged and supplied to the speech decoder. The speech decoder uses the error flags to modify its output signal so as to minimize perceptual artifacts in the output speech. Separate checksums are developed for subsets of line spectrum pair (LSP) coefficients and related speech data, whereby a single subset may be error-detected and replaced, rather than an entire frame.

...read moreread less

59 citations

Patent•DOI•

Speech quality improvement for voice coders and synthesizers

[...]

Daehyoung Hong¹, Michael D. Kotzin¹, Anthony P. van den Heuvel¹•Institutions (1)

Motorola¹

22 Oct 1990-Journal of the Acoustical Society of America

TL;DR: In this article, a harmonic signal is created from a limited spectral representation of a voice signal, which is combined with the at least a portion of the limited delayed spectral signal to provide a reconstructed speech signal having perceptually improved audio quality.

...read moreread less

Abstract: A harmonic signal is created from a limited spectral representation of a voice signal. The harmonic signal is combined with the at least a portion of the limited delayed spectral signal to provide a reconstructed speech signal having perceptually improved audio quality.

...read moreread less

53 citations

Journal Article•DOI•

Text-to-speech conversion technology

[...]

Michael H. O'Malley

01 Aug 1990-IEEE Computer

TL;DR: The historical and theoretical bases of contemporary high-performance text-to-speech (TTS) systems and their current design are discussed, with particular reference to vocal tract models.

...read moreread less

Abstract: The historical and theoretical bases of contemporary high-performance text-to-speech (TTS) systems and their current design are discussed. The major elements of a TTS system are described, with particular reference to vocal tract models. The stages involved in the process of converting text into speech parameters are examined, covering text normalization, word pronunciation, prosodies, phonetic rules, voice tables, and hardware implementation. Examples are drawn mainly from Berkeley Speech Technologies' proprietary text-to-speech system, T-T-S, but other approaches are indicated briefly. >

...read moreread less

51 citations

Proceedings Article•DOI•

Pitch period determination of aperiodic speech signals

[...]

Per Hedelin¹, Dieter Huber¹•Institutions (1)

Chalmers University of Technology¹

03 Apr 1990

TL;DR: A PDA is presented which outperforms the other methods regarding correct voicing decision and pitch estimation in quasi-periodic as well as in aperiodic speech signals.

...read moreread less

Abstract: The problem of pitch determination in aperiodic speech signals and its relevance for practical computer speech applications is discussed. Four patterns of aperiodic voice excitation are distinguished systematically with respect to their acoustical characteristics and their distributional properties between different speakers (female and male) and different kinds of text boundaries. Several pitch determination algorithms (PDAs), including both time-domain and short-term analysis approaches such as the Gold-Rabiner algorithm, the SIFT algorithm, and the cepstrum method, are evaluated for their capacity to detect and identify these patterns correctly in continuous human speech. A PDA is presented which outperforms the other methods regarding correct voicing decision and pitch estimation in quasi-periodic as well as in aperiodic speech signals. >

...read moreread less

Journal Article•DOI•

High-quality coding of telephone speech and wideband audio

[...]

Nuggehally Sampath Jayant¹•Institutions (1)

Bell Labs¹

01 Jan 1990-IEEE Communications Magazine

TL;DR: Digital speech technology is reviewed, with the emphasis on applications demanding high-quality reproduction of the speech signal, which include the important subclass of wideband speech.

...read moreread less

Abstract: Digital speech technology is reviewed, with the emphasis on applications demanding high-quality reproduction of the speech signal. Examples of such applications are network telephony, ISDN terminals for audio teleconferencing, and systems for the storage of audio signals, which include the important subclass of wideband speech. Depending on the application, the bandwidth of input speech can vary from about 3 kHz to nearly 20 kHz. Coding for digital telephony at 4 and 8 kb/s, network quality coding at 16 kb/s, and coding for audio at 7 and 20 kHz are examined. Future directions in the field are discussed with respect to anticipated technology applications and the algorithms needed to support these technologies. >

...read moreread less

Journal Article•DOI•

Putting speech recognition to work in the telephone network

[...]

Matthew Lennig¹•Institutions (1)

bell northern research¹

01 Aug 1990-IEEE Computer

TL;DR: The use of speaker-independent speech recognition in the development of Northern Telecom's automated alternate billing service (AABS) for collect calls, third-number-billed calls, and calling-card-b billed calls is discussed.

...read moreread less

Abstract: The use of speaker-independent speech recognition in the development of Northern Telecom's automated alternate billing service (AABS) for collect calls, third-number-billed calls, and calling-card-billed calls is discussed. The AABS system automates a collect call by recording the calling party's name, placing a call to the called party, playing back the calling party's name to the called party, informing the called party that he or she has a collect call from that person, and asking. 'Will you pay for the call?' The operation of AABS, the architecture of the voice interface, and the speech recognition algorithm are described, and the accuracy of the recognizer is discussed. AABS relies on isolated-word recognition, although more advanced techniques that can recognize continuous speech are being pursued. >

...read moreread less

Journal Article•DOI•

Corrective and reinforcement learning for speaker-independent continuous speech recognition

[...]

Kai-Fu Lee¹, Sanjoy Mahajan¹•Institutions (1)

Carnegie Mellon University¹

01 Jul 1990-Computer Speech & Language

TL;DR: This work uses cross-validation to increase the effective training size and introduces a near-miss sentence hypothesization algorithm for continuous speech training that resulted in over 20% error reductions both with and without grammar.

...read moreread less

Patent•DOI•

Telephone terminal device having speech recognition unit

[...]

Masanobu Shimanuki¹•Institutions (1)

Toshiba¹

08 Feb 1990-Journal of the Acoustical Society of America

TL;DR: It is desirable for this device to be provided with a circuit that prevents generation of ringing tones when an incoming call arrives and a circuit to reduce the level of signals send from a telephone network to the receiver when the speech recognition unit receives speech signals from the transmitter microphone.

...read moreread less

Abstract: A telephone terminal device equipped with a transmitter microphone, a receiver, a speech recognition unit that receives and recognizes speech signals from the transmitter microphone and a circuit to reduce the level of signals send from a telephone network to the receiver when the speech recognition unit receives speech signals from the transmitter microphone. Further, this device is preferably equipped with a speech reproduction unit that reproduces the speech information stored in a memory, in response to the information of recognition result from the speech recognition unit, and a circuit that prevents transmission of signals from the telephone network to the receiver when the regenerated speech information is sent to the receiver. Furthermore, it is desirable for this device to be provided with a circuit that prevents generation of ringing tones when an incoming call arrives.

...read moreread less

Proceedings Article•DOI•

Predictive coding of speech using analysis-by-synthesis techniques

[...]

P. Kroon, B.S. Atal

05 Nov 1990

TL;DR: Different excitation signals are discussed, as well as procedures for determining the various coder parametsrs, which are based on analysis-by-synthesis techniques.

...read moreread less

Abstract: This paper presents an overview of analysis-by-synthesis techniques used for low bit rate coding of speech signals. Analysis-by-synthesis procedures use linear predictors to remove the redundancies in the speech signal. The remaining difference signal is not quantized directly, but is replaced by an excitation signa1 that can be represented with a low number of bits. The selection of this signal is typically based on an exhaustive search procedure, in which for each prototype excitation the corresponding speech signal is constructed. The average mean-squared error between the original and the reconstructed signal is used as a criterion to determine the best choice of Lhe excitation signal. In this paper, different excitation signals are discussed, as well as procedures for determining the various coder parametsrs. In addition, the paper discusses some recently proposed speech coding standards, which are based on analysis-by-synthesis techniques.

...read moreread less

Proceedings Article•DOI•

Cross-language voice conversion

[...]

Masanobu Abe, Kiyohiro Shikano, Hisao Kuwabara

03 Apr 1990

TL;DR: A model for cross-language voice conversion is described and the converted speech from male to female is as understandable as the unconverted speech and, moreover, it is recognized as female speech.

...read moreread less

Abstract: First, the part of spectral difference that is due to the difference in language is assessed. This is investigated using a bilingual speaker's speech data. It is found that the interlanguage (between English and Japanese) difference is smaller than the interspeaker difference. Listening tests indicate that the difference between English and Japanese is very small. Second, a model for cross-language voice conversion is described. In this approach, voice conversion is considered a mapping problem between two speakers' spectrum spaces. The spectrum spaces are represented by codebooks. From this point of view, a cross-language voice conversion model and measures for the model are proposed. The converted speech from male to female is as understandable as the unconverted speech and, moreover, it is recognized as female speech. >

...read moreread less

Proceedings Article•DOI•

A comparison of speech and typed input

[...]

Alexander G. Hauptmann¹, Alexander I. Rudnicky•Institutions (1)

Carnegie Mellon University¹

24 Jun 1990

TL;DR: An empirical study in which users were asked to enter digit strings into the computer by voice and by keyboard shows that speech is preferable for strings that require more than a few keystrokes.

...read moreread less

Abstract: Meaningful evaluation of spoken language interfaces must be based on detailed comparisons with an alternate, well-understood input modality, such as the keyboard. This paper presents an empirical study in which users were asked to enter digit strings into the computer by voice and by keyboard. Two different ways of verifying and correcting the spoken input were also examined using either voice or keyboard. Timing analyses were performed to determine which aspects of the interface were critical to speedy completion of the task. The results show that speech is preferable for strings that require more than a few keystrokes. The results emphasize the need for fast and accurate speech recognition, but also demonstrate how error correction and input validation are crucial components of a speech interface.

...read moreread less

Journal Article•DOI•

Design issues relevant to developing an integrated voice/data mobile radio system

[...]

H.P. Stern

01 Nov 1990-IEEE Transactions on Vehicular Technology

TL;DR: In this article, the authors discuss various design issues related to developing an integrated voice/data mobile radio system, including high speed digital radio frequency modulation in a mobile environment, statistics for the talkspurt/silence gap composition of speech, switching schemes for voice/Data integration, encoding techniques, and voice and data traffic statistics.

...read moreread less

Abstract: The various design issues related to developing an integrated voice/data mobile radio system, including high speed digital radio frequency modulation in a mobile environment, statistics for the talkspurt/silence gap composition of speech, switching schemes for voice/data integration, encoding techniques, and voice and data traffic statistics are discussed. A performance analysis is conducted for a typical design, showing that a voice-only mobile radio system can be upgraded to an integrated voice/data system capable of carrying the full voice and data loads without requiring additional radio channels and without compromising voice performance. Data traffic is only minimally delayed (46.2 ms mean delay) for a fully loaded system. >

...read moreread less

Proceedings Article•DOI•

Improved excitation for phonetically-segmented VXC speech coding below 4 kb/s

[...]

S. Wang¹, Allen Gersho¹•Institutions (1)

University of California, Santa Barbara¹

02 Dec 1990

TL;DR: Techniques for combating two types of distortion that degrade the quality of vector excitation coded (VXC) speech are presented and it is shown that the first technique can benefit any VXC coder, whereas the second is applicable specifically when phonetic segmentation is used as a front end to V XC coders.

...read moreread less

Abstract: Techniques for combating two types of distortion that degrade the quality of vector excitation coded (VXC) speech are presented One degradation, the presence of noiselike components between the intended harmonics in voiced speech segments, is reduced by adaptive comb filtering, controlled by a smoothed pitch estimate The other degradation arises with front vowel sounds whose second and third formants tend to be attenuated in VXC coders This is improved by adding high-frequency emphasis to the perceptual weighting when computing the distortion between original and reconstructed speech It is shown that the first technique can benefit any VXC coder, whereas while the second is applicable specifically when phonetic segmentation is used as a front end to VXC coders >

...read moreread less

Journal Article•DOI•

Perception of sentences, words, and speech features by profoundly hearing-impaired children using a multichannel electrotactile speech processor.

[...]

Robert Cowan¹, Peter J. Blamey, Karyn L. Galvin, Julia Z. Sarant, Joseph I. Alcantara, Graeme M. Clark - Show less +2 more•Institutions (1)

University of Melbourne¹

01 Sep 1990-Journal of the Acoustical Society of America

TL;DR: Speech perception testing and speech discrimination results indicate that, given sufficient training, children can utilize speech feature information provided through the Tickle Talker to improve discrimination of words and sentences.

...read moreread less

Abstract: Fourteen prelinguistically profoundly hearing-impaired children were fitted with the multichannel electrotactile speech processor (Tickle Talker) developed by Cochlear Pty. Ltd. and the University of Melbourne. Each child participated in an ongoing training and evaluation program, which included measures of speech perception and production. Results of speech perception testing demonstrate clear benefits for children fitted with the device. Thresholds for detection of pure tones were lower for the Tickle Talker than for hearing aids across the frequency range 250-4000 Hz, with the greatest tactual advantage in the high-frequency consonant range (above 2000 Hz). Individual and mean speech detection thresholds for the Ling 5-sound test confirmed that speech sounds were detected by the electrotactile device at levels consistent with normal conversational speech. Results for three speech feature tests showed significant improvement when the Tickle Talker was used in combination with hearing aids (TA) as compared with hearing aids along (A). Mean scores in the TA condition increased by 11% for vowel duration, 20% for vowel formant, and 25% for consonant manner as compared with hearing aids alone. Mean TA score on a closed-set word test (WIPI) was 48%, as compared with 32% for hearing aids alone. Similarly, mean WIPI score for the combination of Tickle Talker, lipreading, and hearing aids (TLA) increased by 6% as compared with combined lipreading and hearing aid (LA) scores. Mean scores on open-set sentences (BKB) showed a significant increase of 21% for the tactually aided condition (TLA) as compared with unaided (LA). These results indicate that, given sufficient training, children can utilize speech feature information provided through the Tickle Talker to improve discrimination of words and sentences. These results indicate that, given sufficient training, children can utilize speech feature information provided through the Tickle Talker to improve discrimination of words and sentences. These results are consistent with improvement in speech discrimination previously reported for normally hearing and hearing-impaired adults using the device. Anecdotal evidence also indicates some improvements in speech production for children fitted with the Tickle Talker.

...read moreread less

Journal Article•DOI•

Coding of speech and wideband audio

[...]

Nikil S. Jayant¹, Victor B. Lawrence¹, Dimitrios P. Prezas¹•Institutions (1)

Bell Labs¹

10 Sep 1990-AT&T technical journal

TL;DR: Advances in coding algorithms and digital signal processing have led to sophisticated technologies for speech communication for a variety of applications, as well as to greater flexibilities in the design of ISDN terminals, which implies stereo teleconferencing or dual-language programming over a 64-kb/s channel.

...read moreread less

Abstract: Advances in coding algorithms and digital signal processing have led to sophisticated technologies for speech communication for a variety of applications, as well as to greater flexibilities in the design of ISDN terminals for integrated communication of speech, images, and data. For traditional telephony with a signal bandwidth of 3.2 kHz, the transmission rate for network-quality speech is now down to 16 kb/s. Robust communications-quality speech appropriate for cellular radio has been realized at 8 kb/s. Research attention is shifting toward 4 kb/s, focused on improving speaker identification and the naturalness of coded speech. For wideband audio with a signal bandwidth of 7 kHz, high-quality coding is now possible at 32 kb/s, which implies stereo teleconferencing or dual-language programming over a 64-kb/s channel. Transparent coding of 20-kHz audio has been demonstrated at 128 kb/s, with near-transparent performance at rates as low as 64 kb/s for some classes of signals.

...read moreread less

Patent•

Speech detector with improved line-fault immunity

[...]

Yushi C¹, O Mitsubishi Denki K.K. Naito¹, Kazuo C, O Mitsubishi Denki K.K. Saito•Institutions (1)

Mitsubishi¹

21 Jun 1990

TL;DR: In this paper, a speech detector has an intensity detector that indicates whether the intensity of a PCM signal exceeds a first threshold, and a normal zero-crossing-count detector, which is combined by AND logic to produce the output of the speech detector.

...read moreread less

Abstract: A speech detector has an intensity detector that indicates whether the intensity of a PCM signal exceeds a first threshold, and a normal-zero-crossing-count detector that indicates whether the zero-crossing count of the PCM signal exceeds a second threshold. The outputs of the intensity detector and normal-zero-crossing-count detector are combined by AND logic to produce the output of the speech detector. The second threshold is set well below the minimum zero-crossing count occurring in normal speech, the function of the normal-zero-crossing-count detector being to disable speech detection during line faults.

...read moreread less

Patent•DOI•

Speech pattern correction device for deaf and voice-impaired

[...]

Stephen A. Davis

26 Feb 1990-Journal of the Acoustical Society of America

TL;DR: A portable voice or speech aid enabling a deaf or voice impaired user to make sounds into a microphone to output intelligible speech through a built-in speaker or to a text display screen is described in this article.

...read moreread less

Abstract: A portable voice or speech aid enabling a deaf or voice impaired user to make sounds into a microphone to output intelligible speech through a built-in speaker or to a text display screen.

...read moreread less

Journal Article•DOI•

Speech coding technology for ATM networks

[...]

Nobuhiko Kitawaki, H. Nagabuchi, M. Taka, K. Takahashi

01 Jan 1990-IEEE Communications Magazine

TL;DR: The performance levels for increasing cell loss are compared for various speech coding methods, in combination with methods for dividing coded speech signals into cells and discarding cells.

...read moreread less

Abstract: A type of speech coding for asynchronous transfer mode (ATM) is described. Cell processing, which improves service quality, is taken into account. Missing-cell recovery methods are discussed, and the distinctive features of missing-cell recovery methods used with low-bit-rate coding are examined. An example of the speech quality obtained using speech coding techniques in the ATM networks is described. The performance levels for increasing cell loss are compared for various speech coding methods, in combination with methods for dividing coded speech signals into cells and discarding cells. Representative feasible network applications of coding technologies are considered. >

...read moreread less

Patent•DOI•

Method and apparatus for speech analysis and speech recognition

[...]

John W. Jackson¹•Institutions (1)

IBM¹

05 Nov 1990-Journal of the Acoustical Society of America

TL;DR: In this article, a method and apparatus for speech analysis and speech recognition is described, where each speech utterance under examination in accordance with the method of the present invention is digitally sampled and represented as a temporal sequence of data frames.

...read moreread less

Abstract: A method and apparatus are disclosed for speech analysis and speech recognition. Each speech utterance under examination in accordance with the method of the present invention is digitally sampled and represented as a temporal sequence of data frames. Each data frame is then analyzed by the application of a Fast Fourier Transform (FFT) to obtain an indication of the energy content of each data frame in a plurality of frequency bands or bins. An indication of each of the most significant frequency bands, in terms of energy content, are then plotted by bin number for all data frames and graphically combined to create a power content signature for the speech utterance which is indicative of the movement of audio power through the audio spectrum over time for that utterance. By comparing the power content signature of an unknown speech utterance to a number of previously stored power content signatures, each associated with a known utterance, it is possible to identify an unknown speech utterance with a high degree of accuracy. In one preferred embodiment of the present invention, comparisons of power content signatures from unknown speech utterances are made with stored power content signatures utilizing a least squares fit or other suitable technique.

...read moreread less

Patent•DOI•

Device for the digital recording and reproduction of speech signals

[...]

Peter Meyer¹, Rudolf Dipl Ing Hofmann¹•Institutions (1)

Philips¹

03 Dec 1990-Journal of the Acoustical Society of America

TL;DR: It is proposed to encode speech signals by means of a residual signal speech encoder to reduce the quantity of data to be stored without noticeably affecting the acoustic quality of the speech.

...read moreread less

Abstract: Devices for the digital recording and reproduction of speech signals are used, for example in answering apparatus. In order to reduce the quantity of data to be stored without noticeably affecting the acoustic quality of the speech, it is proposed to encode speech signals by means of a residual signal speech encoder.

...read moreread less

Patent•DOI•

Speech coding and decoding system with background sound reproducing function

[...]

Asakawa Yoshiaki¹, Aritsuka Toshiyuki¹•Institutions (1)

Hitachi¹

20 Apr 1990-Journal of the Acoustical Society of America

TL;DR: In speech decoding, a transmission code is received and whether or not there is a code error is detected on the basis of the error correcting code and artificially background sound corresponding to the decoded speech is generated from characteristic parameters indicating unvoiced sound in the decodes speech.

...read moreread less

Abstract: In speech decoding, a transmission code, which includes an error correcting code added to a speech code, is received and whether or not there is a code error is detected on the basis of the error correcting code. At this time, when there is no code error or when the detected code error has been corrected, a normal speech decoding processing is executed. On the other hand, when there is a code error which is impossible to be corrected, artificially background sound corresponding to the decoded speech is generated from characteristic parameters indicating unvoiced sound in the decoded speech. The parameters are continuously extracted from the decoded speech, stored in a memory and are used to replace an erroneous portion of the speech code.

...read moreread less

Proceedings Article•

A noise robust speech recognition system.

[...]

Masahiro Hamada, Yumi Takizawa, Takeshi Norimatsu

01 Jan 1990

Journal Article•DOI•

Source coding of speech and video signals

[...]

S. Singhal, D. Le Gall, C.-T. Chen

01 Jul 1990

TL;DR: Predictive coding of speech, multipulse and code-excited coders and frequency-domain coders, and intraframe and still image coding and interframe coding are examined for the coding of image and video signals.

...read moreread less

Abstract: Some digital source coding techniques for speech and video are reviewed. Predictive coding of speech, multipulse and code-excited coders and frequency-domain coders are discussed and compared for the coding of speech signals, and intraframe and still image coding and interframe coding are examined for the coding of image and video signals. The emphasis is on those algorithms that offer high compression while maintaining the perceptual quality of the source signals are discussed. Some algorithms that are general waveform coding algorithms and do not strictly depend on the input source are included. >

...read moreread less