Showing papers on "Voice activity detection published in 1995"

PDF

Open Access

Journal Article•DOI•

Speech recognition in noisy environments: a survey

[...]

Yifan Gong¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

01 Apr 1995-Speech Communication

TL;DR: The survey indicates that the essential points in noisy speech recognition consist of incorporating time and frequency correlations, giving more importance to high SNR portions of speech in decision making, exploiting task-specific a priori knowledge both of speech and of noise, using class-dependent processing, and including auditory models in speech processing.

...read moreread less

712 citations

Patent•

Knowledge-based speech recognition system and methods having frame length computed based upon estimated pitch period of vocalic intervals

[...]

Edward Komissarchik, Vladimir L. Arlazarov, Dimitri Bogdanov, Yuri Finkelstein, Andrei Ivanov, Jacob Kaminsky, Julia Komissarchik, Olga Krivnova, Mikhail Kronrod, Mikhail Malkovsky, Maxim Paklin, Alexander Rozanov, Vladimir Segal, Nina Zinovieva - Show less +10 more

07 Nov 1995

TL;DR: In this article, a knowledge-based speech recognition apparatus and methods are provided for translating an input speech signal to text, which employ a largely speaker independent dictionary based upon the application of phonological and phonetic/acoustic rules to generate acoustic event transcriptions against which the series of hypothesized acoustic feature vectors are compared to select word choices.

...read moreread less

Abstract: Knowledge based speech recognition apparatus and methods are provided for translating an input speech signal to text. The speech recognition apparatus captures an input speech signal, segments it based on the detection of pitch period, and generates a series of hypothesized acoustic feature vectors for the input speech signal that characterizes the signal in terms of primary acoustic events, detectable vowel sounds and other acoustic features. The apparatus and methods employ a largely speaker-independent dictionary based upon the application of phonological and phonetic/acoustic rules to generate acoustic event transcriptions against which the series of hypothesized acoustic feature vectors are compared to select word choices. Local and global syntactic analysis of the word choices is provided to enhance the recognition capability of the methods and apparatus.

...read moreread less

483 citations

Book•

Digital Speech: Coding for Low Bit Rate Communication Systems

[...]

A. Kindoz, Ahmet M. Kondoz

01 Feb 1995

TL;DR: A detailed account of the most recently developed digital speech coders designed specifically for use in the evolving communications systems, including an in-depth examination of the important topic of code excited linear prediction (CELP).

...read moreread less

Abstract: From the Publisher: A detailed account of the most recently developed digital speech coders designed specifically for use in the evolving communications systems. Discusses the variety of speech coders utilized with such new systems as MBE IMMARSAT-M. Includes an in-depth examination of the important topic of code excited linear prediction (CELP).

...read moreread less

453 citations

Journal Article•DOI•

A mixed excitation LPC vocoder model for low bit rate speech coding

[...]

Alan V. McCree¹, Thomas P. Barnwell¹•Institutions (1)

Georgia Institute of Technology¹

01 Jul 1995-IEEE Transactions on Speech and Audio Processing

TL;DR: A new mixed excitation LPC vocoder model is presented that preserves the low bit rate of a fully parametric model but adds more free parameters to the excitation signal so that the synthesizer can mimic more characteristics of natural human speech.

...read moreread less

Abstract: Traditional pitch-excited linear predictive coding (LPC) vocoders use a fully parametric model to efficiently encode the important information in human speech. These vocoders can produce intelligible speech at low data rates (800-2400 b/s), but they often sound synthetic and generate annoying artifacts such as buzzes, thumps, and tonal noises. These problems increase dramatically if acoustic background noise is present at the speech input. This paper presents a new mixed excitation LPC vocoder model that preserves the low bit rate of a fully parametric model but adds more free parameters to the excitation signal so that the synthesizer can mimic more characteristics of natural human speech. The new model also eliminates the traditional requirement for a binary voicing decision so that the vocoder performs well even in the presence of acoustic background noise. A 2400-b/s LPC vocoder based on this model has been developed and implemented in simulations and in a real-time system. Formal subjective testing of this coder confirms that it produces natural sounding speech even in a difficult noise environment. In fact, diagnostic acceptability measure (DAM) test scores show that the performance of the 2400-b/s mixed excitation LPC vocoder is close to that of the government standard 4800-b/s CELP coder. >

...read moreread less

352 citations

Proceedings Article•DOI•

Noise estimation techniques for robust speech recognition

[...]

Hans-Günter Hirsch, C. Ehrlicher

09 May 1995

TL;DR: Two new techniques are presented to estimate the noise spectra or the noise characteristics for noisy speech signals and can be combined with a nonlinear spectral subtraction scheme to enhance noisy speech and to improve the performance of speech recognition systems.

...read moreread less

Abstract: Two new techniques are presented to estimate the noise spectra or the noise characteristics for noisy speech signals No explicit speech pause detection is required Past noisy segments of just about 400 ms duration are needed for the estimation Thus the algorithm is able to quickly adapt to slowly varying noise levels or slowly changing noise spectra This techniques can be combined with a nonlinear spectral subtraction scheme The ability can be shown to enhance noisy speech and to improve the performance of speech recognition systems Another application is the realization of a robust voice activity detection

...read moreread less

273 citations

Patent•

Word tagging and editing system for speech recognition

[...]

Douglas Holt, Michael K. Davis, Joseph H. Miglietta

13 Nov 1995

TL;DR: In this article, a word tagging and editing system for speech recognition receives recognized speech text from a speech recognition engine, and creates tagging information that follows the speech text as it is received by a word processing program or other program.

...read moreread less

Abstract: A word tagging and editing system for speech recognition receives recognized speech text from a speech recognition engine, and creates tagging information that follows the speech text as it is received by a word processing program or other program. The body of text to be edited in connection with the word processing program may be selected and cut and pasted and otherwise manipulated, and the tags follow the speech text. A word may be selected by a user, and the tag information used to point to a sound bite within the audio data file created initially by the speech recognition engine. The sound bite may be replayed to the user through a speaker. The practical results include that the user may confirm the correctness of a particular recognized word, in real time whilst editing text in the word processor. If the recognition is manually corrected, the correction information may be supplied to the engine for use in updating a user profile for the user who dictated the audio that was recognized. Particular tagging approaches are employed depending on the particular word processor being used.

...read moreread less

188 citations

Patent•

Real-time reconfigurable adaptive speech recognition command and control apparatus and method

[...]

George A. Salazar, Dena S. Haynes, Marc J. Sommers

20 Sep 1995

TL;DR: In this paper, an adaptive speech recognition and control system and method for controlling various mechanisms and systems in response to spoken instructions is presented, in which spoken commands are effective to direct the system into appropriate memory nodes, and to respective appropriate memory templates corresponding to the voiced command.

...read moreread less

Abstract: An adaptive speech recognition and control system and method for controlling various mechanisms and systems in response to spoken instructions and in which spoken commands are effective to direct the system into appropriate memory nodes, and to respective appropriate memory templates corresponding to the voiced command Spoken commands from any of a group of operators for which the system is trained may be identified, and voice templates are updated as required in response to changes in pronunciation and voice characteristics over time of any of the operators for which the system is trained Provisions are made for both near-real-time retraining of the system with respect to individual terms which are determined not be positively identified, and for an overall system training and updating process in which recognition of each command and vocabulary term is checked, and in which the memory templates are retrained if necessary for respective commands or vocabulary terms with respect to an operator currently using the system In one embodiment, the system includes input circuitry connected to a microphone and including signal processing and control sections for sensing the level of vocabulary recognition over a given period and, if recognition performance falls below a given level, processing audio-derived signals for enhancing recognition performance of the system

...read moreread less

188 citations

Book•

Robustness in Automatic Speech Recognition: Fundamentals and Applications

[...]

Jean-Claude Junqua, Jean-Paul Haton

31 Oct 1995

TL;DR: This work analyzes the nature and perception of speech sounds, application domain, human factors, and dialogue, and the current technology and its limits: an overview of automatic speech recognition (ASR).

...read moreread less

Abstract: About the authors. Foreword. Preface. Part A: Speech communication by humans and machines. 1. Nature and perception of speech sounds. 2. Background on speech analysis. 3. Fundamentals of automatic speech recognition. Part B: Robustness in ASR: Problems and issues. 4. Speaker variability and specificity. 5. Dealing with noisy speech and channel distortions. Part C: Possible solutions and some perspectives. 6. The current technology and its limits: an overview. 7. Towards robust speech analysis. 8. On the use of a robust speech representation. 9. ASR of noisy, stressed, and channel distorted speech. 10. Word-spotting and rejection. 11. Spontaneous speech. 12. On the use of knowledge in ASR. 13. Application domain, human factors, and dialogue. Appendix. Index.

...read moreread less

178 citations

Patent•

Method and system for displaying speech recognition status information in a visual notification area

[...]

Michael J. Rozak¹, Juha P. Salin¹, James H. Spoltman¹, Ronald A. Belgau¹•Institutions (1)

Microsoft¹

31 Jul 1995

TL;DR: In this article, a speech recognition system provides a user with graphical and textual feedback, which is displayed in windows but occupies little of the available display space and is displayed only for a short period of time.

...read moreread less

Abstract: A speech recognition system provides a user with graphical and textual feedback. The textual feedback is displayed in windows but occupies little of the available display space and are displayed only for a short period of time. The graphical feedback is displayed in a designated notification area and does not obscure any other displayed items. The feedback provided by the speech recognition system may indicate a current mode of operation of the speech recognition system as well as a state of processing of audio input by the speech recognition system.

...read moreread less

166 citations

Proceedings Article•DOI•

Voice and data transmission over an 802.11 wireless network

[...]

M.A. Visser¹, M. El Zarki¹•Institutions (1)

Delft University of Technology¹

27 Sep 1995

TL;DR: Analysis of the transmission of voice and data over an 802.11 WLAN shows that a larger superframe length provides the opportunity for more voice conversations or a higher data throughput, but requires increasing the time to live for the speech bits to retain an acceptable quality.

...read moreread less

Abstract: This paper analyzes the transmission of voice and data over an 802.11 wireless local area network (WLAN). The data is transmitted in a contention based access period, while the voice samples are transmitted during a contention free period, based on a polling scheme. Because statistical multiplexing can be utilized, speech may be outdated when a poll arrives. The portion of outdated speech is then clipped to decrease the load on the channel. We analyze the quality of the voice conversations in terms of the percentage of bits clipped as well as the throughput of the data for various parameters. We show the boundary conditions involved in the transmission of voice over the WLAN and demonstrate the impact of a time-bounded service on the throughput during the contention period. The results show that the high overhead introduced by the 802.11 WLAN standard results in a low number of possible voice conversations. It can also be concluded that the cooperation of the contention based and contention free periods results in a poor performance. Further, variation of the maximum payload size reveals that the largest possible maximum payload size must be selected to minimize the percentage of clipped bits and maximize the throughput. Finally, we show that a larger superframe length provides the opportunity for more voice conversations or a higher data throughput, but requires increasing the time to live for the speech bits to retain an acceptable quality.

...read moreread less

164 citations

Patent•

Voice over data modem with selectable voice compression

[...]

Ping Li, Timothy D. Gunn, Jeffrey P. Davis

19 Jun 1995

TL;DR: In this paper, a personal communications system enables the operator to simultaneously transmit voice and data communication to a remote site, using a modified supervisory packet for negotiating communication parameters, including speech compression algorithm, the speech compression ratio, the communication multiplex scheme, and other operations needed for control of remote hardware interfaces.

...read moreread less

Abstract: A personal communications system enables the operator to simultaneously transmit voice and data communication to a remote site. The personal communications system is equipped with two telephone line interfaces to allow connection between two remote sites. The connection between the first remote site and the local site may operate in a voice over data communications mode to simultaneously send compressed voice and data. A digital transmission protocol which is consistent with current packet standards is used to create an independent channel through use of a modified supervisory packet for negotiating communication parameters, including the speech compression algorithm, the speech compression ratio, the communication multiplex scheme, and other operations needed for control of remote hardware interfaces.

...read moreread less

Proceedings Article•DOI•

On the effects of speech rate in large vocabulary speech recognition systems

[...]

Matthew A. Siegler¹, Richard M. Stern¹•Institutions (1)

Carnegie Mellon University¹

09 May 1995

TL;DR: It is suggested that phone rate is a more meaningful measure of speech rate than the more common word rate, and it is found that when data sets are clustered according to the phone rate metric, recognition errors increase when thePhone rate is more than 1 standard deviation greater than the mean.

...read moreread less

Abstract: It is well known that a higher-than-normal speech rate will cause the rate of recognition errors in large vocabulary automatic speech recognition (ASR) systems to increase. In this paper we attempt to identify and correct for errors due to fast speech. We first suggest that phone rate is a more meaningful measure of speech rate than the more common word rate. We find that when data sets are clustered according to the phone rate metric, recognition errors increase when the phone rate is more than 1 standard deviation greater than the mean. We propose three methods to improve the recognition accuracy of fast speech, each addressing different aspects of performance degradation. The first method is an implementation of Baum-Welch codebook adaptation. The second method is based on the adaptation of HMM state-transition probabilities. In the third method, the pronunciation dictionaries are modified using rule-based techniques and compound words are added. We compare improvements in recognition accuracy for each method using data sets clustered according to the phone rate metric. Adaptation of the HMM state-transition probabilities to fast speech improves recognition of fast speech by a relative amount of 4 to 6 percent.

...read moreread less

Patent•DOI•

Voice activity detection driven noise remediator

[...]

Thomas John Janiszewski¹, Michael Charles Recchione¹•Institutions (1)

AT&T¹

18 Jan 1995-Journal of the Acoustical Society of America

TL;DR: In this article, a voice activity detector uses an energy estimate to detect the presence of speech in a received speech signal in a noise environment, and a set of high pass filters are used to filter the signal based upon the background noise level.

...read moreread less

Abstract: A method and apparatus for improving sound quality in a digital cellular radio system receiver. A voice activity detector uses an energy estimate to detect the presence of speech in a received speech signal in a noise environment. When no speech is present the system attenuates the signal and inserts low pass filtered white noise. In addition, a set of high pass filters are used to filter the signal based upon the background noise level. This high pass filtering is applied to the signal regardless of whether speech is present. Thus, a combination of signal attenuation with insertion of low pass filtered white noise during periods of non-speech, along with high pass filtering of the signal, improves sound quality when decoding speech which has been encoded in a noisy environment.

...read moreread less

Patent•DOI•

Instantaneous context switching for speech recognition systems

[...]

Vince M. Stanford¹, Alice G. Klein¹, Norman Frederick Brickman¹•Institutions (1)

IBM¹

18 May 1995-Journal of the Acoustical Society of America

TL;DR: An instantaneous context switching speech recognition system is disclosed which enables a speech recognition application to be changed without loading new pattern matching data into the system.

...read moreread less

Abstract: An instantaneous context switching speech recognition system is disclosed which enables a speech recognition application to be changed without loading new pattern matching data into the system. Selectable pointer maps are included in the memory of the system which selectively change the relationship between words and phonemes between a first application context and the pattern matching logic to a second application context and the pattern matching logic.

...read moreread less

Proceedings Article•DOI•

Toward movement-invariant automatic lip-reading and speech recognition

[...]

Paul Duchnowski¹, M. Hunke¹, D. Busching¹, Uwe Meier¹, Alex Waibel¹ - Show less +1 more•Institutions (1)

Karlsruhe Institute of Technology¹

09 May 1995

TL;DR: In this article, a modular system for flexible human-computer interaction via speech is presented, which integrates acoustic and visual information (automatic lip-reading) improving overall recognition, especially in noisy environments.

...read moreread less

Abstract: We present the development of a modular system for flexible human-computer interaction via speech. The speech recognition component integrates acoustic and visual information (automatic lip-reading) improving overall recognition, especially in noisy environments. The image of the lips, constituting the visual input, is automatically extracted from the camera picture of the speaker's face by the lip locator module. Finally, the speaker's face is automatically acquired and followed by the face tracker sub-system. Integration of the three functions results in the first bi-modal speech recognizer allowing the speaker reasonable freedom of movement within a possibly noisy room while continuing to communicate with the computer via voice. Compared to audio-alone recognition, the combined system achieves a 20 to 50 percent error rate reduction for various signal/noise conditions.

...read moreread less

Proceedings Article•DOI•

Microsoft Windows highly intelligent speech recognizer: Whisper

[...]

Xuedong Huang¹, Alejandro Acero¹, F. Alleva¹, Mei-Yuh Hwang¹, Li Jiang¹, Milind Mahajan¹ - Show less +2 more•Institutions (1)

Microsoft¹

09 May 1995

TL;DR: The Whisper (Windows Highly Intelligent Speech Recognizer) represents significantly improved recognition efficiency, usability, and accuracy, when compared with the Sphinx-II system.

...read moreread less

Abstract: Since January 1993, the authors have been working to refine and extend Sphinx-II technologies in order to develop practical speech recognition at Microsoft. The result of that work has been the Whisper (Windows Highly Intelligent Speech Recognizer). Whisper represents significantly improved recognition efficiency, usability, and accuracy, when compared with the Sphinx-II system. In addition Whisper offers speech input capabilities for Microsoft Windows and can be scaled to meet different PC platform configurations. It provides features such as continuous speech recognition, speaker-independence, on-line adaptation, noise robustness, dynamic vocabularies and grammars. For typical Windows Command-and-Control applications (less than 1000 words), Whisper provides a software only solution on PCs equipped with a 486DX, 4MB of memory, and a standard sound card and a desk-top microphone.

...read moreread less

Patent•DOI•

Speech circuit controlling sidetone signal by background noise level

[...]

Osamu Watanabe

21 Jun 1995-Journal of the Acoustical Society of America

TL;DR: A speech circuit is disclosed which solves the serious problem of the degradation of the articulation of received speech voice in conventional circuits and permits pleasant communications at places where the background noise level is high.

...read moreread less

Abstract: A speech circuit is disclosed which solves the serious problem of the degradation of the articulation of received speech voice in conventional circuits and permits pleasant communications at places where the background noise level is high. The circuit has a construction in which an input signal from a microphone is attenuated in correspondence to the background noise level to form a sidetone signal and a received speech signal from a speech channel is amplified in correspondence to the background noise level to form a new received speech signal.

...read moreread less

Patent•

Method and system for reducing perplexity in speech recognition via caller identification

[...]

Paul S. Cohen¹, John M. Lucassen¹, Elton B. Sherwin¹, Jorge L. Vizcaino¹•Institutions (1)

IBM¹

05 Sep 1995

TL;DR: In this article, the identification of a caller is determined upon connection to the network via standard caller identification circuitry and upon detection of a spoken utterance, that utterance is processed against the core library, if the caller's identity cannot be determined, or against a particular caller-specific library.

...read moreread less

Abstract: A method and system are disclosed for reducing perplexity in a speech recognition system within a telephonic network based upon determined caller identity. In a speech recognition system which processes input frames of speech against stored templates representing speech, a core library of speech templates is created and stored representing a basic vocabulary of speech. Multiple caller-specific libraries of speech templates are also created and stored, each library containing speech templates which represent a specialized vocabulary and pronunciations for a specific geographic location and a particular individual. Additionally, the caller-specific libraries of speech templates are preferably processed to reflect the reduced bandwidth, transmission channel variations and other signal variations introduced into the system via a telephonic network. The identification of a caller is determined upon connection to the network via standard caller identification circuitry and upon detection of a spoken utterance, that utterance is processed against the core library, if the caller's identity cannot be determined, or against a particular caller-specific library, if the caller's identity can be determined, thereby greatly enhancing the efficiency and accuracy of speech recognition by the system.

...read moreread less

Journal Article•DOI•

Digital audio coding for visual communications

[...]

P. Noll

01 Jun 1995

TL;DR: Basic approaches to speech, wideband speech, and audio bit rate compressions in audiovisual communications are explained and it will become obvious that the use of the knowledge of auditory perception helps minimizing perception of coding artifacts and leads to efficient low bit rate coding algorithms which can achieve substantially more compression than was thought possible only a few years ago.

...read moreread less

Abstract: Current and future visual communications for applications such as broadcasting videotelephony, video- and audiographic-conferencing, and interactive multimedia services assume a substantial audio component. Even text, graphics, fax, still images, email documents, etc. will gain from voice annotation and audio clips. A wide range of speech, wideband speech, and wideband audio coders is available for such applications. In the context of audiovisual communications, the quality of telephone-bandwidth speech is acceptable for some videotelephony and videoconferencing services. Higher bandwidths (wideband speech) may be necessary to improve the intelligibility and naturalness of speech. High quality audio coding including multichannel audio will be necessary in advanced digital TV and multimedia services. This paper explains basic approaches to speech, wideband speech, and audio bit rate compressions in audiovisual communications. These signal classes differ in bandwidth, dynamic range, and in listener expectation of offered quality. It will become obvious that the use of our knowledge of auditory perception helps minimizing perception of coding artifacts and leads to efficient low bit rate coding algorithms which can achieve substantially more compression than was thought possible only a few years ago. The paper concentrates on worldwide source coding standards beneficial for consumers, service providers, and manufacturers. >

...read moreread less

Proceedings Article•DOI•

Speech enhancement based on temporal processing

[...]

Hynek Hermansky¹, Eric A. Wan¹, Carlos Avendano¹•Institutions (1)

Oregon Health & Science University¹

09 May 1995

TL;DR: In Informal listenings,inite impulse response (FIR) Wiener-like filters are applied to time trajectories of the cubic-root compressed short-term power spectrum of noisy speech recorded over cellular telephone communications and bring a noticeable improvement to the quality of processed noisy speech.

...read moreread less

Abstract: Finite impulse response (FIR) Wiener-like filters are applied to time trajectories of the cubic-root compressed short-term power spectrum of noisy speech recorded over cellular telephone communications. Informal listenings indicate that the technique brings a noticeable improvement to the quality of processed noisy speech while not causing any significant degradation to clean speech. Alternative filter structures are being investigated as well as other potential applications in cellular channel compensation and narrowband to wideband speech mapping.

...read moreread less

Patent•DOI•

Conference Bridge for Packetized Speech-Signal Networks

[...]

Thomas Wayne Anderson¹, Norman R. Tiedemann¹, Paul Wesley Vancil¹•Institutions (1)

AT&T¹

02 Feb 1995-Journal of the Acoustical Society of America

TL;DR: In this article, the authors propose a conference bridge that receives speech data in the form of data packets, and transmits data without transforming it in the conference bridge, based on the loudest speaker.

...read moreread less

Abstract: A conference bridge that receives speech data in the form of data packets, and transmits data in the same form, without transforming the data in the conference bridge. The conference bridge according to this invention includes a plurality of inputs that have speech detectors that detect the presence of speech data. The speech detectors report the presence of speech to a controller. The controller causes data packets from one of the inputs detecting speech to be replicated for all outputs. If there is speech at more than one input at a time, then a decision is made as to which input to replicate. Advantageously, the decision is based on who is the loudest speaker. Further, the data that is replicated is not sent to the output for the originator in order to prevent echo.

...read moreread less

Journal Article•DOI•

Keyword detection in conversational speech utterances using hidden Markov model based continuous speech recognition

[...]

Richard Rose¹•Institutions (1)

Bell Labs¹

01 Oct 1995-Computer Speech & Language

TL;DR: Decision tree based allophone clustering resulted in a significant increase in keyword detection performance over that obtained using tri-phone based subword units while at the same time reducing the size of the inventory of subword acoustic models by 40%.

...read moreread less

Journal Article•DOI•

ICARUS: source generator based real-time recognition of speech in noisy stressful and Lombard effect environments

[...]

John H. L. Hansen¹, Douglas A. Cairns¹•Institutions (1)

Duke University¹

01 Jun 1995-Speech Communication

TL;DR: The effects of additive background noise on speech quality and recognition parameters are discussed, and a source generator based framework to address stress and noise is proposed.

...read moreread less

Proceedings Article•

Audio-visual speech recognition compared across two architectures.

[...]

Ali Adjoudani, Christian Benoît

01 Jan 1995

Patent•

Speech-rate modification for linear-prediction based analysis-by-synthesis speech coders

[...]

Willem Bastiaan Kleijn¹•Institutions (1)

Alcatel-Lucent¹

04 Apr 1995

TL;DR: In this paper, the authors exploit the synergy between operations performed by a speech rate modification system and those operations performed in a speech coding system to provide a speech-rate modification system with reduced hardware requirements.

...read moreread less

Abstract: Synergy between operations performed by a speech-rate modification system and those operations performed in a speech coding system is exploited to provide a speech-rate modification system with reduced hardware requirements. The speech rate of an input signal is modified based on a signal representing a predetermined change in speech rate. The modified speech-rate signal is then filtered to generate a speech signal having increased short-term correlation. Modification of the input speech signal may be performed by inserting in the input speech signal a previous sequence of samples corresponding substantially to a pitch cycle. Alternatively, the input speech signal may be modified by removing from the input speech signal a sequence of samples corresponding substantially to a pitch cycle.

...read moreread less

Journal Article•DOI•

Lip synchronization using speech-assisted video processing

[...]

T. Chen¹, H.P. Graf¹, Kuansan Wang²•Institutions (2)

Bell Labs¹, AT&T²

01 Apr 1995-IEEE Signal Processing Letters

TL;DR: The marriage of speech analysis and image processing can solve problems related to lip synchronization and speech information is utilized to improve the quality of audio-visual communications such as videotelephony and videoconferencing.

...read moreread less

Abstract: We utilize speech information to improve the quality of audio-visual communications such as videotelephony and videoconferencing. In particular, the marriage of speech analysis and image processing can solve problems related to lip synchronization. We present a technique called speech-assisted frame-rate conversion. Demonstration sequences are presented. Other applications, including speech-assisted video coding, are outlined. >

...read moreread less

Proceedings Article•DOI•

Description Of The Proposed ITU-T 8 Kb/S Speech Coding Standard

[...]

R. Salami¹, C. Laftamme, J.-P. Adoul, A. Kataoba, S. Hayashi, C. Lamblin, D. Massaloux, S. Proust, P. Kroon, Y. Shoham - Show less +6 more•Institutions (1)

Université de Sherbrooke¹

20 Sep 1995

Patent•

Analysis of audio quality using speech recognition and synthesis

[...]

Hollier Michael Peter, Philip Julian Sheppard

17 Aug 1995

TL;DR: In this paper, an apparatus for monitoring signal quality in a communications link is provided which recognizes speech elements in signals received over the communications link and generates therefrom an estimate of the original speech signal, and compares the estimated signal with the actual received signal to provide an output based on the comparison.

...read moreread less

Abstract: An apparatus for monitoring signal quality in a communications link is provided which recognizes speech elements in signals received over the communications link and generates therefrom an estimate of the original speech signal, and compares the estimated signal with the actual received signal to provide an output based on the comparison.

...read moreread less

Patent•

Single processor programmable speech recognition test system

[...]

James H. Spoltman¹, Michael J. Rozak¹, Walter Wittel¹, Gregory G. Landon¹•Institutions (1)

Microsoft¹

27 Nov 1995

TL;DR: A speech recognition test system comprising a single host processing system having a host processor and a memory device, wherein the memory device contains a plurality of audio files accessible by the host processor, is described in this paper.

...read moreread less

Abstract: A speech recognition test system comprising a single host processing system having a host processor and a memory device, wherein the memory device contains a plurality of audio files accessible by the host processor. The test system also includes a speech recognition application having a vocabulary, an independent test application, a means for concurrently executing the speech recognition application and the independent test application on the host processor, a means for queuing the audio files as input to the speech recognition application by way of the test application, a means for programming the test application at configuration time to expand the vocabulary of the application being tested and/or other voice gender, volume, and speed playback parameters, and a means for capturing and evaluating test results from the speech recognition application by way of the test application. In an alternative embodiment the speech recognition test system includes an audio input/output device operatively connected to the host processor, and a means for redirecting output from the audio input/output system as input to itself.

...read moreread less

Proceedings Article•DOI•

Speech recognition in the Informedia Digital Video Library: uses and limitations

[...]

Alexander G. Hauptmann¹•Institutions (1)

Carnegie Mellon University¹

05 Nov 1995

TL;DR: The paper describes the Informedia Digital Video Library project and discusses how speech recognition is used for transcript creation from video, alignment with hand-generated transcripts, query interface and audio paragraph segmentation.

...read moreread less

Abstract: In principle, speech recognition technology can make any spoken data useful for library indexing and retrieval. The paper describes the Informedia Digital Video Library project and discusses how speech recognition is used for transcript creation from video, alignment with hand-generated transcripts, query interface and audio paragraph segmentation. The results show that speech recognition accuracy varies dramatically depending on the quality and type of data used. Our information retrieval experiments also show that reasonable recall and precision can be obtained with moderate speech recognition accuracy. Finally we discuss some active areas of speech research relevant to the digital video library problem.

...read moreread less

Collapse