Showing papers on "Voice activity detection published in 2006"

PDF

Open Access

Patent•

Method and system for considering information about an expected response when performing speech recognition

[...]

Keith Braho, Amro El-Jaroudi, Jeffrey Pike

02 Feb 2006

TL;DR: In this paper, a speech recognition system receives and analyzes speech input from a user in order to recognize and accept a response from the user, under certain conditions, information about the response expected from user may be available.

...read moreread less

Abstract: A speech recognition system receives and analyzes speech input from a user in order to recognize and accept a response from the user. Under certain conditions, information about the response expected from the user may be available. In these situations, the available information about the expected response is used to modify the behavior of the speech recognition system by taking this information into account. The modified behavior of the speech recognition system comprises adjusting the rejection threshold when speech input matches the predetermined expected response.

...read moreread less

517 citations

Patent•DOI•

Adjustable resource based speech recognition system

[...]

Ian M. Bennett, Bandi Ramesh Babu, Kishor Morkhandikar, Pallaki Gururaj

20 Nov 2006-Journal of the Acoustical Society of America

TL;DR: In this paper, a real-time speech recognition system includes distributed processing across a client and server for recognizing a spoken query by a user, where the partitioning of responsibility for speech recognition operations can be done on a client by client or connection by connection basis.

...read moreread less

Abstract: A real-time speech recognition system includes distributed processing across a client and server for recognizing a spoken query by a user. Both the client and server can dedicate a variable number of processing resources for performing speech recognition functions. The partitioning of responsibility for speech recognition operations can be done on a client by client or connection by connection basis.

...read moreread less

279 citations

Journal Article•DOI•

Voice activity detection based on multiple statistical models

[...]

Joon-Hyuk Chang¹, Nam Soo Kim², Sanjit K. Mitra³•Institutions (3)

Inha University¹, Seoul National University², University of California, Santa Barbara³

01 Jun 2006-IEEE Transactions on Signal Processing

TL;DR: This paper proposes a class of VAD algorithms based on several statistical models based on the Gaussian model, and incorporates the complex Laplacian and Gamma probability density functions to the analysis of statistical properties.

...read moreread less

Abstract: One of the key issues in practical speech processing is to achieve robust voice activity detection (VAD) against the background noise. Most of the statistical model-based approaches have tried to employ the Gaussian assumption in the discrete Fourier transform (DFT) domain, which, however, deviates from the real observation. In this paper, we propose a class of VAD algorithms based on several statistical models. In addition to the Gaussian model, we also incorporate the complex Laplacian and Gamma probability density functions to our analysis of statistical properties. With a goodness-of-fit tests, we analyze the statistical properties of the DFT spectra of the noisy speech under various noise conditions. Based on the statistical analysis, the likelihood ratio test under the given statistical models is established for the purpose of VAD. Since the statistical characteristics of the speech signal are differently affected by the noise types and levels, to cope with the time-varying environments, our approach is aimed at finding adaptively an appropriate statistical model in an online fashion. The performance of the proposed VAD approaches in both the stationary and nonstationary noise environments is evaluated with the aid of an objective measure.

...read moreread less

241 citations

Journal Article•DOI•

P.563—The ITU-T Standard for Single-Ended Speech Quality Assessment

[...]

Ludovic Malfait, J. Berger, M. Kastner

01 Nov 2006-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: This paper introduces a method for nonintrusive assessment of speech quality for narrow-band telephony, which was approved by the International Telecommunication Union (ITU-T) in May 2004, based on models of voice production and perception.

...read moreread less

Abstract: Objective voice quality assessment has been the subject of research for many years. Up until very recently, objective models required a copy of the unprocessed signal for estimating the quality of a signal transmitted across a telecommunication network, making live call monitoring impossible. This paper introduces a method for nonintrusive assessment of speech quality for narrow-band telephony, which was approved by the International Telecommunication Union (ITU-T) in May 2004. Essentially based on models of voice production and perception, the algorithm demonstrates good performance on more than 48 subjective experiments representing most distortions that occur on voice networks

...read moreread less

218 citations

Patent•

Robust separation of speech signals in a noisy environment

[...]

Erik Visser¹, Jeremy Toman¹, Kwokleung Chan¹•Institutions (1)

Qualcomm¹

21 Jul 2006

TL;DR: In this paper, a method for improving the quality of a speech signal extracted from a noisy acoustic environment is provided, where a signal separation process (180) is associated with a voice activity detector (185).

...read moreread less

Abstract: A method for improving the quality of a speech signal extracted from a noisy acoustic environment is provided. In one approach, a signal separation process (180) is associated with a voice activity detector (185). The voice activity detector (185) is a two-channel (178,182) detector, which enables a particularly robust and accurate detection of voice activity. When a speech is detected, the voice activity detector generates a control signal (411). The control signal (411) is used to activate, adjust, or control signal separation processes or post -processing operations (195) to improve the quality of the resulting speech signal. In another approach, a signal separation process (180) is provided as a learning stage (752) and an output stage (756). The learning stage (752) aggressively adjus to current acoustic conditions and passes coefficients to the output stage (756). The output stage (756) adapts more slowly and generates a speech-content signal (181,770) and a noise dominant signal (407,773). When the learning stage (752) becomes unstable only the learning stage (752) is reset, allowing the output stage (756) to continue outputting a high quality speech signal.

...read moreread less

217 citations

Book•

Speech Quality of VoIP: Assessment and Prediction

[...]

Alexander Raake

01 Jan 2006

TL;DR: This work focuses on the development of an E-model for Speech Quality in Telephony, which automates the very labor-intensive and therefore time-heavy and expensive process of modeling speech quality through simulation.

...read moreread less

Abstract: Preface. List of Abbreviations. Introduction. 1 Speech Quality in Telephony. 1.1 Speech. 1.2 Speech Quality. 2 Speech Quality Measurement Methods. 2.1 Auditory Methods. 2.2 Instrumental Methods. 2.3 Speech Quality Measurement Methods: Summary. 3 Quality Elements and Quality Features of VoIP. 3.1 Speech Transmission Using Internet Protocol. 3.2 Overview of Quality Elements. 3.3 Quality Elements and Related Features. 3.4 Quality Dimensions. 3.5 Combined Elements and Combined Features. 3.6 Listening and Conversational Features. 3.7 Desired Nature. 3.8 Open Questions. 3.9 From Elements to Features: Modeling VoIP Speech Quality. 3.10 Quality Elements and Quality Features of VoIP: Summary. 4 Time-Varying Distortion: Quality Features and Modeling. 4.1 Microscopic Loss Behavior. 4.2 Macroscopic Loss Behavior. 4.3 Interactivity. 4.4 Packet Loss and Combined Impairments. 4.5 Time-Varying Distortion: Summary. 5 Wideband Speech, Linear and Non Linear Distortion: Quality Features and Modeling. 5.1 Wideband Speech: Improvement Over Narrowband. 5.2 Bandpass-Filtered Speech. 5.3 Wideband Codecs. 5.4 Desired Nature. 6 From Elements to Features: Extensions of the E-model. 6.1 E-model: Packet Loss. 6.2 E-model: Additivity. 6.3 E-model: Wideband, Linear and Non-Linear Distortion. 7 Summary and Conclusions. 8 Outlook. A Aspects of a Parametric Description of Time-Varying Distortion. B Simulation of Quality Elements. C Frequency Responses. D Test Data Normalization and Transformation. E E-model Algorithm. F Interactive Short Conversation Test Scenarios (iSCTs). G Auditory Test Settings and Results. H Modeling Details. I Glossary. Bibliography. Index.

...read moreread less

211 citations

Patent•

Method and apparatus for improved estimation of non-stationary noise for speech enhancement

[...]

David Zhao, Willem Bastiaan Kleijn, Alexander Ypma, De Vries Bert

23 Aug 2006

TL;DR: In this article, the authors proposed a speech enhancement system that is able to suppress highly non-stationary noise, which can be adapted to a hearing aid or a headset, using a speech model and a noise model having at least one shape and gain.

...read moreread less

Abstract: A central aspect of the invention relates to a method of enhancing speech, the method comprising the steps of, receiving noisy speech comprising a clean speech component and a non-stationary noise component, providing a speech model, providing a noise model having at least one shape and a gain, dynamically modifying the noise model based on the speech model and the received noisy speech, enhancing the noisy speech at least based on the modified noise model Hereby is achieved a method of speech enhancement that is able to suppress highly non-stationary noise Another aspect of the invention relates to a speech enhancement system that may be adapted to be used in a hearing system, such as a hearing aid or a headset

...read moreread less

211 citations

Patent•

Method And System For Sharing Portable Voice Profiles

[...]

Jacqueline Mallett, Sunil Vemuri, N. Rao Machiraju

28 Sep 2006

TL;DR: In this paper, a speech recognition engine that utilizes portable voice profiles for converting recorded speech to text is presented, where each portable voice profile includes speaker-dependent data, and is configured to be accessible to a plurality of speech recognition engines through a common interface.

...read moreread less

Abstract: An embodiment of the present invention provides a speech recognition engine that utilizes portable voice profiles for converting recorded speech to text. Each portable voice profile includes speaker-dependent data, and is configured to be accessible to a plurality of speech recognition engines through a common interface. A voice profile manager receives the portable voice profiles from other users who have agreed to share their voice profiles. The speech recognition engine includes speaker identification logic to dynamically select a particular portable voice profile, in real-time, from a group of portable voice profiles. The speaker-dependent data included with the portable voice profile enhances the accuracy with which speech recognition engines recognize spoken words in recorded speech from a speaker associated with a portable voice profile.

...read moreread less

201 citations

Patent•

User adaptive speech recognition method and apparatus

[...]

Jung-Eun Kim¹, Jeong-Su Kim¹•Institutions (1)

Samsung¹

16 Feb 2006

TL;DR: In this paper, a user adaptive speech recognition method and apparatus is disclosed that controls user confirmation of a recognition candidate using a new threshold value adapted to a user, which includes calculating a confidence score of recognition candidate according to the result of speech recognition.

...read moreread less

Abstract: A user adaptive speech recognition method and apparatus is disclosed that controls user confirmation of a recognition candidate using a new threshold value adapted to a user. The user adaptive speech recognition method includes calculating a confidence score of a recognition candidate according to the result of speech recognition, setting a new threshold value adapted to the user based on a result of user confirmation of the recognition candidate and the confidence score of the recognition candidate, and outputting a corresponding recognition candidate as a result of the speech recognition if the calculated confidence score is higher than the new threshold value. Thus, the need for user confirmation of the result of speech recognition is reduced and the probability of speech recognition success is increased.

...read moreread less

181 citations

Journal Article•DOI•

Statistical Voice Activity Detection Using Low-Variance Spectrum Estimation and an Adaptive Threshold

[...]

A. Davis, Sven Nordholm, Roberto Togneri

01 Dec 2006-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: A novel statistical method for voice activity detection using a signal-to-noise ratio measure that employs a low-variance spectrum estimate and determines an optimal threshold based on the estimated noise statistics.

...read moreread less

Abstract: Traditionally, voice activity detection algorithms are based on any combination of general speech properties such as temporal energy variations, periodicity, and spectrum. This paper describes a novel statistical method for voice activity detection using a signal-to-noise ratio measure. The method employs a low-variance spectrum estimate and determines an optimal threshold based on the estimated noise statistics. A possible implementation is presented and evaluated over a large test set and compared to current modern standardized algorithms. The evaluations indicate promising results with the proposed scheme being comparable or favorable over the whole test set.

...read moreread less

173 citations

Patent•

Disruption of speech understanding by adding a privacy sound thereto

[...]

Daniel Mapes-Riordan, Jeffrey Specht, William DeKruif

27 Oct 2006

TL;DR: In this article, a privacy sound may be based on the speaker's own voice or another voice, which may be used to access a database of the speaker or another's voice, and form one or more voice streams to form the privacy sound.

...read moreread less

Abstract: A privacy apparatus adds a privacy sound into the environment, thereby confusing listeners as to which of the sounds is the real source. The privacy sound may be based on the speaker's own voice or may be based on another voice. At least one characteristic of the speaker (such as a characteristic of the speaker's speech) may be identified. The characteristic may then be used to access a database of the speaker's own voice or another's voice, and to form one or more voice streams to form the privacy sound. The privacy sound may thus permit disruption of the ability to understand the source speech of the user by eliminating segregation cues that the auditory system uses to interpret speech.

...read moreread less

Patent•

Method and System for Masking Speech

[...]

W. Daniel Hillis, Bran Ferren, Russel Howe

12 Jul 2006

TL;DR: In this article, a simple and efficient method for producing an obfuscated speech signal which may be used to mask a stream of speech, is described, and a speech signal representing the speech stream to be masked is obtained.

...read moreread less

Abstract: A simple and efficient method for producing an obfuscated speech signal which may be used to mask a stream of speech, is disclosed. A speech signal representing the speech stream to be masked is obtained. The speech signal is then temporally partitioned into segments, preferably corresponding to phonemes within the speech stream. The segments are then stored in a memory, and some or all of the segments are subsequently selected, retrieved, and assembled into an obfuscated speech signal representing an unintelligible speech stream that, when combined with the speech signal or reproduced and combined with the speech stream, provides a masking effect. While the presently preferred embodiment finds application most readily in an open plan office, embodiments suitable for use in restaurants, classrooms, and in telecommunications systems are also disclosed.

...read moreread less

Patent•

A method for speech recognition and speech reproduction

[...]

Andreas Korthauer¹, Frank Steffens¹, Johannes-Markus Waizenegger¹•Institutions (1)

Bosch¹

21 Aug 2006

TL;DR: In this paper, a phonetic transcription is generated in a language system by a given spelling of words in a non-language system for processing the input signals and/or output signals that are not carried out in the language system.

...read moreread less

Abstract: The method involves receiving input signals by voice input units (2), and outputting output signals of voice output units (3). A phonetic transcription is generated in a language system by a given spelling of words in a non-language system for processing the input signals and/or output signals that are not carried out in the language system. The phonetic transcription is recorded in a data file or list, where the phonetic transcription in the language system is determined upon replacement of noise sequences in the non-language system. An independent claim is also included for a device for speech recognition and/or speech rendering for accomplishing speech dialogue between a person and a machine.

...read moreread less

Proceedings Article•

Towards Continuous Speech Recognition Using Surface Electromyography

[...]

Szu-Chen Stan Jou¹, Tanja Schultz¹, Matthias Walliczek¹, Florian Kraft, Alex Waibel - Show less +1 more•Institutions (1)

Carnegie Mellon University¹

01 Jan 2006

TL;DR: This paper demonstrates how to train the phoneme-based acoustic models with carefully designed electromyographic feature extraction methods by decomposing the signal into different feature space and successfully keep the useful information while reducing the noise.

...read moreread less

Abstract: We present our research on continuous speech recognition of the surface electromyographic signals that are generated by the human articulatory muscles. Previous research on electromyographic speech recognition was limited to isolated word recognition because it was very difficult to train phoneme-based acoustic models for the electromyographic speech recognizer. In this paper, we demonstrate how to train the phoneme-based acoustic models with carefully designed electromyographic feature extraction methods. By decomposing the signal into different feature space, we successfully keep the useful information while reducing the noise. Additionally, we also model the anticipatory effect of the electromyographic signals compared to the speech signal. With a 108-word decoding vocabulary, the experimental results show that the word error rate improves from 86.8% to 32.0% by using our novel feature extraction methods. Index Terms: speech recognition, electromyography, articulatory muscles, feature extraction.

...read moreread less

Patent•

Auotmatic generation of voice content for a voice response system

[...]

Ashok Mitter Khosla, Steven Samuel Pollock

17 Nov 2006

TL;DR: In this paper, the authors present a method for building a voice response system, which comprises developing voice content for the voice response, the voice content including prompts and information to be played to a user; and integrating the voice contents with logic to define a voice user-interface that is capable of interacting with the user in a manner of a conversation.

...read moreread less

Abstract: In one embodiment, the invention provides a method for building a voice response system. The method comprises developing voice content for the voice response system, the voice content including prompts and information to be played to a user; and integrating the voice content with logic to define a voice user-interface that is capable of interacting with the user in a manner of a conversation in which the voice user-interface receives an utterance from the user and presents a selection of the voice content to the user in response to the utterance.

...read moreread less

Proceedings Article•DOI•

Text-Independent Voice Conversion Based on Unit Selection

[...]

D. Sundermann¹, Harald Höge¹, Antonio Bonafonte², Hermann Ney, Alan W. Black³, Shrikanth S. Narayanan² - Show less +2 more•Institutions (3)

Siemens¹, University of Southern California², Carnegie Mellon University³

14 May 2006

TL;DR: A new approach is presented that applies unit selection to find corresponding time frames in source and target speech to achieve the same performance as the conventional text-dependent training.

...read moreread less

Abstract: So far, most of the voice conversion training procedures are text-dependent, i.e., they are based on parallel training utterances of source and target speaker. Since several applications (e.g. speech-to-speech translation or dubbing) require text-independent training, over the last two years, training techniques that use non-parallel data were proposed. In this paper, we present a new approach that applies unit selection to find corresponding time frames in source and target speech. By means of a subjective experiment it is shown that this technique achieves the same performance as the conventional text-dependent training.

...read moreread less

USTC System for Blizzard Challenge 2006 an Improved HMM-based Speech Synthesis Method

[...]

Zhen-Hua Ling¹, Yi-Jian Wu¹, Yu-Ping Wang¹, Long Qin¹, Ren-Hua Wang¹ - Show less +1 more•Institutions (1)

University of Science and Technology of China¹

01 Jan 2006

TL;DR: The evaluation results show that the proposed USTC speech synthesis system is able to synthesize speech with high naturalness and intelligibility by using either full database or only ARCTIC subset.

...read moreread less

Abstract: This paper introduces the USTC speech synthesis system for Blizzard Challenge 2006. The HMM-based parametric synthesis approach was adopted for its convenience and effectiveness in building a new voice, especially for the nonnative developers. Some useful techniques were also integrated into our system, such as minimum generation error (MGE) training, phone duration modeling and linear spectral pair (LSP) based formant enhancement. The evaluation results show that the proposed system is able to synthesize speech with high naturalness and intelligibility by using either full database or only ARCTIC subset.

...read moreread less

Proceedings Article•DOI•

Near End Listening Enhancement: Speech Intelligibility Improvement in Noisy Environments

[...]

Bastian Sauert¹, Peter Vary¹•Institutions (1)

RWTH Aachen University¹

14 May 2006

TL;DR: A digital signal processing algorithm to improve intelligibility of clean far end speech for the near end listener who is located in an environment with background noise is presented.

...read moreread less

Abstract: In contrast to common noise reduction systems, this contribution presents a digital signal processing algorithm to improve intelligibility of clean far end speech for the near end listener who is located in an environment with background noise. Since the noise reaches the ears of the near end listener directly and therefore can hardly be influenced, a sensible option is to manipulate the far end speech. The proposed algorithm raises the average speech spectrum over the average noise spectrum and takes precautions to prevent hearing damage. Informal listening tests and the Speech Intelligibility Index indicate an improved speech intelligibility.

...read moreread less

Patent•

Personalized voice activity detection

[...]

Shaul Simhi

18 Jul 2006

TL;DR: In this article, a method of transferring a real-time audio signal transmission, including registering voice patterns (or other characteristics) of on more users to be used to identify the voices of the users, accepting an audio signal as it is created as a sequence of segments, analyzing each segment of the accepted audio signal to determine if it contains voice activity, determining a probability level that the voice activity of the segment is of a registered user, and selectively transferring the contents, of a segment responsive to the determined probability level.

...read moreread less

Abstract: A method of transferring a real-time audio signal transmission, including: registering voice patterns (or other characteristics) of on more users to be used to identify the voices of the users, accepting an audio signal as it is created as a sequence of segments, analyzing each segment of the accepted audio signal to determine if it contains voice activity (314), determining a probability level that the voice activity of the segment is of a registered user (320 & 322); and selectively transferring the contents, of a segment responsive to the determined probability level (324).

...read moreread less

Journal Article•DOI•

Applied principles of clear and Lombard speech for automated intelligibility enhancement in noisy environments

[...]

Mark D. Skowronski¹, John G. Harris¹•Institutions (1)

University of Florida¹

01 May 2006-Speech Communication

TL;DR: The results show that ERVU successfully increased intelligibility of speech using a simple automated segmentation algorithm, applicable to a wide variety of communication systems such as cell phones and public address systems.

...read moreread less

Patent•DOI•

Robust voice browser system and voice activated device controller

[...]

Alexander Kurganov, Valery Zhukoff

24 Apr 2006-Journal of the Acoustical Society of America

TL;DR: In this article, a voice browsing system maintains a database containing a list of information sources such as web sites, connected to a network, each of the information sources is assigned a rank number which is listed in the database along with the record for the information source.

...read moreread less

Abstract: The present invention relates to a system for acquiring information from sources on a network, such as the Internet. A voice browsing system maintains a database containing a list of information sources, such as web sites, connected to a network. Each of the information sources is assigned a rank number which is listed in the database along with the record for the information source. In response to a speech command received from a user, a network interface system accesses the information source with the highest rank number in order to retrieve information requested by the user.

...read moreread less

Journal Article•DOI•

Voice conversion using duration-embedded bi-HMMs for expressive speech synthesis

[...]

Chung-Hsien Wu¹, Chi-Chun Hsia¹, Te-Hsien Liu¹, Jhing-Fa Wang¹•Institutions (1)

National Cheng Kung University¹

01 Jul 2006-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: This paper presents an expressive voice conversion model (DeBi-HMM) as the post processing of a text-to-speech (TTS) system for expressive speech synthesis, named for its duration- embedded characteristic of the two HMMs for modeling the source and target speech signals, respectively.

...read moreread less

Abstract: This paper presents an expressive voice conversion model (DeBi-HMM) as the post processing of a text-to-speech (TTS) system for expressive speech synthesis. DeBi-HMM is named for its duration-embedded characteristic of the two HMMs for modeling the source and target speech signals, respectively. Joint estimation of source and target HMMs is exploited for spectrum conversion from neutral to expressive speech. Gamma distribution is embedded as the duration model for each state in source and target HMMs. The expressive style-dependent decision trees achieve prosodic conversion. The STRAIGHT algorithm is adopted for the analysis and synthesis process. A set of small-sized speech databases for each expressive style is designed and collected to train the DeBi-HMM voice conversion models. Several experiments with statistical hypothesis testing are conducted to evaluate the quality of synthetic speech as perceived by human subjects. Compared with previous voice conversion methods, the proposed method exhibits encouraging potential in expressive speech synthesis

...read moreread less

Journal Article•DOI•

Unintelligible Low-Frequency Sound Enhances Simulated Cochlear-Implant Speech Recognition in Noise

[...]

J.E. Chang¹, J.Y. Bai², Fan-Gang Zeng²•Institutions (2)

University of California, Berkeley¹, University of California, Irvine²

20 Nov 2006-IEEE Transactions on Biomedical Engineering

TL;DR: A brain-based mechanism that uses the voice pitch cue in the low-frequency sound to first segregate the target voice from the competing voice and then to group appropriate temporal envelope cues in thetarget voice for robust speech recognition under realistic listening situations is suggested.

...read moreread less

Abstract: Speech can be recognized by multiple acoustic cues in both frequency and time domains. These acoustic cues are often thought to be redundant. One example is the low-frequency sound component below 300 Hz, which is not even transmitted by the majority of communication devices including telephones. Here, we showed that this low-frequency sound component, although unintelligible when presented alone, could improve the functional signal-to-noise ratio (SNR) by 10-15 dB for speech recognition in noise when presented in combination with a cochlear-implant simulation. A similar low-frequency enhancement effect could be obtained by presenting the low-frequency sound component to one ear and the cochlear-implant simulation to the other ear. However, a high-frequency sound could not produce a similar speech enhancement in noise. We argue that this low-frequency enhancement effect cannot be due to linear addition of intelligibility between low- and high-frequency components or an increase in the physical SNR. We suggest a brain-based mechanism that uses the voice pitch cue in the low-frequency sound to first segregate the target voice from the competing voice and then to group appropriate temporal envelope cues in the target voice for robust speech recognition under realistic listening situations

...read moreread less

Reliable Audio for Use over the Internet

[...]

Vicky Hardman¹, Martina Angela Sasse¹, Mark Handley¹, Anna Watson¹•Institutions (1)

University College London¹

01 Jan 2006

TL;DR: The paper proposes the use of synthetic speech coding algorithms (vocoders) to provide redundancy, since the algorithms produce a very low bit-rate stream, which only adds a small overhead to a packet.

...read moreread less

Abstract: This paper describes current problems found with audio applications over the MBONE (Multicast Backbone), and investigates possible solutions to the most common one packet loss. The principles of packet speech systems are discussed, and how the structure allows the use of redundancy to design viable solutions to the problem. The paper proposes the use of synthetic speech coding algorithms (vocoders) to provide redundancy, since the algorithms produce a very low bit-rate stream, which only adds a small overhead to a packet. Preliminary experiments show that normal speech repaired with synthetic quality speech is intelligible, even at very high loss rates.

...read moreread less

Book•

Speech Recognition Over Digital Channels: Robustness and Standards

[...]

Antonio M. Peinado

01 Jan 2006

TL;DR: This book discusses Speech Recognition with HMMs, a Alternative Representations of the LPC Coefficients, and Front-end Processing for Robust Feature Extraction, a Review of Channel Coding Techniques.

...read moreread less

Abstract: Forward. Preface. 1 Introduction. 1.1 Introduction. 1.2 RSR over Digital Channels. 1.3 Organization of the Book. 2 Speech Recognition with HMMs. 2.1 Introduction. 2.2 Some General Issues. 2.3 Analysis of Speech Signals. 2.4 Vector Quantization. 2.5 Approaches to ASR. 2.6 Hidden Markov Models. 2.7 Application of HMMs to Speech Recognition. 2.8 Model Adaptation. 2.9 Dealing with Uncertainty. 3 Networks and Degradation. 3.1 Introduction. 3.2 Mobile and Wireless Networks. 3.3 IP Networks. 3.4 The Acoustic Environment. 4 Speech Compression and Architectures for RSR. 4.1 Introduction. 4.2 Speech Coding. 4.3 Recognition from Decoded Speech. 4.4 Recognition from Codec Parameters. 4.5 Distributed Speech Recognition. 4.6 Comparison between NSR and DSR. 5 Robustness Against Transmission Channel Errors. 5.1 Introduction. 5.2 Channel Coding Techniques. 5.3 Error Concealment (EC). 6 Front-end Processing for Robust Feature Extraction. 6.1 Introduction. 6.2 Noise Reduction Techniques. 6.3 Voice Activity Detection. 6.4 Feature Normalization. 7 Standards for Distributed Speech Recognition. 7.1 Introduction. 7.2 Signal Preprocessing. 7.3 Feature Extraction. 7.4 Feature Compression and Encoding. 7.5 Feature Decoding and Postprocessing. A Alternative Representations of the LPC Coefficients. B Basic Digital Modulation Concepts. C Review of Channel Coding Techniques. C.1 Media-independent FEC. C.2 Interleaving. Bibliography. List of Acronyms. Index.

...read moreread less

Patent•

Error Correction In Automatic Speech Recognition Transcripts

[...]

Brian Amento¹, Philip L. Isenhour¹, Larry Stead¹•Institutions (1)

AT&T¹

01 Mar 2006

TL;DR: In this paper, a transcript associated with the speech processing may be displayed to a user with a first visual indication of words having a confidence level within a first predetermined confidence range, and an error correction facility may be provided for the user to correct errors in the displayed transcript.

...read moreread less

Abstract: A method, a processing device, and a machine-readable medium are provided for improving speech processing. A transcript associated with the speech processing may be displayed to a user with a first visual indication of words having a confidence level within a first predetermined confidence range. An error correction facility may be provided for the user to correct errors in the displayed transcript. Error correction information, collected from use of the error correction facility, may be provided to a speech processing module to improve speech processing accuracy.

...read moreread less

Patent•

Automatically adapting user guidance in automated speech recognition

[...]

Ryan J. Wasson¹, John P. Weiss¹, Jason W. Clark¹•Institutions (1)

General Motors¹

28 Oct 2006

TL;DR: A speech recognition method includes receiving input speech from a user, processing the input speech to obtain at least one parameter value, and determining an experience level of the user using the parameter value(s) as discussed by the authors.

...read moreread less

Abstract: A speech recognition method includes receiving input speech from a user, processing the input speech to obtain at least one parameter value, and determining an experience level of the user using the parameter value(s). The method can also include prompting the user based upon the determined experience level of the user to assist the user in delivering speech commands.

...read moreread less

Proceedings Article•DOI•

Combining Prosodic Lexical and Cepstral Systems for Deceptive Speech Detection

[...]

Martin Graciarena¹, E. Shriberg, Andreas Stolcke, Frank Enos, Julia Hirschberg, Sachin S. Kajarekar - Show less +2 more•Institutions (1)

SRI International¹

14 May 2006

TL;DR: A system combination approach using different models and features for deception detection is proposed, resulting in improved accuracy over the individual systems.

...read moreread less

Abstract: We report on machine learning experiments to distinguish deceptive from nondeceptive speech in the Columbia-SRI-Colorado (CSC) corpus. Specifically, we propose a system combination approach using different models and features for deception detection. Scores from an SVM system based on prosodic/lexical features are combined with scores from a Gaussian mixture model system based on acoustic features, resulting in improved accuracy over the individual systems. Finally, we compare results from the prosodic-only SVM system using features derived either from recognized words or from human transcriptions.

...read moreread less

Patent•

Dynamic help including available speech commands from content contained within speech grammars

[...]

Soonthorn Ativanichayaphong¹, David Jaramillo¹, Gerald M. McCobb¹•Institutions (1)

IBM¹

13 Mar 2006

TL;DR: In this paper, a method for providing help to voice-enabled applications, including multimodal applications, can include a step of identifying at least one speech grammar associated with a voiceenabled application.

...read moreread less

Abstract: A method for providing help to voice-enabled applications, including multimodal applications, can include a step of identifying at least one speech grammar associated with a voice-enabled application. Help fields can be defined within the speech grammar. The help fields can include available speech commands for the voice enabled application. When the speech grammar is activated for use by the voice-enabled application, the available speech commands can be presented to a user of the voice-enabled application. The presented speech commands can be obtained from the help fields.

...read moreread less

Patent•

Dynamically adjusting speech grammar weights based on usage

[...]

Oscar J. Blass¹, Harvey Ruback¹, Roberto Vila¹•Institutions (1)

Nuance Communications¹

06 Mar 2006

TL;DR: In this article, a speech processing method can automatically and dynamically adjust speech grammar weights at runtime based upon usage data, which can indicate a relative frequency with which each of the available speech commands is utilized.

...read moreread less

Abstract: A speech processing method can automatically and dynamically adjust speech grammar weights at runtime based upon usage data. Each of the speech grammar weights can be associated with an available speech command contained within a speech grammar to which the speech grammar weights apply. The usage data can indicate a relative frequency with which each of the available speech commands is utilized.

...read moreread less

Collapse