Showing papers on "Voice activity detection published in 2002"

PDF

Open Access

Patent•

Method and system for accessing CRM data via voice

[...]

Shannon Jones, Richard Gorman, Jesse Ambrose, Joseph Harb, Chris Haven - Show less +1 more

04 Jan 2002

TL;DR: In this paper, the system includes a voice recognition unit and a speech processing server that work together to enable users to interact with the system using voice commands guided by navigation context sensitive voice prompts, and provide user-requested data in a verbalized format back to the users.

...read moreread less

Abstract: A system and method for providing access to CRM data via a voice interface. In one embodiment, the system includes a voice recognition unit and a speech processing server that work together to enable users to interact with the system using voice commands guided by navigation context sensitive voice prompts, and provide user-requested data in a verbalized format back to the users. Digitized voice waveform data are processed to determine the voice commands of the user. The system also uses a “grammar” that enables users to retrieve data using intuitive natural language speech queries. In response to such a query, a corresponding data query is generated by the system to retrieve one or more data sets corresponding to the query. The user is then enabled to browse the data that are returned through voice command navigation, wherein the system “reads” the data back to the user using text-to-speech (TTS) conversion.

...read moreread less

1,188 citations

Noise Spectrum Estimation in Adverse Environments: Improved Minima Controlled

[...]

Recursive Averaging, Israel Cohen

01 Jan 2002

TL;DR: It is shown that in nonstationary noise environments and under low SNR conditions, the IMCRA approach is very effective, compared to a competitive method, it obtains a lower estimation error, and when integrated into a speech enhancement system achieves improved speech quality and lower residual noise.

...read moreread less

Abstract: Noise spectrum estimation is a fundamental component of speech enhancement and speech recognition systems. In this paper, we present an Improved Minima Con- trolled Recursive Averaging (IMCRA) approach, for noise es- timation in adverse environments involving non-stationary noise, weak speech components, and low input signal-to- noise ratio (SNR). The noise estimate is obtained by av- eraging past spectral power values, using a time-varying frequency-dependent smoothing parameter that is adjusted by the signal presence probability. The speech presence probability is controlled by the minima values of a smoothed periodogram. The proposed procedure comprises two iter- ations of smoothing and minimum tracking. The rst it- eration provides a rough voice activity detection in each frequency band. Then, smoothing in the second iteration excludes relatively strong speech components, which makes the minimum tracking during speech activity robust. We show that in non-stationary noise environments and under low SNR conditions, the IMCRA approach is very eectiv e. In particular, compared to a competitive method, it obtains a lower estimation error, and when integrated into a speech enhancement system achieves improved speech quality and lower residual noise.

...read moreread less

834 citations

Patent•

Universal IP-based and scalable architectures across conversational applications using web services for speech and audio processing resources

[...]

Stephane H. Maes¹, David M. Lubensky¹, Andrzej Sakrajda¹•Institutions (1)

IBM¹

25 Jun 2002

TL;DR: In this paper, the authors present systems and methods for building distributed conversational applications using a Web services-based model where speech engines (e.g., speech recognition) and audio I/O systems are programmable services that can be asynchronously programmed by an application using a standard, extensible SERCP (speech engine remote control protocol), to provide scalable and flexible IP-based architectures that enable deployment of the same application or application development environment across a wide range of voice processing platforms and networks/gateways.

...read moreread less

Abstract: Systems and methods for conversational computing and, in particular, to systems and methods for building distributed conversational applications using a Web services-based model wherein speech engines (e.g., speech recognition) and audio I/O systems are programmable services that can be asynchronously programmed by an application using a standard, extensible SERCP (speech engine remote control protocol), to thereby provide scalable and flexible IP-based architectures that enable deployment of the same application or application development environment across a wide range of voice processing platforms and networks/gateways (e.g., PSTN (public switched telephone network), Wireless, Internet, and VoIP (voice over IP)). Systems and methods are further provided for dynamically allocating, assigning, configuring and controlling speech resources such as speech engines, speech pre/post processing systems, audio subsystems, and exchanges between speech engines using SERCP in a web service-based framework.

...read moreread less

619 citations

Patent•DOI•

Distributed voice user interface

[...]

George M. White, James J. Buteau, Glen E. Shires, Kevin J. Surace, Steven Markman - Show less +1 more

22 Jan 2002-Journal of the Acoustical Society of America

TL;DR: In this article, a distributed voice user interface system includes a local device which receives speech input issued from a user, such speech input may specify a command or a request by the user, and the local device performs preliminary processing of the speech input and determines whether it is able to respond to the command or request by itself.

...read moreread less

Abstract: A distributed voice user interface system includes a local device which receives speech input issued from a user. Such speech input may specify a command or a request by the user. The local device performs preliminary processing of the speech input and determines whether it is able to respond to the command or request by itself. If not, the local device initiates communication with a remote system for further processing of the speech input.

...read moreread less

441 citations

Journal Article•DOI•

The adaptive multirate wideband speech codec (AMR-WB)

[...]

B. Bessette¹, R. Salami¹, Roch Lefebvre¹, Milan Jelinek¹, Jani Rotola-Pukkila², Janne Vainio², H. Mikkola², K. Jarvinen¹ - Show less +4 more•Institutions (2)

Université de Sherbrooke¹, Nokia²

01 Nov 2002-IEEE Transactions on Speech and Audio Processing

TL;DR: In this paper, the adaptive multirate wideband (AMR-WB) speech codec was selected by the Third Generation Partnership Project (3GPP) for GSM and the third generation mobile communication WCDMA system for providing wideband speech services.

...read moreread less

Abstract: This paper describes the adaptive multirate wideband (AMR-WB) speech codec selected by the Third Generation Partnership Project (3GPP) for GSM and the third generation mobile communication WCDMA system for providing wideband speech services. The AMR-WB speech codec algorithm was selected in December 2000 and the corresponding specifications were approved in March 2001. The AMR-WB codec was also selected by the International Telecommunication Union-Telecommunication Sector (ITU-T) in July 2001 in the standardization activity for wideband speech coding around 16 kb/s and was approved in January 2002 as Recommendation G.722.2. The adoption of AMR-WB by ITU-T is of significant importance since for the first time the same codec is adopted for wireless as well as wireline services. AMR-WB uses an extended audio bandwidth from 50 Hz to 7 kHz and gives superior speech quality and voice naturalness compared to existing second- and third-generation mobile communication systems. The wideband speech service provided by the AMR-WB codec will give mobile communication speech quality that also substantially exceeds (narrowband) wireline quality. The paper details AMR-WB standardization history, algorithmic description including novel techniques for efficient ACELP wideband speech coding and subjective quality performance of the codec.

...read moreread less

312 citations

Patent•DOI•

Method and apparatus for audio-visual speech detection and recognition

[...]

Sankar Basu¹, Philippe Christian de Cuetos¹, Stephane H. Maes¹, Chalapathy Neti¹, Andrew W. Senior¹ - Show less +1 more•Institutions (1)

IBM¹

30 Aug 2002-Journal of the Acoustical Society of America

TL;DR: In this article, the authors propose a speech recognition technique for video and audio signals that consists of processing a video signal associated with an arbitrary content video source, processing an audio signal associated to the video signal, and recognizing at least a portion of the processed audio signal using at least the processed video signal to generate output signal representative of the audio signal.

...read moreread less

Abstract: Techniques for providing speech recognition comprise the steps of processing a video signal associated with an arbitrary content video source, processing an audio signal associated with the video signal, and recognizing at least a portion of the processed audio signal, using at least a portion of the processed video signal, to generate an output signal representative of the audio signal.

...read moreread less

302 citations

Patent•

Methods, systems, and programming for performing speech recognition

[...]

Daniel L. Roth, Jordan Cohen, David F. Johnston, Manfred G. Grabherr

06 Sep 2002

TL;DR: In this paper, the authors present an approach for speech recognition using selectable recognition modes, using choice lists in large-vocabulary speech recognition, enabling users to select word transformations, and speech recognition that automatically turns recognition off in one or more specified ways.

...read moreread less

Abstract: The present invention relates to: speech recognition using selectable recognition modes; using choice lists in large-vocabulary speech recognition; enabling users to select word transformations; speech recognition that automatically turns recognition off in one or more specified ways; phone key control of large-vocabulary speech recognition; speech recognition using phone key alphabetic filtering and spelling: speech recognition that enables a user to perform re-utterance recognition; the combination of speech recognition and text-to-speech (TTS) generation; the combination of speech recognition with handwriting and/or character recognition; and the combination of large-vocabulary speech recognition with audio recording and playback.

...read moreread less

284 citations

Patent•

Method and apparatus for providing speech-driven routing between spoken language applications

[...]

James F. Arnold¹, Horacio Franco, David Israel•Institutions (1)

SRI International¹

25 Mar 2002

TL;DR: In this article, a distributed speech recognition system comprises a client device and a central server, where the client device is equipped with two speech recognition modules: a foreground speaker and a background speaker.

...read moreread less

Abstract: An apparatus and a concomitant method for speech recognition. In one embodiment, a distributed speech recognition system provides speech-driven control and remote service access. The distributed speech recognition system comprises a client device and a central server, where the client device is equipped with two speech recognition modules: a foreground speech recognizer and a background speech recognizer. The foreground speech recognizer is implementing a particular spoken language application (SLA) to handle a particular task, whereas the background speech recognizer is monitoring a change in the topic and/or a change in the intent of the user. Upon detection of a change in topic or intent of the user, the background speech recognizer will effect the routing to a new SLA to address the new topic or intent.

...read moreread less

233 citations

Patent•DOI•

Speech input system, speech portal server, and speech input terminal

[...]

Soshiro Kuzunuki¹, Shinya Ohtsuji¹, Michio Morioka¹, Tadashi Kamiwaki¹, Mariko Okude¹ - Show less +1 more•Institutions (1)

Hitachi¹

11 Jul 2002-Journal of the Acoustical Society of America

TL;DR: In this article, a speech input system for having access from a mobile terminal such as a PDA or a portable phone, or a home telephone, a TV set or a PC to a network through speech, and receiving a service from a provider for providing map information, music information, broadcast program information, and telephone information, is described.

...read moreread less

Abstract: In order to provide a speech input system for having access from a mobile terminal such as a PDA or a portable phone, or a stationary terminal such as a home telephone, a TV set, or a PC to a network through speech, and receiving a service from a provider for providing map information, music information, broadcast program information, and telephone information, the speech input system comprises speech input terminals 10, 30 provided with a speech input/output mean, and an access status display mean, a speech portal server 50 provided with a speech recognizing mean for receiving a speech to recognize it as a text, a command converting mean for checking the recognized text with a command text dictionary, and separating it into a command text and an object text, and a conversation control mean for having an access to, and receiving a service from a provider which provides different information based on the separated texts, and providing the speech input terminal with the service, and a provider 60 for searching information based on the command text and the object text received from the speech portal server, and serves the speech portal server with a search result.

...read moreread less

220 citations

Journal Article•DOI•

Speech pause detection for noise spectrum estimation by tracking power envelope dynamics

[...]

Mark Marzinzik, Birger Kollmeier¹•Institutions (1)

University of Oldenburg¹

07 Aug 2002-IEEE Transactions on Speech and Audio Processing

TL;DR: An algorithm is proposed which detects speech pauses by adaptively tracking minima in a noisy signal's power envelope both for the broadband signal and for the high-pass and low-pass filtered signal in poor signal-to-noise ratios (SNRs).

...read moreread less

Abstract: A speech pause detection algorithm is an important and sensitive part of most single-microphone noise reduction schemes for enhancement of speech signals corrupted by additive noise as an estimate of the background noise is usually determined when speech is absent. An algorithm is proposed which detects speech pauses by adaptively tracking minima in a noisy signal's power envelope both for the broadband signal and for the high-pass and low-pass filtered signal. In poor signal-to-noise ratios (SNRs), the proposed algorithm maintains a low false-alarm rate in the detection of speech pauses while the standardized algorithm of ITU G.729 shows an increasing false-alarm rate in unfavorable situations. These characteristics are found with different types of noise and indicate that the proposed algorithm is better suited to be used for noise estimation in noise reduction algorithms, as speech deterioration may thus be kept at a low level. It is shown that in connection with the Ephraim-Malah (1984) noise reduction scheme, the speech pause detection performance can even be further increased by using the noise-reduced signal instead of the noisy signal as input for the speech pause decision unit.

...read moreread less

219 citations

Patent•

Noise suppression for a wireless communication device

[...]

Feng Yang, Yen-Son Paul Huang

12 Feb 2002

TL;DR: In this paper, two or more signal detectors (e.g., microphones) are used to detect respective signals having speech and noise components, with the magnitude of each component being dependent on various factors such as the distance between the speech source and the microphone.

...read moreread less

Abstract: Techniques to suppress noise from a signal comprised of speech plus noise. In accordance with aspects of the invention, two or more signal detectors (e.g., microphones) are used to detect respective signals having speech and noise components, with the magnitude of each component being dependent on various factors such as the distance between the speech source and the microphone. Signal processing is then used to process the detected signals to generate the desired output signal having predominantly speech with a large portion of the noise removed. The techniques described herein may be advantageously used for both near-field and far-field applications, and may be implemented in various mobile communication devices such as cellular phones.

...read moreread less

Patent•

Real time display of system instructions

[...]

Gilad Odinak, Hakan Kostepen, Oren Danieli

24 Oct 2002

TL;DR: In this article, the authors present a system and method for reviewing inputted voice instructions in a vehicle-based telematics control unit, which includes a microphone, a speech recognition processor, and an output device.

...read moreread less

Abstract: A system and method for reviewing inputted voice instructions in a vehicle-based telematics control unit. The system includes a microphone, a speech recognition processor, and an output device. The microphone receives voice instructions from a user. Coupled to the microphone is the speech recognition processor that generates a voice signal by performing speech recognition processing of the received voice instructions. The output device outputs the generated voice signal to the user. The system also includes a user interface for allowing the user to approve the outputted voice signal, and a communication component for wirelessly sending the generated voice signal to a server over a wireless network upon approval by the user.

...read moreread less

Patent•

Language independent and voice operated information management system

[...]

Seth A. Cameron

30 May 2002

TL;DR: In this paper, a voice operated portable information management system that is substantially language independent and capable of supporting a substantially unlimited vocabulary is presented, which includes a microphone, speaker, clock and GPS connected to a speech processing system.

...read moreread less

Abstract: A voice operated portable information management system that is substantially language independent and capable of supporting a substantially unlimited vocabulary. The system (Fig. 1) includes a microphone, speaker, clock and GPS connected to a speech processing system. The speech processing system (Fig. 4): 1) generates and stores compressed speech data corresponding to a user's speech received through the microphone, 2) compares the stored speech data, 3) re-synthesizes the stored speech data for output as speech through the speaker, 4) provides an audible user interface including a speech assistant for providing instructions in the user's language, 5) stores user-specific compressed speech data, including commands, received in response to prompts from the speech assistant for purposes of adapting the system to the user's speech, 6) identifies memo management commands spoken by the user, and stores and organizes compressed speech data as a function of the identified commands, and 7) identifies memo retrieval commands spoken by the user, and retrieves and outputs the stored speech data as a function of the commands.

...read moreread less

Patent•DOI•

Speech synthesis apparatus and selection method

[...]

Paul St John Brittan¹, Roger Cecil Ferry Tucker¹•Institutions (1)

Hewlett-Packard¹

31 May 2002-Journal of the Acoustical Society of America

TL;DR: In this article, a speech synthesiser is provided with a plurality of synthesis engines each having different characteristics and each including a text-to-speech converter for converting text-form utterances into speech form.

...read moreread less

Abstract: A speech synthesiser is provided with a plurality of synthesis engines each having different characteristics and each including a text-to-speech converter for converting text-form utterances into speech form. A synthesis-engine selector selects one of the synthesis engines as the current operative engine for producing speech-form utterances for a speech application. If the overall quality of the speech-form utterance produced by the text-to-speech converter of the current operative synthesis engine becomes inadequate, the selector is caused to select a different engine as the current operative synthesis engine.

...read moreread less

Proceedings Article•DOI•

Active learning for automatic speech recognition

[...]

Dilek Hakkani-Tur¹, Giuseppe Riccardi¹, Allen Louis Gorin¹•Institutions (1)

AT&T Labs¹

13 May 2002

TL;DR: A new method for reducing the transcription effort for training in automatic speech recognition (ASR), which automatically estimates a confidence score for each word of the utterance, exploiting the lattice output of a speech recognizer, which was trained on a small set of transcribed data.

...read moreread less

Abstract: State-of-the-art speech recognition systems are trained using transcribed utterances, preparation of which is labor intensive and time-consuming. In this paper, we describe a new method for reducing the transcription effort for training in automatic speech recognition (ASR). Active learning aims at reducing the number of training examples to be labeled by automatically processing the unlabeled examples, and then selecting the most informative ones with respect to a given cost function for a human to label. We automatically estimate a confidence score for each word of the utterance, exploiting the lattice output of a speech recognizer, which was trained on a small set of transcribed data. We compute utterance confidence scores based on these word confidence scores, then selectively sample the utterances to be transcribed using the utterance confidence scores. In our experiments, we show that we reduce the amount of labeled data needed for a given word accuracy by 27%.

...read moreread less

Patent•DOI•

Head-worn, trimodal device to increase transcription accuracy in a voice recognition system and to process unvocalized speech

[...]

Roy J. Lahr¹•Institutions (1)

California State University, Long Beach¹

27 Mar 2002-Journal of the Acoustical Society of America

TL;DR: In this article, an ultrasonic injection of sound into the vocal cavity and subsequent decoding of the emitted sound after injection is used to provide audible clues as to the unvoiced sound formed by speaking when the vocal cords are not energized.

...read moreread less

Abstract: A voice recognition device and method allows position-stabilized capture of spoken sounds with great repeatability and accuracy. The voice recognition device may additionally provide two channels of lip movement information to supplement the usual audible speech component recognition system in selecting the proper pairing of data input to text output. The voice recognition device may provide a further channel of information about the speech generating motions via an ultrasonic injection of sound into the vocal cavity and subsequent decoding of the emitted sound after injection. The ultrasonic injection and decoding may also used to provide audible clues as to the unvoiced sound formed by speaking when the vocal cords are not energized. The ensemble of electronic equipment upon the bail band may be in microcircuit form, including placing the components on a copper layer polyimide flexible strip. The side camera and “other side” illuminator LED may be on thin copper polyimide strips attached to the main electronics ensemble, and a set of thin polyimide conductors would conduct power into the ensemble and the signals out of the ensemble through one of the bail band ends, into the ear piece and down the connector to the associated computer equipment and may also supply the power for the electronic ensemble. The electronic ensemble may be potted with a thin layer of elastomer, such as translucent silicone, and provide a moisture barrier and physical protection for the ensemble, while still offering a very light visual weight to the combination of the electronic ensemble and the bail band.

...read moreread less

Patent•

Method for generating pesonalized speech from text

[...]

Donald T. Tang¹, Ligin Shen¹, Qin Shi¹, Wei Zhang¹•Institutions (1)

IBM¹

05 Apr 2002

TL;DR: In this paper, a method for generating personalized speech from text includes the steps of analyzing the input text to get standard parameters of the speech to be synthesized from a standard text-to-speech database.

...read moreread less

Abstract: A method for generating personalized speech from text includes the steps of analyzing the input text to get standard parameters of the speech to be synthesized from a standard text-to-speech database; mapping the standard speech parameters to the personalized speech parameters via a personalization model obtained in a training process; and synthesizing speech of the input text based on the personalized speech parameters. The method can be used to simulate the speech of the target person so as to make the speech produced by a TTS system more attractive and personalized.

...read moreread less

Patent•DOI•

Method for coding speech and music signals

[...]

Kazuhuito Koishida¹, Vladimir Cuperman¹, Amir H. Majidimehr¹, Allen Gersho¹•Institutions (1)

Microsoft¹

15 May 2002-Journal of the Acoustical Society of America

TL;DR: In this article, a transform coding method for music signals was proposed, which is suitable for use in a hybrid codec, whereby a common linear predictive (LP) synthesis filter was employed for both speech and music signals.

...read moreread less

Abstract: The present invention provides a transform coding method efficient for music signals that is suitable for use in a hybrid codec, whereby a common Linear Predictive (LP) synthesis filter is employed for both speech and music signals. The LP synthesis filter switches between a speech excitation generator and a transform excitation generator, in accordance with the coding of a speech or music signal, respectively. For coding speech signals, the conventional CELP technique may be used, while a novel asymmetrical overlap-add transform technique is applied for coding music signals. In performing the common LP synthesis filtering, interpolation of the LP coefficients is conducted for signals in overlap-add operation regions. The invention enables smooth transitions when the decoder switches between speech and music decoding modes.

...read moreread less

Patent•DOI•

Method for the training or adaptation of a speech recognition device

[...]

Volker Steinbiss¹, Thomas Dr. Eisele¹•Institutions (1)

Philips¹

10 May 2002-Journal of the Acoustical Society of America

TL;DR: In this paper, a method for the training or adaptation of a speech recognition device used to act upon functions of an electrical appliance, for example the triggering of a voice dial in a mobile telephone terminal, is proposed.

...read moreread less

Abstract: The invention relates to a method for the training or adaptation of a speech recognition device used to act upon functions of an electrical appliance, for example the triggering of a voice dial in a mobile telephone terminal. In order to structure the training and/or adaptation of the speech recognition device to improve user comfort, a method is proposed with the following steps: performance of a speech input; processing of the speech input by means of the speech recognition device for the production of a speech recognition result; if the speech recognition result can be allocated to a function of the electrical appliance, action upon the allocatable function of the electrical appliance; training or adaptation of the speech recognition device on the basis of the speech recognition result associated with the speech input made, if the action upon the allocatable function of the electrical appliance does not cause a user input expressing rejection.

...read moreread less

Patent•DOI•

Method and apparatus for selective speech recognition

[...]

Senaka Balasuriya¹•Institutions (1)

Motorola¹

13 Dec 2002-Journal of the Acoustical Society of America

TL;DR: In this paper, a method and apparatus for selective speech recognition includes receiving a media file (112) having a media type indicator (114) and a browser (104) that receives the media file and a speech recognition engine selector (106) receiving the media type indicators from the browser.

...read moreread less

Abstract: A method and apparatus for selective speech recognition includes receiving a media file (112) having a media type indicator (114). The method and apparatus further includes a browser (104) that receives the media file and a speech recognition engine selector (106) that receives the media type indicator from the browser (104). The selected speech recognition engine selector (106) then selects either a first speech recognition engine (108) or a second speech recognition engine (110), in response to the media type indicator. The method and apparatus further includes an audio receiver (102) that receives an audio input (116) which is provided to the enabled first speech recognition engine (108) or the second speech recognition engine (110), thereupon allowing for the reduction in power consumption by disabling a speech recognition engine (108 or 110) until actively selected by the speech recognition engine selector (106).

...read moreread less

Patent•DOI•

Method for robust voice recognition by analyzing redundant features of source signal

[...]

Narendranath Malayath, Harinath Garudadri

20 Mar 2002-Journal of the Acoustical Society of America

TL;DR: In this paper, a method for processing digitized speech signals by analyzing redundant features to provide more robust voice recognition is proposed, where a primary transformation is applied to a source speech signal to extract primary features therefrom.

...read moreread less

Abstract: A method for processing digitized speech signals by analyzing redundant features to provide more robust voice recognition. A primary transformation is applied to a source speech signal to extract primary features therefrom. Each of at least one secondary transformation is applied to the source speech signal or extracted primary features to yield at least one set of secondary features statistically dependant on the primary features. At least one predetermined function is then applied to combine the primary features with the secondary features. A recognition answer is generated by pattern matching this combination against predetermined voice recognition templates.

...read moreread less

Proceedings Article•DOI•

Comparison of voice activity detection algorithms for VoIP

[...]

R. Venkatesha Prasad¹, A. Sangwan², H. S. Jamadagni¹, M.C. Chiranth², R. Sah², Vishal Gaurav² - Show less +2 more•Institutions (2)

Indian Institute of Science¹, PES University²

01 Jul 2002

TL;DR: A comparison of the merits and demerits along with the subjective quality of speech after removal of silence periods for three time-domain and three frequency-domain VAD algorithms is presented.

...read moreread less

Abstract: We discuss techniques for voice activity detection (VAD) for voice over Internet Protocol (VoIP). VAD aids in saving the bandwidth requirement of a voice session, thereby increasing the bandwidth efficiently. We compare the quality of speech, level of compression and computational complexity for three time-domain and three frequency-domain VAD algorithms. Implementation of time-domain algorithms is computationally simple. However, better speech quality is obtained with the frequency-domain algorithms. A comparison of the merits and demerits along with the subjective quality of speech after removal of silence periods is presented for all the algorithms. A quantitative measurement of speech quality for different algorithms is also presented.

...read moreread less

Journal Article•DOI•

Speech enhancement using a mixture-maximum model

[...]

David Burshtein¹, Sharon Gannot²•Institutions (2)

Tel Aviv University¹, Technion – Israel Institute of Technology²

10 Dec 2002-IEEE Transactions on Speech and Audio Processing

TL;DR: A spectral domain, speech enhancement algorithm based on a mixture model for the short time spectrum of the clean speech signal, and on a maximum assumption in the production of the noisy speech spectrum that shows improved performance compared to alternative speech enhancement algorithms.

...read moreread less

Abstract: We present a spectral domain, speech enhancement algorithm. The new algorithm is based on a mixture model for the short time spectrum of the clean speech signal, and on a maximum assumption in the production of the noisy speech spectrum. In the past this model was used in the context of noise robust speech recognition. In this paper we show that this model is also effective for improving the quality of speech signals corrupted by additive noise. The computational requirements of the algorithm can be significantly reduced, essentially without paying performance penalties, by incorporating a dual codebook scheme with tied variances. Experiments, using recorded speech signals and actual noise sources, show that in spite of its low computational requirements, the algorithm shows improved performance compared to alternative speech enhancement algorithms.

...read moreread less

Journal Article•DOI•

Reconstruction of speech from whispers.

[...]

Robert W. Morris¹, Mark A. Clements¹•Institutions (1)

Georgia Institute of Technology¹

01 Sep 2002-Medical Engineering & Physics

TL;DR: In this article, a method for the real-time reconstruction of normal speech from whispers is proposed, where the normal speech is synthesized using the mixed excitation linear prediction model.

...read moreread less

Patent•

System and method for speech enhancement

[...]

Geydi Lorenzo¹, Charles D. Estes¹•Institutions (1)

Motorola¹

23 Dec 2002

TL;DR: In this paper, the authors proposed a method and apparatus for reducing the noise in a speech signal, which includes a microphone, a receiver, and a speech filter for suppressing noise in the auditory signal and sound.

...read moreread less

Abstract: A method and apparatus for reducing noise in a speech signal. A handset or remote unit provides to users with a hearing deficiency, a first mode of operation where noise suppressant/speech enhancement algorithms are used during any auditory-related service. There is also provided, in a related mode of operation, speech filtering for reducing noise in a speech signal received through the microphone and outputting the filtered sound to the speaker. The handset includes a microphone for receiving an auditory sound, a receiver for receiving an auditory signal and a speech filter for suppressing noise in the auditory signal and sound. The speech filter also may be configured to shift the frequency and/or alter the intensity of the auditory signal and sound. The speaker is used for amplifying and outputting the enhanced speech component as an audible sound.

...read moreread less

Journal Article•DOI•

Performance evaluation and comparison of G.729/AMR/fuzzy voice activity detectors

[...]

Francesco Beritelli, Salvatore Casale, Giuseppe Ruggeri, Salvatore Serrano

07 Aug 2002-IEEE Signal Processing Letters

TL;DR: A performance evaluation and comparison of G.729, AMR, and fuzzy voice activity detection (FVAD) algorithms was made using objective, psychoacoustic, and subjective parameters to evaluate the extent to which VADs depend on language, the signal-to-noise ratio, or the power level.

...read moreread less

Abstract: The paper proposes a performance evaluation and comparison of G.729, AMR, and fuzzy voice activity detection (FVAD) algorithms. The comparison was made using objective, psychoacoustic, and subjective parameters. A highly varied speech database was also set up to evaluate the extent to which VADs depend on language, the signal-to-noise ratio (SNR), or the power level.

...read moreread less

Patent•DOI•

Method and system for network-based speech recognition

[...]

Christopher S. Jochumson¹•Institutions (1)

Pearson Education¹

19 Jul 2002-Journal of the Acoustical Society of America

TL;DR: In this article, the server evaluates the user speech in context of the current speech recognition exercise being executed, and provides responsive feedback to the client, again, in approximate real-time, with minimum latency delays.

...read moreread less

Abstract: Methods and systems for handling speech recognition processing in effectively real-time, via the internet, in order that users do not experience noticeable delays from the start of an exercise until they receive responsive feedback. A user uses a client to access the internet and a server supporting speech recognition processing, e.g., for language learning activities. The user inputs speech to the client, which transmits the user speech to the server in approximate real-time. The server evaluates the user speech in context of the current speech recognition exercise being executed, and provides responsive feedback to the client, again, in approximate real-time, with minimum latency delays. The client upon receiving responsive feedback from the server, displays, or otherwise provides, the feedback to the user.

...read moreread less

Proceedings Article•DOI•

Speech enhancement using excitation source information

[...]

B. Yegnanarayana¹, S. R. Mahadeva Prasanna¹, K. Sreenivasa Rao¹•Institutions (1)

Indian Institute of Technology Madras¹

13 May 2002

TL;DR: The characteristics of voiced speech can be used to derive a coherently added signal from the linear prediction (LP) residuals of the degraded speech data from different microphones to enhance speech degraded by noise and reverberation.

...read moreread less

Abstract: This paper proposes an approach for processing speech from multiple microphones to enhance speech degraded by noise and reverberation. The approach is based on exploiting the features of the excitation source in speech production. In particular, the characteristics of voiced speech can be used to derive a coherently added signal from the linear prediction (LP) residuals of the degraded speech data from different microphones. A weight function is derived from the coherently added signal. For coherent addition the time-delay between a pair of microphones is estimated using the knowledge of the source information present in the LP residual. The enhanced speech is generated by exciting the time varying all-pole filter with the weighted LP residual.

...read moreread less

Patent•

Consolidated voice activity detection and noise estimation

[...]

Daniel J. Marchok¹, Richard C. Younce¹, Charles W. K. Gritton¹•Institutions (1)

Tellabs¹

30 Dec 2002

TL;DR: In this paper, the authors present a method for processing at least one voice signal in which a centralized voice processing unit controls operation of a plurality of voice processing blocks, and a centralized signal characteristic estimator is used to estimate the signal characteristic of a voice signal.

...read moreread less

Abstract: Methods and apparatus for processing at least one voice signal in which a centralized voice processing unit controls operation of a plurality of voice processing blocks. In a first embodiment, the centralized voice processing unit comprises a centralized voice activity detector that provides at least one voice activity indication to the plurality of voice processing blocks. In a second embodiment, the centralized voice processing unit comprises a centralized noise estimator that provides at least one noise estimate to the plurality of voice processing blocks. In a third embodiment, the centralized voice processing unit comprises a centralized signal characteristic estimator that provides at least one signal characteristic estimate to the plurality of voice processing blocks.

...read moreread less

Proceedings Article•DOI•

VAD techniques for real-time speech transmission on the Internet

[...]

A. Sangwan¹, M.C. Chiranth¹, H. S. Jamadagni², R. Sah¹, R. Venkatesha Prasad², Vishal Gaurav¹ - Show less +2 more•Institutions (2)

PES University¹, Indian Institute of Science²

07 Nov 2002

TL;DR: A comparison of the relative merits and demerits along with the subjective quality of speech after the pruning of silence periods for four time-domain VAD algorithms in terms of speech quality, compression level and computational complexity.

...read moreread less

Abstract: We discuss techniques for voice activity detection (VAD) for voice over Internet Protocol (VoIP). VAD aids in reducing the bandwidth requirement of a voice session, thereby using bandwidth efficiently. Such a scheme would be implemented in the application layer. Thus the VAD is independent of the lower layers in the network stack (see Flood, J.E., "Telecommunications Switching - Traffic and Networks", Prentice Hall India). We compare four time-domain VAD algorithms in terms of speech quality, compression level and computational complexity. A comparison of the relative merits and demerits along with the subjective quality of speech after the pruning of silence periods is presented for all the algorithms. A quantitative measurement of speech quality for different algorithms is also presented.

...read moreread less

Collapse