scispace - formally typeset
Search or ask a question

Showing papers on "Voice activity detection published in 2002"


Patent
04 Jan 2002
TL;DR: In this paper, the system includes a voice recognition unit and a speech processing server that work together to enable users to interact with the system using voice commands guided by navigation context sensitive voice prompts, and provide user-requested data in a verbalized format back to the users.
Abstract: A system and method for providing access to CRM data via a voice interface. In one embodiment, the system includes a voice recognition unit and a speech processing server that work together to enable users to interact with the system using voice commands guided by navigation context sensitive voice prompts, and provide user-requested data in a verbalized format back to the users. Digitized voice waveform data are processed to determine the voice commands of the user. The system also uses a “grammar” that enables users to retrieve data using intuitive natural language speech queries. In response to such a query, a corresponding data query is generated by the system to retrieve one or more data sets corresponding to the query. The user is then enabled to browse the data that are returned through voice command navigation, wherein the system “reads” the data back to the user using text-to-speech (TTS) conversion.

1,188 citations


01 Jan 2002
TL;DR: It is shown that in nonstationary noise environments and under low SNR conditions, the IMCRA approach is very effective, compared to a competitive method, it obtains a lower estimation error, and when integrated into a speech enhancement system achieves improved speech quality and lower residual noise.
Abstract: Noise spectrum estimation is a fundamental component of speech enhancement and speech recognition systems. In this paper, we present an Improved Minima Con- trolled Recursive Averaging (IMCRA) approach, for noise es- timation in adverse environments involving non-stationary noise, weak speech components, and low input signal-to- noise ratio (SNR). The noise estimate is obtained by av- eraging past spectral power values, using a time-varying frequency-dependent smoothing parameter that is adjusted by the signal presence probability. The speech presence probability is controlled by the minima values of a smoothed periodogram. The proposed procedure comprises two iter- ations of smoothing and minimum tracking. The rst it- eration provides a rough voice activity detection in each frequency band. Then, smoothing in the second iteration excludes relatively strong speech components, which makes the minimum tracking during speech activity robust. We show that in non-stationary noise environments and under low SNR conditions, the IMCRA approach is very eectiv e. In particular, compared to a competitive method, it obtains a lower estimation error, and when integrated into a speech enhancement system achieves improved speech quality and lower residual noise.

834 citations


Patent
25 Jun 2002
TL;DR: In this paper, the authors present systems and methods for building distributed conversational applications using a Web services-based model where speech engines (e.g., speech recognition) and audio I/O systems are programmable services that can be asynchronously programmed by an application using a standard, extensible SERCP (speech engine remote control protocol), to provide scalable and flexible IP-based architectures that enable deployment of the same application or application development environment across a wide range of voice processing platforms and networks/gateways.
Abstract: Systems and methods for conversational computing and, in particular, to systems and methods for building distributed conversational applications using a Web services-based model wherein speech engines (e.g., speech recognition) and audio I/O systems are programmable services that can be asynchronously programmed by an application using a standard, extensible SERCP (speech engine remote control protocol), to thereby provide scalable and flexible IP-based architectures that enable deployment of the same application or application development environment across a wide range of voice processing platforms and networks/gateways (e.g., PSTN (public switched telephone network), Wireless, Internet, and VoIP (voice over IP)). Systems and methods are further provided for dynamically allocating, assigning, configuring and controlling speech resources such as speech engines, speech pre/post processing systems, audio subsystems, and exchanges between speech engines using SERCP in a web service-based framework.

619 citations


PatentDOI
TL;DR: In this article, a distributed voice user interface system includes a local device which receives speech input issued from a user, such speech input may specify a command or a request by the user, and the local device performs preliminary processing of the speech input and determines whether it is able to respond to the command or request by itself.
Abstract: A distributed voice user interface system includes a local device which receives speech input issued from a user. Such speech input may specify a command or a request by the user. The local device performs preliminary processing of the speech input and determines whether it is able to respond to the command or request by itself. If not, the local device initiates communication with a remote system for further processing of the speech input.

441 citations


Journal ArticleDOI
TL;DR: In this paper, the adaptive multirate wideband (AMR-WB) speech codec was selected by the Third Generation Partnership Project (3GPP) for GSM and the third generation mobile communication WCDMA system for providing wideband speech services.
Abstract: This paper describes the adaptive multirate wideband (AMR-WB) speech codec selected by the Third Generation Partnership Project (3GPP) for GSM and the third generation mobile communication WCDMA system for providing wideband speech services. The AMR-WB speech codec algorithm was selected in December 2000 and the corresponding specifications were approved in March 2001. The AMR-WB codec was also selected by the International Telecommunication Union-Telecommunication Sector (ITU-T) in July 2001 in the standardization activity for wideband speech coding around 16 kb/s and was approved in January 2002 as Recommendation G.722.2. The adoption of AMR-WB by ITU-T is of significant importance since for the first time the same codec is adopted for wireless as well as wireline services. AMR-WB uses an extended audio bandwidth from 50 Hz to 7 kHz and gives superior speech quality and voice naturalness compared to existing second- and third-generation mobile communication systems. The wideband speech service provided by the AMR-WB codec will give mobile communication speech quality that also substantially exceeds (narrowband) wireline quality. The paper details AMR-WB standardization history, algorithmic description including novel techniques for efficient ACELP wideband speech coding and subjective quality performance of the codec.

312 citations


PatentDOI
TL;DR: In this article, the authors propose a speech recognition technique for video and audio signals that consists of processing a video signal associated with an arbitrary content video source, processing an audio signal associated to the video signal, and recognizing at least a portion of the processed audio signal using at least the processed video signal to generate output signal representative of the audio signal.
Abstract: Techniques for providing speech recognition comprise the steps of processing a video signal associated with an arbitrary content video source, processing an audio signal associated with the video signal, and recognizing at least a portion of the processed audio signal, using at least a portion of the processed video signal, to generate an output signal representative of the audio signal.

302 citations


Patent
06 Sep 2002
TL;DR: In this paper, the authors present an approach for speech recognition using selectable recognition modes, using choice lists in large-vocabulary speech recognition, enabling users to select word transformations, and speech recognition that automatically turns recognition off in one or more specified ways.
Abstract: The present invention relates to: speech recognition using selectable recognition modes; using choice lists in large-vocabulary speech recognition; enabling users to select word transformations; speech recognition that automatically turns recognition off in one or more specified ways; phone key control of large-vocabulary speech recognition; speech recognition using phone key alphabetic filtering and spelling: speech recognition that enables a user to perform re-utterance recognition; the combination of speech recognition and text-to-speech (TTS) generation; the combination of speech recognition with handwriting and/or character recognition; and the combination of large-vocabulary speech recognition with audio recording and playback.

284 citations


Patent
25 Mar 2002
TL;DR: In this article, a distributed speech recognition system comprises a client device and a central server, where the client device is equipped with two speech recognition modules: a foreground speaker and a background speaker.
Abstract: An apparatus and a concomitant method for speech recognition. In one embodiment, a distributed speech recognition system provides speech-driven control and remote service access. The distributed speech recognition system comprises a client device and a central server, where the client device is equipped with two speech recognition modules: a foreground speech recognizer and a background speech recognizer. The foreground speech recognizer is implementing a particular spoken language application (SLA) to handle a particular task, whereas the background speech recognizer is monitoring a change in the topic and/or a change in the intent of the user. Upon detection of a change in topic or intent of the user, the background speech recognizer will effect the routing to a new SLA to address the new topic or intent.

233 citations


PatentDOI
Soshiro Kuzunuki1, Shinya Ohtsuji1, Michio Morioka1, Tadashi Kamiwaki1, Mariko Okude1 
TL;DR: In this article, a speech input system for having access from a mobile terminal such as a PDA or a portable phone, or a home telephone, a TV set or a PC to a network through speech, and receiving a service from a provider for providing map information, music information, broadcast program information, and telephone information, is described.
Abstract: In order to provide a speech input system for having access from a mobile terminal such as a PDA or a portable phone, or a stationary terminal such as a home telephone, a TV set, or a PC to a network through speech, and receiving a service from a provider for providing map information, music information, broadcast program information, and telephone information, the speech input system comprises speech input terminals 10, 30 provided with a speech input/output mean, and an access status display mean, a speech portal server 50 provided with a speech recognizing mean for receiving a speech to recognize it as a text, a command converting mean for checking the recognized text with a command text dictionary, and separating it into a command text and an object text, and a conversation control mean for having an access to, and receiving a service from a provider which provides different information based on the separated texts, and providing the speech input terminal with the service, and a provider 60 for searching information based on the command text and the object text received from the speech portal server, and serves the speech portal server with a search result.

220 citations


Journal ArticleDOI
TL;DR: An algorithm is proposed which detects speech pauses by adaptively tracking minima in a noisy signal's power envelope both for the broadband signal and for the high-pass and low-pass filtered signal in poor signal-to-noise ratios (SNRs).
Abstract: A speech pause detection algorithm is an important and sensitive part of most single-microphone noise reduction schemes for enhancement of speech signals corrupted by additive noise as an estimate of the background noise is usually determined when speech is absent. An algorithm is proposed which detects speech pauses by adaptively tracking minima in a noisy signal's power envelope both for the broadband signal and for the high-pass and low-pass filtered signal. In poor signal-to-noise ratios (SNRs), the proposed algorithm maintains a low false-alarm rate in the detection of speech pauses while the standardized algorithm of ITU G.729 shows an increasing false-alarm rate in unfavorable situations. These characteristics are found with different types of noise and indicate that the proposed algorithm is better suited to be used for noise estimation in noise reduction algorithms, as speech deterioration may thus be kept at a low level. It is shown that in connection with the Ephraim-Malah (1984) noise reduction scheme, the speech pause detection performance can even be further increased by using the noise-reduced signal instead of the noisy signal as input for the speech pause decision unit.

219 citations


Patent
12 Feb 2002
TL;DR: In this paper, two or more signal detectors (e.g., microphones) are used to detect respective signals having speech and noise components, with the magnitude of each component being dependent on various factors such as the distance between the speech source and the microphone.
Abstract: Techniques to suppress noise from a signal comprised of speech plus noise. In accordance with aspects of the invention, two or more signal detectors (e.g., microphones) are used to detect respective signals having speech and noise components, with the magnitude of each component being dependent on various factors such as the distance between the speech source and the microphone. Signal processing is then used to process the detected signals to generate the desired output signal having predominantly speech with a large portion of the noise removed. The techniques described herein may be advantageously used for both near-field and far-field applications, and may be implemented in various mobile communication devices such as cellular phones.

Patent
24 Oct 2002
TL;DR: In this article, the authors present a system and method for reviewing inputted voice instructions in a vehicle-based telematics control unit, which includes a microphone, a speech recognition processor, and an output device.
Abstract: A system and method for reviewing inputted voice instructions in a vehicle-based telematics control unit. The system includes a microphone, a speech recognition processor, and an output device. The microphone receives voice instructions from a user. Coupled to the microphone is the speech recognition processor that generates a voice signal by performing speech recognition processing of the received voice instructions. The output device outputs the generated voice signal to the user. The system also includes a user interface for allowing the user to approve the outputted voice signal, and a communication component for wirelessly sending the generated voice signal to a server over a wireless network upon approval by the user.

Patent
30 May 2002
TL;DR: In this paper, a voice operated portable information management system that is substantially language independent and capable of supporting a substantially unlimited vocabulary is presented, which includes a microphone, speaker, clock and GPS connected to a speech processing system.
Abstract: A voice operated portable information management system that is substantially language independent and capable of supporting a substantially unlimited vocabulary. The system (Fig. 1) includes a microphone, speaker, clock and GPS connected to a speech processing system. The speech processing system (Fig. 4): 1) generates and stores compressed speech data corresponding to a user's speech received through the microphone, 2) compares the stored speech data, 3) re-synthesizes the stored speech data for output as speech through the speaker, 4) provides an audible user interface including a speech assistant for providing instructions in the user's language, 5) stores user-specific compressed speech data, including commands, received in response to prompts from the speech assistant for purposes of adapting the system to the user's speech, 6) identifies memo management commands spoken by the user, and stores and organizes compressed speech data as a function of the identified commands, and 7) identifies memo retrieval commands spoken by the user, and retrieves and outputs the stored speech data as a function of the commands.

PatentDOI
TL;DR: In this article, a speech synthesiser is provided with a plurality of synthesis engines each having different characteristics and each including a text-to-speech converter for converting text-form utterances into speech form.
Abstract: A speech synthesiser is provided with a plurality of synthesis engines each having different characteristics and each including a text-to-speech converter for converting text-form utterances into speech form. A synthesis-engine selector selects one of the synthesis engines as the current operative engine for producing speech-form utterances for a speech application. If the overall quality of the speech-form utterance produced by the text-to-speech converter of the current operative synthesis engine becomes inadequate, the selector is caused to select a different engine as the current operative synthesis engine.

Proceedings ArticleDOI
13 May 2002
TL;DR: A new method for reducing the transcription effort for training in automatic speech recognition (ASR), which automatically estimates a confidence score for each word of the utterance, exploiting the lattice output of a speech recognizer, which was trained on a small set of transcribed data.
Abstract: State-of-the-art speech recognition systems are trained using transcribed utterances, preparation of which is labor intensive and time-consuming. In this paper, we describe a new method for reducing the transcription effort for training in automatic speech recognition (ASR). Active learning aims at reducing the number of training examples to be labeled by automatically processing the unlabeled examples, and then selecting the most informative ones with respect to a given cost function for a human to label. We automatically estimate a confidence score for each word of the utterance, exploiting the lattice output of a speech recognizer, which was trained on a small set of transcribed data. We compute utterance confidence scores based on these word confidence scores, then selectively sample the utterances to be transcribed using the utterance confidence scores. In our experiments, we show that we reduce the amount of labeled data needed for a given word accuracy by 27%.

PatentDOI
TL;DR: In this article, an ultrasonic injection of sound into the vocal cavity and subsequent decoding of the emitted sound after injection is used to provide audible clues as to the unvoiced sound formed by speaking when the vocal cords are not energized.
Abstract: A voice recognition device and method allows position-stabilized capture of spoken sounds with great repeatability and accuracy. The voice recognition device may additionally provide two channels of lip movement information to supplement the usual audible speech component recognition system in selecting the proper pairing of data input to text output. The voice recognition device may provide a further channel of information about the speech generating motions via an ultrasonic injection of sound into the vocal cavity and subsequent decoding of the emitted sound after injection. The ultrasonic injection and decoding may also used to provide audible clues as to the unvoiced sound formed by speaking when the vocal cords are not energized. The ensemble of electronic equipment upon the bail band may be in microcircuit form, including placing the components on a copper layer polyimide flexible strip. The side camera and “other side” illuminator LED may be on thin copper polyimide strips attached to the main electronics ensemble, and a set of thin polyimide conductors would conduct power into the ensemble and the signals out of the ensemble through one of the bail band ends, into the ear piece and down the connector to the associated computer equipment and may also supply the power for the electronic ensemble. The electronic ensemble may be potted with a thin layer of elastomer, such as translucent silicone, and provide a moisture barrier and physical protection for the ensemble, while still offering a very light visual weight to the combination of the electronic ensemble and the bail band.

Patent
Donald T. Tang1, Ligin Shen1, Qin Shi1, Wei Zhang1
05 Apr 2002
TL;DR: In this paper, a method for generating personalized speech from text includes the steps of analyzing the input text to get standard parameters of the speech to be synthesized from a standard text-to-speech database.
Abstract: A method for generating personalized speech from text includes the steps of analyzing the input text to get standard parameters of the speech to be synthesized from a standard text-to-speech database; mapping the standard speech parameters to the personalized speech parameters via a personalization model obtained in a training process; and synthesizing speech of the input text based on the personalized speech parameters. The method can be used to simulate the speech of the target person so as to make the speech produced by a TTS system more attractive and personalized.

PatentDOI
TL;DR: In this article, a transform coding method for music signals was proposed, which is suitable for use in a hybrid codec, whereby a common linear predictive (LP) synthesis filter was employed for both speech and music signals.
Abstract: The present invention provides a transform coding method efficient for music signals that is suitable for use in a hybrid codec, whereby a common Linear Predictive (LP) synthesis filter is employed for both speech and music signals. The LP synthesis filter switches between a speech excitation generator and a transform excitation generator, in accordance with the coding of a speech or music signal, respectively. For coding speech signals, the conventional CELP technique may be used, while a novel asymmetrical overlap-add transform technique is applied for coding music signals. In performing the common LP synthesis filtering, interpolation of the LP coefficients is conducted for signals in overlap-add operation regions. The invention enables smooth transitions when the decoder switches between speech and music decoding modes.

PatentDOI
TL;DR: In this paper, a method for the training or adaptation of a speech recognition device used to act upon functions of an electrical appliance, for example the triggering of a voice dial in a mobile telephone terminal, is proposed.
Abstract: The invention relates to a method for the training or adaptation of a speech recognition device used to act upon functions of an electrical appliance, for example the triggering of a voice dial in a mobile telephone terminal. In order to structure the training and/or adaptation of the speech recognition device to improve user comfort, a method is proposed with the following steps: performance of a speech input; processing of the speech input by means of the speech recognition device for the production of a speech recognition result; if the speech recognition result can be allocated to a function of the electrical appliance, action upon the allocatable function of the electrical appliance; training or adaptation of the speech recognition device on the basis of the speech recognition result associated with the speech input made, if the action upon the allocatable function of the electrical appliance does not cause a user input expressing rejection.

PatentDOI
Senaka Balasuriya1
TL;DR: In this paper, a method and apparatus for selective speech recognition includes receiving a media file (112) having a media type indicator (114) and a browser (104) that receives the media file and a speech recognition engine selector (106) receiving the media type indicators from the browser.
Abstract: A method and apparatus for selective speech recognition includes receiving a media file (112) having a media type indicator (114). The method and apparatus further includes a browser (104) that receives the media file and a speech recognition engine selector (106) that receives the media type indicator from the browser (104). The selected speech recognition engine selector (106) then selects either a first speech recognition engine (108) or a second speech recognition engine (110), in response to the media type indicator. The method and apparatus further includes an audio receiver (102) that receives an audio input (116) which is provided to the enabled first speech recognition engine (108) or the second speech recognition engine (110), thereupon allowing for the reduction in power consumption by disabling a speech recognition engine (108 or 110) until actively selected by the speech recognition engine selector (106).

PatentDOI
TL;DR: In this paper, a method for processing digitized speech signals by analyzing redundant features to provide more robust voice recognition is proposed, where a primary transformation is applied to a source speech signal to extract primary features therefrom.
Abstract: A method for processing digitized speech signals by analyzing redundant features to provide more robust voice recognition. A primary transformation is applied to a source speech signal to extract primary features therefrom. Each of at least one secondary transformation is applied to the source speech signal or extracted primary features to yield at least one set of secondary features statistically dependant on the primary features. At least one predetermined function is then applied to combine the primary features with the secondary features. A recognition answer is generated by pattern matching this combination against predetermined voice recognition templates.

Proceedings ArticleDOI
01 Jul 2002
TL;DR: A comparison of the merits and demerits along with the subjective quality of speech after removal of silence periods for three time-domain and three frequency-domain VAD algorithms is presented.
Abstract: We discuss techniques for voice activity detection (VAD) for voice over Internet Protocol (VoIP). VAD aids in saving the bandwidth requirement of a voice session, thereby increasing the bandwidth efficiently. We compare the quality of speech, level of compression and computational complexity for three time-domain and three frequency-domain VAD algorithms. Implementation of time-domain algorithms is computationally simple. However, better speech quality is obtained with the frequency-domain algorithms. A comparison of the merits and demerits along with the subjective quality of speech after removal of silence periods is presented for all the algorithms. A quantitative measurement of speech quality for different algorithms is also presented.

Journal ArticleDOI
TL;DR: A spectral domain, speech enhancement algorithm based on a mixture model for the short time spectrum of the clean speech signal, and on a maximum assumption in the production of the noisy speech spectrum that shows improved performance compared to alternative speech enhancement algorithms.
Abstract: We present a spectral domain, speech enhancement algorithm. The new algorithm is based on a mixture model for the short time spectrum of the clean speech signal, and on a maximum assumption in the production of the noisy speech spectrum. In the past this model was used in the context of noise robust speech recognition. In this paper we show that this model is also effective for improving the quality of speech signals corrupted by additive noise. The computational requirements of the algorithm can be significantly reduced, essentially without paying performance penalties, by incorporating a dual codebook scheme with tied variances. Experiments, using recorded speech signals and actual noise sources, show that in spite of its low computational requirements, the algorithm shows improved performance compared to alternative speech enhancement algorithms.

Journal ArticleDOI
TL;DR: In this article, a method for the real-time reconstruction of normal speech from whispers is proposed, where the normal speech is synthesized using the mixed excitation linear prediction model.

Patent
23 Dec 2002
TL;DR: In this paper, the authors proposed a method and apparatus for reducing the noise in a speech signal, which includes a microphone, a receiver, and a speech filter for suppressing noise in the auditory signal and sound.
Abstract: A method and apparatus for reducing noise in a speech signal. A handset or remote unit provides to users with a hearing deficiency, a first mode of operation where noise suppressant/speech enhancement algorithms are used during any auditory-related service. There is also provided, in a related mode of operation, speech filtering for reducing noise in a speech signal received through the microphone and outputting the filtered sound to the speaker. The handset includes a microphone for receiving an auditory sound, a receiver for receiving an auditory signal and a speech filter for suppressing noise in the auditory signal and sound. The speech filter also may be configured to shift the frequency and/or alter the intensity of the auditory signal and sound. The speaker is used for amplifying and outputting the enhanced speech component as an audible sound.

Journal ArticleDOI
TL;DR: A performance evaluation and comparison of G.729, AMR, and fuzzy voice activity detection (FVAD) algorithms was made using objective, psychoacoustic, and subjective parameters to evaluate the extent to which VADs depend on language, the signal-to-noise ratio, or the power level.
Abstract: The paper proposes a performance evaluation and comparison of G.729, AMR, and fuzzy voice activity detection (FVAD) algorithms. The comparison was made using objective, psychoacoustic, and subjective parameters. A highly varied speech database was also set up to evaluate the extent to which VADs depend on language, the signal-to-noise ratio (SNR), or the power level.

PatentDOI
TL;DR: In this article, the server evaluates the user speech in context of the current speech recognition exercise being executed, and provides responsive feedback to the client, again, in approximate real-time, with minimum latency delays.
Abstract: Methods and systems for handling speech recognition processing in effectively real-time, via the internet, in order that users do not experience noticeable delays from the start of an exercise until they receive responsive feedback. A user uses a client to access the internet and a server supporting speech recognition processing, e.g., for language learning activities. The user inputs speech to the client, which transmits the user speech to the server in approximate real-time. The server evaluates the user speech in context of the current speech recognition exercise being executed, and provides responsive feedback to the client, again, in approximate real-time, with minimum latency delays. The client upon receiving responsive feedback from the server, displays, or otherwise provides, the feedback to the user.

Proceedings ArticleDOI
13 May 2002
TL;DR: The characteristics of voiced speech can be used to derive a coherently added signal from the linear prediction (LP) residuals of the degraded speech data from different microphones to enhance speech degraded by noise and reverberation.
Abstract: This paper proposes an approach for processing speech from multiple microphones to enhance speech degraded by noise and reverberation. The approach is based on exploiting the features of the excitation source in speech production. In particular, the characteristics of voiced speech can be used to derive a coherently added signal from the linear prediction (LP) residuals of the degraded speech data from different microphones. A weight function is derived from the coherently added signal. For coherent addition the time-delay between a pair of microphones is estimated using the knowledge of the source information present in the LP residual. The enhanced speech is generated by exciting the time varying all-pole filter with the weighted LP residual.

Patent
30 Dec 2002
TL;DR: In this paper, the authors present a method for processing at least one voice signal in which a centralized voice processing unit controls operation of a plurality of voice processing blocks, and a centralized signal characteristic estimator is used to estimate the signal characteristic of a voice signal.
Abstract: Methods and apparatus for processing at least one voice signal in which a centralized voice processing unit controls operation of a plurality of voice processing blocks. In a first embodiment, the centralized voice processing unit comprises a centralized voice activity detector that provides at least one voice activity indication to the plurality of voice processing blocks. In a second embodiment, the centralized voice processing unit comprises a centralized noise estimator that provides at least one noise estimate to the plurality of voice processing blocks. In a third embodiment, the centralized voice processing unit comprises a centralized signal characteristic estimator that provides at least one signal characteristic estimate to the plurality of voice processing blocks.

Proceedings ArticleDOI
07 Nov 2002
TL;DR: A comparison of the relative merits and demerits along with the subjective quality of speech after the pruning of silence periods for four time-domain VAD algorithms in terms of speech quality, compression level and computational complexity.
Abstract: We discuss techniques for voice activity detection (VAD) for voice over Internet Protocol (VoIP). VAD aids in reducing the bandwidth requirement of a voice session, thereby using bandwidth efficiently. Such a scheme would be implemented in the application layer. Thus the VAD is independent of the lower layers in the network stack (see Flood, J.E., "Telecommunications Switching - Traffic and Networks", Prentice Hall India). We compare four time-domain VAD algorithms in terms of speech quality, compression level and computational complexity. A comparison of the relative merits and demerits along with the subjective quality of speech after the pruning of silence periods is presented for all the algorithms. A quantitative measurement of speech quality for different algorithms is also presented.