scispace - formally typeset
Search or ask a question

Showing papers on "Linear predictive coding published in 2000"


Journal ArticleDOI
TL;DR: A new method of processing speech degraded by reverberation based on analysis of short segments of data to enhance the regions in the speech signal having a high signal-to-reverberant component ratio (SRR).
Abstract: We propose a new method of processing speech degraded by reverberation. The method is based on analysis of short (2 ms) segments of data to enhance the regions in the speech signal having a high signal-to-reverberant component ratio (SRR). The short segment analysis shows that SRR is different in different segments of speech. The processing method involves identifying and manipulating the linear prediction residual signal in three different regions of the speech signal, namely, high SRR region, low SRR region, and only reverberation component region. A weight function is derived to modify the linear prediction residual signal. The weighted residual signal samples are used to excite a time-varying all-pole filter to obtain perceptually enhanced speech. The method is robust to noise present in the recorded speech signal. The performance is illustrated through spectrograms, subjective and objective evaluations.

210 citations


Proceedings ArticleDOI
05 Jun 2000
TL;DR: Both the objective and subjective test results shows that the proposed algorithm outperforms the conventional codebook mapping method.
Abstract: Reconstruction of wideband speech from its narrowband version is an attractive issue, since it can enhance the speech quality without modifying the existing communication networks. This paper proposes a new recovery method of wideband speech from narrowband speech. In the proposed method, the narrowband spectral envelope of input speech is transformed to a wideband spectral envelope based on the Gaussian mixture model (GMM), whose parameters are calculated by a joint density estimation technique. Then the lowband and highband speech signal is reconstructed by the LPC synthesizer using the reconstructed spectral envelope. This paper also proposes a codeword-dependent power estimation method. Both the objective and subjective test results shows that the proposed algorithm outperforms the conventional codebook mapping method.

197 citations


Journal ArticleDOI
TL;DR: A new subband-based classification scheme is developed for classifying underwater mines and mine-like targets from the acoustic backscattered signals using a feature extractor using wavelet packets in conjunction with linear predictive coding, a feature selection scheme, and a backpropagation neural-network classifier.
Abstract: In this paper, a new subband-based classification scheme is developed for classifying underwater mines and mine-like targets from the acoustic backscattered signals. The system consists of a feature extractor using wavelet packets in conjunction with linear predictive coding (LPC), a feature selection scheme, and a backpropagation neural-network classifier. The data set used for this study consists of the backscattered signals from six different objects: two mine-like targets and four nontargets for several aspect angles. Simulation results on ten different noisy realizations and for signal-to-noise ratio (SNR) of 12 dB are presented. The receiver operating characteristic (ROC) curve of the classifier generated based on these results demonstrated excellent classification performance of the system. The generalization ability of the trained network was demonstrated by computing the error and classification rate statistics on a large data set. A multiaspect fusion scheme was also adopted in order to further improve the classification performance.

189 citations


Patent
09 Aug 2000
TL;DR: In this article, the authors present methods and systems for testing speech recognition systems using a text-to-speech (T2S) device, in which the speech recognition device to be tested is directly monitored in accordance with a T2S device.
Abstract: Methods and systems for testing speech recognition systems are disclosed in which the speech recognition device to be tested is directly monitored in accordance with a text-to-speech device The collection of reference texts to be used by the speech recognition device is provided by a text-to-speech device preferably, in one embodiment, implemented within the same computer system In such an embodiment, a digital audio file stored within a storage area of a computer system is generated from a reference text using a text-to-speech device The digital audio file is later read using a speech recognition device to generate a decoded (or recognized) text representative of the reference text The reference text and the decoded text are compared in an alignment operation and an error report representative of the recognition rate of the speech recognition device is finally generated

178 citations


Patent
Yoshinori Shiga1
11 Jan 2000
TL;DR: In this paper, a text analysis section reads, from a text file, a text to be subjected to speech synthesis, and analyzes the text using a morphological analysis section, a syntactic structure analysis, a semantic analysis, and a similarly-pronounced-word detecting section.
Abstract: A text analysis section reads, from a text file, a text to be subjected to speech synthesis, and analyzes the text using a morphological analysis section, a syntactic structure analysis section, a semantic analysis section and a similarly-pronounced-word detecting section. A speech segment selecting section incorporated in a speech synthesizing section obtains the degree of intelligibility of synthetic speech for each accent phrase on the basis of the text analysis result of the text analysis section, thereby selecting a speech segment string corresponding to each accent phrase on the basis of the degree of intelligibility from one of a 0th-rank speech segment dictionary, a first-rank speech segment dictionary and a second-rank speech segment dictionary. A speech segment connecting section connects selected speech segment strings and subjects the connection result to speech synthesis performed by a synthesizing filter section.

138 citations


Proceedings ArticleDOI
28 Mar 2000
TL;DR: A lossless algorithm of delta compression that attempts to predict the next point from previous points using higher-order polynomial extrapolation, in contrast to traditional predictive coding that takes into account varying (non-equidistant) domain steps.
Abstract: Summary form only given. We propose a lossless algorithm of delta compression (a variant of predictive coding) that attempts to predict the next point from previous points using higher-order polynomial extrapolation. In contrast to traditional predictive coding our method takes into account varying (non-equidistant) domain (typically, time) steps. To save space and guarantee lossless compression, the actual and predicted values are converted to 64-bit integers. The residual (difference between actual and predicted values) is computed as difference of integers. The unnecessary bits of the residual are truncated, e.g., 1111110101 is replaced by 10101. The length of the bit sequence (5/sub 10/=(000101)/sub 2/) is prepended.

108 citations


PatentDOI
TL;DR: The authors used subband cepstral features to improve the recognition string accuracy rates for speech inputs for first training and then recognizing speech, using a method and apparatus for first classifying speech.
Abstract: A method and apparatus for first training and then recognizing speech. The method and apparatus use subband cepstral features to improve the recognition string accuracy rates for speech inputs.

107 citations


PatentDOI
Xiaobo Pi1, Ying Jia1
TL;DR: In this paper, an interactive voice response system is described that supports full duplex data transfer to enable the playing of a voice prompt to a user of telephony system while the system listens for voice barge-in from the user.
Abstract: An interactive voice response system is described that supports full duplex data transfer to enable the playing of a voice prompt to a user of telephony system while the system listens for voice barge-in from the user. The system includes a speech detection module that may utilize various criteria such as frame energy magnitude and duration thresholds to detect speech. The system also includes an automatic speech recognition engine. When the automatic speech recognition engine recognizes a segment of speech, a feature extraction module may be used to subtract a prompt echo spectrum, which corresponds to the currently playing voice prompt, from an echo-dirtied speech spectrum recorded by the system. In order to improve spectrum subtraction, an estimation of the time delay between the echo-dirtied speech and the prompt echo may also be performed.

87 citations


PatentDOI
Huan-Yu Su1, Eyal Shlomot1, Jes Thyssen1, Adil Benyassine1, Yang Gao1 
TL;DR: There is provided a conference bridge or transcoder configured to intelligently handle multiple speech channels in the contest of a packet network, wherein various speech channels may adhere to variety of speech encoding standards.
Abstract: There is provided a conference bridge or transcoder configured to intelligently handle multiple speech channels in the contest of a packet network, wherein various speech channels may adhere to variety of speech encoding standards. For example, the conference bridge establishes framing and alignment of multiple incoming speech channels associated with multiple participants, extracts parameters from the speech samples, mixes the parameters, and re-encodes the resulting speech samples for transmission to the participants. In one aspect, a speech processing method comprises decoding a first bitstream according to a first coding scheme to generate first speech samples and a first side information; generating second speech samples and a second side information using the first speech samples and the first side information, for use according to a second coding scheme; and creating a second bitstream, encoded based on the second coding scheme, using the second speech samples and the second side information.

81 citations


Proceedings ArticleDOI
17 Sep 2000
TL;DR: An algorithm to recover wideband speech from lowpass-bandlimited speech that needs only one single wideband codebook and inherently guarantees the transparency of the system in the base-band is proposed.
Abstract: In this paper we propose an algorithm to recover wideband speech from lowpass-bandlimited speech. The narrowband input signal is classified into a limited number of speech sounds for which the information about the wideband spectral envelope is taken from a pre-trained codebook. For the codebook search algorithm a statistical approach based on a hidden Markov model is used, which takes different features of the bandlimited speech into account, and minimizes a mean squared error criterion. The new algorithm needs only one single wideband codebook and inherently guarantees the transparency of the system in the base-band. The enhanced speech exhibits a significantly larger bandwidth than the input speech without introducing objectionable artifacts.

80 citations


Patent
05 Jun 2000
TL;DR: In this article, a method of estimating a confidence measure for a speech recognition system, involves comparing an input speech signal with a number of predetermined models of possible speech signals, and best scores indicating the degree of similarity between the input speech signals and each of the predetermined models are then used to determine a normalized variance, which is used as the Confidence Measure, in order to determine whether the input signal has been correctly recognized.
Abstract: A method of estimating a confidence measure for a speech recognition system, involves comparing an input speech signal with a number of predetermined models of possible speech signals. Best scores indicating the degree of similarity between the input speech signal and each of the predetermined models are then used to determine a normalized variance, which is used as the Confidence Measure, in order to determine whether the input speech signal has been correctly recognized, the Confidence Measure is compared to a threshold value. The threshold value is weighted according to the Signal to Noise Ratio of the input speech signal and according to the number of predetermined models used.

Proceedings ArticleDOI
17 Sep 2000
TL;DR: An algorithm to generate wideband speech from narrow band speech using as low as 500 bit/s of side information is presented, which has enhanced quality compared to narrowband speech.
Abstract: Wireless telephone speech is usually limited to the 300-3400 Hz band, which reduces its quality. There is thus a growing demand for wideband speech systems that transmit from 50 Hz to 8000 Hz. This paper presents an algorithm to generate wideband speech from narrowband speech using as low as 500 bit/s of side information. The 50-300 Hz band is predicted from the narrowband signal. A source-excitation model is used for the 3400-8000 Hz band, where the excitation is extrapolated at the receiver, and the spectral envelope is transmitted. Though some artifacts are present, the resulting wideband speech has enhanced quality compared to narrowband speech.

Proceedings Article
01 Jan 2000
TL;DR: A series of tools based on current and newly developed techniques for AIR integrated under MARSYAS, the authors' framework for audio analysis are described, based on Signal Processing, Pattern Recognition and Visualization techniques.
Abstract: Most of the work in music Information Retrieval (MIR) and analysis has been performed using symbolic representation like MIDI The recent advances in computing power and network connectivity have made large amounts of raw digital audio data available in the form of unstructured monolithic sound files In this work the focus is on tools that work directly on real world audio data without attempting to transcribe the music To distinguish from symbolic−based music IR for the remainder of the paper we use the term audio IR (AIR) to refer to techniques that work directly on raw audio signals Obviously these signals can contain music as well as other types of audio like speech We describe a series of tools based on current and newly developed techniques for AIR integrated under MARSYAS, our framework for audio analysis For related work refer to (Foote, 1999) The tools developed are based on Signal Processing, Pattern Recognition and Visualization techniques Finally, due to the immature state of the available techniques and to the inherent complexity of the task it is important to take advantage of the human user in the system We have developed two user interfaces to integrate and improve our techniques: an augmented sound editor and TimbreGrams a novel graphical representation for soundfiles The previously unpublished contributions of this paper are the genre classification method, the segmentation− based retrieval and summarization, and the definition of the TimbreGram Feature−based audio analysis The developed analysis tools are based on the calculation of short−time feature vectors The signal is processed in small chunks so that its statistical characteristics are relativily stable For each chunk some form of spectral analysis is performed and based on that analysis a vector of feature values is calculated In our system features based on FFT (Fast Fourier Transform) analysis, MPEG filterbank analysis, LPC (Linear Predictive Coding) and MFCC (Mel−Frequency cepstrum coefficients) are supported In addition derivatives and running statistics are used to express temporal changes The flexible architecture of MARSYAS allows the easy integration and experimentation of new features Based on the calculated features different types of audio analysis processes can be performed

Proceedings ArticleDOI
05 Jun 2000
TL;DR: Subjective tests show that the proposed bandwidth-scalable coding scheme based on the G.729 standard as a base layer coder achieves better performance than the 16 kbit/s MPEG-4 CELP with bandwidth scalability.
Abstract: This paper proposes a bandwidth-scalable coding scheme based on the G.729 standard as a base layer coder. In the scheme, according to the channel conditions, the output speech of the decoder can be selected to be narrowband (4-kHz bandwidth) or wideband (8-kHz bandwidth). The proposed scheme consists of two layers: base and enhancement. The base coder uses the G.729 algorithm to encode narrowband speech. The enhancement coder is based on a full-band CELP model and it encodes wideband speech while making use of the available base layer information. Two bandwidth-scalable coders are designed: one is scalable with the 8 kbit/s G.729 base coder and another with the 6.4 kbit/s G.729 (Annex D) base coder. Subjective tests show that, for wideband speech, the proposed coders at 16 kbit/s achieve better performance than the 16 kbit/s MPEG-4 CELP with bandwidth scalability.

Proceedings ArticleDOI
05 Jun 2000
TL;DR: It was found that a low LPC order in GSM coding is responsible for most performance degradations and a speaker recognition system equivalent in performance to the original one which decodes and reanalyzes speech before performing recognition is obtained.
Abstract: This paper investigates the influence of GSM speech coding on text independent speaker recognition performance. The three existing GSM speech coder standards were considered. The whole TIMIT database was passed through these coders, obtaining three transcoded databases. In a first experiment, it was found that the use of GSM coding degrades significantly the identification and verification performance (performance in correspondence with the perceptual speech quality of each coder). In a second experiment, the features for the speaker recognition system were calculated directly from the information available in the encoded bit stream. It was found that a low LPC order in GSM coding is responsible for most performance degradations. By extracting the features directly from the encoded bit-stream, we also managed to obtain a speaker recognition system equivalent in performance to the original one which decodes and reanalyzes speech before performing recognition.

Proceedings ArticleDOI
17 Sep 2000
TL;DR: A speech/music discrimination procedure for multi-mode wideband coding that is suitable for combined speech and audio coding and shows improved performance when compared to single-mode encoding is described.
Abstract: We propose in this paper a general solution for combined speech and audio coding. Particularly, we describe a speech/music discrimination procedure for multi-mode wideband coding. The speech/music decision is updated only when a low-energy frame is detected, and kept unchanged otherwise. The signal is classified using second-order statistics of discriminant parameters. An experimental CELP/transform coder operating at 16 kbit/s is demonstrated. Results show improved performance when compared to single-mode encoding.

PatentDOI
TL;DR: In this article, an information unit (4) enabling a speech input is stored on a server (5) and can be retrieved by a client (1, 2, 3) and in which the client can be coupled to one or more speech recognizers (7, 8, 9) through a communications network (6), the information unit is assigned additional information (12) which is provided for determining a combination of a client for recognizing an uttered speech signal and at least one of the speech recognisers (7.
Abstract: In a method in which an information unit (4) enabling a speech input is stored on a server (5) and can be retrieved by a client (1, 2, 3) and in which the client can be coupled to one or more speech recognizers (7, 8, 9) through a communications network (6), the information unit (4) is assigned additional information (12) which is provided for determining a combination of a client (1, 2, 3) for recognizing an uttered speech signal and at least one of the speech recognizers (7, 8, 9), to dynamically assign the speech recognizers (7, 8, 9) in a communications network (6) to the information units (4) and thus ensure an acceptable processing time for the recognition of a speech input with a high recognition quality.

Proceedings ArticleDOI
Alan V. McCree1
05 Jun 2000
TL;DR: This paper describes a new 14 kb/s wideband speech coder, which uses a split-band approach, and is capable of producing high quality output speech.
Abstract: This paper describes a new 14 kb/s wideband speech coder. The coder uses a split-band approach, where the input signal, sampled at 16 kHz, is split into two equal frequency bands from 0-4 kHz and 4-8 kHz, each of which is decimated to an 8 kHz sampling rate. The lower band is coded with a high-quality narrowband speech coder, the 11.8 kb/s G.729 Annex E, while the higher band is represented by a simple but effective parametric model. Two new features facilitate efficient coding of the high-band signal: noise modulation and high-frequency reversal. Since the encoding of the lower band is independent of the high-band signal, the narrowband encoder output can be embedded in the overall bitstream. Subjective test results show that this wideband speech coder is capable of producing high quality output speech.

Proceedings Article
01 Jan 2000
TL;DR: In this article, a single-pass adaptive algorithm that uses context classification and multiple linear predictors, locally optimized on a pixel-by-pixel basis, is proposed to obtain a compression ratio comparable to CALIC while improving on some images.
Abstract: In the past years, there have been several improvements in lossless image compression. All the recently proposed state-of-the-art lossless image compressors can be roughly divided into two categories: single and double-pass compressors. Linear prediction is rarely used in the first category, while TMW [7], a state-of-the-art double-pass image compressor, relies on linear prediction for its performance. We propose a single-pass adaptive algorithm that uses context classification and multiple linear predictors, locally optimized on a pixel-by-pixel basis. Locality is also exploited in the entropy coding of the prediction error The results we obtained on a test set of several standard images are encouraging. On the average, our ALPC obtains a compression ratio comparable to CALIC [20] while improving on some images.

Patent
18 Oct 2000
TL;DR: In this article, a speech coding method and device for encoding and decoding an input signal and providing synthesized speech is presented, where the higher frequency components of the synthesised speech are achieved by high-pass filtering and coloring an artificial signal to provide a processed artificial signal.
Abstract: A speech coding method and device for encoding and decoding an input signal and providing synthesized speech, wherein the higher frequency components of the synthesized speech are achieved by high-pass filtering and coloring an artificial signal to provide a processed artificial signal. The processed artificial signal is scaled by a first scaling factor during the active speech periods of the input signal and a second scaling factor during the non-active speech periods, wherein the first scaling factor is characteristic of the higher frequency band of the input signal and the second scaling factor is characteristic of the lower frequency band of the input signal. In particular, the second scaling factor is estimated based on the lower frequency components of the synthesized speech and the coloring of the artificial signal is based on the linear predictive coding coefficients characteristic of the lower frequency of the input signal.

Proceedings ArticleDOI
05 Jun 2000
TL;DR: A system that generates a low-band signal from a telephone-bandspeech signal to obtain an extended-band speech signal to increase signal naturalness and listening comfort and compatibility with all current telephone networks is maintained.
Abstract: This paper describes a system that generates a low-band signal (100-300 Hz) from a telephone-band (300-3400 Hz) speech signal to obtain an extended-band speech signal (100-3400 Hz). The low-band increases signal naturalness and listening comfort. This system is applied at the receiving end such that compatibility with all current telephone networks is maintained. The described technique splits the telephone-band speech signal into a spectral envelope and a short-term residual. The spectral envelope and the residual are extended separately and recombined to create an extended band signal. This system is evaluated by listening tests and distortion measurement.

PatentDOI
Huang Pengjun1
TL;DR: In this article, a speech classification technique for robust classification of varying modes of speech to enable maximum performance of multi-mode variable bit rate encoding techniques is presented, where a speech classifier accurately classifies a high percentage of speech segments for encoding at minimal bit rates, meeting lower bit rate requirements.
Abstract: A speech classification technique for robust classification of varying modes of speech to enable maximum performance of multi-mode variable bit rate encoding techniques. A speech classifier accurately classifies a high percentage of speech segments for encoding at minimal bit rates, meeting lower bit rate requirements. Highly accurate speech classification produces a lower average encoded bit rate, and higher quality decoded speech. The speech classifier considers a maximum number of parameters for each frame of speech, producing numerous and accurate speech mode classifications for each frame. The speech classifier correctly classifies numerous modes of speech under varying environmental conditions. The speech classifier inputs classification parameters from external components, generates internal classification parameters from the input parameters, sets a Normalized Auto-correlation Coefficient Function threshold and selects a parameter analyzer according to the signal environment, and then analyzes the parameters to produce a speech mode classification.

Proceedings ArticleDOI
05 Jun 2000
TL;DR: The proposed method proved to be able to improve significantly (more than 10% in all adverse mixing situations) the performance of a continuous phoneme-based speech recognition system and therefore can be used as a front-end to separate simultaneous speech of speakers who are moving in arbitrary directions in reverberant rooms.
Abstract: In this paper we present a new on-line blind signal separation method capable to separate convolutive speech signals of moving speakers in highly reverberant rooms. The separation network used is a recurrent network which performs separation of convolutive speech mixtures in the time domain, without any prior knowledge of the propagation media, based on the maximum likelihood estimation (MLE) principle. The proposed method proved to be able to improve significantly (more than 10% in all adverse mixing situations) the performance of a continuous phoneme-based speech recognition system and therefore can be used as a front-end to separate simultaneous speech of speakers who are moving in arbitrary directions in reverberant rooms.

Proceedings ArticleDOI
05 Jun 2000
TL;DR: A 1.2 kbps speech coder based on the mixed excitation linear prediction (MELP) analysis algorithm that achieves approximately the same quality as the proposed federal standard 2.4 kbps MELP coder.
Abstract: This paper presents a 1.2 kbps speech coder based on the mixed excitation linear prediction (MELP) analysis algorithm. In the proposed coder, the MELP parameters of three consecutive frames are grouped into a superframe and jointly quantized to obtain a high coding efficiency. The interframe redundancy is exploited with distinct quantization schemes for different unvoiced/voiced (U/V) frame combinations in the superframe. Novel techniques for improving performance make use of the superframe structure. These include pitch vector quantization using pitch differentials, joint quantization of pitch and U/V decisions and LSF quantization with a forward-backward interpolation method. Subjective test results indicate that the 1.2 kbps speech coder achieves approximately the same quality as the proposed federal standard 2.4 kbps MELP coder.


Journal ArticleDOI
TL;DR: An algorithm which determines the optimal segmentation with respect to a cost function relating prediction error to modeling cost is presented, whereby the segmentation is implicitly computed while minimizing the modelization distortion for a given modelization cost.
Abstract: A common technique to extend linear prediction to nonstationary signals is time segmentation: the signal is split into small portions and the modelization is carried out locally. The accuracy of the analysis is, however, dependent on the window size and on the signal characteristics, so that the problem of finding a good segmentation is crucial to the entire modeling scheme. In this paper, we present an algorithm which determines the optimal segmentation with respect to a cost function relating prediction error to modeling cost. The proposed approach casts the problem in a rate/distortion (R/D) framework, whereby the segmentation is implicitly computed while minimizing the modelization distortion for a given modelization cost. The algorithm is implemented by means of dynamic programming and takes the form of a trellis-based Lagrangian minimization. The optimal linear predictor, when applied to speech coding, dramatically reduces the number of bits per second devoted to the modeling parameters in comparison to fixed-window schemes.


Proceedings ArticleDOI
17 Sep 2000
TL;DR: Preliminary subjective test results indicate that the CELP technique provides very high quality speech that meets or exceeds all requirements for both the ETSI/3GPP and ITU-T wideband speech standardization efforts.
Abstract: Recent standardization efforts have provided motivation for research in the area of wideband speech coding (50 Hz to 7 kHz). Both the European Telecommunications Standards Institute (ETSI) in conjunction with the Third Generation Partnership Project (3GPP), and the ITU-T are currently in the process of evaluating candidate algorithms at bit rates from around 12 kbps to 32 kbps. This paper describes a code-excited linear prediction (CELP) technique used in the Motorola candidate algorithm that is scalable to a wide range of bit rates, thereby allowing the use of a ubiquitous speech model. Preliminary subjective test results indicate that the technique provides very high quality speech that meets or exceeds all requirements for both the ETSI/3GPP and ITU-T wideband speech standardization efforts.

Patent
27 Sep 2000
TL;DR: In this paper, a speech encoder selects a quantized speech spectral parameter vector as a current anchor vector, and perturbs the target speech parameter vector to derive a plurality (K) of perturbed speech parameter vectors.
Abstract: A system controller (106) includes a speech encoder (107) that dynamically segments frames of a low bit rate digital voice message. Speech model parameters have been generated in a sequence of frames. The speech model parameters include quantized speech spectral parameter vectors. The speech encoder selects (1820) a first quantized speech spectral parameter vector as a current anchor vector, selects (1820, 1830) a second quantized speech spectral parameter vector located a predetermined number of frames (LMAX) from the current anchor vector as a target speech parameter vector, and perturbs (1840) the target speech parameter vector to derive a plurality (K) of perturbed speech parameter vectors.

Proceedings ArticleDOI
05 Jun 2000
TL;DR: The experimental results show that the proposed space diversity speech recognition system can attain about 80% in accuracy, while the performances of conventional HMMs using close-talking microphones are less than 50%, indicating that the space diversity approach is promising for robust speech recognition under a real acoustic environment.
Abstract: This paper proposes space diversity speech recognition technique using distributed multi-microphones in a room, as a new paradigm of speech recognition. The key technology to realize the system is (1) distant-talking speech recognition and (2) the integration method of multiple inputs. In this paper, we propose the use of a distant speech model for distant-talking speech recognition, and feature-based and likelihood-based integration methods for multimicrophones distributed in the room. The distant speech model is a set of HMMs learned using speech data convolved with the impulse responses measured at several points in the room. The experimental results of simulated distant-talking speech recognition show that the proposed space diversity speech recognition system can attain about 80% in accuracy, while the performances of conventional HMMs using close-talking microphones are less than 50%. These results indicate that the space diversity approach is promising for robust speech recognition under a real acoustic environment.