scispace - formally typeset
Search or ask a question

Showing papers in "IEICE technical report. Speech in 2004"


Journal Article
TL;DR: In this paper, a robust speech recognition method based on position dependent Cepstral Mean Normalization (CMN) is proposed to compensate the mismatch between utterances spoken by human and those emitted from loudspeaker.
Abstract: In a distant environment, channel distortion may dramatically degrade speech recognition performance. In this paper, we propose a robust speech recognition method based on position dependent Cepstral Mean Normalization (CMN). At first the system measures the transmission characteristics according to the speaker positions from some grid points in the room a priori. In the recognition stage, the system estimates the speaker position in a 3-D space based on the time delay of arrival (TDOA) between distinct microphone pairs. And then the system selects the transmission characteristics estimated a priori corresponding to the estimated position and applies a channel distortion compensation method to the speech and recognizes it. In our proposed method, we also compensate the mismatch between the cepstral means of utterances spoken by human and those emitted from loudspeaker. Our experiments showed that the proposed method improved the performance of speech recognition system in a distant environment efficiently and it could also compensate the mismatch between voices from human and loudspeaker well.

10 citations



Journal Article
TL;DR: A novel approach to alleviate the PAPR problem of OFDM is proposed and a generalized OFDM (GOFDM) with frequency-domain equalization is presented and its performance in a frequency-selective fading channel is evaluated and compared with those of conventional OFDM and single carrier (SC) systems.
Abstract: A possible problem of orthogonal frequency division multiplexing (OFDM) is its high peak-to-average power ratio (PAPR). Recently, the single carrier (SC) transmission system with frequency-domain equalization is attracting much attention. In this paper, a novel approach to alleviate the PAPR problem of OFDM is proposed. A generalized OFDM (GOFDM) with frequency-domain equalization is presented and its performance in a frequency-selective fading channel is evaluated and compared with those of conventional OFDM and single carrier (SC) systems.

8 citations


Journal Article
Zhipeng Zhang1, Sadaoki Furui1
TL;DR: In this article, the authors proposed a new robust noisy speech recognition method based on robust end-point detection and online model adaptation using tree-structured noisy speech HMMs, which consists of; 1) blind speech segmentation; 2) best matching GMM selection; 3) recognizing the speech with the HMM that corresponds to the GMM; 4) endpoint detection based on recognition results; 5) HMM adaptation based on the recognition results, and 6) re-recognition using the adapted HMM.
Abstract: How to detect speech periods in noisy speech and how to cope with the temporal variation of noise characteristics are challenging problems. This paper proposes a new robust noisy speech recognition method based on robust end-point detection and online model adaptation using tree-structured noisy speech HMMs. The basic algorithm consists of; 1) blind speech segmentation; 2) best matching GMM selection; 3) recognizing the speech with the HMM that corresponds to the GMM; 4) end-point detection based on the recognition results; 5) HMM adaptation based on the recognition results; and 6) re-recognition using the adapted HMM. The processes of 1) through 6) are repeated by shifting the blind segmentation window until the end of the sequence of utterances is detected. The proposed method is evaluated by noisy speech collected by a Japanese dialogue system. Experimental results show that the proposed method is effective in recognizing noisy speech under various noise conditions.

6 citations




Journal Article
TL;DR: In this article, the performance of multicarrier code division multiplexing (MC-CDM) and cyclically prefixed direct-sequence CDM in multipath fading channels was theoretically analyzed.
Abstract: This paper theoretically analyzes the performance of multicarrier code division multiplexing (MC-CDM) and cyclically prefixed direct-sequence code division multiplexing (CP-DS-CDM) in multipath fading channels. The paper shows the relationship among the SNIR, diversity order obtained and BER lower bound for MC-CDM and CP-DS-CDM, and demonstrates some computer simulation results on performance comparison.

4 citations




Journal Article
TL;DR: This paper proposes a new talker localization method based on subband CSP analysis with weighting of an average speech spectrum which consists of subband analysis with equal bandwidth on mel-frequency and analysis weight coefficients based on an averagespeech spectrum, which are trained with speech database, in advance.
Abstract: Summary form only given, as follows. It is very important to capture distant-talking speech with high quality for hands-free speech acquisition systems. A microphone array steering is an ideal candidate for capturing distant-talking speech with high quality. However, it requires localizing a target talker before capturing distant-talking speech. Conventional talker localization methods cannot localize a target talker accurately in higher noisy environments. To deal with this problem, in this paper, we propose a new talker localization method based on subband CSP analysis with weighting of an average speech spectrum- It consists of subband analysis with equal bandwidth on mel-frequency and analysis weight coefficients based on an average speech spectrum, which are trained with speech database, in advance. As a result of evaluation experiments in a real room, we confirmed that the proposed method could provide better talker localization performance than the conventional methods.

3 citations


Journal Article
TL;DR: In this article, a trigger-based language model is proposed to model dependencies between words longer than those modeled by the n-gram language model, where task-dependent trigger pairs are first extracted from the corpus that matches the task, and then the occurrence probabilities of the pairs are estimated from both the task corpus and a large text corpus to avoid the data sparseness problem.
Abstract: In this paper we study the trigger-based language model, which can model dependencies between words longer than those modeled by the n-gram language model. Generally in language modeling, when the training corpus matches the target task, its size is typically small, and therefore insufficient for providing reliable probability estimates. On the other hand, large corpora are often too general to capture task dependency. The proposed approach tries to overcome this generality-sparseness trade-off problem by constructing a trigger-based language model in which task-dependent trigger pairs are first extracted from the corpus that matches the task, and then the occurrence probabilities of the pairs are estimated from both the task corpus and a large text corpus to avoid the data sparseness problem. We report evaluation results in the Corpus of Spontaneous Japanese (CSJ).







Journal Article
TL;DR: It is theoretically shown that the optimal solution to maximize the DS-CDMA forward link capacity under the condition of constant total transmit power is to transmit only from the best base station that has the maximum channel gain.
Abstract: In this paper, it is theoretically shown that the optimal solution to maximize the DS-CDMA forward link capacity under the condition of constant total transmit power is to transmit only from the best base station (BS) that has the maximum channel gain (this is called site selection diversity transmit (SSDT)). This theoretical analysis is confirmed by the Monte-Carlo computer simulation of an instantaneous SSDT which is based on instantaneous channel gain. In addition we also consider the case of average power based SSDT to study the difference of these two types of SSDT and their effects on forward link capacity.






Journal Article
TL;DR: In this article, generalized posterior probability (GPP) is used for verification of large vocabulary continuous speech recognition (LVCSR) output at subword, word and utterance levels.
Abstract: Generalized posterior probability (GPP), a statistical confidence measure, is used for verification of large vocabulary continuous speech recognition (LVCSR) output at subword, word and utterance levels. GPP is obtained by combining exponentially and optimally weighted products of acoustic and language model scores for reappeared units in the reduced search space (e.g., word graph). Experimental results have demonstrated the effectiveness of GPP for verifying LVCSR output at all three levels. Keyword confidence measure, posterior probability, large vocabulary continuous speech recognition 1 The author is now with Microsoft Research Asia.

Journal Article
TL;DR: A novel spatial filtering technique for orthogonal frequency division multiplexing (OFDM) signals called "Vlrtual Subcarrier Assignment (VISA)," which can easily achieve a space division multiple access (SDMA) by assigning a different spectral position of virtual subcarrier to a different user.