Conference

International Symposium on Chinese Spoken Language Processing

About: International Symposium on Chinese Spoken Language Processing is an academic conference. The conference publishes majorly in the area(s): Speech processing & Computer science. Over the lifetime, 1000 publications have been published by the conference receiving 4711 citations.

...read moreread less

Topics: Speech processing, Computer science, Speaker recognition, Speech synthesis, Mandarin Chinese ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Book Chapter•DOI•

HKUST/MTS: a very large scale mandarin telephone speech corpus

[...]

Yi Liu¹, Pascale Fung¹, Yongsheng Yang¹, Christopher Cieri², Shudong Huang², David Graff² - Show less +2 more•Institutions (2)

Hong Kong University of Science and Technology¹, University of Pennsylvania²

13 Dec 2006

TL;DR: The paper describes the design, collection, transcription and analysis of 200 hours of HKUST Mandarin Telephone Speech Corpus (HKUST/MTS), the largest and first of its kind for Mandarin conversational telephone speech, providing abundant and diversified samples for Mandarin speech recognition and other application-dependent tasks.

...read moreread less

Abstract: The paper describes the design, collection, transcription and analysis of 200 hours of HKUST Mandarin Telephone Speech Corpus (HKUST/MTS) from over 2100 Mandarin speakers in mainland China under the DARPA EARS framework. The corpus includes speech data, transcriptions and speaker demographic information. The speech data include 1206 ten-minute natural Mandarin conversations between either strangers or friends. Each conversation focuses on a single topic. All calls are recorded over public telephone networks. All calls are manually annotated with standard Chinese characters (GBK) as well as specific mark-ups for spontaneous speech. A file with speaker demographic information is also provided. The corpus is the largest and first of its kind for Mandarin conversational telephone speech, providing abundant and diversified samples for Mandarin speech recognition and other application-dependent tasks, such as topic detection, information retrieval, keyword spotting, speaker recognition, etc. In a 2004 evaluation test by NIST, the corpus is found to improve system performance quite significantly.

...read moreread less

124 citations

Proceedings Article•DOI•

Investigation of deep neural networks (DNN) for large vocabulary continuous speech recognition: Why DNN surpasses GMMS in acoustic modeling

[...]

Pan Jia, Cong Liu, Zhi-Guo Wang, Yu Hu, Hui Jiang¹ - Show less +1 more•Institutions (1)

York University¹

01 Dec 2012

TL;DR: This paper investigates DNN for several large vocabulary speech recognition tasks and proposes a few ideas to reconfigure the DNN input features, such as using logarithm spectrum features or VTLN normalized features in DNN.

...read moreread less

Abstract: Recently, it has been reported that context-dependent deep neural network (DNN) has achieved some unprecedented gains in many challenging ASR tasks, including the well-known Switchboard task. In this paper, we first investigate DNN for several large vocabulary speech recognition tasks. Our results have confirmed that DNN can consistently achieve about 25–30% relative error reduction over the best discriminatively trained GMMs even in some ASR tasks with up to 700 hours of training data. Next, we have conducted a series of experiments to study where the unprecedented gain of DNN comes from. Our experiments show the gain of DNN is almost entirely attributed to DNN's feature vectors that are concatenated from several consecutive speech frames within a relatively long context window. At last, we have proposed a few ideas to reconfigure the DNN input features, such as using logarithm spectrum features or VTLN normalized features in DNN. Our results have shown that each of these methods yields over 3% relative error reduction over the traditional MFCC or PLP features in DNN.

...read moreread less

102 citations

Proceedings Article•DOI•

Speech separation based on improved deep neural networks with dual outputs of speech features for both target and interfering speakers

[...]

Yan-Hui Tu¹, Jun Du¹, Yong Xu¹, Li-Rong Dai¹, Chin-Hui Lee² - Show less +1 more•Institutions (2)

University of Science and Technology of China¹, Georgia Institute of Technology²

27 Oct 2014

TL;DR: Experimental results show that the proposed new DNN enhances the separation performance in terms of different objective measures under the semi-supervised mode where the training data of the target speaker is provided while the unseen interferer in the separation stage is predicted by using multiple interfering speakers mixed with thetarget speaker in the training stage.

...read moreread less

Abstract: In this paper, a novel deep neural network (DNN) architecture is proposed to generate the speech features of both the target speaker and interferer for speech separation. DNN is adopted here to directly model the highly nonlinear relationship between speech features of the mixed signals and the two competing speakers. With the modified output speech features for learning the parameters of the DNN, the generalization capacity to unseen interferers is improved for separating the target speech. Meanwhile, without any prior information from the interferer, the interfering speech can also be separated. Experimental results show that the proposed new DNN enhances the separation performance in terms of different objective measures under the semi-supervised mode where the training data of the target speaker is provided while the unseen interferer in the separation stage is predicted by using multiple interfering speakers mixed with the target speaker in the training stage.

...read moreread less

62 citations

Book Chapter•DOI•

An HMM-based mandarin chinese text-to-speech system

[...]

Yao Qian¹, Frank K. Soong¹, Yining Chen¹, Min Chu¹•Institutions (1)

Microsoft¹

13 Dec 2006

TL;DR: The listening test results show that LSP and its dynamic counterpart, both in time and frequency, are preferred for the resultant higher synthesized speech quality.

...read moreread less

Abstract: In this paper we present our Hidden Markov Model (HMM)-based, Mandarin Chinese Text-to-Speech (TTS) system Mandarin Chinese or Putonghua, “the common spoken language”, is a tone language where each of the 400 plus base syllables can have up to 5 different lexical tone patterns Their segmental and supra-segmental information is first modeled by 3 corresponding HMMs, including: (1) spectral envelop and gain; (2) voiced/unvoiced and fundamental frequency; and (3) segment duration The corresponding HMMs are trained from a read speech database of 1,000 sentences recorded by a female speaker Specifically, the spectral information is derived from short-time LPC spectral analysis Among all LPC parameters, Line Spectrum Pair (LSP) has the closest relevance to the natural resonances or the “formants” of a speech sound and it is selected to parameterize the spectral information Furthermore, the property of clustered LSPs around a spectral peak justify augmenting LSPs with their dynamic counterparts, both in time and frequency, in both HMM modeling and parameter trajectory synthesis One hundred sentences synthesized by 4 LSP-based systems have been subjectively evaluated with an AB comparison test The listening test results show that LSP and its dynamic counterpart, both in time and frequency, are preferred for the resultant higher synthesized speech quality

...read moreread less

57 citations

Proceedings Article•DOI•

Detection of language boundary in code-switching utterances by bi-phone probabilities

[...]

J.Y.C. Chan¹, P.C. Ching¹, Tan Lee¹, Helen Meng•Institutions (1)

The Chinese University of Hong Kong¹

15 Dec 2004

TL;DR: This paper presents an effective method to detect the language boundary (LB) in code-switching utterances, mainly produced in Cantonese, a commonly used Chinese dialect, whilst occasionally English words are inserted between Cantonse words.

...read moreread less

Abstract: In this paper, we present an effective method to detect the language boundary (LB) in code-switching utterances. The utterances are mainly produced in Cantonese, a commonly used Chinese dialect, whilst occasionally English words are inserted between Cantonese words. Bi-phone probabilities are calculated to measure the confidence that the recognized phones are in Cantonese. Two sets of context-independent mono-phone models are trained by monolingual Cantonese and monolingual English data separately. Both knowledge-based and data-driven model selection approaches are studied in order to retain the language-dependent characteristics and to merge duplicated phone sets between the two languages. The LB detection accuracy is 75.12% for utterances that contain one single code-switching word or phrase.

...read moreread less

54 citations

Collapse

Performance

Metrics

1,000

Papers

4,711

Citations

No. of papers from the Conference in previous years
Year	Papers
2022	101
2021	70
2018	94
2016	135
2014	137
2012	96