Showing papers in &quot;Computer Speech &amp; Language in 2009&quot;

GA, MR, FFNN, PNN and GMM based models for automatic text summarization

TL;DR: RavenClaw isolates the domain-specific aspects of the dialog control logic from domain-independent conversational skills, and in the process facilitates rapid development of mixed-initiative systems operating in complex, task-oriented domains.

...read moreread less

284 citations

Journal Article•DOI•

[...]

Mohamed Abdel Fattah¹, Fuji Ren²•Institutions (2)

University of Tokushima¹, Beijing University of Posts and Telecommunications²

A machine learning approach to reading level assessment

TL;DR: This work proposes an approach to address the problem of improving content selection in automatic text summarization by using some statistical tools, which takes into account several features, including sentence position, positive keyword, negative keyword, sentence centrality, sentence resemblance to the title, sentenceclusion of name entity, sentence inclusion of numerical data, sentence relative length and aggregated similarity.

...read moreread less

235 citations

Journal Article•DOI•

[...]

Sarah E. Petersen¹, Mari Ostendorf¹•Institutions (1)

University of Washington¹

A unified framework of HMM adaptation with joint compensation of additive and convolutive distortions

TL;DR: This paper uses support vector machines to combine features from n-gram language models, parses, and traditional reading level measures to produce a better method of assessing reading level, and explores ways that multiple human annotations can be used in comparative assessments of system performance.

...read moreread less

205 citations

Journal Article•DOI•

[...]

Jinyu Li¹, Li Deng¹, Dong Yu¹, Yifan Gong¹, Alex Acero¹ - Show less +1 more•Institutions (1)

Microsoft¹

Natural language watermarking via morphosyntactic alterations

TL;DR: A model-domain environment robust adaptation algorithm, which demonstrates high performance in the standard Aurora 2 speech recognition task without discriminative training of the HMM system, using the clean-trained complex HMM backend as the baseline system for the unsupervised model adaptation.

...read moreread less

104 citations

Journal Article•DOI•

[...]

Hasan Mesut Meral¹, Bulent Sankur¹, A. Sumru Özsoy¹, Tunga Güngör¹, Emre Sevinç¹ - Show less +1 more•Institutions (1)

Boğaziçi University¹

Automatic pronunciation scoring of words and sentences independent from the non-native's first language

TL;DR: It is observed that agglutinative languages are somewhat more amenable to morphosyntax-based natural language watermarking and the free word order property of a language, like Turkish, is an extra bonus.

...read moreread less

102 citations

Journal Article•DOI•

[...]

Tobias Cincarek, Rainer Gruhn, Christian Hacker¹, Elmar Nöth¹, Satoshi Nakamura - Show less +1 more•Institutions (1)

University of Erlangen-Nuremberg¹

Data-driven user simulation for automated evaluation of spoken dialog systems

TL;DR: Experimental results show, that the reliability of automatic sentence level scoring by the system is almost as high as the average human evaluator, and that the system gives the highest pronunciation quality scores to 90% of native speakers' utterances.

...read moreread less

83 citations

Journal Article•DOI•

[...]

Sangkeun Jung¹, Cheongjae Lee¹, Kyungduk Kim¹, Minwoo Jeong¹, Gary Geunbae Lee¹ - Show less +1 more•Institutions (1)

Pohang University of Science and Technology¹

Combining lexical, syntactic and prosodic cues for improved online dialog act tagging

TL;DR: A novel integrated dialog simulation technique for evaluating spoken dialog systems using a linear-chain conditional random field, and a two-phase data-driven domain-specific user utterance simulation method and a linguistic knowledge-based ASR channel simulation method are presented.

...read moreread less

82 citations

Journal Article•DOI•

[...]

Vivek Kumar Rangarajan Sridhar¹, Srinivas Bangalore², Shrikanth S. Narayanan¹•Institutions (2)

University of Southern California¹, AT&T Labs²

Intonation modeling for Indian languages

TL;DR: It is shown that modeling the sequence of acoustic-prosodic values as n-gram features with a maximum entropy model for dialog act (DA) tagging can perform better than conventional approaches that use coarse representation of the prosodic contour through summative statistics of theProsody contour.

...read moreread less

68 citations

Journal Article•DOI•

[...]

K. Sreenivasa Rao¹, B. Yegnanarayana²•Institutions (2)

Indian Institute of Technology Kharagpur¹, International Institute of Information Technology²

Arabic diacritic restoration approach based on maximum entropy models

TL;DR: The proposed intonation models using neural networks are compared with Classification and Regression Tree (CART) models and it is found that 88% of the F"0 values (pitch) of the syllables could be predicted from the models within 15% ofThe actual F" 0.

...read moreread less

65 citations

Journal Article•DOI•

[...]

Imed Zitouni¹, Ruhi Sarikaya¹•Institutions (1)

IBM¹

Harnessing graphics processors for the fast computation of acoustic likelihoods in speech recognition

TL;DR: A comparison of the approach to previously published techniques is shown and the effectiveness of this technique in restoring diacritics in different kind of data such as the dialectal Iraqi Arabic scripts is demonstrated.

...read moreread less

59 citations

Journal Article•DOI•

[...]

Paul R. Dixon¹, Tasuku Oonishi¹, Sadaoki Furui¹•Institutions (1)

Tokyo Institute of Technology¹

Frequency warping for VTLN and speaker adaptation by linear transformation of standard MFCC

TL;DR: This paper evaluates the GPU technique on a large vocabulary spontaneous speech recognition task using a set of acoustic models with varying complexity and the results consistently show by using the GPU it is possible to reduce the recognition time.

...read moreread less

Journal Article•DOI•

[...]

Sankaran Panchapagesan¹, Abeer Alwan¹•Institutions (1)

University of California, Los Angeles¹

Continuous speech recognition with sparse coding

TL;DR: The performance of the new LT was comparable to that of regular VTLN implemented by warping the Mel filterbank, when the MLS criterion was used for FW estimation, and it is shown that the approximations involved do not lead to any performance degradation.

...read moreread less

Journal Article•DOI•

[...]

W. J. Smit¹, Etienne Barnard•Institutions (1)

University of Pretoria¹

Efficient likelihood evaluation and dynamic Gaussian selection for HMM-based speech recognition

TL;DR: This article shows how sparse codes can be used to do continuous speech recognition by using an iterative subset selection algorithm with quadratic programming to find a sparse code for a spectrogram.

...read moreread less

Journal Article•DOI•

[...]

Jun Cai¹, Ghazi Bouselmi², Yves Laprie², Jean-Paul Haton²•Institutions (2)

Xiamen University¹, French Institute for Research in Computer Science and Automation²

Learning-based pronoun resolution for Turkish with a comparative evaluation

TL;DR: A fast likelihood computation approach called dynamic Gaussian selection (DGS) is proposed, which is a one-pass search technique which generates a dynamic shortlist of Gaussians for each state during the procedure of likelihood computation.

...read moreread less

Journal Article•DOI•

[...]

Yılmaz Kılıçaslan¹, Edip Serdar Güner¹, Savas Yildirim²•Institutions (2)

Trakya University¹, Istanbul Bilgi University²

The RavenClaw dialog management framework

TL;DR: An evaluation of the classification performances of the learning models for pronoun resolution in Turkish suggests that non-linear models properly tuned to avoid overfitting outperform linear ones when applied to the data used in the authors' experiments.

...read moreread less

Journal Article•DOI•

[...]

BohusDan, I RudnickyAlexander

Syllabification rules versus data-driven methods in a language with low syllabic complexity: The case of Italian

TL;DR: RavenClaw isolates the domain-specific aspects of the dialog control logic from domain-independent logic in a plan-based, task-independent dialog management framework.

...read moreread less

Journal Article•DOI•

[...]

Connie R. Adsett¹, Yannick Marchand¹, Vlado Keselj¹•Institutions (1)

Dalhousie University¹

Improving GMM-UBM speaker verification using discriminative feedback adaptation

TL;DR: Results show that data-driven methods can also outperform rule-based methods on Italian syllabification, a language of low syllabic complexity.

...read moreread less

Journal Article•DOI•

[...]

Yi-Hsiang Chao, Wei-Ho Tsai¹, Hsin-Min Wang²•Institutions (2)

National Taipei University of Technology¹, Academia Sinica²

Label propagation via bootstrapped support vectors for semantic relation extraction between named entities

TL;DR: A discriminative feedback adaptation (DFA) framework is proposed that reinforces the discriminability between the target speaker model and the anti-model, while preserving the generalization ability of the GMM-UBM approach.

...read moreread less

Journal Article•DOI•

[...]

Zhou Guodong¹, Qian Long-hua¹, Zhu Qiao-ming¹•Institutions (1)

Soochow University (Suzhou)¹

Hybrid statistical pronunciation models designed to be trained by a medium-size corpus

TL;DR: Evaluation on the ACE RDC corpora shows that the semi-supervised learning method proposed can integrate the advantages of both SVM bootstrapping and label propagation and significantly outperforms the normal LP algorithm via all the available data without SVMbootstrapping.

...read moreread less

Journal Article•DOI•

[...]

Bahram Vazirnezhad¹, Farshad Almasganj¹, Seyed Mohammad Ahadi¹•Institutions (1)

Amirkabir University of Technology¹

A data-driven grapheme-to-phoneme conversion method using dynamic contextual converting rules for Korean TTS systems

TL;DR: In this paper, generalized decision trees are used to predict places in the word where substitution, deletion and insertion of phonemes may occur, and appropriate statistical contextual rules are applied to the permitted places, in order to specifically determine word variants.

...read moreread less

Journal Article•DOI•

[...]

Jinsik Lee¹, Gary Geunbae Lee¹•Institutions (1)

Pohang University of Science and Technology¹

Automatic discovery of topics and acoustic morphemes from speech

TL;DR: This paper presents a data-driven Korean grapheme-to-phoneme conversion method including alignment, rule extraction, and rule pruning procedures that effectively handle the exceptional pronunciation of speech databases.

...read moreread less

Journal Article•DOI•

[...]

Christophe Cerisara

An Ngram-based reordering model

TL;DR: The proposed algorithm builds a lexicon enriched with topic information in three steps: transcription of an audio stream into phone sequences with a speaker- and task-independent phone recogniser, automatic lexical acquisition based on approximate string matching, and hierarchical topic clustering of the lexical entries based on a knowledge-poor co-occurrence approach.

...read moreread less

Journal Article•DOI•

[...]

Marta R. Costa-jussí¹, José A. R. Fonollosa¹•Institutions (1)

Polytechnic University of Catalonia¹

Missing data mask estimation with frequency and temporal dependencies

TL;DR: This Ngram-based reordering (NbR) approach uses the powerful techniques of SMT systems to generate a weighted reordering graph, and this allows an extension to the SMT decoding search.

...read moreread less

Journal Article•DOI•

[...]

Sébastien Demange, Christophe Cerisara, Jean-Paul Haton

Improving robustness of MLLR adaptation with speaker-clustered regression class trees

TL;DR: This work proposes to take into account frequency and temporal dependencies in order to improve the masks' estimation accuracy, and develops Bayesian models of the masks, which leads to a new architecture of a missing data mask estimator.

...read moreread less

Journal Article•DOI•

[...]

Arindam Mandal¹, Mari Ostendorf¹, Andreas Stolcke²•Institutions (2)

University of Washington¹, International Computer Science Institute²

An analysis of prosodic information for the recognition of dialogue acts in a multimodal corpus in Mexican Spanish

TL;DR: An algorithm is described that produces a linear combination of MLLR transformations from cluster-specific trees using weights estimated by maximizing the likelihood of a speaker's adaptation data to realize gains in unsupervised adaptation.

...read moreread less

Journal Article•DOI•

[...]

Sergio R. Coria¹, Luis A. Pineda¹•Institutions (1)

National Autonomous University of Mexico¹

The effect of code-mixing on accent identification accuracy

TL;DR: It is shown that utterance mood can be predicted from intonational information, and that this mood information can then be used to recognize the dialogue act.

...read moreread less

Journal Article•DOI•

[...]

Thomas Niesler¹, Febe de Wet¹•Institutions (1)

Stellenbosch University¹

Evolutionary minimization of the Rand index for speaker clustering

TL;DR: It is concluded that accent identification is more successful for Xhosa and Zulu utterances because accents are more pronounced for English embedded in mother-tongue speech than for English spoken as part of a monolingual dialogue by non-native speakers.

...read moreread less

Journal Article•DOI•

[...]

Wei-Ho Tsai¹, Hsin-Min Wang²•Institutions (2)

National Taipei University of Technology¹, Academia Sinica²

IXIR: A statistical information distillation system

TL;DR: The method jointly optimizes the generated clusters and the required number of clusters by estimating and minimizing the Rand index and uses a genetic algorithm to determine the cluster in which each utterance should be located.

...read moreread less

Journal Article•DOI•

[...]

Michael Levit¹, Dilek Hakkani-Tur¹, Gokhan Tur², Daniel Gillick¹•Institutions (2)

International Computer Science Institute¹, SRI International²