scispace - formally typeset

Conference of the International Speech Communication Association

About: Conference of the International Speech Communication Association is an academic conference. The conference publishes majorly in the area(s): Speaker recognition & Speech synthesis. Over the lifetime, 21820 publication(s) have been published by the conference receiving 342095 citation(s). more


Open accessProceedings Article
Tomas Mikolov1, Martin Karafiat1, Lukas Burget1, Jan Cernocký  +1 moreInstitutions (2)
01 Jan 2010-
Abstract: A new recurrent neural network based language model (RNN LM) with applications to speech recognition is presented. Results indicate that it is possible to obtain around 50% reduction of perplexity by using mixture of several RNN LMs, compared to a state of the art backoff language model. Speech recognition experiments show around 18% reduction of word error rate on the Wall Street Journal task when comparing models trained on the same amount of data, and around 5% on the much harder NIST RT05 task, even when the backoff model is trained on much more data than the RNN LM. We provide ample empirical evidence to suggest that connectionist language models are superior to standard n-gram techniques, except their high computational (training) complexity. Index Terms: language modeling, recurrent neural networks, speech recognition more

Topics: Time delay neural network (61%), Language model (60%), Recurrent neural network (59%) more

4,971 Citations

Open accessProceedings Article
01 Jan 2002-
Abstract: SRILM is a collection of C++ libraries, executable programs, and helper scripts designed to allow both production of and experimentation with statistical language models for speech recognition and other applications. SRILM is freely available for noncommercial purposes. The toolkit supports creation and evaluation of a variety of language model types based on N-gram statistics, as well as several related tasks, such as statistical tagging and manipulation of N-best lists and word lattices. This paper summarizes the functionality of the toolkit and discusses its design and implementation, highlighting ease of rapid prototyping, reusability, and combinability of tools. more

Topics: Modeling language (56%)

4,783 Citations

Proceedings ArticleDOI: 10.21437/INTERSPEECH.2014-80
01 Jan 2014-
Abstract: Long Short-Term Memory (LSTM) is a specific recurrent neural network (RNN) architecture that was designed to model temporal sequences and their long-range dependencies more accurately than conventional RNNs. In this paper, we explore LSTM RNN architectures for large scale acoustic modeling in speech recognition. We recently showed that LSTM RNNs are more effective than DNNs and conventional RNNs for acoustic modeling, considering moderately-sized models trained on a single machine. Here, we introduce the first distributed training of LSTM RNNs using asynchronous stochastic gradient descent optimization on a large cluster of machines. We show that a two-layer deep LSTM RNN where each LSTM layer has a linear recurrent projection layer can exceed state-of-the-art speech recognition performance. This architecture makes more effective use of model parameters than the others considered, converges quickly, and outperforms a deep feed forward neural network having an order of magnitude more parameters. Index Terms: Long Short-Term Memory, LSTM, recurrent neural network, RNN, speech recognition, acoustic modeling. more

2,030 Citations

Open accessProceedings Article
David Pearce, Hans-Günter Hirsch1Institutions (1)
01 Jan 2000-
Abstract: This paper describes a database designed to evaluate the performance of speech recognition algorithms in noisy conditions. The database may either be used for the evaluation of front-end feature extraction algorithms using a defined HMM recognition back-end or complete recognition systems. The source speech for this database is the TIdigits, consisting of connected digits task spoken by American English talkers (downsampled to 8kHz) . A selection of 8 different real-world noises have been added to the speech over a range of signal to noise ratios and special care has been taken to control the filtering of both the speech and noise. The framework was prepared as a contribution to the ETSI STQ-AURORA DSR Working Group [1]. Aurora is developing standards for Distributed Speech Recognition (DSR) where the speech analysis is done in the telecommunication terminal and the recognition at a central location in the telecom network. The framework is currently being used to evaluate alternative proposals for front-end feature extraction. The database has been made publicly available through ELRA so that other speech researchers can evaluate and compare the performance of noise robust algorithms. Recognition results are presented for the first standard DSR feature extraction scheme that is based on a cepstral analysis. more

Topics: Speech processing (69%), Voice activity detection (66%), Speaker recognition (63%) more

1,860 Citations

Proceedings ArticleDOI: 10.21437/INTERSPEECH.2005-446
04 Sep 2005-
Abstract: The article describes a database of emotional speech. Ten actors (5 female and 5 male) simulated the emotions, producing 10 German utterances (5 short and 5 longer sentences) which could be used in everyday communication and are interpretable in all applied emotions. The recordings were taken in an anechoic chamber with high-quality recording equipment. In addition to the sound electro-glottograms were recorded. The speech material comprises about 800 sentences (seven emotions * ten actors * ten sentences + some second versions). The complete database was evaluated in a perception test regarding the recognisability of emotions and their naturalness. Utterances recognised better than 80% and judged as natural by more than 60% of the listeners were phonetically labelled in a narrow transcription with special markers for voice-quality, phonatory and articulatory settings and articulatory features. The database can be accessed by the public via the internet ( more

1,558 Citations

No. of papers from the Conference in previous years

Top Attributes

Show by:

Conference's top 5 most impactful authors

Shrikanth S. Narayanan

265 papers, 6.2K citations

John H. L. Hansen

164 papers, 2.5K citations

Haizhou Li

144 papers, 2.2K citations

Hermann Ney

127 papers, 6.2K citations

Satoshi Nakamura

123 papers, 1.1K citations

Network Information
Related Conferences (5)
International Conference on Spoken Language Processing

802 papers, 19.1K citations

95% related
IEEE Automatic Speech Recognition and Understanding Workshop

907 papers, 32.3K citations

94% related
Spoken Language Technology Workshop

822 papers, 14.4K citations

94% related