scispace - formally typeset
Conference

International Conference on Acoustics, Speech, and Signal Processing

About: International Conference on Acoustics, Speech, and Signal Processing is an academic conference. The conference publishes majorly in the area(s): Speech processing & Adaptive filter. Over the lifetime, 46388 publication(s) have been published by the conference receiving 794416 citation(s).

... read more

Papers
  More

46,388 results found


Open accessProceedings ArticleDOI: 10.1109/ICASSP.2013.6638947
26 May 2013-
Abstract: Recurrent neural networks (RNNs) are a powerful model for sequential data. End-to-end training methods such as Connectionist Temporal Classification make it possible to train RNNs for sequence labelling problems where the input-output alignment is unknown. The combination of these methods with the Long Short-term Memory RNN architecture has proved particularly fruitful, delivering state-of-the-art results in cursive handwriting recognition. However RNN performance in speech recognition has so far been disappointing, with better results returned by deep feedforward networks. This paper investigates deep recurrent neural networks, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs. When trained end-to-end with suitable regularisation, we find that deep Long Short-term Memory RNNs achieve a test set error of 17.7% on the TIMIT phoneme recognition benchmark, which to our knowledge is the best recorded score.

... read more

Topics: Recurrent neural network (64%), Deep learning (62%), Time delay neural network (60%) ... show more

5,938 Citations


Proceedings ArticleDOI: 10.1109/ICASSP.2015.7178964
19 Apr 2015-
Abstract: This paper introduces a new corpus of read English speech, suitable for training and evaluating speech recognition systems. The LibriSpeech corpus is derived from audiobooks that are part of the LibriVox project, and contains 1000 hours of speech sampled at 16 kHz. We have made the corpus freely available for download, along with separately prepared language-model training data and pre-built language models. We show that acoustic models trained on LibriSpeech give lower error rate on the Wall Street Journal (WSJ) test sets than models trained on WSJ itself. We are also releasing Kaldi scripts that make it easy to build these systems.

... read more

Topics: Speech corpus (64%), VoxForge (61%), Language model (51%)

2,611 Citations


Proceedings ArticleDOI: 10.1109/ICASSP.2001.941023
07 May 2001-
Abstract: Previous objective speech quality assessment models, such as bark spectral distortion (BSD), the perceptual speech quality measure (PSQM), and measuring normalizing blocks (MNB), have been found to be suitable for assessing only a limited range of distortions. A new model has therefore been developed for use across a wider range of network conditions, including analogue connections, codecs, packet loss and variable delay. Known as perceptual evaluation of speech quality (PESQ), it is the result of integration of the perceptual analysis measurement system (PAMS) and PSQM99, an enhanced version of PSQM. PESQ is expected to become a new ITU-T recommendation P.862, replacing P.861 which specified PSQM and MNB.

... read more

Topics: PSQM (72%), PESQ (71%), POLQA (63%) ... show more

1,914 Citations


Proceedings ArticleDOI: 10.1109/ICASSP.1992.225858
J.J. Godfrey1, E. Holliman1, J. McDaniel1Institutions (1)
23 Mar 1992-
Abstract: SWITCHBOARD is a large multispeaker corpus of conversational speech and text which should be of interest to researchers in speaker authentication and large vocabulary speech recognition. About 2500 conversations by 500 speakers from around the US were collected automatically over T1 lines at Texas Instruments. Designed for training and testing of a variety of speech processing algorithms, especially in speaker verification, it has over an 1 h of speech from each of 50 speakers, and several minutes each from hundreds of others. A time-aligned word for word transcription accompanies each recording. >

... read more

Topics: Speech corpus (66%), Speech processing (65%), VoxForge (62%) ... show more

1,881 Citations


Proceedings ArticleDOI: 10.1109/ICASSP.1995.479394
Reinhard Kneser1, Hermann NeyInstitutions (1)
09 May 1995-
Abstract: In stochastic language modeling, backing-off is a widely used method to cope with the sparse data problem. In case of unseen events this method backs off to a less specific distribution. In this paper we propose to use distributions which are especially optimized for the task of backing-off. Two different theoretical derivations lead to distributions which are quite different from the probability distributions that are usually used for backing-off. Experiments show an improvement of about 10% in terms of perplexity and 5% in terms of word error rate.

... read more

Topics: Perplexity (57%), Probability distribution (55%), Kneser–Ney smoothing (53%) ... show more

1,708 Citations


Performance
Metrics
No. of papers from the Conference in previous years
YearPapers
20211,723
20201,863
20191,738
20181,390
20171,333
20161,325

Top Attributes

Show by:

Conference's top 5 most impactful authors

Georgios B. Giannakis

116 papers, 2.8K citations

Shrikanth S. Narayanan

111 papers, 1.8K citations

Yonina C. Eldar

98 papers, 861 citations

Martin Vetterli

79 papers, 1.8K citations

Abdelhak M. Zoubir

75 papers, 648 citations

Network Information
Related Conferences (5)
European Signal Processing Conference

11.3K papers, 82K citations

94% related
Asilomar Conference on Signals, Systems and Computers

11.5K papers, 128.8K citations

91% related
International Conference on Digital Signal Processing

3.3K papers, 23K citations

91% related