Topic
Transcription (software)
About: Transcription (software) is a research topic. Over the lifetime, 2678 publications have been published within this topic receiving 57837 citations.
Papers published on a yearly basis
Papers
More filters
Proceedings Article•
21 Jun 2014TL;DR: A speech recognition system that directly transcribes audio data with text, without requiring an intermediate phonetic representation is presented, based on a combination of the deep bidirectional LSTM recurrent neural network architecture and the Connectionist Temporal Classification objective function.
Abstract: This paper presents a speech recognition system that directly transcribes audio data with text, without requiring an intermediate phonetic representation. The system is based on a combination of the deep bidirectional LSTM recurrent neural network architecture and the Connectionist Temporal Classification objective function. A modification to the objective function is introduced that trains the network to minimise the expectation of an arbitrary transcription loss function. This allows a direct optimisation of the word error rate, even in the absence of a lexicon or language model. The system achieves a word error rate of 27.3% on the Wall Street Journal corpus with no prior linguistic information, 21.9% with only a lexicon of allowed words, and 8.2% with a trigram language model. Combining the network with a baseline system further reduces the error rate to 6.7%.
2,131 citations
23 Mar 1992
TL;DR: SWITCHBOARD as mentioned in this paper is a large multispeaker corpus of conversational speech and text which should be of interest to researchers in speaker authentication and large vocabulary speech recognition.
Abstract: SWITCHBOARD is a large multispeaker corpus of conversational speech and text which should be of interest to researchers in speaker authentication and large vocabulary speech recognition. About 2500 conversations by 500 speakers from around the US were collected automatically over T1 lines at Texas Instruments. Designed for training and testing of a variety of speech processing algorithms, especially in speaker verification, it has over an 1 h of speech from each of 50 speakers, and several minutes each from hundreds of others. A time-aligned word for word transcription accompanies each recording. >
2,102 citations
TL;DR: This work was supported by a grant from the European Commission BIOMED2 (BMH4-CT96-0656) and has been endorsed by the Food and Drug Administration (FDA) for use in clinical practice.
Abstract: I would like to thank Martin Bootman for preparing the figures. This work was supported by a grant from the European Commission BIOMED2 (BMH4-CT96-0656).
2,059 citations
Proceedings Article•
19 Jun 2016TL;DR: In this article, an end-to-end deep learning approach was used to recognize either English or Mandarin Chinese speech-two vastly different languages-using HPC techniques, enabling experiments that previously took weeks to now run in days.
Abstract: We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech-two vastly different languages. Because it replaces entire pipelines of hand-engineered components with neural networks, end-to-end learning allows us to handle a diverse variety of speech including noisy environments, accents and different languages. Key to our approach is our application of HPC techniques, enabling experiments that previously took weeks to now run in days. This allows us to iterate more quickly to identify superior architectures and algorithms. As a result, in several cases, our system is competitive with the transcription of human workers when benchmarked on standard datasets. Finally, using a technique called Batch Dispatch with GPUs in the data center, we show that our system can be inexpensively deployed in an online setting, delivering low latency when serving users at scale.
1,531 citations
TL;DR: It is suggested that researchers incorporate reflection into their research design by interrogating their transcription decisions and the possible impact these decisions may have on participants and research outcomes.
Abstract: In this paper we discuss the complexities of interview transcription. While often seen as a behind-the-scenes task, we suggest that transcription is a powerful act of representation. Transcription is practiced in multiple ways, often using naturalism, in which every utterance is captured in as much detail as possible, and/or denaturalism, in which grammar is corrected, interview noise (e.g., stutters, pauses, etc.) is removed and nonstandard accents (i.e., non-majority) are standardized. In this article, we discuss the constraints and opportunities of our transcription decisions and point to an intermediate, reflective step. We suggest that researchers incorporate reflection into their research design by interrogating their transcription decisions and the possible impact these decisions may have on participants and research outcomes.
877 citations