scispace - formally typeset
Journal ArticleDOI

An automatic caption alignment mechanism for off-the-shelf speech recognition technologies

Maria Federico, +1 more
- 01 Sep 2014 - 
- Vol. 72, Iss: 1, pp 21-40
TLDR
A novel, automatic, simple and low-cost mechanism that does not require human transcriptions or special dedicated software to align captions is proposed and can be helpful to expand video content accessibility.
Abstract
With a growing number of online videos, many producers feel the need to use video captions in order to expand content accessibility and face two main issues: production and alignment of the textual transcript. Both activities are expensive either for the high labor of human resources or for the employment of dedicated software. In this paper, we focus on caption alignment and we propose a novel, automatic, simple and low-cost mechanism that does not require human transcriptions or special dedicated software to align captions. Our mechanism uses a unique audio markup and intelligently introduces copies of it into the audio stream before giving it to an off-the-shelf automatic speech recognition (ASR) application; then it transforms the plain transcript produced by the ASR application into a timecoded transcript, which allows video players to know when to display every single caption while playing out the video. The experimental study evaluation shows that our proposal is effective in producing timecoded transcripts and therefore it can be helpful to expand video content accessibility.

read more

Citations
More filters
Journal ArticleDOI

Privacy Perception when Using Smartphone Applications

TL;DR: The user’s perception of privacy is influenced by the knowledge of the data used by the installed applications; applications access to much more data than they need is analyzed.
Journal ArticleDOI

On gamifying the transcription of digital video lectures

TL;DR: The evaluation shows that the accuracy of the obtained transcripts is higher than the one obtained by speech recognition technologies and also shows that participants like the game approach, and ALGA can be considered a reasonable, feasible and affordable solution to produce transcripts from video lectures.
Journal ArticleDOI

On introducing timed tag-clouds in video lectures indexing

TL;DR: VLB (Video Lecture Browsing) is proposed, a system designed to facilitate both the retrieval of video lectures within video archives and the finding of the most appropriate segment of a video lecture that covers a searched topic by automatically producing a general picture of the contents of aVideo lecture.
Proceedings ArticleDOI

Topic-based playlist to improve video lecture accessibility

TL;DR: The idea is to use low-level audio/video features, video segmentation and OCR analysis to “understand” the content of the video lectures to improve accessibility to video lecture materials.
Proceedings ArticleDOI

Conversational Interfaces for a Smart Campus: A Case Study

TL;DR: A case study based on the design, development, and assessment of a prototype devoted to assist students' during their daily activities in a smart campus context is presented.
References
More filters
Journal ArticleDOI

Acoustic and Auditory Phonetics

Kimary N. Shahin, +1 more
- 01 Dec 1999 - 
Journal ArticleDOI

Evaluating the Use of Captioned Video Materials in Advanced Foreign Language Learning

TL;DR: Using Russian and ESL as target languages, the data collected strongly support a positive correlation between the presence of captions and increased comprehension of the linguistic content of the video material, suggesting the use of captioning to bridge the gap between the learner's competence in reading and listening.
Book

Acoustic and Auditory Phonetics

TL;DR: This chapter discusses the development of the Acoustic Theory of Speech Production: Deriving Schwa, a Cross-Linguistic Map of Chinese Tones, and its application to Speech Perception.
Proceedings ArticleDOI

Strategies for automatic segmentation of audio data

TL;DR: In this paper, three different segmenting strategies are compared on the same broadcast news test data and it is shown that model-based and metric-based techniques outperform the simpler energy-based algorithms.
Journal ArticleDOI

Television Literacy: Comprehension of Program Content Using Closed Captions for the Deaf

TL;DR: The captioned video provided significantly better comprehension of the script for students who are deaf, suggesting that visual stimuli provide essential information for viewers who is deaf, which improves comprehension of televised script.
Related Papers (5)