scispace - formally typeset
Search or ask a question

Showing papers on "Closed captioning published in 2010"


Journal Article
TL;DR: Investigation of the effects of captioning during video-based listening activities revealed that learners used captions to increase their attention, improve processing, reinforce previous knowledge, and analyze language, and reported using captions as a crutch.
Abstract: This study investigated the effects of captioning during video-based listening activities. Second- and fourth-year learners of Arabic, Chinese, Spanish, and Russian watched three short videos with and without captioning in randomized order. Spanish learners had two additional groups: one watched the videos twice with no captioning, and another watched them twice with captioning. After the second showing of the video, learners took comprehension and vocabulary tests based on the video. Twenty-six learners participated in interviews following the actual experiment. They were asked about their general reactions to the videos (captioned and noncaptioned). Results from t-tests and two-way ANOVAs indicated that captioning was more effective than no captioning. Captioning during the first showing of the videos was more effective for performance on aural vocabulary tests. For Spanish and Russian, captioning first was generally more effective than captioning second; while for Arabic and Chinese, there was a trend toward captioning second being more effective. The interview data revealed that learners used captions to increase their attention, improve processing, reinforce previous knowledge, and analyze language. Learners also reported using captions as a crutch.

298 citations


Proceedings ArticleDOI
25 Oct 2010
TL;DR: A video accessibility enhancement scheme with a Dynamic Captioning approach, which explores a rich set of technologies including face detection and recognition, visual saliency analysis, text-speech alignment, etc, to help hearing impaired audience better recognize the speaking characters.
Abstract: There are more than 66 million people su®ering from hearing impairment and this disability brings them di±culty in the video content understanding due to the loss of audio information. If scripts are available, captioning technology can help them in a certain degree by synchronously illustrating the scripts during the playing of videos. However, we show that the existing captioning techniques are far from satisfactory in assisting hearing impaired audience to enjoy videos. In this paper, we introduce a video accessibility enhancement scheme with a Dynamic Captioning approach, which explores a rich set of technologies including face detection and recognition, visual saliency analysis, text-speech alignment, etc. Different from the existing methods that are categorized as static captioning here, dynamic captioning puts scripts at suitable positions to help hearing impaired audience better recognize the speaking characters. In addition, it progressively highlights the scripts word-by-word via aligning them with the speech signal and illustrates the variation of voice volume. In this way, the special audience can better track the scripts and perceive the moods that are conveyed by the variation of volume. We implement the technology on 20 video clips and conduct an in-depth study with 60 real hearing impaired users, and results have demonstrated the effectiveness and usefulness of the video accessibility enhancement scheme.

98 citations


Patent
15 Jan 2010
TL;DR: In this paper, the authors present a system for processing 3D or 3D pseudo-3D programming with closed caption (CC) information that includes caption data and a location identifier that specifies a location for the caption data within the 3D programming.
Abstract: Systems and methods are presented for processing three-dimensional (3D or 3-D) or pseudo-3D programming. The programming includes closed caption (CC) information that includes caption data and a location identifier that specifies a location for the caption data within the 3D programming. The programming information is processed to render the caption data at the specified location and to present the programming on the display. By encoding location identification information into the three-dimensional programming, a high level of configurability can be provided and the 3D experience can be preserved while captions are displayed.

57 citations


Patent
22 Sep 2010
TL;DR: In this article, a synchronization process between captioning data and corresponding metatags and the associated media file parses the media file, correlates the caption information and/or metatag with segments of the media, and provides a capability for textual search and selection of particular segments.
Abstract: A synchronization process between captioning data and/or corresponding metatags and the associated media file parses the media file, correlates the caption information and/or metatags with segments of the media file, and provides a capability for textual search and selection of particular segments. A time-synchronized version of the captions is created that is synchronized to the moment that the speech is uttered in the recorded media. The caption data is leveraged to enable search engines to index not merely the title of a video, but the entirety of what was said during the video as well as any associated metatags relating to contents of the video. Further, because the entire media file is indexed, a search can request a particular scene or occurrence within the event recorded by the media file, and the exact moment within the media relevant to the search can be accessed and played for the requester.

55 citations


Journal ArticleDOI
TL;DR: This paper will demonstrate how the processes and practices associated with CC and audio description, in their current form, violate some of the main principles of universal design and are thus not such good examples of it and introduce an alternative process and set of practices.
Abstract: To abide by the tenets of universal design theory, the design of a product or service needs not only to consider the inclusion of as many potential users and uses as possible but also to do so from conception. Control over the creation and adaptation of the design should, therefore, fall under the purview of the original designer. Closed captioning (CC) has always been touted as an excellent example of a design or electronic curb cut because it is a system designed for people who are deaf or hard of hearing, yet is used by many others for access to television in noisy environments such as gyms or pubs, or to learn a second language. Audio description is poised to have a similar image. In this paper, we will demonstrate how the processes and practices associated with CC and audio description, in their current form, violate some of the main principles of universal design and are thus not such good examples of it. In addition, we will introduce an alternative process and set of practices through which direct...

45 citations


Patent
13 Sep 2010
TL;DR: In this paper, the N-gram analysis is used to compare each word of a closed-captioned text associated with the multimedia file with words generated by an automated speech recognition (ASR) analysis of the audio of a multimedia file to create an accurate, time-based metadata file in which each closed captioned word is associated with a respective point on a timeline corresponding to the same point in time on the timeline in which the word is actually spoken in the audio and occurs within the video.
Abstract: Method, systems, and computer program products for synchronizing text with audio in a multimedia file, wherein the multimedia file is defined by a timeline having a start point and end point and respective points in time therebetween, wherein an N-gram analysis is used to compare each word of a closed-captioned text associated with the multimedia file with words generated by an automated speech recognition (ASR) analysis of the audio of the multimedia file to create an accurate, time-based metadata file in which each closed-captioned word is associated with a respective point on the timeline corresponding to the same point in time on the timeline in which the word is actually spoken in the audio and occurs within the video.

44 citations


Proceedings ArticleDOI
21 Sep 2010
TL;DR: A set of temporal transformations for multimedia documents that allow end-users to create and share personalized timed-text comments on third party videos, and a predictive timing model for synchronizing unstructured comments with specific events within a video(s).
Abstract: This paper introduces a multimedia document model that can structure community comments about media. In particular, we describe a set of temporal transformations for multimedia documents that allow end-users to create and share personalized timed-text comments on third party videos. The benefit over current approaches lays in the usage of a rich captioning format that is not embedded into a specific video encoding format. Using as example a Web-based video annotation tool, this paper describes the possibility of merging video clips from different video providers into a logical unit to be captioned, and tailoring the annotations to specific friends or family members. In addition, the described transformations allow for selective viewing and navigation through temporal links, based on end-users' comments. We also report on a predictive timing model for synchronizing unstructured comments with specific events within a video(s). The contributions described in this paper bring significant implications to be considered in the analysis of rich media social networking sites and the design of next generation video annotation tools.

33 citations


Patent
01 Dec 2010
TL;DR: In this article, a method and system for monitoring video assets provided by a multimedia content distribution network includes testing closed captions provided in output video signals, where a video and audio portion of a video signal are acquired during a time period that a closed caption occurs, and a degree of matching between the strings is evaluated based on a threshold to determine when a caption error occurs.
Abstract: A method and system for monitoring video assets provided by a multimedia content distribution network includes testing closed captions provided in output video signals. A video and audio portion of a video signal are acquired during a time period that a closed caption occurs. A first text string is extracted from a text portion of a video image, while a second text string is extracted from speech content in the audio portion. A degree of matching between the strings is evaluated based on a threshold to determine when a caption error occurs. Various operations may be performed when the caption error occurs, including logging caption error data and sending notifications of the caption error.

28 citations


Patent
27 Sep 2010
TL;DR: In this article, the authors present a method for converting speech to text based on analyzing multimedia content to determine the presence of closed captioning data and then indexing the closed captioned data as associated with the multimedia content.
Abstract: Methods and systems for converting speech to text are disclosed. One method includes analyzing multimedia content to determine the presence of closed captioning data. The method includes, upon detecting closed captioning data, indexing the closed captioning data as associated with the multimedia content. The method also includes, upon failure to detect closed captioning data in the multimedia content, extracting audio data from multimedia content, the audio data including speech data, performing a plurality of speech to text conversions on the speech data to create a plurality of transcripts of the speech data, selecting text from one or more of the plurality of transcripts to form an amalgamated transcript, and indexing the amalgamated transcript as associated with the multimedia content.

24 citations


Patent
20 May 2010
TL;DR: In this paper, a system for identification of video content in a video signal is provided via the use of DVS or SAP information or other data in the video signal or transport stream such as MPEG-x.
Abstract: A system for identification of video content in a video signal is provided via the use of DVS or SAP information or other data in a video signal or transport stream such as MPEG-x. Sampling of the received video signal or transport stream allows capture of dialog from a movie or video program. The captured dialog is compared to a reference library or database for identification purposes. Other attributes of the video signal or transport stream may be combined with closed caption data or closed caption text for identification purposes. Example attributes include DVS/SAP information, time code information, histograms, and or rendered video or pictures.

23 citations


Book
04 Apr 2010
TL;DR: A practical guide to writing news stories, publishing platforms for advanced multimedia story-telling and getting a job as a multimedia journalist.
Abstract: Multimedia Journalism: A Practical Guide offers clear advice on working across multiple media platforms and includes guides to creating and using video, audio, text and pictures. It contains all the essentials of good practice and is supported by an Companion Website at: www.multimedia-journalism.co.uk which demonstrates how to apply the skills covered in the book, gives examples of good and bad practice, and keeps the material up-to-date and in line with new hardware, software, methods of working and legislation. The book is fully cross-referenced and interlinked with the website, which offers the chance to test your learning and send in questions for industry experts to answer in their masterclasses. Split into three levels – getting started, building proficiency and professional standards – this book builds on the knowledge attained in each part, and ensures that skills are introduced one step at a time until professional competency is achieved. This three stage structure means it can be used from initial to advanced level to learn the key skill areas of video, audio, text, and pictures and how to combine them to create multimedia packages. Skills covered include: writing news reports, features, email bulletins and blogs building a website using a content management system measuring the success of your website or blog shooting, cropping, editing and captioning pictures recording, editing and publishing audio reports and podcasts shooting, editing and streaming video and creating effective packages creating breaking news tickers and using Twitter using and encouraging user generated content interviewing and conducting advanced online research subediting, proofreading and headlining, including search engine optimisation geo-tagging, geo-coding and geo-broadcasting. Website access is free when the book or ebook is purchased. The registration key is on the final page of all editions of the book and ebook and is also on the inside front cover of the paperback edition.

Thomas Steiner1
09 Nov 2010
TL;DR: The final result is a deep-linkable RDF description of the video, and a "scroll-along" view of theVideo as an example of video visualization formats.
Abstract: SemWebVid1 is an online Ajax application that allows for the automatic generation of Resource Description Framework (RDF) video descriptions. These descriptions are based on two pillars: first, on a combination of user-generated metadata such as title, summary, and tags; and second, on closed captions which can be user-generated, or be auto-generated via speech recognition. The plaintext contents of both pillars are being analyzed using multiple Natural Language Processing (NLP) Web services in parallel whose results are then merged and where possible matched back to concepts in the sense of Linking Open Data (LOD). The final result is a deep-linkable RDF description of the video, and a "scroll-along" view of the video as an example of video visualization formats.

Patent
27 Oct 2010
TL;DR: In this article, closed captioning data can be packaged in IP packets and transmitted over an HDMI interface to enable closed-captioning data to be rendered at a television set or other HDMI sink closer to the TV set rather than at a source so as to more closely match the capabilities of the display device.
Abstract: In certain implementations, closed captioning data can be packaged in IP packets and transmitted over an HDMI interface to permit closed captioning data to be rendered at a television set or other HDMI sink closer to the TV set rather than at a source so as to more closely match the capabilities of the display device. This abstract is not to be considered limiting, since other embodiments may deviate from the features described in this abstract.

Book ChapterDOI
14 Jul 2010
TL;DR: How Synote, a web application for annotating and captioning multimedia, can enhance learning for all students and how, finding ways to improve the accuracy and readability of automatic captioning, can encourage its widespread adoption and so greatly benefit disabled students is discussed.
Abstract: Although manual transcription and captioning can increase the accessibility of multimedia for deaf students it is rarely provided in educational contexts in the UK due to the cost and shortage of highly skilled and trained stenographers. Speech recognition has the potential to reduce the cost and increase the availability of captioning if it could satisfy accuracy and readability requirements. This paper discusses how Synote, a web application for annotating and captioning multimedia, can enhance learning for all students and how, finding ways to improve the accuracy and readability of automatic captioning, can encourage its widespread adoption and so greatly benefit disabled students.

Proceedings ArticleDOI
29 Mar 2010
TL;DR: This paper proposes a solution to automatically generate captions (including place name, keywords and summary) from the web content based on image location information via synergetic techniques from Geographic Information System, Web IR and multi-document summarisation.
Abstract: Increasing quantities of images are indexed by GPS coordinates. However, it is difficult to search within such pictures. In this paper, we propose a solution to automatically generate captions (including place name, keywords and summary) from the web content based on image location information. The richer descriptions have great potential to help image organisation, indexing and search. The solution is realised through the synergetic techniques from Geographic Information System, Web IR and multi-document summarisation.

Book ChapterDOI
06 Sep 2010
TL;DR: Two different approaches to automatic captioning of geo-tagged images by summarizing multiple web-documents that contain information related to an image's location are presented: a graph-based and a statistical-based approach.
Abstract: This paper presents two different approaches to automatic captioning of geo-tagged images by summarizing multiple web-documents that contain information related to an image's location: a graph-based and a statistical-based approach. The graph-based method uses text cohesion techniques to identify information relevant to a location. The statistical-based technique relies on different word or noun phrases frequency counting for identifying pieces of information relevant to a location. Our results show that summaries generated using these two approaches lead indeed to higher ROUGE scores than n-gram language models reported in previous work.

Book ChapterDOI
14 Jul 2010
TL;DR: An enhanced captioning system that uses graphical elements, speaker names and caption placement techniques for speaker identification was developed and results indicate that viewers are distracted when the caption follows the character on-screen regardless of whether this should assist in identifying who is speaking.
Abstract: The current method for speaker identification in closed captioning on television is ineffective and difficult in situations with multiple speakers, offscreen speakers, or narration An enhanced captioning system that uses graphical elements (eg, avatar and colour), speaker names and caption placement techniques for speaker identification was developed A comparison between this system and conventional closed captions was carried out deaf and hard-ofhearing participants Results indicate that viewers are distracted when the caption follows the character on-screen regardless of whether this should assist in identifying who is speaking Using the speaker's name for speaker identification is useful for viewers who are hard of hearing but not for deaf viewers There was no significant difference in understanding, distraction, or preference for the avatar with the coloured border component

Patent
19 Mar 2010
TL;DR: In this article, a computer server receives, from a remote device, a request for closed caption data, the request specifying media content for which the closed- caption data is to be provided.
Abstract: A computer server receives, from a remote device, a request for closed caption data, the request specifying media content for which the closed caption data is to be provided. In the server, it is determined whether closed-captioned data for the specified media content is available, and if closed-captioned data for the media content is available, a sequence of records associated with the media content is provided to the device from the server.

Patent
29 Mar 2010
TL;DR: In this paper, a system for identification of video content in a video signal is provided via the use of closed caption or other data in MPEG-x, such as time code information, histograms, and rendered video or pictures.
Abstract: A system for identification of video content in a video signal is provided via the use of closed caption or other data in a video signal or transport stream such as MPEG-x. Sampling of the received video signal or transport stream allows capture of dialog from a movie or video program. The captured dialog is compared to a reference library or database for identification purposes. Other attributes of the video signal or transport stream may be combined with closed caption data or closed caption text for identification purposes. Example attributes include time code information, histograms, and or rendered video or pictures.

Proceedings Article
22 May 2010
TL;DR: In this article, the authors examine some of the major problems linked to the task of designing appropriate multilingual e-learning environments for deaf learners (DL) and present and discuss ongoing research aimed at overcoming these limitations.
Abstract: This paper examines some of the major problems linked to the task of designing appropriate multilingual e-learning environments for deaf learners (DL). Due to their hearing disability most DL experience dramatic difficulties in acquiring appropriate literacy skills. E-learning tools could in principle be very useful for facilitating access to web based knowledge and promoting literacy development in DL. However, designing appropriate e-learning environments for DL is a complex task especially because of the different linguistic background and experience DL may have, and of the multimodal language resources that need to be provided and integrated (e.g. language produced in the visual-gestural or signed modality, in written texts, closed captioning for vocal language information). The purpose of this paper is twofold: (1) describe and discuss issues we believe need to be addressed, focusing on the limitations that appear to characterize several e learning platforms that have been proposed for DL; (2) present and discuss ongoing research aimed at overcoming these limitations.

Proceedings ArticleDOI
26 Oct 2010
TL;DR: The feasibility of using text available from video content to obtain high quality keywords suitable for matching advertisements, using statistical and generative methods to identify dominant terms in the source text is studied.
Abstract: With the proliferation of online distribution methods for videos, content owners require easier and more effective methods for monetization through advertising. Matching advertisements with related content has a significant impact on the effectiveness of the ads, but current methods for selecting relevant advertising keywords for videos are limited by reliance on manually supplied metadata. In this paper we study the feasibility of using text available from video content to obtain high quality keywords suitable for matching advertisements. In particular, we tap into three sources of text for ad keyword generation: production scripts, closed captioning tracks, and speech-to-text transcripts. We address several challenges associated with using such data. To overcome the high error rates prevalent in automatic speech recognition and the lack of an explicit structure to provide hints about which keywords are most relevant, we use statistical and generative methods to identify dominant terms in the source text. To overcome the sparsity of the data and resulting vocabulary mismatches between source text and the advertiser's chosen keywords, these terms are then expanded into a set of related keywords using related term mining methods. Our evaluations present a comprehensive analysis of the relative performance for these methods across a range of videos, including professionally produced films and popular videos from YouTube.

Book ChapterDOI
06 Sep 2010
TL;DR: The research project is targeted at incorporation of speech technologies into the CTV environment and developed a fully automatic captioning pilot system making the broadcasting of Parliamentary meetings of the Chamber of Deputies accessible to the hearing impaired viewers.
Abstract: In the paper we introduce the on-line captioning system developed by our teams and used by the Czech Television (CTV), the public service broadcaster in the Czech Republic The research project is targeted at incorporation of speech technologies into the CTV environment One of the key missions is the development of captioning system supporting captioning of a "live" acoustic track It can be either the real audio stream or the audio stream produced by a shadow speaker Another key mission is to develop software tools and techniques usable for training the shadow speakers During the initial phases of the project we concluded that the broadcasting of the Parliamentary meetings of the Chamber of Deputies fulfills the necessary conditions that enable it to be captioned without the aid of the shadow speaker We developed a fully automatic captioning pilot system making the broadcasting of Parliamentary meetings of the Chamber of Deputies accessible to the hearing impaired viewers The pilot run enabled us and our partners in the Czech TV to develop and evaluate the complete captioning infrastructure and collect, review and possibly implement opinions and suggestions of the targeted audience This paper presents our experience gathered during first years of the project to the public audience

Patent
08 Mar 2010
TL;DR: In this article, phone captioning is used to facilitate communication through the use of traditional phone or VOIP or internet telephone system between people of hearing impaired and those who can hear.
Abstract: IP text relay or phone captioning is described herein, to facilitate communication through the use of traditional phone or VOIP or internet telephone system between people of hearing impaired and those who can hear. This service and device will enable users to communicate with users of hearing via assistance of an operator who will transcribe the call, while also receiving the Caller ID of the calling party and not the relay center.

Patent
19 Oct 2010
TL;DR: In this paper, an automated closed captioning, captioning or subtitle generation system that automatically generates the captioning text from the audio signal in a submitted online video and then allows the user to type in any corrections after which it adds the captioned text to the video allowing users to enable or disable captioning as needed.
Abstract: An automated closed captioning, captioning, or subtitle generation system that automatically generates the captioning text from the audio signal in a submitted online video and then allows the user to type in any corrections after which it adds the captioning text to the video allowing users to enable the captioning as needed. The user text review and correction step allows the text prediction model to accumulate additional corrected data with each use thereby improving the accuracy of the text generation over time and use of the system.

Patent
17 Aug 2010
TL;DR: Closed caption data for video programs, such as television programs, may be used to implement a video search as discussed by the authors, where a device may perform a search to obtain video programs that are relevant to the search.
Abstract: Closed caption data for video programs, such as television programs, may be used to implement a video search. In one implementation, a device may perform a search to obtain video programs that are relevant to the search. The search may be performed using an index generated from closed caption data of video programs. The device may additionally present the video programs that are relevant to the search as a matrix of reduced-in-size images sampled from the video programs that are relevant to the search query. The images may be sampled from the video programs near a position in the video programs corresponding to the positions at which the search query is relevant to the video program.


Patent
30 Jun 2010
TL;DR: In this article, a system for identification of video content in a video signal via a sound track audio signal is provided via filtering and non linear transformations to extract voice signals from the sound track channel.
Abstract: A system for identification of video content in a video signal is provided via a sound track audio signal. The audio signal is processed with filtering and non linear transformations to extract voice signals from the sound track channel. The extracted voice signals are coupled to a speech recognition system to provide in text form, the words of the video content, which is later compared with a reference library of words or dialog from known video programs or movies. Other attributes of the video signal or transport stream may be combined with closed caption data or closed caption text for identification purposes. Example attributes include DVS/SAP information, time code information, histograms, and or rendered video or pictures.

Proceedings ArticleDOI
01 Nov 2010
TL;DR: An efficient training scheme of acoustic and language models, which does not require faithful transcripts and thus is scalable to enormous data, is realized by exploiting official records made by human stenographers.
Abstract: Applications of automatic speech recognition (ASR) have been extended to a variety of tasks and domains, including spontaneous human-human speech. We have developed an ASR system for the Japanese Parliament (Diet), which is deployed this year. By exploiting official records made by human stenographers, we have realized an efficient training scheme of acoustic and language models, which does not require faithful transcripts and thus is scalable to enormous data. Evaluation results of the semi-automated model update are presented. We are also working on an ASR system for classroom lectures, which is intended for assisting hearing impaired students. As the classroom lectures in universities are very technical, efficient adaptation methods of acoustic and language models are investigated. A trial of realtime captioning for a hearing impaired student in our university is reported.

Patent
19 May 2010
TL;DR: In this paper, an apparatus, head end controller, server, and method for automatically selecting a user's preferred language in Video-on-Demand (VOD), cable and broadcast TV, and other communication and entertainment systems is presented.
Abstract: An apparatus, Head End controller, server, and method for automatically selecting a user's preferred language in Video-on-Demand (VOD), cable and broadcast TV, and other communication and entertainment systems. The user provides language preferences for audio, captioning, and textual menus to a service provider. The language preferences are stored in a user profile database. Thereafter, when the service provider receives a request from the user to view audiovisual content, the language preferences are retrieved from the database and are used by a Video Pump and a Catalog Player to automatically send the requested content and menus to the user in the user's preferred language.

Proceedings Article
01 Aug 2010
TL;DR: An initiative based in B2B recommendation for the exchange, production, and broadcast the close captioning info.
Abstract: The organization and structure of the motion picture industry has three major divisions: production, distribution, and exhibition or diffusion. The distribution are made at least with two soundtracks: “original” and “natural” the first soundtrack carry the audio dialogs while the second carry the rest of the audio, both are mixed in the play-out room. Subtitling is used extensively by broadcasters both for foreign-language subtitling and as an access service to help people with a disability access television programmes, web, IPTV, Mobile TV, etc. Broadcasters have to subtitling each picture to their language through subtitling company. Multiple subtitling files are generated for each picture in order to translate information in all the languages, even several subtitling files for only one language. In this paper, we present an initiative based in B2B recommendation for the exchange, production, and broadcast the close captioning info. We propose a subtitling exchange workflow based mainly in the use of “natural” and “original” soundtracks' classification for subtitling exchange, we use two separate subtitle data type: “Natural” symbolize audio as: music, smile, laughter, knocking on the door, etc., while “Original” is the dialog subtitle. Comparing this new workflow with the actual we get save time and work.