scispace - formally typeset
Search or ask a question

Showing papers on "Closed captioning published in 2011"


Patent
02 May 2011
TL;DR: In this paper, a captions and program information extractor monitors a broadcast media signal having captions encoded in the broadcast media signals are extracted and stored in a content database and a moderator accesses the content database to retrieve captions.
Abstract: A method and system for the creation of interactive programming using captions. A caption and program information extractor monitors a broadcast media signal having captions and program information encoded in the broadcast media signal. The captions and program information are extracted and stored in a content database. A moderator accesses the content database to retrieve captions and for a program specified by the program information. The moderator uses the services of a moderator server to generate interactive programming from the captions and the moderator's own comments. The interactive programming is transmitted to a plurality of viewers who interact with the interactive programming by entering viewer comments. The viewer comments are received by the moderator along with additional captions and new interactive programming is generated using the viewer comments, additional captions, and additional moderator commentary.

65 citations


Patent
05 Aug 2011
TL;DR: In this paper, the authors proposed a method to automatically identify media content based on visually capturing a still or video image of media content being presented to a user via another device, which can be further refined by determining location of the user, capturing an audio portion of the media content, date and time of the capture, or profile/behavioral characteristics of user.
Abstract: Automatic identification of media content is at least partially based upon visually capturing a still or video image of media content being presented to a user via another device. The media content can be further refined by determining location of the user, capturing an audio portion of the media content, date and time of the capture, or profile/behavioral characteristics of the user. Identifying the media content can require (1) distinguishing a rectangular illumination the corresponds to a video display; (2) decoding a watermark presented within the displayed image/video; (3) characterizing the presentation sufficiently for determining a particular time stamp or portion of a program; and (4) determining user setting preferences for viewing the program (e.g., close captioning, aspect ratio, language). Thus identified, the media content appropriately formatted can be received for continued presentation on a user interface of the mobile device.

62 citations


Journal ArticleDOI
TL;DR: In this article, an eye-tracking study of captioning for deaf and hard-of-hearing viewers reading different types of captions was carried out by examining eye movement patterns when these viewers were watching clips with verbatim, standard, and edited captions.
Abstract: One of the most frequently recurring themes in captioning is whether captions should be edited or verbatim. The authors report on the results of an eye-tracking study of captioning for deaf and hard of hearing viewers reading different types of captions. By examining eye movement patterns when these viewers were watching clips with verbatim, standard, and edited captions, the authors tested whether the three different caption styles were read differently by the study participants ( N = 40): 9 deaf, 21 hard of hearing, and 10 hearing individuals. Interesting interaction effects for the proportion of dwell time and fixation count were observed. In terms of group differences, deaf participants differed from the other two groups only in the case of verbatim captions. The results are discussed with reference to classical reading studies, audiovisual translation, and a new concept of viewing speed.

60 citations


Journal ArticleDOI
04 Nov 2011
TL;DR: A scheme to enhance video accessibility using a Dynamic Captioning approach, which explores a rich set of technologies including face detection and recognition, visual saliency analysis, text-speech alignment, etc, to help the hearing-impaired audience better recognize the speaking characters.
Abstract: There are more than 66 million people suffering from hearing impairment and this disability brings them difficulty in video content understanding due to the loss of audio information. If the scripts are available, captioning technology can help them in a certain degree by synchronously illustrating the scripts during the playing of videos. However, we show that the existing captioning techniques are far from satisfactory in assisting the hearing-impaired audience to enjoy videos. In this article, we introduce a scheme to enhance video accessibility using a Dynamic Captioning approach, which explores a rich set of technologies including face detection and recognition, visual saliency analysis, text-speech alignment, etc. Different from the existing methods that are categorized as static captioning, dynamic captioning puts scripts at suitable positions to help the hearing-impaired audience better recognize the speaking characters. In addition, it progressively highlights the scripts word-by-word via aligning them with the speech signal and illustrates the variation of voice volume. In this way, the special audience can better track the scripts and perceive the moods that are conveyed by the variation of volume. We implemented the technology on 20 video clips and conducted an in-depth study with 60 real hearing-impaired users. The results demonstrated the effectiveness and usefulness of the video accessibility enhancement scheme.

59 citations


Patent
Elliot Smith1, Victor Szilagyi1
28 Dec 2011
TL;DR: In this article, the authors present a system for identifying and locating related content using natural language processing (NLP) and a dynamic interface that enables both user-interactive and automatic methods of obtaining and displaying related content.
Abstract: Systems and methods for identifying and locating related content using natural language processing are generally disclosed herein. One embodiment includes an HTML5/JavaScript user interface configured to execute scripting commands to perform natural language processing and related content searches, and to provide a dynamic interface that enables both user-interactive and automatic methods of obtaining and displaying related content. The natural language processing may extract one or more context-sensitive key terms of text associated with a set of content. Related content may be located and identified using keyword searches that include the context-sensitive key terms. For example, text associated with video of a first content, such as text originating from subtitles or closed captioning, may be used to perform searches and locate related content such as a video of a second content, or text of a third content.

39 citations


Patent
28 Jun 2011
TL;DR: In this paper, a system, method, and computer program product for automatically analyzing multimedia data audio content are disclosed, where Embodiments receive multimedia data, detect portions having specified audio features, and output a corresponding subset of the multimedia data and generated metadata.
Abstract: A system, method, and computer program product for automatically analyzing multimedia data audio content are disclosed. Embodiments receive multimedia data, detect portions having specified audio features, and output a corresponding subset of the multimedia data and generated metadata. Audio content features including voices, non-voice sounds, and closed captioning, from downloaded or streaming movies or video clips are identified as a human probably would do, but in essentially real time. Particular speakers and the most meaningful content sounds and words and corresponding time-stamps are recognized via database comparison, and may be presented in order of match probability. Embodiments responsively pre-fetch related data, recognize locations, and provide related advertisements. The content features may be also sent to search engines so that further related content may be identified. User feedback and verification may improve the embodiments over time.

39 citations


Proceedings ArticleDOI
28 Mar 2011
TL;DR: A tool that facilitates crowdsourcing correction of speech recognition captioning errors to provide a sustainable method of making videos accessible to people who find it difficult to understand speech through hearing alone is described.
Abstract: In this paper, we describe a tool that facilitates crowdsourcing correction of speech recognition captioning errors to provide a sustainable method of making videos accessible to people who find it difficult to understand speech through hearing alone.

29 citations


Patent
23 Aug 2011
TL;DR: In this article, a content receiver receives an captioning element and positional information regarding segments of a content instance, which can be used with the positional information to locate where the segments stop and/or start.
Abstract: A content receiver receives an captioning element and positional information regarding segments of a content instance The captioning element corresponds to a component of captioning data included in content that can be utilized with the positional information to locate where the segments stop and/or start The content receiver analyzes the content based on the captioning element and the positional information and alters how the content will be presented Such alteration may involve skipping and/or deleting segments, starting/stopping presentation of content other than at the beginning and/or end of the content, altering recording timers, and/or replacing segments with alternative segments In some implementations, the content may be recorded as part of recording multiple content instances received via at least one broadcast from a content provider wherein the multiple content instances are all included in a same frequency band of the broadcast and are all encoded utilizing a same control word

29 citations


Patent
21 Jan 2011
TL;DR: In this article, a computer-controlled system and method for inserting real-time targeted advertising in media content such as a video stream based upon contextual information occurring in the video stream is presented.
Abstract: A computer-controlled system and method for inserting real-time targeted advertising in media content such as a video stream based upon contextual information occurring in the video stream. The system detects and extracts contextual information such as subtitles, closed captions, and tags associated with images in the video stream and then determines whether the contextual information is related to any advertisements in a database of advertisements. The contextual information may be utilized together with other advertising criteria and audience qualifiers to prioritize the advertisements for automatic insertion at the next advertising splice point.

29 citations


Patent
27 Apr 2011
TL;DR: In this article, a content receiver is used to provide supplemental content, such as news content, personal content and advertising content, to a user based on user preference information stored therein, and the formatted supplemental content is transmitted to a content display device.
Abstract: Systems and methods utilize a content receiver to provide supplemental content, such as news content, personal content and advertising content, to a user. Received data is formatted as supplemental content by the content receiver based on user preference information stored therein, and the formatted supplemental content is transmitted to a content display device. The supplemental content is provided to the user in addition or as an alternative to video content, and may replace or supplement closed captioning content. The supplemental content may be translated into another language and/or converted into audio signals utilizing the content receiver. Systems and methods also utilize a content receiver to translate data such as text data into another language. Text data may, in addition or alternatively, be converted into audio signals utilizing the content receiver.

26 citations


Patent
10 Jun 2011
TL;DR: An adaptive workflow system can be used to implement captioning projects, such as projects for creating captions or subtitles for live and non-live broadcasts as discussed by the authors, where workers can repeat words spoken during a broadcast program or other program into a voice recognition system, which outputs text that may be used as captions and subtitles.
Abstract: An adaptive workflow system can be used to implement captioning projects, such as projects for creating captions or subtitles for live and non-live broadcasts. Workers can repeat words spoken during a broadcast program or other program into a voice recognition system, which outputs text that may be used as captions or subtitles. The process of workers repeating these words to create such text can be referred to as respeaking. Respeaking can be used as an effective alternative to more expensive and hard-to-find stenographers for generating captions and subtitles.


Journal ArticleDOI
01 Apr 2011
TL;DR: Considerations for coding and transport of stereoscopic 3-D video, options for dual-channel encoding as well as frame-compatible delivery and an overview of3-D eyewear issues are discussed.
Abstract: This paper discusses considerations for coding and transport of stereoscopic 3-D video, options for dual-channel encoding as well as frame-compatible delivery. A description of the use of digital interfaces for stereoscopic 3-D delivery from set-top boxes (STBs) to displays is included along with an overview of 3-D eyewear issues. Complexities such as rendering captions without introducing depth conflicts and future directions are also discussed.

Proceedings ArticleDOI
07 Nov 2011
TL;DR: A semiotic inspection on Half-Life 2 is performed, seeking to identify which strategies were used to convey information through audio, and how the loss of information in each of them may impact players' experience.
Abstract: Mainstream games usually lack support for accessibility to deaf and hard of hearing people. The popular FPS game Half-Life 2 is an exception, in that it provides well constructed closed captions to players. In this paper, we performed a semiotic inspection on Half-Life 2, seeking to identify which strategies were used to convey information through audio. We also evaluated how the loss of information in each of them may impact players' experience. Our findings reveal that six different strategies are used and how they may compromise player experience.

Journal ArticleDOI
TL;DR: Drawing on a range of Hollywood movies and television shows, a way of thinking about closed captioning is offered to consider captioning as a rhetorical and interpretative practice that warrants further analysis and criticism from scholars in the humanities and social sciences.
Abstract: This article offers a way of thinking about closed captioning that goes beyond quality (narrowly defined in current style guides in terms of visual design) to consider captioning as a rhetorical and interpretative practice that warrants further analysis and criticism from scholars in the humanities and social sciences. A rhetorical perspective recasts quality in terms of how genre, audience, context, and purpose shape the captioning act. Drawing on a range of Hollywood movies and television shows, this article addresses a set of topics that are central to an understanding of the effectiveness, significance, and reception of captions: overcaptioning, undercaptioning, subtitles vs. captions, the manipulation of time, non-speech information, series awareness, and the backchannel.

Patent
22 Jun 2011
TL;DR: In this paper, a system for identification of video content in a video signal is provided via a sound track audio signal, which is processed with filtering, frequency translation, and or non linear transformations to extract voice signals from the sound track channel.
Abstract: A system for identification of video content in a video signal is provided via a sound track audio signal. The audio signal is processed with filtering, frequency translation, and or non linear transformations to extract voice signals from the sound track channel. The extracted voice signals are coupled to a speech recognition system to provide in text form, the words of the video content, which is later compared with a reference library of words or dialog from known video programs or movies. Other attributes of the video signal or transport stream may be combined with closed caption data or closed caption text for identification purposes. Example attributes include DVS/SAP information, time code information, histograms, and or rendered video or pictures.

Journal ArticleDOI
TL;DR: Reading comprehension test- General English Proficiency Test (GEPT) – intermediate- was administered by participants in order to determine the influences of using subtitles on learners’ reading comprehension.
Abstract: There are an increasing number of foreign language teaching techniques that integrate with the latest technology, such as computers, video materials. As the emphasis in multimedia shifts to success for all language learners, educators tend to carry out various techniques to demonstrate benefits. Presenting subtitles aids visual channels to communicate verbal information. The presenter examines whether video English captions improve or impede EFL students’ reading comprehension.Using the instructional videos with English subtitles 1hour every 2 weeks over 10 weeks. Two versions of videos, one with captioning and one without it, were used by two groups randomly selected among freshmen at the university in Taiwan. Reading comprehension test- General English Proficiency Test (GEPT) – intermediate- was administered by participants in order to determine the influences of using subtitles on learners’ reading comprehension.

Patent
25 Aug 2011
TL;DR: In this paper, a media processing device may include a processing component and a viewing context builder operative on the processing component, which can extract context relevant data from the analyzed media content; and build a viewing preference profile from the context relevance data.
Abstract: A media processing device may include a processing component and a viewing context builder operative on the processing component. The viewing context builder may analyze media content comprising an audio stream, a video stream, and/or a closed captioning stream from a selected channel; extract context relevant data from the analyzed media content; and build a viewing preference profile from the context relevant data. Other embodiments are described and claimed.

01 Jan 2011
TL;DR: In this paper, the authors discuss considerations for coding and transport of stereoscopic 3D video, options for dual-channel encoding as well as frame-compatible delivery, and a description of the use of digital interfaces for stereo 3D delivery from set-top boxes (STBs) to displays.
Abstract: This paper discusses considerations for coding and transport of stereoscopic 3-D video, options for dual- channel encoding as well as frame-compatible delivery. A description of the use of digital interfaces for stereoscopic 3-D delivery from set-top boxes (STBs) to displays is included along with an overview of 3-D eyewear issues. Complexities such as rendering captions without introducing depth conflicts and future directions are also discussed.


Patent
23 Dec 2011
TL;DR: In this paper, a system or method for generating subtitles (also known as "closed captioning") of an audio component of a multimedia presentation automatically for one or more stored presentations is described.
Abstract: One embodiment described herein may take the form of a system or method for generating subtitles (also known as "closed captioning") of an audio component of a multimedia presentation automatically for one or more stored presentations. In general, the system or method may access one or more multimedia programs stored on a storage medium, either as an entire program or in portions. Upon retrieval, the system or method may perform an analysis of the audio component of the program and generate a subtitle text file that corresponds to the audio component. In one embodiment, the system or method may perform a speech recognition analysis on the audio component to generate the subtitle text file.

Patent
23 Feb 2011
TL;DR: In this paper, a system for identification of video content in a video signal is provided by a filter bank which provides a real time or near real time frequency analysis of video signal to provide the identification.
Abstract: A system for identification of video content in a video signal is provided by a filter bank which provides a real time or near real time frequency analysis of a video signal to provide the identification. An alternative embodiment for video content identification includes frequency coefficients from one or more video frames along a curve, or from a region of the video frame. Other attributes of the video signal or transport stream may be combined with closed caption data or closed caption text for identification purposes. Example attributes include DVS/SAP information, time code information, histograms, and or rendered video or pictures.

Journal Article
TL;DR: The U.S. judicial system has created a "have and have-not" dichotomy when it comes to persons with disabilities enjoying television as discussed by the authors, which has led to the lack of video description technology.
Abstract: I. INTRODUCTION II. REACHING THE DECISION A. The Effects of the 1996 Telecommunications Act B. Initial Reception to the Video Description Regulations--the Battle Begins III. CONFLICT AND CHANGES A. Why Is This a Problem? B. The Effect of Video Descriptions on the Television Industry C. Showdown: Video Description Versus Closed Captioning D. The Transition to Digital Television's Effect on Video Descriptions IV. SOLUTIONS TO THE CURRENT SITUATION A. Stimulating the Video Description Market B. Federal Regulation Mandating Implementation of Video Description Technology V. CONCLUSION I. INTRODUCTION Many people take for granted the relatively simple action of sitting down at the end of the day and turning on the television. They can relax and let wave after wave of sounds and images wash over them, relieving their stress and tension. Regardless of whether the dial is set to sports or a soap opera, news or nonsense, drama or comedy, television is something that has become part of the fabric of almost every person's life. However, there are a significant number of people in the United States who are unable to enjoy this activity. The U.S. judicial system has created a "have and have-not" dichotomy when it comes to persons with disabilities enjoying television. As a result of the D.C. Circuit's 2002 decision in Motion Picture Association of America, Inc. v. Federal Communications Commission, the FCC is allowed to regulate closed captioning, forcing television manufacturers and broadcasters to implement technology that will allow deaf Americans to enjoy television more fully. (1) In the same decision, the court found that the FCC did not have power to promulgate regulations regarding video descriptions (2) that would allow blind and seeing-impaired Americans to have a more complete television experience, similar to those without a disability. (3) The Survey of Income and Program Participation is a national survey that collects data on a regular basis to identify the percentage of the American population with heating loss or deafness. (4) This survey has found that "1 in 20 Americans are currently deaf or hard of heating. In round numbers, nearly 10,000,000 persons are hard of hearing and close to 1,000,000 are functionally deaf." (5) Americans who suffer from hearing loss or complete deafness have become the "haves" when it comes to the FCC's ability to provide a satisfactory television experience; since 1993, the FCC has taken steps to make sure that closed captioning (6) is available to as many Americans as possible. (7) The ability of the FCC to help those with hearing problems is in stark contrast to its ability to help those with seeing problems through the use of video descriptions. Allowing the FCC to regulate video descriptions would help the 25.2 million Americans who have reported problems seeing, many of whom are unable to see at all. (8) This Note argues that the time has come to take action and increase availability of video descriptions. Part II of this Note examines the court's decision in Motion Picture Association of America. It considers both the views of the visually impaired community and the entertainment industry leading up to the court's decision. Part II further examines the major justifications that the court used in reaching its decision. Part III begins by exploring why the lack of video description technology is a problem. As a result of the decision in Motion Picture Association of America, closed captioning and video description have been placed in juxtaposition to one another. This Section explores the divergence in treatment between the two and whether those differences justify their disparity in treatment under the current regulatory scheme. The Section ends by looking at the changes available for video description technology as a result of the digital transition and how the change affects the ease of implementing the technology. …


Patent
31 Oct 2011
TL;DR: In this article, a system is configured to receive, from a user device, a request for video content and closed captioning content associated with the video content; obtain, based on the request, device information that identifies a first video format that is supported by the user device; and transmit the converted video contents and the converted closed captioned content to the user devices.
Abstract: A system is configured to receive, from a user device, a request for video content and closed captioning content associated with the video content; obtain, based on the request, device information that identifies a first video format that is supported by the user device; obtain the video content and the closed captioning content, where the video content conforms to a second video format, and where the closed captioning content conforms to a text format; convert the video content from the second video format to the first video format and the closed captioning content from the text format to the first video format; and transmit the converted video content and the converted closed captioning to the user device, where the converted video content and the converted closed captioning content enable the user device to play the converted video content and the converted closed captioning content without modifying the user device.

Proceedings ArticleDOI
01 Nov 2011
TL;DR: The development of a closed captioning system for Filipino TV news programs is discussed and the highest average recognition accuracy achieved in developing for the test set was 57.36% using flat start context-dependent models and a language model with absolute discounting applied.
Abstract: In this paper, the development of a closed captioning system for Filipino TV news programs is discussed. The researchers tested the system for offline captioning and evaluated the performance of the system based on word error rate (WER). Carnegie Mellon University's open-source speech recognition system, Sphinx-III, was used as the primary training and recognition engine. A Filipino News Corpus was built consisting of speech and text data obtained from Filipino news videos. Training and testing sets were generated and from this, different training and decoding parameters of Sphinx were evaluated. Using the word error rate (WER) computation, the highest average recognition accuracy achieved in developing for the test set was 57.36% using flat start context-dependent models and a language model with absolute discounting applied. This project is a first step towards establishing the baseline accuracy for future development of the system.

Journal ArticleDOI
TL;DR: The digital divide: the Australian Government's role in addressing 'ability' as discussed by the authors introduces multi-user environments into Australia's virtual classrooms: a value proposition for Australia's National Broadband Network Mandy Salomon
Abstract: Addressing the telecommunications policy gaps Rosemary Sinclair Australia’s missing accessible information and communications technology procurement policy Wayne Hawkins Infrastructure gaps must be filled to make e-health a reality Andrew Pesce The digital divide: the Australian Government's role in addressing 'ability' Dave Lee Introducing multi-user environments into Australia’s virtual classrooms: a value proposition for Australia’s National Broadband Network Mandy Salomon

01 Jan 2011
TL;DR: An eye-tracking study of captioning for deaf and hard of hearing viewers reading different types of captions found that deaf participants differed from the other two groups only in the case of verbatim captions.

Patent
11 Nov 2011
TL;DR: In this article, the closed captioning data is parsed using metadata to identify portions of the video stream to skip during presentation, and/or to output to a user, where the portions of video stream that are to be skipped are filtered from the video streams, and the filtered video stream is presented to the user.
Abstract: Various embodiments of apparatus and/or methods are described for skipping and/or filtering content from a video stream using closed captioning data associated with the video stream. The closed captioning data is parsed using metadata to identify portions of the video stream to skip during presentation, and/or to identify portions of the video stream to output to a user. The portions of the video stream that are to be skipped are filtered from the video stream, and the filtered video stream is presented to a user.

Patent
28 Jul 2011
TL;DR: In this paper, a host captioning device converts the presenter's speech to text and communicates the text to and for presentation by an audience member's client device in real time as text.
Abstract: There is disclosed one or more methods, systems and components therefor for broadcasting captions of a presenter's speech to audience members to accompany the live viewing of the presentation. A host captioning device converts the presenter's speech to text and communicates the text to and for presentation by an audience member's client device. The communication session between the host captioning device and the client device is established by an invitation request from the host captioning device in response to a registration request from the client device. The captioning information may be communicated in real time as text. The host captioning device either connects to a network or provides one itself, thereby serving as an access point for the client devices.