scispace - formally typeset
Search or ask a question

Showing papers on "Closed captioning published in 2006"


Patent
28 Feb 2006
TL;DR: In this article, a method for blocking scenes with objectionable content comprises receiving incoming content, namely a scene of a program, and using closed captioning information, a determination is made if the scene of the program includes objectionable content, and if so, blocking the scene from being displayed.
Abstract: According to one embodiment, a method for blocking scenes with objectionable content comprises receiving incoming content, namely a scene of a program. Thereafter, using closed captioning information, a determination is made if the scene of the program includes objectionable content, and if so, blocking the scene from being displayed.

74 citations


Patent
28 Feb 2006
TL;DR: In this article, the authors present a system and methods for integrated media monitoring, which enables users to analyze how a product or service is being advertised or otherwise conveyed to the general public.
Abstract: Systems and methods for integrated media monitoring are disclosed. The present invention enables users to analyze how a product or service is being advertised or otherwise conveyed to the general public. Via strategically placed servers, the present invention captures multiple types and sources of media for storage and analysis. Analysis includes both closed captioning analysis and human monitoring. Media search parameters are received over a network and a near real-time hit list of occurrences of the parameters are produced and presented to a requesting user. Options for previewing and purchasing matching media segments are presented, along with corresponding reports and coverage analyses. Reports indicate the effectiveness of advertising, the tonality of editorials, and other information useful to a user looking to understand how a product or service is being conveyed to the public via the media.

63 citations


Patent
01 Dec 2006
TL;DR: In this paper, a multimedia server distributes closed captioning over a network to a client device running a media player that does not support standardized closed-captioning, such as CEA-608-B or CEA 708-B, Advanced Television Systems Committee ATSC A/53 or the Society of Cable Telecommunications Engineers SCTE 20 and/orSCTE 21.
Abstract: A multimedia server distributes closed captioning over a network to a client device running a media player that does not support standardized closed captioning. The multimedia server receives a media stream including closed captioning that is encoded according to a closed captioning standard such as Consumer Electronics Association CEA-608-B or CEA 708-B, Advanced Television Systems Committee ATSC A/53 or the Society of Cable Telecommunications Engineers SCTE 20 and/or SCTE 21. The multimedia server transcodes the closed captioning into a format that is usable by the media player and transmits the transcoded closed captioning to the client device over the network so that the media player can render the closed captioning synchronously with programming content included in the media stream.

61 citations


Patent
25 Jul 2006
TL;DR: In this article, a decoder device decodes the closed captioning information to determine the position of the speaker within the video data, and the time index to correlate the captioning text and positioning information to a specific frame of video data.
Abstract: Closed captioning information is provided regarding the location of a speaker, and when the text is spoken. An audio/video signal includes a video data and the closed captioning information. The closed captioning information includes a time index, a closed captioning text, and positioning information. The positioning information indicates a position within a frame of the video data, and is associated with the closed captioning text for a given time index. The position corresponds to the speaker who is speaking the associated closed captioning text. A decoder device decodes the closed captioning information to determine the position of the speaker within the video data, and the time index to correlate the closed captioning text and positioning information to a specific frame of video data. The video data is preferably scaled to provide a less than full screen video. The scaled video is appropriately positioned on a display screen and talk bubbles, which provide a visual link between the closed captioning text and the speaker, are preferably displayed off the scaled video. Alternatively, the video is not scaled and the talk bubbles are superimposed on the fall screen video in a blended fashion.

44 citations


Proceedings Article
01 Sep 2006
TL;DR: A method combining a linguistic analysis of the imperfect transcripts and a dynamic synchronization of these transcripts inside the search algorithm to improve the performance of an automatic speech recognition (ASR) system is proposed.
Abstract: In many cases, textual information can be associated with speech signals such as movie subtitles, theater scenarios, broadcast news summaries etc. This information could be considered as approximated transcripts and corresponds rarely to the exact word utterances. The goal of this work is to use this kind of information to improve the performance of an automatic speech recognition (ASR) system. Multiple applications are possible: to follow a play with closed caption aligned to the voice signal (while respecting to performer variations) to help deaf people, to watch a movie in another language using aligned and corrected closed captions, etc. We propose in this paper a method combining a linguistic analysis of the imperfect transcripts and a dynamic synchronization of these transcripts inside the search algorithm. The proposed technique is based on language model adaptation and on-line synchronization of the search algorithm. Experiments are carried out on an extract of the ESTER evaluation campaign [4] database, using the LIA Broadcast News system. The results show that the transcript-driven system outperforms significantly both the original recognizer and the imperfect transcript itself.

40 citations


Journal ArticleDOI
TL;DR: This paper describes the development of a system that enables editors to correct errors in the captions as they are created by Automatic Speech Recognition.
Abstract: Lectures can be digitally recorded and replayed to provide multimedia revision material for students who attended the class and a substitute learning experience for students unable to attend. Deaf and hard of hearing people can find it difficult to follow speech through hearing alone or to take notes while they are lip-reading or watching a sign-language interpreter. Notetakers can only summarise what is being said while qualified sign language interpreters with a good understanding of the relevant higher education subject content are in very scarce supply. Synchronising the speech with text captions can ensure deaf students are not disadvantaged and assist all learners to search for relevant specific parts of the multimedia recording by means of the synchronised text. Real time stenography transcription is not normally available in UK higher education because of the shortage of stenographers wishing to work in universities. Captions are time consuming and expensive to create by hand and while Automatic Speech Recognition can be used to provide real time captioning directly from lecturers’ speech in classrooms it has proved difficult to obtain accuracy comparable to stenography. This paper describes the development of a system that enables editors to correct errors in the captions as they are created by Automatic Speech Recognition.

38 citations


Book ChapterDOI
11 Jul 2006
TL;DR: The development of a system that enables editors to correct errors in the captions as they are created by Automatic Speech Recognition is described.
Abstract: Deaf and hard of hearing people can find it difficult to follow speech through hearing alone or to take notes when lip-reading or watching a sign-language interpreter. Notetakers summarise what is being said while qualified sign language interpreters with a good understanding of the relevant higher education subject content are in very scarce supply. Real time captioning/transcription is not normally available in UK higher education because of the shortage of real time stenographers. Lectures can be digitally recorded and replayed to provide multimedia revision material for students who attended the class and a substitute learning experience for students unable to attend. Automatic Speech Recognition can provide real time captioning directly from lecturers' speech in classrooms but it is difficult to obtain accuracy comparable to stenography. This paper describes the development of a system that enables editors to correct errors in the captions as they are created by Automatic Speech Recognition

37 citations


Patent
30 Oct 2006
TL;DR: In this paper, a system and method that integrates automated voice recognition technology and speech-to-text technology with automated translation and closed captioning technology to provide translations of live or real-time television content is disclosed.
Abstract: A system and method that integrates automated voice recognition technology and speech-to-text technology with automated translation and closed captioning technology to provide translations of “live” or “real-time” television content is disclosed. It converts speech to text, translates the converted text to other languages, and provides captions through a single device that may be installed at the broadcast facility. The device accepts broadcast quality audio, recognizes the speaker's voice, converts the audio to text, translates the text, processes the text for multiple caption outputs, and then sends multiple text streams out to caption encoders and/or other devices in the proper format. Because it automates the process, it dramatically reduces the cost and time traditionally required to package television programs for broadcast into foreign or multi-language U.S. markets.

35 citations


Proceedings Article
01 Jan 2006
TL;DR: This paper describes the solutions to the problems and the implementation of a live captioning system based on the CRIM speech recognizer and reports results from field deployment in several projects.
Abstract: Growing needs for French closed-captioning of live TV broadcasts in Canada cannot be met only with stenography-based technology because of a chronic shortage of skilled stenographers. Using speech recognition for live closed-captioning, however, requires several specific problems to be solved, such as the need for low-latency real-time recognition, remote operation, automated model updates, and collaborative work. In this paper we describe our solutions to these problems and the implementation of a live captioning system based on the CRIM speech recognizer. We report results from field deployment in several projects. The oldest in operation has been broadcasting real-time closed-captions for more than 2 years. Index Terms: speech recognition, closed-captioning, model adaptation.

35 citations


Journal Article
TL;DR: In this article, the authors describe the development of a system that enables editors to correct errors in the captions as they are created by Automatic Speech Recognition, which can provide real time captioning directly from lecturers' speech in classrooms.
Abstract: Deaf and hard of hearing people can find it difficult to follow speech through hearing alone or to take notes when lip-reading or watching a sign-language interpreter. Notetakers summarise what is being said while qualified sign language interpreters with a good understanding of the relevant higher education subject content are in very scarce supply. Real time captioning/transcription is not normally available in UK higher education because of the shortage of real time stenographers. Lectures can be digitally recorded and replayed to provide multimedia revision material for students who attended the class and a substitute learning experience for students unable to attend. Automatic Speech Recognition can provide real time captioning directly from lecturers' speech in classrooms but it is difficult to obtain accuracy comparable to stenography. This paper describes the development of a system that enables editors to correct errors in the captions as they are created by Automatic Speech Recognition.

34 citations


Patent
28 Feb 2006
TL;DR: In this paper, a method for integrated media monitoring is disclosed, wherein multiple forms of media are monitored and searched according to user defined criteria, and options for previewing and purchasing matching media segments are presented along with corresponding reports and coverage analyses.
Abstract: A method for integrated media monitoring is disclosed, wherein multiple forms of media are monitored and searched according to user defined criteria. The method may be used by a business to understand how a product or service is being received by the general public. Monitoring includes analysis of closed captioning data and human monitoring so as to provide a business with a full understanding of advertising and editorial effectiveness. A user provides media search parameters via a network, and a near real-time hit list is produced and presented to the requesting user. Options for previewing and purchasing matching media segments are presented, along with corresponding reports and coverage analyses. Previewing can occur via a streamed video format, whereas purchasing allows for high quality video download. Reports include information about how the product was conveyed, audience watching, and value to the business. Reports can be created by the system or by the user, formatted for presentation, purchased, and downloaded.

Proceedings ArticleDOI
Yunxin Zhao1, X. Zhang1, Rusheng Hu1, Jian Xue1, Xiaolong Li1, L. Che1, Rong Hu1, L. Schopp 
14 May 2006
TL;DR: The captioning system is based on the state-of-the-art technology of large vocabulary conversational speech recognition, encompassing speech stream separation, acoustic modeling, language modeling, real-time decoding, confidence annotation, and human-computer interface.
Abstract: In this paper, we present a first exposition of an automatic closed captioning system designed to assist hearing impaired users in telemedicine. This system automatically separates telehealth conversation speech between a health care provider and a client into two streams and provides real-time captions of health care provider's speech to client. The captioning system is based on the state-of-the-art technology of large vocabulary conversational speech recognition, encompassing speech stream separation, acoustic modeling, language modeling, real-time decoding, confidence annotation, and human-computer interface, with innovations made in several components. The system currently handles a vocabulary size over 46 K. Real-time captioning performance at the average word accuracy of 77.95% is reported.

Book ChapterDOI
11 Sep 2006
TL;DR: In this article, the authors described a LVCSR system for automatic online subtitling (closed captioning) of TV transmissions of the Czech Parliament meetings based on Hidden Markov Models, lexical trees and bigram language model.
Abstract: This paper describes a LVCSR system for automatic online subtitling (closed captioning) of TV transmissions of the Czech Parliament meetings The recognition system is based on Hidden Markov Models, lexical trees and bigram language model The acoustic model is trained on 40 hours of parliament speech and the language model on more than 10M tokens of parliament speech trancriptions The first part of the article is focused on text normalization and class-based language model preparation The second part describes the recognition network and its decoding with respect to real-time operation demands using up to 100k vocabulary The third part outlines the application framework allowing generation and displaying of subtitles for any audio/video source Finally, experimental results obtained on parliament speeches with recognition accuracy varying from 80 to 95 % (according to the discussed topic) are reported and discussed.

Patent
Fernando Incertis Carro1
31 Jan 2006
TL;DR: In this article, the authors present a system, method and computer program for enabling a user receiving a broadcast program on a program receiver to download on his/her computer device time stamped captions and synchronization data related to the received broadcast program, and to autonomously generate and display on his or her computer device, a captions stream synchronized with the display of the broadcast program.
Abstract: The present invention relates to a system, method and computer program for enabling a user receiving a broadcast program on a program receiver to download on his/her computer device time stamped captions and synchronization data related to the received broadcast program, and to autonomously generate and display on his/her computer device, a captions stream synchronized with the display of the broadcast program on the program receiver.

Patent
27 Jan 2006
TL;DR: In this paper, a method and apparatus for processing closed caption information associated with a video program by identifying a parameter associated with the video program; and, formatting the appearance of the caption information in response to the identified parameter.
Abstract: A method and apparatus for processing closed caption information associated with a video program by identifying a parameter associated with the video program; and, formatting the appearance of the closed caption information in response to the identified parameter The parameter may comprise genre information, and may be identified from program and system information protocol signals, extended data service information, or program guide data

Proceedings ArticleDOI
17 Sep 2006
TL;DR: In this paper, a new method was proposed to detect speech segments online with identifying gender attributes for efficient dual gender-dependent speech recognition and broadcast news captioning. But the proposed method requires a very small delay from the audio input.
Abstract: This paper describes a new method to detect speech segments online with identifying gender attributes for efficient dual gender-dependent speech recognition and broadcast news captioning. The proposed online speech detection performs dual-gender phoneme recognition and detects a start-point and an end-point based on the ratio between the cumulative phoneme likelihood and the cumulative non-speech likelihood with a very small delay from the audio input. Obtaining the speech segments, the phoneme recognizer also identifies gender attributes with high discrimination in order to guide the subsequent dual-gender continuous speech recognizer efficiently. As soon as the start-point is detected, the continuous speech recognizer with paralleled gender-dependent acoustic models starts a search and allows search transitions between male and female in a speech segment based on the gender attributes. Speech recognition experiments on conversational commentaries and field reporting from Japanese broadcast news showed that the proposed speech detection method was effective in reducing the false rejection rate from 4.6% to 0.53% and also recognition errors in comparison with a conventional method using adaptive energy thresholds. It was also effective in identifying the gender attributes, whose correct rate was 99.7% of words. With the new speech detection and the gender identification, the proposed dual-gender speech recognition significantly reduced the word error rate by 11.2% relative to a conventional gender-independent system, while keeping the computational cost feasible for real-time operation.

Journal ArticleDOI
TL;DR: The automatic provision of online lecture notes, synchronised with speech, enables staff and students to focus on learning and teaching issues, while also benefiting learners unable to attend the lecture or who find it difficult or impossible to take notes at the same time as listening, watching and thinking.
Abstract: The potential use of Automatic Speech Recognition to assist receptive communication is explored. The opportunities and challenges that this technology presents students and staff to provide captioning of speech online or in classrooms for deaf or hard of hearing students and assist blind, visually impaired or dyslexic learners to read and search learning material more readily by augmenting synthetic speech with natural recorded real speech is also discussed and evaluated. The automatic provision of online lecture notes, synchronised with speech, enables staff and students to focus on learning and teaching issues, while also benefiting learners unable to attend the lecture or who find it difficult or impossible to take notes at the same time as listening, watching and thinking.

Patent
Larry B. Pearson1, Edward Walter1
29 Mar 2006
TL;DR: In this article, a method includes receiving at a set-top box a broadcast signal including closed-captioning content including subtitles related to the broadcast signal and uniform resource locator (URL) data.
Abstract: A method includes receiving at a set-top box a broadcast signal including closed-captioning content. The closed-captioning content includes subtitles related to the broadcast signal and uniform resource locator (URL) data. The URL data is extracted from the closed-captioning content. The URL data is stored in a memory of the set-top box.

Patent
09 Jun 2006
TL;DR: In this article, an apparatus and method for displaying auxiliary information associated with a multimedia program is presented, where users with differing abilities and/or preferences related to display of caption text may customize display of the caption information transmitted with television programs.
Abstract: An apparatus and method is presented for displaying auxiliary information associated with a multimedia program. Specifically, the present invention is directed to receiving program signals, acquiring (104) auxiliary information (e.g. closed captioning information) from the program signals, associating (106) time information with the auxiliary information, storing (108) the auxiliary information in a file associated with program content, using (110) the auxiliary information file and program content to determine candidate program portions for customization of the auxiliary information, and displaying (112) customized auxiliary information with the program content in accordance with user selections (see FIG. 1). Using the present invention, users with differing abilities and/or preferences related to display of caption text may customize display of the caption information transmitted with television programs.

Patent
11 Jan 2006
TL;DR: In this article, a content detecting device for a digital broadcast signal receiver or a recording apparatus that records the digital broadcast signals is presented. But the detection of a commercial based on information on presence or absence of one of a closed captioning broadcast and a data broadcast is not considered.
Abstract: A content detecting device for a digital broadcast signal receiver or a recording apparatus that records the digital broadcast signal. A program-related-information acquiring unit acquires program specific information and information for creating an electronic program guide and causes a memory to store the information. A detecting unit detects a commercial based on information on presence or absence of one of a closed captioning broadcast and a data broadcast and causes the memory to store detection information. A discriminating unit reads out the detection information and outputs a signal for distinguishing the program and the commercial. When information in the program specific information and information in the electronic program guide information in the memory contradict each other concerning presence or absence of one of a closed captioning broadcast and a data broadcast, the detecting unit causes the memory to store information indicating the detection of the commercial.

Proceedings ArticleDOI
11 Dec 2006
TL;DR: A method to automatically index each video segment of the television program by the principal video object using closed-caption text information using Quinlan's C4.5 decision-tree learning algorithm and the predicted accuracies of production rule indicators.
Abstract: This paper proposes a method for automatically generating a multimedia encyclopedia from video clips using closed-caption text information. The goal is to automatically index each video segment of the television program by the principal video object. We focus on several features of the closed-caption text style in order to identify the principal video objects. Using Quinlan?s C4.5 decision-tree learning algorithm and the predicted accuracies of production rule indicators, one object noun is extracted for each video shot. To show the effectiveness of the method, we conducted experiments on the extraction of video segments in which animals appear in twenty television programs on animals and nature. We obtained a precision rate of 74.6 percent and a recall rate of 51.4 percent on the extraction of video segments in which animals appear, and generated a multimedia encyclopedia comprising 322 video clips showing 82 kinds of animals.

Patent
04 Jan 2006
TL;DR: In this paper, video navigation is provided where a video stream encoded with captioning is processed to locate captions that match a search string, and video playback is implemented from a point in the video stream near the located caption to thereby navigate to a scene in a program containing dialogue or descriptive sounds that most nearly matches the search string.
Abstract: Video navigation is provided where a video stream encoded with captioning is processed to locate captions that match a search string. Video playback is implemented from a point in the video stream near the located caption to thereby navigate to a scene in a program containing dialogue or descriptive sounds that most nearly matches the search string.

Journal ArticleDOI
TL;DR: In this paper, the role of parties involved in the passage of closed-captioning legislation and highlights how social forces were successful in passing legislation beneficial to the deaf and hearing-impaired community.
Abstract: In 1990 the United States Congress approved the Television Decoder Circuitry Act, which mandated that all television sets 13 inches or larger for sale in the United States be manufactured with caption‐decoding microchips. This legislation allowed millions of deaf and hearing‐impaired people throughout the US access to captions on commercials and television programs. Access to technology is one determinant of who can participate in the social, cultural, political and economic facets of a society. Scholars recognize that communication processes in the public sphere often are unbalanced. Access to media outlets creates a gap between those with media power and those without. Using a contextual analysis framework supported by a social model of disability, this paper defines the roles of parties involved in the passage of closed‐captioning legislation and highlights how social forces were successful in passing legislation beneficial to the Deaf and Hearing‐impaired community.

01 Jun 2006
TL;DR: This presentation will explain and demonstrate how automatic speech recognition can enhance the quality of learning and teaching and help ensure that both face to face learning and e-learning is accessible to all through the cost-effective production of synchronised and captioned multimedia.
Abstract: Lectures can present barriers to learning for many students and although online multimedia materials have become technically easier to create and offer many benefits for learning and teaching, they also can be difficult to access, manage, and exploit. This presentation will explain and demonstrate how automatic speech recognition can enhance the quality of learning and teaching and help ensure that both face to face learning and e-learning is accessible to all through the cost-effective production of synchronised and captioned multimedia. This approach can: support preferred learning and teaching styles and assist those who, for cognitive, physical or sensory reasons, find notetaking difficult; assist learners to manage and search online digital multimedia resources; provide automatic captioning of speech for deaf learners, or for any learner when speech is not available or suitable; assist blind, visually impaired or dyslexic learners to read and search learning material more readily by augmenting synthetic speech with natural recorded real speech; and assist reflection by teachers and learners to improve their spoken communication skills.

Patent
15 Mar 2006
TL;DR: In this paper, a system for providing closed-caption text in a movie theater wherein the caption information is only visible to those patrons wishing to view the text, comprising a projector projecting a plurality of light applications onto a predetermined strip of light in a viewing area, encoding the caption text so that it will only be viewed within the viewing area and decoding the encoded text.
Abstract: A system for providing closed-caption text in a movie theater wherein the caption information is only visible to those patrons wishing to view the text, comprising a projector projecting a plurality of light applications onto a predetermined strip of light in a viewing area, encoding the caption text so that it will only be viewed within the predetermined strip of light in the viewing area and decoding the encoded text so that it will be viewed only by those patrons wishing to view the text.


01 Feb 2006
TL;DR: This document specifies an RTP payload format for the transmission of 3GPP (3rd Generation Partnership Project) timed text, a time-lined, decorated text media format with defined storage in a 3GP file.
Abstract: This document specifies an RTP payload format for the transmission of 3GPP (3rd Generation Partnership Project) timed text. 3GPP timed text is a time-lined, decorated text media format with defined storage in a 3GP file. Timed Text can be synchronized with audio/video contents and used in applications such as captioning, titling, and multimedia presentations. In the following sections, the problems of streaming timed text are addressed, and a payload format for streaming 3GPP timed text over RTP is specified. [STANDARDS-TRACK]

Proceedings ArticleDOI
21 May 2006
TL;DR: Results show that while the Fischlar-TRECVid-2004 system combining text- and image-based searching achieves greater retrieval effectiveness, users make more varied and extensive queries with the image only based searching version.
Abstract: The Fischlar-TRECVid-2004 system was developed for Dublin City University's participation in the 2004 TRECVid video information retrieval benchmarking activity. The system allows search and retrieval of video shots from over 60 hours of content. The shot retrieval engine employed is based on a combination of query text matched against spoken dialogue combined with image-image matching where a still image (sourced externally), or a keyframe (from within the video archive itself), is matched against all keyframes in the video archive. Three separate text retrieval engines are employed for closed caption text, automatic speech recognition and video OCR. Visual shot matching is primarily based on MPEG-7 low-level descriptors. The system supports relevance feedback at the shot level enabling augmentation and refinement using relevant shots located by the user. Two variants of the system were developed, one that supports both text- and image-based searching and one that supports image only search. A user evaluation experiment compared the use of the two systems. Results show that while the system combining text- and image-based searching achieves greater retrieval effectiveness, users make more varied and extensive queries with the image only based searching version.

Proceedings Article
24 May 2006
TL;DR: The real-time closed-captioning system is based on the class-based language model designed after analysis of training data and OOV words in new (till now unseen) commen taries with the goal to decrease an OOV (Out-Of-Vocabulary) rate and increase recognition accuracy.
Abstract: This article describes the real-time speech recognition system for closed-captioning of TV ice-hockey commentaries. Automatic transcription of TV commentary accompanying an ice-hockey match is usually a hard task due to the spontaneous speech of a commentator put often into a very loud background noise created by the public, music, siren, drums, whistle, etc. Data for building this system was collected from 41 matches that were played during World Championships in years 2000, 2001, and 2002 and were transmitted by the Czech TV channels. The real-time closed-captioning system is based on the class-based language model designed after careful analysis of training data and OOV words in new (till now unseen) commentaries with the goal to decrease an OOV (Out-Of-Vocabulary) rate and increase recognition accuracy.

Patent
13 Nov 2006
TL;DR: In this paper, the authors present an approach for remotely controlling an image captioning device from a portable communication device by sending MMI data (SC1, VS) associated with a screen.
Abstract: The present invention is directed towards remotely controlling an image captioning device (16) from a portable communication device (10). MMI data (SC1, VS) is sent from the image captioning device to the portable communication device and includes at leas one set of screen data (SC1) related to a screen intended for presentation in the portable communication device. The set comprises selectable commands related to the screen, format information specifying the format in which the commands are to be transferred, and possibly a pointer to a further screen. The MMI data for one screen includes media data (VS). If an image captioning command associated with a screen is selected and sent from the portable communication device to the image captioning device an image (I) captured by the captioning device is returned.