scispace - formally typeset
Search or ask a question

Showing papers on "Closed captioning published in 2000"


Patent
22 May 2000
TL;DR: In this paper, a five-step process for producing closed captions for a television program, subtitles for a movie or other uses for time-aligned transcripts is described, where the first step consists of identifying the portions of the input audio that contain spoken text.
Abstract: Disclosed is a five-step process for producing closed captions for a television program, subtitles for a movie or other uses for time-aligned transcripts. An operator transcribes the audio track while listening to the recorded material. The system helps him/her to work efficiently and produce precisely aligned captions. The first step consists of identifying the portions of the input audio that contain spoken text. Only the spoken parts are further processed by the invention system. The other parts may be used to generate non-spoken captions. The second step controls the rate of speech depending on how fast the operator types. While the operator types, the third module records the time the words were typed in. This provides a rough time alignment for the transcribed text. Then the fourth module realigns precisely the transcribed text on the audio track. A final module segments the transcribed text into captions, based on acoustic clues and natural language constraints. Further, the speech rate-control component of the system may be used in other systems where transcripts are required to be generated from spoken audio.

81 citations


Journal ArticleDOI
TL;DR: It was found that the addition of captions to a video resulted in major changes in eye movement patterns, with the viewing process becoming primarily a reading process.
Abstract: Eye movement of six subjects was recorded as they watched video segments with and without captions. It was found that the addition of captions to a video resulted in major changes in eye movement patterns, with the viewing process becoming primarily a reading process. Further, although people viewing a specific video segment are likely to have similar eye movement patterns, there are also distinct individual differences present in these patterns. For example, someone accustomed to speechreading may spend more time looking at an actor's lips, while someone with poor English skills may spend more time reading the captions. Finally, there is some preliminary evidence to suggest that higher captioning speed results in more time spent reading captions on a video segment.

74 citations


Patent
30 Nov 2000
TL;DR: In this paper, a system for finding URLs for sites having information related to topics in a video presentation has an extractor extracting closed-caption (CC) text from the video presentation, a parser parsing the CC text for topic language, and a search function using the topic language from the parser as a search criteria.
Abstract: A system for finding URLs for sites having information related to topics in a video presentation has an extractor extracting closed-caption (CC) text from the video presentation, a parser parsing the CC text for topic language, and a search function using the topic language from the parser as a search criteria. The search function searches for WEB sites having information matching the topic language, returns URLs for WEB sites found, and associates the URLs with the topic language. In some cases there is a hyperlink generator for creating hyperlinks to the WEB sites returned, and the system displays the hyperlinks with a display of the video presentation. In a preferred embodiment the video presentation is provided in a first window in the display, thumbnails are displayed in a second window, each thumbnail representing a new topic, and the hyperlinks are displayed in a third window. The hyperlinks are displayed in the third window when the video presentation in the first window is in the particular topic related to the hyperlinks, or when a user does a mouseover of a thumbnail representing the topic to which the hyperlinks are related.

68 citations


Journal ArticleDOI
TL;DR: The Informedia Digital Video Library system as mentioned in this paper extracts information from digitized video sources and allows full content search and retrieval over all extracted data This extracted'metadata' enables users to rapidly find interesting news stories and to quickly identify whether a retrieved TV news story is indeed relevant to their query.
Abstract: The Informedia Digital Video Library system extracts information from digitized video sources and allows full content search and retrieval over all extracted data This extracted 'metadata' enables users to rapidly find interesting news stories and to quickly identify whether a retrieved TV news story is indeed relevant to their query This article highlights two unique features: named faces and location analysis Named faces automatically associate a name with a face, while location analysis allows the user to visually follow the action in the news story on a map and also allows queries for news stories by graphically selecting a region on the map 1 The Informedia Digital Video Library Project The Informedia Digital Video Library project [1], initiated in 1994, uniquely utilizes integrated speech, image and natural language understanding to process broadcast video The project’s goal is to allow search and retrieval in the video medium, similar to what is available today for text only To enable this access to video, fast, high-accuracy automatic transcriptions of broadcast news stories are generated through Carnegie Mellon’s Sphinx speech recognition system and closed captions are incorporated where available Image processing determines scene boundaries, recognizes faces and allows for image similarity comparisons Text visible on the screen is recognized through video OCR and can be searched Everything is indexed into a searchable digital video library [2], where users can ask queries and retrieve relevant news stories as results The

63 citations


Patent
David J. Marsh1
21 Apr 2000
TL;DR: In this article, a time-dependent content buffering arrangement is proposed for selecting candidate television and multimedia programs for recording, recording the candidate programs, viewing the recorded programs, and archiving the recorded program.
Abstract: Improved methods and arrangements are provided for use in selecting candidate television and multimedia programs for recording, recording the candidate programs, viewing the recorded programs, and archiving the recorded programs. At the center of this capability is a time-dependent content buffering arrangement that allows candidate programs to be selected by an intelligent content agent, with the assistance of a bubbling agent, an electronic program guide, a select library listing, and/or a personal profile associated with a particular user. The buffering arrangement selectively records candidate programs in a non-circular manner. Candidate programs may be dropped during recording based on certain information associated with the program. For example, examination of closed captioning information may reveal that the candidate program does not match the initial criteria for making it a candidate program. The buffering arrangement also allows the user to selectively view recorded programs on demand and/or archive certain programs. Archived programs are maintained locally or otherwise stored to another media. Those recorded programs that are not archived will be erased in a time-dependent manner when a defined storage capacity is reached. The buffering arrangement also provides for feedback to various intelligent candidate-selecting agents, such as, e.g., an intelligent content agent and a bubbling agent.

60 citations


Proceedings ArticleDOI
01 Jun 2000
TL;DR: A 2-pass decoder that progressively outputs the latest available results used for real-time closed captioning of Japanese broadcast news that practically eliminates the disadvantage of multiple-passDecoder that delay a decision until the end of a sentence.
Abstract: This paper describes a 2-pass decoder that progressively outputs the latest available results used for real-time closed captioning of Japanese broadcast news. The decoder practically eliminates the disadvantage of multiple-pass decoders that delay a decision until the end of a sentence. During the first pass of search the proposed decoder periodically executes the second pass that rescores partial N-best word sequences up to that time. If the rescored best word sequence has words in common with the previous one, that part is regarded as likely to be correct and is decided to be a part of the final result. This method is not theoretically optimal but makes a quick response with a negligible increase in word errors. In a recognition experiment on Japanese broadcast news, the decoder worked with an average decision delay of 554 msec for each word and degraded word accuracy only by 0.22%.

44 citations


Patent
31 May 2000
TL;DR: In this article, an automated video editing and indexing system is presented, which has application for such tasks as creating a presentation for a video magazine and processing collections of video material in general.
Abstract: An automated video editing and indexing system is taught which has application for such tasks as creating a presentation for a video magazine and processing collections of video material in general. The system extracts text either from closed captions (CC) contained in the analog video presentation, or extracted from the voice contained in the presentation by voice-to-text techniques, time-stamps the CC text in text files according to position in the video presentation, and digitizes the video presentation. The text files and digitized video are sent to an editing station, where the CC text is analyzed using Natural Language Processing and other techniques to determine topic changes in the presentation. Keyframes are selected to represent the topic changes, and become thumbnails useful in indexing; indexing meaning marking the video material at points of topic changes, and in some cases jumping the video presentation to the positions represented by the thumbnails. In some cases selected CC text is associated with the thumbnails, and displayed in the video magazine as each thumbnail is selected by mouseover.

43 citations


Proceedings ArticleDOI
30 Jul 2000
TL;DR: This work presents an efficient technique for temporal segmentation and parsing of news recordings based on visual cues that can either be employed as a stand-alone application for non-closed captioned broadcasts or integrated with audio and textual cues of existing systems.
Abstract: Automatic content-based analysis and indexing of broadcast news recordings or digitized news archives is becoming an important tool in the framework of many multimedia interactive services such as news summarization, browsing, retrieval and news-on-demand (NoD) applications. Existing approaches have achieved high performance in such applications but heavily rely on textual cues such as closed caption tokens and teletext transcripts. We present an efficient technique for temporal segmentation and parsing of news recordings based on visual cues that can either be employed as a stand-alone application for non-closed captioned broadcasts or integrated with audio and textual cues of existing systems. The technique involves robust face detection by means of color segmentation, skin color matching and shape processing, and is able to identify typical news instances like anchor persons, reports and outdoor shots.

40 citations


PatentDOI
TL;DR: In this article, a method and apparatus is provided to enable a user watching and/or listening to a program to search for new information in the stream of a telecommunications data, which can be accomplished by either changing the viewer's station or by bringing in a split screen display forward into the display.
Abstract: A method and apparatus is provided to enable a user watching and/or listening to a program to search for new information in the stream of a telecommunications data. The apparatus includes a voice recognition system that recognizes the user's request and causes a search to be performed in the long stream of data of at least one other telecommunication channel. The system includes a storage device for storing and processing the request. Upon recognition of the request, the incoming signal or signals are scanned for matches with the request. Upon finding the match between the request and the incoming signal, information related to the data is brought to the viewer's attention. This can be accomplished by either changing the viewer's station or by bringing in a split screen display forward into the display.

39 citations


Journal ArticleDOI
TL;DR: This paper describes an MT system that translates the closed captions that accompany most North American television broadcasts, and presents a fully automatic large-scale multilingual MT system, ALTo, based on Whitelock's Shake and Bake MT paradigm.
Abstract: Traditional Machine Translation (MT) systems are designed to translate documents. In this paper we describe an MT system that translates the closed captions that accompany most North American television broadcasts. This domain has two identifying characteristics. First, the captions themselves have properties quite different from the type of textual input that many MT systems have been designed for. This is due to the fact that captions generally represent speech and hence contain many of the phenomena that characterize spoken language. Second, the operational characteristics of the closed-caption domain are also quite distinctive. Unlike most other translation domains, the translated captions are only one of several sources of information that are available to the user. In addition, the user has limited time to comprehend the translation since captions only appear on the screen for a few seconds. In this paper, we look at some of the theoretical and implementational challenges that these characteristics pose for MT. We present a fully automatic large-scale multilingual MT system, ALTo. Our approach is based on Whitelock's Shake and Bake MT paradigm, which relies heavily on lexical resources. The system currently provides wide-coverage translation from English to Spanish. In addition to discussing the design of the system, we also address the evaluation issues that are associated with this domain and report on our current performance.

31 citations


Proceedings ArticleDOI
30 Jul 2000
TL;DR: A telop-on-demand system that anatomically recognizes texts in video frames to create the indices needed for content based video browsing and retrieval and a method for structuring a video based on the text attributes.
Abstract: The paper presents a telop-on-demand system that anatomically recognizes texts in video frames to create the indices needed for content based video browsing and retrieval. Superimposed texts are important as they provide semantic information about scene contents. Their attributes such as fonts, size, and position in a frame are important as they are carefully designed by the video editor and so reflect the intent of captioning. In news programs, for instance, the headline text is displayed in larger fonts than the subtitles. Our system takes into account not only the texts themselves but also their attributes for structuring videos. We describe: (i) novel methods for detecting and extracting texts that are robust against the presence of complex backgrounds and intensity degradation of the character patterns, and (ii) a method for structuring a video based on the text attributes.

Patent
27 Dec 2000
TL;DR: In this paper, a system and method by which a user can select the language in which on-screen displays are displayed and audio programs are broadcast on a receiver by making a single-on-screen selection is presented.
Abstract: A system and method by which a user can select the language in which on-screen displays are displayed and audio programs are broadcast on a receiver by making a single-on-screen selection. The on-screen displays may include user menu, close captioning and/or teletext.

Patent
21 Dec 2000
TL;DR: In this article, a method and apparatus for analyzing the closed caption aspect of a video signal for specific undesirable words or phrases and then muting the audio portion of those words and phrases while not affecting the video portion therein while simultaneously modifying the closed-captioned signal in order to display an acceptable word or phrase was presented.
Abstract: A method and apparatus for analyzing the closed caption aspect of a video signal for specific undesirable words or phrases and then muting the audio portion of those words or phrases while not effecting the video portion therein while simultaneously modifying the closed captioned signal in order to display an acceptable word or phrase.

Proceedings Article
01 Jan 2000
TL;DR: This application will provide the hearing impaired with an option to read captions for live broadcast programs, i.e., when off-line captioning is not feasible.
Abstract: A system for on-line generation of closed captions (subtitles) for broadcast of live TV-programs is described. During broadcast, a commentator formulates a possibly condensed, but semantically correct version of the original speech. These compressed phrases are recognized by a continuous speech recognizer, and the resulting captions are fed into the teletext system. This application will provide the hearing impaired with an option to read captions for live broadcast programs, i.e., when off-line captioning is not feasible. The main advantage in using a speech recognizer rather than a stenography-based system (e.g., Velotype) is the relaxed requirements for commentator training. Also, the amount of text generated by a system based on stenography tends to be large, thus making it harder to read.


01 Nov 2000
TL;DR: In this paper, Davis et al. provide questions and answers for postsecondary educational institutions concerning provision of access and accommodations to individuals who are deaf and hard-of-hearing.
Abstract: This booklet provides questions and answers for postsecondary educational institutions concerning provision of access and accommodations to individuals who ai:e deaf and hard of hearing. Questions are about assistive listening devices (ALDs), C-Print technology, real-time captioning, and policy issues. Preliminary information concerns the mission and members of the Postsecondary Education Program Network. Questions address such concerns as use of ALDs by students with cochlear implants, use of ALDs in a conference table setting, use of an ALD by a student who has a learning disability or attention deficit disorder, differences between C-Print and real-time captioning, qualifications and training of the C-Print captionist, costs of C-Print software and hardware, recommended accuracy percentages for real-time captioning, use of real-time captioning in group learning environments, and use of real-time captioning with visually impaired students. Some policy issues addressed include making C-Print notes available to faculty and other students, eligibility for services, and determination of appropriate accommodations. (DB) Reproductions supplied by EDRS are the best that can be made from the original document. Providing Real-Time Captioning, C-Print Speech To Print Transcription, Assistive Listening Devices and Other Technologies: Questions and Answers. Cheryl Davis, Pamela Francis, Denese Harlan U.S. DEPARTMENT OF EDUCATION Office of Educational Research and Improvement EDUCATIONAL RESOURCES INFORMATION CENTER (ERIC) O This document has been reproduced as received from the person or organization originating it. O Minor changes have been made to improve reproduction quality Points of view or opinions stated in this document do not necessarily represent official OERI position or policy.

T. Imai, A. Kobayashi, S. Sato, H. Tanaka, A. Ando 
01 Jan 2000
TL;DR: A speech recognition engine that progressively outputs the latest available results of words used for real-time closed captioning of Japanese broadcast news using a progressive 2-pass decoder that practically eliminates the disadvantage of conventional multiple- pass decoders.
Abstract: This paper describes a speech recognition engine that progressively outputs the latest available results of words used for real-time closed captioning of Japanese broadcast news. The search engine called a progressive 2-pass decoder practically eliminates the disadvantage of conventional multiple-pass decoders that delay a decision until the end of a sentence. During the first pass of the search the proposed decoder periodically executes the second pass up to that time and detects a part of the final result of words. This method is not theoretically optimal but makes a quick decision with a negligible increase in word errors. In a recognition experiment on Japanese broadcast news, the decoder worked with an average decision delay of 554msec for each word and degraded word accuracy only by 0.22%.

Patent
15 Dec 2000
TL;DR: In this article, an apparatus for displaying closed captioning data of a wide digital TV consisting of a micro-computer for transmitting a digital MPEG compressed image/sound transport stream through a tuner and an analog/digital converter or generating a graphic image according to an order by a user is provided.
Abstract: PURPOSE: An apparatus and method of displaying a closed captioning data of a wide digital TV is provided to divide a TV screen into two parts, and display an image signal on one part and a closed captioning data on the other part, thereby simultaneously providing two services to a user. CONSTITUTION: An apparatus for displaying a closed captioning data of a wide digital TV comprises a micro-computer for transmitting a digital MPEG compressed image/sound transport stream through a tuner and an analog/digital converter or generating a graphic image according to an order by a user, a MPEG audio/video decoder for restoring the MPEG compressed image/sound transport stream and then outputting an information composed with the graphic image and having a displaying unit, an encoder for encoding an image information according to a broadcasting way, and a digital/analog converter for converting a sound information into an analog sound signal and then outputting the signal. In the apparatus, the displaying unit comprises an image superpositioning portion(301) for superpositioning a closed captioning data on the restored digital compressed image data, a 4:3 format converter(302) for converting an image of 16:9 ratio into an image of 4:3 ratio, a first image mixer(303) for displaying the image of 4:3 ratio on an entire screen and composing the closed captioning data image to simultaneously display the image and the closed captioning data, and a second image mixer(304) for dividing the entire screen of the TV into two parts and displaying the image at one part and the closed captioning data at the other part.


Patent
12 Jul 2000
TL;DR: In this paper, a television set having the option of displaying main and one or more auxiliary pictures, ratings control capability which includes means for an authorized person to limit the display of programs having ratings beyond a selected level, closed captioning capability, and a single stripper for slicing data from the vertical blanking information (VBI).
Abstract: A television set having the option of displaying main and one or more auxiliary pictures, ratings control capability which includes means for an authorized person to limit the display of programs having ratings beyond a selected level, closed captioning capability which includes means for a viewer to choose to display closed captioning along with a picture, and a single stripper for slicing data from the vertical blanking information (VBI) portion of one or more broadcasts wherein the single stripper processes ratings control information for both main picture and auxiliary picture when the ratings control option has been invoked by cycling between main and auxiliary; and the auxiliary picture capability is disabled when the closed captioning option has been invoked.


Journal ArticleDOI
01 Jul 2000
TL;DR: An instructional method was implemented among a group of junior high school students who are deaf, to train them to be attentive to certain types of information by facilitating their acquisition of knowledge and critical viewing and the results bring into question the issue of true accessibility and the utility of the format of captioning as presently developed.
Abstract: Adding captions to televised programs was a media modification to enable accessibility of television to people with restricted access to the audio components, such as those who are deaf or hard-of hearing. Captioned television for the deaf is becoming quite common. In the United States, television captions are generally in written English; however, the English literacy rate among people who are deaf is quite low. Therefore, this research explored a way to make captioned television usable by its intended audience. An instructional method was implemented among a group of junior high school students who are deaf, to train them to be attentive to certain types of information by facilitating their acquisition of knowledge and critical viewing. The initial implementation of the training suggests a need for improving deaf students' access to their prior knowledge to apply to reading comprehension and language related skills. The results of the two-week training of no increase in captioning comprehension bring in...

Proceedings ArticleDOI
08 Jun 2000
TL;DR: In this paper, the authors present several heuristic algorithms that attempt to maximize the revenue generated by an e-commerce merchant providing a service, tailored to deal with different customer service policies.
Abstract: Free bandwidth in television channels which is available in the form of the vertical blanking interval (VBI) is currently being utilized to broadcast programme information, HTML pages and closed captioning. This bandwidth can be used to broadcast data and also products like video clips, software, multimedia packages, etc. thereby satisfying several customers with a single transmission at very high speeds. We present several heuristic algorithms that attempt to maximize the revenue earned by an e-commerce merchant providing such a service. These algorithms are tailored to deal with different customer service policies.

Journal ArticleDOI
TL;DR: Several methods for presenting closed-captions (captions) on TV news programs for hearing-impaired people were subjectively evaluated and two-line captions that changed at once were evaluated to be the best.
Abstract: Several methods for presenting closed-captions (captions) on TV news programs for hearing-impaired people were subjectively evaluated. A total of 180 captioned programs were used for the experiment ; produced by combining 36 presentation methods and 5 kinds of TV news materials. The evaluations were had done by 35 hearing-impaired people and 36 people with normal hearing. Three factors were evaluated : overall goodness, caption quality and picture quality. They were evaluated on a scale of 1 to 5. Two-line captions that changed at once were evaluated to be the best. Separating the caption area from the picture area was found to improve the score. Captions were easier to understand when the display time of the captions was lengthened.