scispace - formally typeset
Search or ask a question
Topic

Closed captioning

About: Closed captioning is a research topic. Over the lifetime, 3011 publications have been published within this topic receiving 64494 citations. The topic is also known as: CC.


Papers
More filters
Patent
08 Mar 1996
TL;DR: In this article, an electronic discussion group includes interaction with closed captioning from a media program, in substantially real-time, and video clips from the program may also be displayed on users' computer terminals.
Abstract: An electronic discussion group includes interaction with closed captioning (10) from a media program, in substantially real time. Video clips from the program may also be displayed on users' computer terminals (5).

33 citations

Posted Content
TL;DR: This work tackles two fundamental language-and-vision tasks: image-text matching and image captioning, and demonstrates that neural scene graph generators can learn effective visual relation features to facilitate grounding language to visual relations and subsequently improve the two end applications.
Abstract: Grounding language to visual relations is critical to various language-and-vision applications. In this work, we tackle two fundamental language-and-vision tasks: image-text matching and image captioning, and demonstrate that neural scene graph generators can learn effective visual relation features to facilitate grounding language to visual relations and subsequently improve the two end applications. By combining relation features with the state-of-the-art models, our experiments show significant improvement on the standard Flickr30K and MSCOCO benchmarks. Our experimental results and analysis show that relation features improve downstream models' capability of capturing visual relations in end vision-and-language applications. We also demonstrate the importance of learning scene graph generators with visually relevant relations to the effectiveness of relation features.

33 citations

Proceedings ArticleDOI
01 Aug 2019
TL;DR: The problem of video captioning is formulated, state-of-the-art methods categorized by their emphasis on vision or language are reviewed, and followed by a summary of standard datasets and representative approaches are reviewed.
Abstract: Deep learning has achieved great successes in solving specific artificial intelligence problems recently. Substantial progresses are made on Computer Vision (CV) and Natural Language Processing (NLP). As a connection between the two worlds of vision and language, video captioning is the task of producing a natural-language utterance (usually a sentence) that describes the visual content of a video. The task is naturally decomposed into two sub-tasks. One is to encode a video via a thorough understanding and learn visual representation. The other is caption generation, which decodes the learned representation into a sequential sentence, word by word. In this survey, we first formulate the problem of video captioning, then review state-of-the-art methods categorized by their emphasis on vision or language, and followed by a summary of standard datasets and representative approaches. Finally, we highlight the challenges which are not yet fully understood in this task and present future research directions.

33 citations

Journal ArticleDOI
TL;DR: The authors investigated the effects of various captioning conditions (i.e. full captioning, keyword captioning and no captions), the number of word encounters (one and three), and the combinations of these two variables on incidental learning of new words while viewing a video.
Abstract: Within instructed second language research, there is growing interest in research focusing on primary school vocabulary learning. Research has emphasized classroom-based learning of vocabulary knowledge, with growing focus on the potential for using captioned videos and increased word encounters. The present study investigated the effects of various captioning conditions (i.e. full captioning, keyword captioning, and no captions), the number of word encounters (one and three), and the combinations of these two variables on incidental learning of new words while viewing a video. Six possible conditions were explored. A total of 257 primary school students learning English as a second language (ESL) were divided into six groups and randomly assigned to a condition in which 15 target lexical items were included. A post-test, measuring the recognition of word form/meaning and recall of word meaning, was administered immediately after participants viewed the video. The post-test was not disclosed to the learners in advance. The group viewing the full captioning video scored significantly higher than the keyword captioning group and the no-captioning group. Repeated encounters with the targeted lexical items led to more successful learning. The combination of full captioning and three encounters was most effective for incidental learning of lexical items. This quasi-experimental study contributes to the literature by providing evidence which suggests that captioned videos coordinate two domains (i.e. auditory and visual components) and help ESL learners to obtain greater depth of word form processing, identify meaning by unpacking language chunks, and reinforce the form-meaning link.

33 citations

Posted Content
TL;DR: The TREC Video Retrieval Evaluation (TRECVID) 2019 was a TREC-style video analysis and retrieval evaluation, the goal of which remains to promote progress in research and development of content-based exploitation and retrieval of information from digital video via open, metrics-based evaluation as mentioned in this paper.
Abstract: The TREC Video Retrieval Evaluation (TRECVID) 2019 was a TREC-style video analysis and retrieval evaluation, the goal of which remains to promote progress in research and development of content-based exploitation and retrieval of information from digital video via open, metrics-based evaluation. Over the last nineteen years this effort has yielded a better understanding of how systems can effectively accomplish such processing and how one can reliably benchmark their performance. TRECVID has been funded by NIST (National Institute of Standards and Technology) and other US government agencies. In addition, many organizations and individuals worldwide contribute significant time and effort. TRECVID 2019 represented a continuation of four tasks from TRECVID 2018. In total, 27 teams from various research organizations worldwide completed one or more of the following four tasks: 1. Ad-hoc Video Search (AVS) 2. Instance Search (INS) 3. Activities in Extended Video (ActEV) 4. Video to Text Description (VTT) This paper is an introduction to the evaluation framework, tasks, data, and measures used in the workshop.

33 citations


Network Information
Related Topics (5)
Feature vector
48.8K papers, 954.4K citations
83% related
Object detection
46.1K papers, 1.3M citations
82% related
Convolutional neural network
74.7K papers, 2M citations
82% related
Deep learning
79.8K papers, 2.1M citations
82% related
Unsupervised learning
22.7K papers, 1M citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023536
20221,030
2021504
2020530
2019448
2018334