Topic
Closed captioning
About: Closed captioning is a research topic. Over the lifetime, 3011 publications have been published within this topic receiving 64494 citations. The topic is also known as: CC.
Papers published on a yearly basis
Papers
More filters
•
08 Mar 1996TL;DR: In this article, an electronic discussion group includes interaction with closed captioning from a media program, in substantially real-time, and video clips from the program may also be displayed on users' computer terminals.
Abstract: An electronic discussion group includes interaction with closed captioning (10) from a media program, in substantially real time. Video clips from the program may also be displayed on users' computer terminals (5).
33 citations
•
TL;DR: This work tackles two fundamental language-and-vision tasks: image-text matching and image captioning, and demonstrates that neural scene graph generators can learn effective visual relation features to facilitate grounding language to visual relations and subsequently improve the two end applications.
Abstract: Grounding language to visual relations is critical to various language-and-vision applications. In this work, we tackle two fundamental language-and-vision tasks: image-text matching and image captioning, and demonstrate that neural scene graph generators can learn effective visual relation features to facilitate grounding language to visual relations and subsequently improve the two end applications. By combining relation features with the state-of-the-art models, our experiments show significant improvement on the standard Flickr30K and MSCOCO benchmarks. Our experimental results and analysis show that relation features improve downstream models' capability of capturing visual relations in end vision-and-language applications. We also demonstrate the importance of learning scene graph generators with visually relevant relations to the effectiveness of relation features.
33 citations
••
01 Aug 2019TL;DR: The problem of video captioning is formulated, state-of-the-art methods categorized by their emphasis on vision or language are reviewed, and followed by a summary of standard datasets and representative approaches are reviewed.
Abstract: Deep learning has achieved great successes in solving specific artificial intelligence problems recently.
Substantial progresses are made on Computer Vision (CV) and Natural Language Processing (NLP).
As a connection between the two worlds of vision and language, video captioning is the task of producing a natural-language utterance (usually a sentence) that describes the visual content of a video. The task is naturally decomposed into two sub-tasks.
One is to encode a video via a thorough understanding and learn visual representation. The other is caption generation, which decodes the learned representation into a sequential sentence, word by word.
In this survey, we first formulate the problem of video captioning, then review state-of-the-art methods categorized by their emphasis on vision or language, and followed by a summary of standard datasets and representative approaches.
Finally, we highlight the challenges which are not yet fully understood in this task and present future research directions.
33 citations
••
TL;DR: The authors investigated the effects of various captioning conditions (i.e. full captioning, keyword captioning and no captions), the number of word encounters (one and three), and the combinations of these two variables on incidental learning of new words while viewing a video.
Abstract: Within instructed second language research, there is growing interest in research focusing on primary school vocabulary learning. Research has emphasized classroom-based learning of vocabulary knowledge, with growing focus on the potential for using captioned videos and increased word encounters. The present study investigated the effects of various captioning conditions (i.e. full captioning, keyword captioning, and no captions), the number of word encounters (one and three), and the combinations of these two variables on incidental learning of new words while viewing a video. Six possible conditions were explored. A total of 257 primary school students learning English as a second language (ESL) were divided into six groups and randomly assigned to a condition in which 15 target lexical items were included. A post-test, measuring the recognition of word form/meaning and recall of word meaning, was administered immediately after participants viewed the video. The post-test was not disclosed to the learners in advance. The group viewing the full captioning video scored significantly higher than the keyword captioning group and the no-captioning group. Repeated encounters with the targeted lexical items led to more successful learning. The combination of full captioning and three encounters was most effective for incidental learning of lexical items. This quasi-experimental study contributes to the literature by providing evidence which suggests that captioned videos coordinate two domains (i.e. auditory and visual components) and help ESL learners to obtain greater depth of word form processing, identify meaning by unpacking language chunks, and reinforce the form-meaning link.
33 citations
•
TL;DR: The TREC Video Retrieval Evaluation (TRECVID) 2019 was a TREC-style video analysis and retrieval evaluation, the goal of which remains to promote progress in research and development of content-based exploitation and retrieval of information from digital video via open, metrics-based evaluation as mentioned in this paper.
Abstract: The TREC Video Retrieval Evaluation (TRECVID) 2019 was a TREC-style video analysis and retrieval evaluation, the goal of which remains to promote progress in research and development of content-based exploitation and retrieval of information from digital video via open, metrics-based evaluation. Over the last nineteen years this effort has yielded a better understanding of how systems can effectively accomplish such processing and how one can reliably benchmark their performance. TRECVID has been funded by NIST (National Institute of Standards and Technology) and other US government agencies. In addition, many organizations and individuals worldwide contribute significant time and effort. TRECVID 2019 represented a continuation of four tasks from TRECVID 2018. In total, 27 teams from various research organizations worldwide completed one or more of the following four tasks: 1. Ad-hoc Video Search (AVS) 2. Instance Search (INS) 3. Activities in Extended Video (ActEV) 4. Video to Text Description (VTT) This paper is an introduction to the evaluation framework, tasks, data, and measures used in the workshop.
33 citations