Topic

Closed captioning

About: Closed captioning is a research topic. Over the lifetime, 3011 publications have been published within this topic receiving 64494 citations. The topic is also known as: CC.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Posted Content•

Exploring and Distilling Cross-Modal Information for Image Captioning

[...]

Fenglin Liu¹, Xuancheng Ren¹, Yuanxin Liu², Kai Lei¹, Xu Sun¹ - Show less +1 more•Institutions (2)

Peking University¹, Beijing University of Posts and Telecommunications²

28 Feb 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: Based on the Transformer, the authors explore image captioning from a cross-modal perspective and propose the Global and Local Information Exploring-and-Distilling approach that explores and distills the source information in vision and language.

...read moreread less

Abstract: Recently, attention-based encoder-decoder models have been used extensively in image captioning. Yet there is still great difficulty for the current methods to achieve deep image understanding. In this work, we argue that such understanding requires visual attention to correlated image regions and semantic attention to coherent attributes of interest. Based on the Transformer, to perform effective attention, we explore image captioning from a cross-modal perspective and propose the Global-and-Local Information Exploring-and-Distilling approach that explores and distills the source information in vision and language. It globally provides the aspect vector, a spatial and relational representation of images based on caption contexts, through the extraction of salient region groupings and attribute collocations, and locally extracts the fine-grained regions and attributes in reference to the aspect vector for word selection. Our Transformer-based model achieves a CIDEr score of 129.3 in offline COCO evaluation on the COCO testing set with remarkable efficiency in terms of accuracy, speed, and parameter budget.

...read moreread less

16 citations

Proceedings Article•DOI•

Comparison of VGG and ResNet used as Encoders for Image Captioning

[...]

Viktar Atliha¹, Dmitrij Šešok¹•Institutions (1)

Vilnius Gediminas Technical University¹

30 Apr 2020

TL;DR: Comparing two popular convolution networks architectures – VGG and ResNet – as encoders for the same image captioning model in order to find out which method is the best at image representation used for caption generation shows that encoder plays a big role and can significantly improve model without changing a decoder architecture.

...read moreread less

Abstract: Recent models for image captioning are usually based on an encoder-decoder framework. Large pre-trained convolutional neural networks are often used as encoders. However, different authors use different encoder architectures for their image captioning models. This makes it more difficult to determine the effect that the encoder has on the overall model performance. In this paper we compare two popular convolution networks architectures – VGG and ResNet – as encoders for the same image captioning model in order to find out which method is the best at image representation used for caption generation.The results show that the ResNet outperforms VGG allowing image captioning model achieve higher BLEU-4 score. Furthermore, the results show that the ResNet allows model to achieve a score comparable with the VGG-based model with a less amount of training epochs. Based on this data we can state that encoder plays a big role and can significantly improve model without changing a decoder architecture.

...read moreread less

16 citations

Book Chapter•DOI•

Towards Unique and Informative Captioning of Images

[...]

Zeyu Wang¹, Berthy Feng¹, Karthik Narasimhan¹, Olga Russakovsky¹•Institutions (1)

Princeton University¹

23 Aug 2020

TL;DR: In this article, a new metric called SPICE-U was proposed by introducing a notion of uniqueness over the concepts generated in a caption, which is better correlated with human judgements compared to SPICE.

...read moreread less

Abstract: Despite considerable progress, state of the art image captioning models produce generic captions, leaving out important image details. Furthermore, these systems may even misrepresent the image in order to produce a simpler caption consisting of common concepts. In this paper, we first analyze both modern captioning systems and evaluation metrics through empirical experiments to quantify these phenomena. We find that modern captioning systems return higher likelihoods for incorrect distractor sentences compared to ground truth captions, and that evaluation metrics like SPICE can be ‘topped’ using simple captioning systems relying on object detectors. Inspired by these observations, we design a new metric (SPICE-U) by introducing a notion of uniqueness over the concepts generated in a caption. We show that SPICE-U is better correlated with human judgements compared to SPICE, and effectively captures notions of diversity and descriptiveness. Finally, we also demonstrate a general technique to improve any existing captioning model – by using mutual information as a re-ranking objective during decoding. Empirically, this results in more unique and informative captions, and improves three different state-of-the-art models on SPICE-U as well as average score over existing metrics (Code is available at https://github.com/princetonvisualai/SPICE-U).

...read moreread less

16 citations

On CALL: A Review of Computer-Assisted Language Learning in U.S. Colleges and Universities.

[...]

Edward J. Miech

18 Apr 1996

TL;DR: This paper examines 22 empirical computer-assisted language learning (CALL) studies published between 1989 and 1994, and 13 reviews and syntheses published between 1987 and 1992, pertaining to CALL in higher education in the United States, and provides three general conclusions.

...read moreread less

Abstract: This paper examines 22 empirical computer-assisted language learning (CALL) studies published between 1989 and 1994, and 13 reviews and syntheses published between 1987 and 1992, pertaining to CALL in higher education in the United States. A "three streams" framework helps to place CALL in a larger context and illustrate its several dimensions. Any specific CALL program involves decisions in relation to developments in at lea.it three fields: educational psychology; linguistics; and computer technology. These three fields may be conceptualized as streams, where each stream flows more or less independently of the others, but where the practice of CALL at any given time requires making a passage across all three. An interpretive summary of five major findings from the review of the empirical CALL studies is offered: (1) captioning video segments can dramatically boost student comprehension; (2) CALL can connect students with other people inside and outside of the classroom, promoting natural and spontaneous communication in the target language; (3) the type of CALL fe-!dback provided to students can play a central role in learning; (4) student attitudes toward CALL are not consistently linked to student achievement using CALL; and (5) CALL can substantially improve achievement as compared with traditional instruction. This paper also provides three general conclusions, each accompanied by recommendations for future CALL practice and research. Appendices include the material sear-h procedure; captioning information; supplementary findings from the empirical studies; individual summaries of empirical studies; and individual summaries of CALL and Computer-Assistcd Instruction (CAI) reviews. (Contains 43 references.) (Author/AEF) *********************************************************************** Reproductions supplied by EDRS are the best that can be made from the original document. **********************************************************************

...read moreread less

16 citations

Journal Article•

The Benefit of Real-Time Captioning in a Mainstream Classroom as Measured by Working Memory.

[...]

Aaron Steinfeld

01 Jan 1998-Volta Review

16 citations

Collapse

Network Information

Performance

Metrics

4,575

Papers

96,790

Citations

No. of papers in the topic in previous years
Year	Papers
2023	536
2022	1,030
2021	504
2020	530
2019	448
2018	334

Closed captioning

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics