Home
/
Topics
/
Closed captioning

Topic

Closed captioning

About: Closed captioning is a research topic. Over the lifetime, 3011 publications have been published within this topic receiving 64494 citations. The topic is also known as: CC.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1989
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1977

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Hierarchical & multimodal video captioning: Discovering and transferring multimodal knowledge for vision to language

[...]

An-An Liu¹, Ning Xu¹, Yongkang Wong², Junnan Li², Yuting Su¹, Mohan S. Kankanhalli² - Show less +2 more•Institutions (2)

Tianjin University¹, National University of Singapore²

01 Oct 2017-Computer Vision and Image Understanding

TL;DR: A Hierarchical & Multimodal Video Caption (HMVC) model is proposed to jointly learn the dynamics within both visual and textual modalities for video caption task, which infers an arbitrary length sentence according to the input video with arbitrary number of frames.

...read moreread less

37 citations

Book Chapter•DOI•

Captioning for deaf and hard of hearing people by editing automatic speech recognition in real time

[...]

Mike Wald¹•Institutions (1)

University of Southampton¹

11 Jul 2006

TL;DR: The development of a system that enables editors to correct errors in the captions as they are created by Automatic Speech Recognition is described.

...read moreread less

Abstract: Deaf and hard of hearing people can find it difficult to follow speech through hearing alone or to take notes when lip-reading or watching a sign-language interpreter. Notetakers summarise what is being said while qualified sign language interpreters with a good understanding of the relevant higher education subject content are in very scarce supply. Real time captioning/transcription is not normally available in UK higher education because of the shortage of real time stenographers. Lectures can be digitally recorded and replayed to provide multimedia revision material for students who attended the class and a substitute learning experience for students unable to attend. Automatic Speech Recognition can provide real time captioning directly from lecturers' speech in classrooms but it is difficult to obtain accuracy comparable to stenography. This paper describes the development of a system that enables editors to correct errors in the captions as they are created by Automatic Speech Recognition

...read moreread less

37 citations

Journal Article•DOI•

Fused GRU with semantic-temporal attention for video captioning

[...]

Lianli Gao¹, Xuanhan Wang¹, Jingkuan Song¹, Yang Liu²•Institutions (2)

University of Electronic Science and Technology of China¹, Tsinghua University²

28 Jun 2020-Neurocomputing

TL;DR: This work proposes an end-to-end pipeline named Fused GRU with Semantic-Temporal Attention (STA-FG), which can explicitly incorporate the high-level visual concepts to the generation of semantic-temporal attention for video captioning.

...read moreread less

37 citations

Posted Content•

Fine-grained Video Classification and Captioning

[...]

Farzaneh Mahdisoltani, Guillaume Berger, Waseem Gharbieh, David J. Fleet, Roland Memisevic - Show less +1 more

24 Apr 2018

TL;DR: A DNN for fine-grained action classification and video captioning that performs much better than the existing classification benchmark for Something-Something, with impressive fine- grained results, and it yields a strong baseline on the new Something- Something captioning task.

...read moreread less

Abstract: We describe a DNN for video classification and captioning, trained end-to-end, with shared features, to solve tasks at different levels of granularity, exploring the link between granularity in a source task and the quality of learned features for transfer learning. For solving the new task domain in transfer learning, we freeze the trained encoder and fine-tune a neural net on the target domain. We train on the Something-Something dataset with over 220, 000 videos, and multiple levels of target granularity, including 50 action groups, 174 fine-grained action categories and captions. Classification and captioning with Something-Something are challenging because of the subtle differences between actions, applied to thousands of different object classes, and the diversity of captions penned by crowd actors. Our model performs better than existing classification baselines for SomethingSomething, with impressive fine-grained results. And it yields a strong baseline on the new Something-Something captioning task. Experiments reveal that training with more fine-grained tasks tends to produce better features for transfer learning.

...read moreread less

37 citations

Proceedings Article•DOI•

Adaptively Attending to Visual Attributes and Linguistic Knowledge for Captioning

[...]

Yi Bin¹, Yang Yang¹, Jie Zhou¹, Zi Huang², Heng Tao Shen¹ - Show less +1 more•Institutions (2)

University of Electronic Science and Technology of China¹, University of Queensland²

19 Oct 2017

TL;DR: This work designs a key control unit, termed visual gate, to adaptively decide "when" and "what" the language generator attend to during the word generation process, and employs a bottom-up workflow to learn a pool of semantic attributes for serving as the propositional attention resources.

...read moreread less

Abstract: Visual content description has been attracting broad research attention in multimedia community because it deeply uncovers intrinsic semantic facet of visual data. Most existing approaches formulate visual captioning as machine translation task (i.e., from vision to language) via a top-down paradigm with global attention, which ignores to distinguish visual and non-visual parts during word generation. In this work, we propose a novel adaptive attention strategy for visual captioning, which can selectively attend to salient visual content based on linguistic knowledge. Specifically, we design a key control unit, termed visual gate, to adaptively decide "when" and "what" the language generator attend to during the word generation process. We map all the preceding outputs of language generator into a latent space to derive the representation of sentence structures, which assists the "visual gate" to choose appropriate attention timing. Meanwhile, we employ a bottom-up workflow to learn a pool of semantic attributes for serving as the propositional attention resources. We evaluate the proposed approach on two commonly-used benchmarks, i.e., MSCOCO and MSVD. The experimental results demonstrate the superiority of our proposed approach compared to several state-of-the-art methods.

...read moreread less

37 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
…
101
102
103
104
105
106
107
…
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

4,575

Papers

96,790

Citations

No. of papers in the topic in previous years
Year	Papers
2023	536
2022	1,030
2021	504
2020	530
2019	448
2018	334

Closed captioning

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics