Home
/
Topics
/
Closed captioning

Topic

Closed captioning

About: Closed captioning is a research topic. Over the lifetime, 3011 publications have been published within this topic receiving 64494 citations. The topic is also known as: CC.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1989
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1977

Papers

PDF

Open Access

More filters

Patent•

Semantic object synchronous understanding for highly interactive interface

[...]

Kuansan Wang¹•Institutions (1)

Microsoft¹

11 May 2004

TL;DR: In this article, a speech input mode dynamically reports partial semantic parses, while audio captioning is still in progress, which is a significant departure from the turn-taking nature of a spoken dialogue.

...read moreread less

Abstract: A method and system provide a speech input mode which dynamically reports partial semantic parses, while audio captioning is still in progress. The semantic parses can be evaluated with an outcome immediately reported back to the user. The net effect is that task conventionally performed in the system turn are now carried out in the midst of the user turn thereby presenting a significant departure from the turn-taking nature of a spoken dialogue.

...read moreread less

174 citations

Posted Content•

End-to-End Dense Video Captioning with Masked Transformer

[...]

Luowei Zhou¹, Yingbo Zhou², Jason J. Corso¹, Richard Socher², Caiming Xiong² - Show less +1 more•Institutions (2)

University of Michigan¹, Salesforce.com²

03 Apr 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work proposes an end-to-end transformer model, which employs a self-attention mechanism, which enables the use of efficient non-recurrent structure during encoding and leads to performance improvements.

...read moreread less

Abstract: Dense video captioning aims to generate text descriptions for all events in an untrimmed video. This involves both detecting and describing events. Therefore, all previous methods on dense video captioning tackle this problem by building two models, i.e. an event proposal and a captioning model, for these two sub-problems. The models are either trained separately or in alternation. This prevents direct influence of the language description to the event proposal, which is important for generating accurate descriptions. To address this problem, we propose an end-to-end transformer model for dense video captioning. The encoder encodes the video into appropriate representations. The proposal decoder decodes from the encoding with different anchors to form video event proposals. The captioning decoder employs a masking network to restrict its attention to the proposal event over the encoding feature. This masking network converts the event proposal to a differentiable mask, which ensures the consistency between the proposal and captioning during training. In addition, our model employs a self-attention mechanism, which enables the use of efficient non-recurrent structure during encoding and leads to performance improvements. We demonstrate the effectiveness of this end-to-end model on ActivityNet Captions and YouCookII datasets, where we achieved 10.12 and 6.58 METEOR score, respectively.

...read moreread less

171 citations

Proceedings Article•DOI•

Learning Multimodal Attention LSTM Networks for Video Captioning

[...]

Jun Xu¹, Ting Yao², Yongdong Zhang¹, Tao Mei•Institutions (2)

University of Science and Technology of China¹, Microsoft²

19 Oct 2017

TL;DR: This work presents a novel deep framework to boost video captioning by learning Multimodal Attention Long-Short Term Memory networks (MA-LSTM), and designs a novel child-sum fusion unit in the MA-L STM to effectively combine different encoded modalities to the initial decoding states.

...read moreread less

Abstract: Automatic generation of video caption is a challenging task as video is an information-intensive media with complex variations. Most existing methods, either based on language templates or sequence learning, have treated video as a flat data sequence while ignoring intrinsic multimodality nature. Observing that different modalities (e.g., frame, motion, and audio streams), as well as the elements within each modality, contribute differently to the sentence generation, we present a novel deep framework to boost video captioning by learning Multimodal Attention Long-Short Term Memory networks (MA-LSTM). Our proposed MA-LSTM fully exploits both multimodal streams and temporal attention to selectively focus on specific elements during the sentence generation. Moreover, we design a novel child-sum fusion unit in the MA-LSTM to effectively combine different encoded modalities to the initial decoding states. Different from existing approaches that employ the same LSTM structure for different modalities, we train modality-specific LSTM to capture the intrinsic representations of individual modalities. The experiments on two benchmark datasets (MSVD and MSR-VTT) show that our MA-LSTM significantly outperforms the state-of-the-art methods with 52.3 BLEU@4 and 70.4 CIDER-D metrics on MSVD dataset, respectively.

...read moreread less

171 citations

Patent•

Interest messaging entertainment system

[...]

David H. Sloo¹•Institutions (1)

Microsoft¹

21 May 2002

TL;DR: In this paper, closed captioning streams of textual data are extracted from video signals received by a client device, and closed-captioning streams may be searched for occurrences of text data in the closed captioned streams that match one or more search terms.

...read moreread less

Abstract: In some implementations, closed captioning streams of textual data are extracted from video signals received by a client device. The closed captioning streams may be searched for occurrences of textual data in the closed captioning streams that match one or more search terms. When the number of matches between the search terms and a particular closed captioning stream exceeds a threshold number, a notification may be sent indicating that content programming determined to be of interest to a viewer has been located and/or the content programming may be recorded.

...read moreread less

166 citations

Journal Article•DOI•

Stimulus-driven and concept-driven analysis for image caption generation

[...]

Songtao Ding, Shiru Qu¹, Yuling Xi¹, Shaohua Wan²•Institutions (2)

Northwestern Polytechnical University¹, Zhongnan University of Economics and Law²

20 Jul 2020-Neurocomputing

TL;DR: This work introduces the theory of attention in psychology to image caption generation with a combination of convolutional neural network over images and long-short term memory network over sentences.

...read moreread less

164 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
…
17
18
19
20
21
22
23
…
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

4,575

Papers

96,790

Citations

No. of papers in the topic in previous years
Year	Papers
2023	536
2022	1,030
2021	504
2020	530
2019	448
2018	334

Closed captioning

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics