S
Shaoxiang Chen
Researcher at Fudan University
Publications - 20
Citations - 653
Shaoxiang Chen is an academic researcher from Fudan University. The author has contributed to research in topics: Closed captioning & Sentence. The author has an hindex of 9, co-authored 19 publications receiving 350 citations.
Papers
More filters
Journal ArticleDOI
Semantic Proposal for Activity Localization in Videos via Sentence Query
Shaoxiang Chen,Yu-Gang Jiang +1 more
TL;DR: This paper proposes a novel Semantic Activity Proposal (SAP) which integrates the semantic information of sentence queries into the proposal generation process to get discriminative activity proposals and evaluates the algorithm on the TACoS dataset and the Charades-STA dataset.
Journal ArticleDOI
Motion Guided Spatial Attention for Video Captioning
Shaoxiang Chen,Yu-Gang Jiang +1 more
TL;DR: The proposed MGSA exploits the motion between video frames by learning spatial attention from stacked optical flow images with a custom CNN and designed a Gated Attention Recurrent Unit (GARU) to adaptively incorporate previous attention maps.
Proceedings ArticleDOI
Black-box Adversarial Attacks on Video Recognition Models
TL;DR: In this paper, the authors proposed the first black-box video attack framework, called V-BAD, which is equivalent to estimating the projection of the adversarial gradient on a selected subspace.
Book ChapterDOI
Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos
TL;DR: In this paper, pairwise modality interactions between modalities are explored to better exploit complementary information for each pair of modalities in videos and thus improve performances on both event captioning and temporal sentence localization tasks.
Book ChapterDOI
Hierarchical Visual-Textual Graph for Temporal Activity Localization via Language
Shaoxiang Chen,Yu-Gang Jiang +1 more
TL;DR: A novel TALL method is proposed which builds a Hierarchical Visual-Textual Graph to model interactions between the objects and words as well as among the objects to jointly understand the video contents and the language.