J
Jun Xiao
Researcher at Zhejiang University
Publications - 168
Citations - 7135
Jun Xiao is an academic researcher from Zhejiang University. The author has contributed to research in topics: Computer science & Question answering. The author has an hindex of 27, co-authored 139 publications receiving 4438 citations. Previous affiliations of Jun Xiao include Nanyang Technological University.
Papers
More filters
Proceedings ArticleDOI
SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning
TL;DR: This paper introduces a novel convolutional neural network dubbed SCA-CNN that incorporates Spatial and Channel-wise Attentions in a CNN that significantly outperforms state-of-the-art visual attention-based image captioning methods.
Posted Content
SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning
TL;DR: SCA-CNN as mentioned in this paper incorporates spatial and channel-wise attentions in a CNN to dynamically modulate the sentence generation context in multi-layer feature maps, encoding where attentive spatial locations at multiple layers and what (i.e., attentive channels) the visual attention is.
Proceedings ArticleDOI
Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks
TL;DR: In this article, an attentional factorization machine (AFM) is proposed to learn the importance of each feature interaction from data via a neural attention network, which outperforms Wide&Deep and DeepCross with a much simpler structure and fewer model parameters.
Proceedings ArticleDOI
Self-Supervised Spatiotemporal Learning via Video Clip Order Prediction
TL;DR: A self-supervised spatiotemporal learning technique which leverages the chronological order of videos to learn the spatiotmporal representation of the video by predicting the order of shuffled clips from the video.
Proceedings ArticleDOI
Video Question Answering via Gradually Refined Attention over Appearance and Motion
TL;DR: This paper proposes an end-to-end model which gradually refines its attention over the appearance and motion features of the video using the question as guidance and demonstrates the effectiveness of the model by analyzing the refined attention weights during the question answering procedure.