scispace - formally typeset
J

Jun Xiao

Researcher at Zhejiang University

Publications -  168
Citations -  7135

Jun Xiao is an academic researcher from Zhejiang University. The author has contributed to research in topics: Computer science & Question answering. The author has an hindex of 27, co-authored 139 publications receiving 4438 citations. Previous affiliations of Jun Xiao include Nanyang Technological University.

Papers
More filters
Proceedings ArticleDOI

SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning

TL;DR: This paper introduces a novel convolutional neural network dubbed SCA-CNN that incorporates Spatial and Channel-wise Attentions in a CNN that significantly outperforms state-of-the-art visual attention-based image captioning methods.
Posted Content

SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning

TL;DR: SCA-CNN as mentioned in this paper incorporates spatial and channel-wise attentions in a CNN to dynamically modulate the sentence generation context in multi-layer feature maps, encoding where attentive spatial locations at multiple layers and what (i.e., attentive channels) the visual attention is.
Proceedings ArticleDOI

Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks

TL;DR: In this article, an attentional factorization machine (AFM) is proposed to learn the importance of each feature interaction from data via a neural attention network, which outperforms Wide&Deep and DeepCross with a much simpler structure and fewer model parameters.
Proceedings ArticleDOI

Self-Supervised Spatiotemporal Learning via Video Clip Order Prediction

TL;DR: A self-supervised spatiotemporal learning technique which leverages the chronological order of videos to learn the spatiotmporal representation of the video by predicting the order of shuffled clips from the video.
Proceedings ArticleDOI

Video Question Answering via Gradually Refined Attention over Appearance and Motion

TL;DR: This paper proposes an end-to-end model which gradually refines its attention over the appearance and motion features of the video using the question as guidance and demonstrates the effectiveness of the model by analyzing the refined attention weights during the question answering procedure.