scispace - formally typeset
X

Xiangpeng Li

Researcher at University of Electronic Science and Technology of China

Publications -  17
Citations -  942

Xiangpeng Li is an academic researcher from University of Electronic Science and Technology of China. The author has contributed to research in topics: Computer science & Question answering. The author has an hindex of 8, co-authored 13 publications receiving 612 citations.

Papers
More filters
Journal ArticleDOI

Beyond RNNs: Positional Self-Attention with Co-Attention for Video Question Answering

TL;DR: This work proposes a new architecture, Positional Self-Attention with Coattention (PSAC), which does not require RNNs for video question answering and significantly outperforms the state-of-the-art on three tasks and attains comparable result on the Count task.
Journal ArticleDOI

Self-Supervised Video Hashing with Hierarchical Binary Auto-encoder

TL;DR: This paper proposes a novel unsupervised video hashing framework dubbed SSVH, which is able to capture the temporal nature of videos in an end-to-end learning to hash fashion, and designs a hierarchical binary auto-encoder to model the temporal dependencies in videos with multiple granularities.
Journal ArticleDOI

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

TL;DR: A hierarchical LSTM with adaptive attention (hLSTMat) approach for image and video captioning that utilizes the spatial or temporal attention for selecting specific regions or frames to predict the related words, while the adaptive attention is for deciding whether to depend on the visual information or the language context information.
Journal ArticleDOI

Self-Supervised Video Hashing With Hierarchical Binary Auto-Encoder

TL;DR: Self-supervised video hashing (SSVH) as discussed by the authors proposes a hierarchical binary auto-encoder to model the temporal dependencies in videos with multiple granularities, and embed the videos into binary codes with less computations than the stacked architecture.
Proceedings ArticleDOI

Learnable Aggregating Net with Diversity Learning for Video Question Answering

TL;DR: A novel architecture, namely Learnable Aggregating Net with Diversity learning (LAD-Net), for V-VQA, which automatically aggregates adaptively-weighted frame-level features to extract rich video (or question) context semantic information by imitating Bags-of-Words (BoW) quantization.