Home
/
Authors
/
Qiankun Tang

Author

Qiankun Tang

Bio: Qiankun Tang is an academic researcher from Chinese Academy of Sciences. The author has contributed to research in topics: Frame (networking) & Motion blur. The author has an hindex of 4, co-authored 7 publications receiving 83 citations.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Exploring Spatial-Temporal Multi-Frequency Analysis for High-Fidelity and Temporal-Consistency Video Prediction

[...]

Beibei Jin¹, Yu Hu¹, Qiankun Tang¹, Jingyu Niu¹, Zhiping Shi², Yinhe Han¹, Xiaowei Li¹ - Show less +3 more•Institutions (2)

Chinese Academy of Sciences¹, Capital Normal University²

14 Jun 2020

TL;DR: Wang et al. as mentioned in this paper proposed a video prediction network based on multi-level wavelet analysis to uniformly deal with spatial and temporal information, which decomposes each video frame into anisotropic sub-bands with multiple frequencies.

...read moreread less

Abstract: Video prediction is a pixel-wise dense prediction task to infer future frames based on past frames. Missing appearance details and motion blur are still two major problems for current models, leading to image distortion and temporal inconsistency. We point out the necessity of exploring multi-frequency analysis to deal with the two problems. Inspired by the frequency band decomposition characteristic of Human Vision System (HVS), we propose a video prediction network based on multi-level wavelet analysis to uniformly deal with spatial and temporal information. Specifically, multi-level spatial discrete wavelet transform decomposes each video frame into anisotropic sub-bands with multiple frequencies, helping to enrich structural information and reserve fine details. On the other hand, multilevel temporal discrete wavelet transform which operates on time axis decomposes the frame sequence into sub-band groups of different frequencies to accurately capture multifrequency motions under a fixed frame rate. Extensive experiments on diverse datasets demonstrate that our model shows significant improvements on fidelity and temporal consistency over the state-of-the-art works. Source code and videos are available at https://github.com/Bei-Jin/STMFANet.

...read moreread less

75 citations

Proceedings Article•

See and Think: Disentangling Semantic Scene Completion

[...]

Shice Liu¹, Yu Hu², Zeng Yiming¹, Qiankun Tang¹, Beibei Jin¹, Yinhe Han¹, Xiaowei Li¹ - Show less +3 more•Institutions (2)

Chinese Academy of Sciences¹, Huazhong University of Science and Technology²

01 Jan 2018

TL;DR: Experimental results show that regardless of inputing a single depth or RGB-D, the proposed disentangled framework can generate high-quality semantic scene completion, and outperforms state-of-the-art approaches on both synthetic and real datasets.

...read moreread less

Abstract: Semantic scene completion predicts volumetric occupancy and object category of a 3D scene, which helps intelligent agents to understand and interact with the surroundings. In this work, we propose a disentangled framework, sequentially carrying out 2D semantic segmentation, 2D-3D reprojection and 3D semantic scene completion. This three-stage framework has three advantages: (1) explicit semantic segmentation significantly boosts performance; (2) flexible fusion ways of sensor data bring good extensibility; (3) progress in any subtask will promote the holistic performance. Experimental results show that regardless of inputing a single depth or RGB-D, our framework can generate high-quality semantic scene completion, and outperforms state-of-the-art approaches on both synthetic and real datasets.

...read moreread less

71 citations

Proceedings Article•DOI•

VarNet: Exploring Variations for Unsupervised Video Prediction

[...]

Beibei Jin¹, Yu Hu¹, Zeng Yiming¹, Qiankun Tang¹, Shice Liu¹, Jing Ye¹ - Show less +2 more•Institutions (1)

Chinese Academy of Sciences¹

01 Oct 2018

TL;DR: The VarNet is presented to directly predict the variations between adjacent frames which are then fused with current frame to generate the future frame and an adaptively re-weighting mechanism for loss function to offer each pixel a fair weight according to the amplitude of its variation.

...read moreread less

Abstract: Unsupervised video prediction is a very challenging task due to the complexity and diversity in natural scenes. Prior works directly predicting pixels or optical flows either have the blurring problem or require additional assumptions. We highlight that the crux for video frame prediction lies in precisely capturing the inter-frame variations which encompass the movement of objects and the evolution of the surrounding environment. We then present an unsupervised video prediction framework — Variation Network (VarNet) to directly predict the variations between adjacent frames which are then fused with current frame to generate the future frame. In addition, we propose an adaptively re-weighting mechanism for loss function to offer each pixel a fair weight according to the amplitude of its variation. Extensive experiments for both short-term and long-term video prediction are implemented on two advanced datasets — KTH and KITTI with two evaluating metrics — PSNR and SSIM. For the KTH dataset, the VarNet outperforms the state-of-the-art works up to 11.9% on PSNR and 9.5% on SSIM. As for the KITTI dataset, the performance boosts are up to 55.1 % on PSNR and 15.9% on SSIM. Moreover, we verify that the generalization ability of our model excels other state-of-the-art methods by testing on the unseen CalTech Pedestrian dataset after being trained on the KITTI dataset. Source code and video are available at

...read moreread less

23 citations

Proceedings Article•DOI•

Lightdet: A Lightweight and Accurate Object Detection Network

[...]

Qiankun Tang¹, Jie Li¹, Zhiping Shi², Yu Hu¹•Institutions (2)

Chinese Academy of Sciences¹, Capital Normal University²

04 May 2020

TL;DR: A lightweight backbone that is able to capture rich low-level features by the proposed Detail-Preserving Module to effectively aggregate bottom and top-down features, and introduces an efficient Feature- Preserving and Refinement Module to further reduce the entire network complexity.

...read moreread less

Abstract: The extensive computational burden limits the usage of accurate but complex object detectors in resource-bounded scenarios. In this paper, we present a lightweight object detector, named LightDet, to address this dilemma. We design a lightweight backbone that is able to capture rich low-level features by the proposed Detail-Preserving Module. To effectively aggregate bottom and top-down features, we introduce an efficient Feature-Preserving and Refinement Module. A lightweight prediction head is employed to further reduce the entire network complexity. Experimental results show that our LightDet achieves 75.5% mAP on PASCAL VOC 2007 at the speed of 250 FPS and 24.0% mAP on MS COCO dataset.

...read moreread less

10 citations

Posted Content•

Exploring Spatial-Temporal Multi-Frequency Analysis for High-Fidelity and Temporal-Consistency Video Prediction

[...]

Beibei Jin¹, Yu Hu¹, Qiankun Tang¹, Jingyu Niu¹, Zhiping Shi², Yinhe Han¹, Xiaowei Li¹ - Show less +3 more•Institutions (2)

Chinese Academy of Sciences¹, Capital Normal University²

23 Feb 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: A video prediction network based on multi-level wavelet analysis to uniformly deal with spatial and temporal information is proposed and shows significant improvements on fidelity and temporal consistency over the state-of-the-art works.

...read moreread less

Abstract: Video prediction is a pixel-wise dense prediction task to infer future frames based on past frames. Missing appearance details and motion blur are still two major problems for current predictive models, which lead to image distortion and temporal inconsistency. In this paper, we point out the necessity of exploring multi-frequency analysis to deal with the two problems. Inspired by the frequency band decomposition characteristic of Human Vision System (HVS), we propose a video prediction network based on multi-level wavelet analysis to deal with spatial and temporal information in a unified manner. Specifically, the multi-level spatial discrete wavelet transform decomposes each video frame into anisotropic sub-bands with multiple frequencies, helping to enrich structural information and reserve fine details. On the other hand, multi-level temporal discrete wavelet transform which operates on time axis decomposes the frame sequence into sub-band groups of different frequencies to accurately capture multi-frequency motions under a fixed frame rate. Extensive experiments on diverse datasets demonstrate that our model shows significant improvements on fidelity and temporal consistency over state-of-the-art works.

...read moreread less

7 citations

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences

[...]

Jens Behley¹, Martin Garbade¹, Andres Milioto¹, Jan Quenzel¹, Sven Behnke¹, Cyrill Stachniss¹, Juergen Gall¹ - Show less +3 more•Institutions (1)

University of Bonn¹

01 Oct 2019

TL;DR: In this paper, the KITTI Vision Odometry Benchmark was used to provide dense point-wise annotations for the complete 360-degree field-of-view of the employed automotive LiDAR.

...read moreread less

Abstract: Semantic scene understanding is important for various applications. In particular, self-driving cars need a fine-grained understanding of the surfaces and objects in their vicinity. Light detection and ranging (LiDAR) provides precise geometric information about the environment and is thus a part of the sensor suites of almost all self-driving cars. Despite the relevance of semantic scene understanding for this application, there is a lack of a large dataset for this task which is based on an automotive LiDAR. In this paper, we introduce a large dataset to propel research on laser-based semantic segmentation. We annotated all sequences of the KITTI Vision Odometry Benchmark and provide dense point-wise annotations for the complete 360-degree field-of-view of the employed automotive LiDAR. We propose three benchmark tasks based on this dataset: (i) semantic segmentation of point clouds using a single scan, (ii) semantic segmentation using multiple past scans, and (iii) semantic scene completion, which requires to anticipate the semantic scene in the future. We provide baseline experiments and show that there is a need for more sophisticated models to efficiently tackle these tasks. Our dataset opens the door for the development of more advanced methods, but also provides plentiful data to investigate new research directions.

...read moreread less

669 citations

Posted Content•

SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences

[...]

Jens Behley¹, Martin Garbade¹, Andres Milioto¹, Jan Quenzel¹, Sven Behnke¹, Cyrill Stachniss¹, Juergen Gall¹ - Show less +3 more•Institutions (1)

University of Bonn¹

02 Apr 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: A large dataset to propel research on laser-based semantic segmentation, which opens the door for the development of more advanced methods, but also provides plentiful data to investigate new research directions.

...read moreread less

Abstract: Semantic scene understanding is important for various applications. In particular, self-driving cars need a fine-grained understanding of the surfaces and objects in their vicinity. Light detection and ranging (LiDAR) provides precise geometric information about the environment and is thus a part of the sensor suites of almost all self-driving cars. Despite the relevance of semantic scene understanding for this application, there is a lack of a large dataset for this task which is based on an automotive LiDAR. In this paper, we introduce a large dataset to propel research on laser-based semantic segmentation. We annotated all sequences of the KITTI Vision Odometry Benchmark and provide dense point-wise annotations for the complete $360^{o}$ field-of-view of the employed automotive LiDAR. We propose three benchmark tasks based on this dataset: (i) semantic segmentation of point clouds using a single scan, (ii) semantic segmentation using multiple past scans, and (iii) semantic scene completion, which requires to anticipate the semantic scene in the future. We provide baseline experiments and show that there is a need for more sophisticated models to efficiently tackle these tasks. Our dataset opens the door for the development of more advanced methods, but also provides plentiful data to investigate new research directions.

...read moreread less

532 citations

Journal Article•DOI•

A Review on Deep Learning Techniques for Video Prediction

[...]

Sergiu Oprea¹, Pablo Martinez-Gonzalez¹, Alberto Garcia-Garcia¹, John Alejandro Castro-Vargas¹, Sergio Orts-Escolano¹, Jose Garcia-Rodriguez¹, Antonis A. Argyros - Show less +3 more•Institutions (1)

University of Alicante¹

15 Dec 2020-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: In this article, the authors provide a review on the deep learning methods for prediction in video sequences, as well as mandatory background concepts and the most used datasets, and carefully analyze existing video prediction models organized according to a proposed taxonomy.

...read moreread less

Abstract: The ability to predict, anticipate and reason about future outcomes is a key component of intelligent decision-making systems. In light of the success of deep learning in computer vision, deep-learning-based video prediction emerged as a promising research direction. Defined as a self-supervised learning task, video prediction represents a suitable framework for representation learning, as it demonstrated potential capabilities for extracting meaningful representations of the underlying patterns in natural videos. Motivated by the increasing interest in this task, we provide a review on the deep learning methods for prediction in video sequences. We firstly define the video prediction fundamentals, as well as mandatory background concepts and the most used datasets. Next, we carefully analyze existing video prediction models organized according to a proposed taxonomy, highlighting their contributions and their significance in the field. The summary of the datasets and methods is accompanied with experimental results that facilitate the assessment of the state of the art on a quantitative basis. The paper is summarized by drawing some general conclusions, identifying open research challenges and by pointing out future research directions.

...read moreread less

141 citations

Proceedings Article•DOI•

Exploring Spatial-Temporal Multi-Frequency Analysis for High-Fidelity and Temporal-Consistency Video Prediction

[...]

Beibei Jin¹, Yu Hu¹, Qiankun Tang¹, Jingyu Niu¹, Zhiping Shi², Yinhe Han¹, Xiaowei Li¹ - Show less +3 more•Institutions (2)

Chinese Academy of Sciences¹, Capital Normal University²

14 Jun 2020

...read moreread less

75 citations

Proceedings Article•DOI•

Video Prediction Recalling Long-term Motion Context via Memory Alignment Learning

[...]

Sangmin Lee¹, Hak Gu Kim², Dae Hwi Choi¹, Hyung-Il Kim¹, Yong Man Ro¹ - Show less +1 more•Institutions (2)

KAIST¹, École Polytechnique Fédérale de Lausanne²

20 Jun 2021

TL;DR: Wang et al. as discussed by the authors proposed a long-term motion context memory (LMC-Memory) with memory alignment learning, which enables to store longterm motion contexts into the memory and to match them with sequences including limited dynamics.

...read moreread less

Abstract: Our work addresses long-term motion context issues for predicting future frames. To predict the future precisely, it is required to capture which long-term motion context (e.g., walking or running) the input motion (e.g., leg movement) belongs to. The bottlenecks arising when dealing with the long-term motion context are: (i) how to predict the long-term motion context naturally matching input sequences with limited dynamics, (ii) how to predict the long-term motion context with high-dimensionality (e.g., complex motion). To address the issues, we propose novel motion context-aware video prediction. To solve the bottle-neck (i), we introduce a long-term motion context memory (LMC-Memory) with memory alignment learning. The pro-posed memory alignment learning enables to store long-term motion contexts into the memory and to match them with sequences including limited dynamics. As a result, the long-term context can be recalled from the limited in-put sequence. In addition, to resolve the bottleneck (ii), we propose memory query decomposition to store local motion context (i.e., low-dimensional dynamics) and recall the suitable local context for each local part of the input individually. It enables to boost the alignment effects of the memory. Experimental results show that the proposed method outperforms other sophisticated RNN-based methods, especially in long-term condition. Further, we validate the effectiveness of the proposed network designs by conducting ablation studies and memory feature analysis. The source code of this work is available†.

...read moreread less

68 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34

Collapse