Home
/
Authors
/
Jingkai Zhou

Author

Jingkai Zhou

Other affiliations: Alibaba Group

Bio: Jingkai Zhou is an academic researcher from South China University of Technology. The author has contributed to research in topics: Object detection & Drone. The author has an hindex of 5, co-authored 13 publications receiving 160 citations. Previous affiliations of Jingkai Zhou include Alibaba Group.

Topics: Object detection, Drone, Filter (signal processing), Upsampling, Convolution ...read more

Papers

PDF

Open Access

More filters

Book Chapter•DOI•

VisDrone-DET2018: The Vision Meets Drone Object Detection in Image Challenge Results

[...]

Pengfei Zhu¹, Longyin Wen, Dawei Du², Xiao Bian³, Haibin Ling⁴, Qinghua Hu¹, Qinqin Nie¹, Hao Cheng¹, Chenfeng Liu¹, Xiaoyu Liu¹, Wenya Ma¹, Haotian Wu¹, Lianjie Wang¹, Arne Schumann, Chase Brown⁵, Chen Qian⁶, Chengzheng Li⁷, Dongdong Li⁸, Emmanouil Michail, Fan Zhang⁹, Feng Ni¹⁰, Feng Zhu¹⁰, Guanghui Wang¹¹, Haipeng Zhang¹², Han Deng¹³, Hao Liu⁸, Haoran Wang⁹, Heqian Qiu¹⁴, Honggang Qi¹⁵, Honghui Shi, Hongliang Li¹⁴, Hongyu Xu¹⁶, Hu Lin¹⁷, Ioannis Kompatsiaris, Jian Cheng¹⁵, Jianqiang Wang¹⁸, Jianxiu Yang⁹, Jingkai Zhou¹⁷, Juanping Zhao⁶, K J Joseph¹⁹, Kaiwen Duan¹⁵, Karthik Suresh⁵, Bo Ke²⁰, Ke Wang⁹, Konstantinos Avgerinakis, Lars Sommer, Lei Zhang²¹, Li Yang⁹, Lin Cheng⁹, Lin Ma²², Liyu Lu¹, Lu Ding⁶, Minyu Huang²³, Naveen Kumar Vedurupaka²⁴, Nehal Mamgain¹⁹, Nitin Bansal⁵, Oliver Acatay, Panagiotis Giannakeris, Qian Wang⁹, Qijie Zhao¹⁰, Qingming Huang¹⁵, Qiong Liu¹⁷, Qishang Cheng¹⁴, Qiuchen Sun⁹, Robert Laganiere²⁵, Sheng Jiang⁹, Shengjin Wang¹⁸, Shubo Wei⁹, Siwei Wang⁹, Stefanos Vrochidis, Sujuan Wang¹⁵, Tiaojio Lee¹³, Usman Sajid¹¹, Vineeth N Balasubramanian¹⁹, Wei Li¹⁴, Wei Zhang¹³, Weikun Wu²³, Wenchi Ma¹¹, Wenrui He¹⁰, Wenzhe Yang⁹, Xiaoyu Chen¹⁴, Xin Sun²⁶, Xinbin Luo⁶, Xintao Lian⁹, Xiufang Li⁹, Yangliu Kuai⁸, Yali Li¹⁸, Yi Luo¹⁷, Yifan Zhang¹⁵, Yiling Liu²⁷, Ying Li²⁷, Yong Wang²⁵, Yongtao Wang¹⁰, Yuanwei Wu¹¹, Yue Fan¹³, Yunchao Wei²⁸, Yuqin Zhang²³, Zexin Wang⁹, Zhangyang Wang⁵, Zhaoyue Xia¹⁸, Zhen Cui⁷, Zhenwei He²¹, Zhipeng Deng⁸, Zhiyao Guo²³, Zichen Song¹⁴ - Show less +101 more•Institutions (28)

Tianjin University¹, University at Albany, SUNY², General Electric³, Temple University⁴, Texas A&M University⁵, Shanghai Jiao Tong University⁶, Nanjing University of Science and Technology⁷, National University of Defense Technology⁸, Xidian University⁹, Peking University¹⁰, University of Kansas¹¹, Jiangnan University¹², Shandong University¹³, University of Electronic Science and Technology of China¹⁴, Chinese Academy of Sciences¹⁵, University of Maryland, College Park¹⁶, South China University of Technology¹⁷, Tsinghua University¹⁸, Indian Institute of Technology, Hyderabad¹⁹, Sun Yat-sen University²⁰, Chongqing University²¹, Tencent²², Xiamen University²³, National Institute of Technology, Tiruchirappalli²⁴, University of Ottawa²⁵, Ocean University of China²⁶, Northwestern Polytechnical University²⁷, University of Illinois at Urbana–Champaign²⁸

08 Sep 2018

TL;DR: A large-scale drone-based dataset, including 8, 599 images with rich annotations, including object bounding boxes, object categories, occlusion, truncation ratios, etc, is released, to narrow the gap between current object detection performance and the real-world requirements.

...read moreread less

Abstract: Object detection is a hot topic with various applications in computer vision, e.g., image understanding, autonomous driving, and video surveillance. Much of the progresses have been driven by the availability of object detection benchmark datasets, including PASCAL VOC, ImageNet, and MS COCO. However, object detection on the drone platform is still a challenging task, due to various factors such as view point change, occlusion, and scales. To narrow the gap between current object detection performance and the real-world requirements, we organized the Vision Meets Drone (VisDrone2018) Object Detection in Image challenge in conjunction with the 15th European Conference on Computer Vision (ECCV 2018). Specifically, we release a large-scale drone-based dataset, including 8, 599 images (6, 471 for training, 548 for validation, and 1, 580 for testing) with rich annotations, including object bounding boxes, object categories, occlusion, truncation ratios, etc. Featuring a diverse real-world scenarios, the dataset was collected using various drone models, in different scenarios (across 14 different cities spanned over thousands of kilometres), and under various weather and lighting conditions. We mainly focus on ten object categories in object detection, i.e., pedestrian, person, car, van, bus, truck, motor, bicycle, awning-tricycle, and tricycle. Some rarely occurring special vehicles (e.g., machineshop truck, forklift truck, and tanker) are ignored in evaluation. The dataset is extremely challenging due to various factors, including large scale and pose variations, occlusion, and clutter background. We present the evaluation protocol of the VisDrone-DET2018 challenge and the comparison results of 38 detectors on the released dataset, which are publicly available on the challenge website: http://www.aiskyeye.com/. We expect the challenge to largely boost the research and development in object detection in images on drone platforms.

...read moreread less

146 citations

Proceedings Article•DOI•

Decoupled Dynamic Filter Networks

[...]

Jingkai Zhou¹, Varun Jampani², Zhixiong Pi³, Qiong Liu¹, Ming-Hsuan Yang³ - Show less +1 more•Institutions (3)

South China University of Technology¹, Google², University of California, Merced³

20 Jun 2021

Abstract: Convolution is one of the basic building blocks of CNN architectures. Despite its common use, standard convolution has two main shortcomings: Content-agnostic and Computation-heavy. Dynamic filters are content-adaptive, while further increasing the computational overhead. Depth-wise convolution is a lightweight variant, but it usually leads to a drop in CNN performance or requires a larger number of channels. In this work, we propose the Decoupled Dynamic Filter (DDF) that can simultaneously tackle both of these shortcomings. Inspired by recent advances in attention, DDF decouples a depth-wise dynamic filter into spatial and channel dynamic filters. This decomposition considerably reduces the number of parameters and limits computational costs to the same level as depth-wise convolution. Meanwhile, we observe a significant boost in performance when replacing standard convolution with DDF in classification networks. ResNet50 / 101 get improved by 1.9% and 1.3% on the top-1 accuracy, while their computational costs are reduced by nearly half. Experiments on the detection and joint upsampling networks also demonstrate the superior performance of the DDF upsampling variant (DDF-Up) in comparison with standard convolution and specialized content-adaptive layers. The project page with code is available 1.

...read moreread less

60 citations

Proceedings Article•DOI•

VisDrone-DET2019: The Vision Meets Drone Object Detection in Image Challenge Results

[...]

Dawei Du¹, Yue Zhang, Zexin Wang², Zhikang Wang, Zichen Song³, Ziming Liu⁴, Liefeng Bo, Hailin Shi, Rui Zhu, Aashish Kumar⁵, Aijin Li², Almaz Zinollayev, Anuar Askergaliyev, Arne Schumann, Binjie Mao⁶, Pengfei Zhu⁷, Byeongwon Lee⁸, Chang Liu⁹, Changrui Chen¹⁰, Chunhong Pan⁶, Chunlei Huo⁶, Da Yu¹¹, DeChun Cong¹², Dening Zeng², Dheeraj Reddy Pailla¹³, Di Li², Longyin Wen, Dong Wang⁹, Donghyeon Cho, Dongyu Zhang¹⁴, Furui Bai, George Jose⁵, Guangyu Gao⁴, Guizhong Liu¹⁵, Haitao Xiong¹⁶, Hao Qi¹⁵, Haoran Wang², Xiao Bian¹⁷, Heqian Qiu³, Hongliang Li³, Huchuan Lu⁹, Ildoo Kim, Jaekyum Kim¹⁸, Jane Shen, Jihoon Lee, Jing Ge⁴, Jingjing Xu¹², Jingkai Zhou¹⁶, Haibin Lin¹⁹, Jonas Meier, Jun Won Choi¹⁸, Junhao Hu²⁰, Junyi Zhang¹⁴, Junying Huang¹⁴, Kaiqi Huang⁶, Keyang Wang²¹, Lars Sommer, Lei Jin²⁰, Lei Zhang²¹, Qinghua Hu⁷, Lianghua Huang⁶, Lin Sun⁵, Lucas Steinmann, Meixia Jia², Nuo Xu⁶, Pengyi Zhang⁴, Qiang Chen⁶, Qingxuan Lv¹⁰, Qiong Liu¹⁶, Qishang Cheng³, Tao Peng⁷, Sai Saketh Chennamsetty¹³, Shuhao Chen⁹, Shuo Wei¹⁰, Srinivas S S Kruthiventi⁵, Sungeun Hong, Sungil Kang, Tong Wu⁴, Tuo Feng², Varghese Alex Kollerathu¹³, Wanqi Li¹⁵, Jiayu Zheng⁷, Wei Dai, Weida Qin¹⁶, Weiyang Wang, Xiaorui Wang¹⁰, Xiaoyu Chen³, Xin Chen⁹, Xin Sun¹⁰, Xin Zhang⁶, Xin Zhao⁶, Xindi Zhang²², Xinyao Wang, Xinyu Zhang⁹, Xuankun Chen¹⁴, Xudong Wei¹⁵, Xuzhang Zhang²³, Yanchao Li, Yifu Chen¹¹, Yu Heng Toh, Yu Zhang¹⁰, Yu Zhu⁶, Yunxin Zhong⁴ - Show less +99 more•Institutions (23)

University at Albany, SUNY¹, Xidian University², University of Electronic Science and Technology of China³, Beijing Institute of Technology⁴, Samsung⁵, Chinese Academy of Sciences⁶, Tianjin University⁷, SK Telecom⁸, Dalian University of Technology⁹, Ocean University of China¹⁰, Harbin Institute of Technology¹¹, Nanjing University of Posts and Telecommunications¹², Siemens¹³, Sun Yat-sen University¹⁴, Xi'an Jiaotong University¹⁵, South China University of Technology¹⁶, General Electric¹⁷, Hanyang University¹⁸, Stony Brook University¹⁹, ShanghaiTech University²⁰, Chongqing University²¹, Queen Mary University of London²², Huazhong University of Science and Technology²³

01 Oct 2019

TL;DR: The Vision Meets Drone Object Detection in Image Challenge (VME-DET 2019) as discussed by the authors, held in conjunction with the 17th International Conference on Computer Vision (ICCV 2019), focuses on image object detection on drones.

...read moreread less

Abstract: Recently, automatic visual data understanding from drone platforms becomes highly demanding. To facilitate the study, the Vision Meets Drone Object Detection in Image Challenge is held the second time in conjunction with the 17-th International Conference on Computer Vision (ICCV 2019), focuses on image object detection on drones. Results of 33 object detection algorithms are presented. For each participating detector, a short description is provided in the appendix. Our goal is to advance the state-of-the-art detection algorithms and provide a comprehensive evaluation platform for them. The evaluation protocol of the VisDrone-DET2019 Challenge and the comparison results of all the submitted detectors on the released dataset are publicly available at the website: http: //www.aiskyeye.com/. The results demonstrate that there still remains a large room for improvement for object detection algorithms on drones.

...read moreread less

57 citations

Proceedings Article•DOI•

VisDrone-VID2019: The Vision Meets Drone Object Detection in Video Challenge Results

[...]

Pengfei Zhu¹, Yue Zhang, Liefeng Bo, Hailin Shi, Rui Zhu, Bing Dong, Dheeraj Reddy Pailla², Feng Ni, Guangyu Gao³, Guizhong Liu⁴, Haitao Xiong⁵, Dawei Du⁶, Jing Ge³, Jingkai Zhou⁵, Jinrong Hu⁷, Lin Sun⁸, Long Chen⁷, Martin Lauer⁹, Qiong Liu⁵, Sai Saketh Chennamsetty², Ting Sun⁴, Tong Wu³, Longyin Wen, Varghese Alex Kollerathu², Wei Tian⁹, Weida Qin⁵, Xier Chen¹⁰, Xingjie Zhao⁴, Yanchao Lian¹⁰, Yinan Wu¹⁰, Ying Li¹¹, Yingping Li¹⁰, Yiwen Wang¹¹, Xiao Bian¹², Yuduo Song⁹, Yuehan Yao, Yunfeng Zhang¹¹, Zhaoliang Pi¹⁰, Zhaotang Chen⁷, Zhenyu Xu, Zhibin Xiao¹³, Zhipeng Luo, Ziming Liu³, Haibin Ling¹⁴, Qinghua Hu¹, Tao Peng¹, Jiayu Zheng¹, Xinyao Wang - Show less +44 more•Institutions (14)

Tianjin University¹, Siemens², Beijing Institute of Technology³, Xi'an Jiaotong University⁴, South China University of Technology⁵, University at Albany, SUNY⁶, Sun Yat-sen University⁷, Samsung⁸, Karlsruhe Institute of Technology⁹, Xidian University¹⁰, Northwestern Polytechnical University¹¹, General Electric¹², Tsinghua University¹³, Stony Brook University¹⁴

01 Oct 2019

TL;DR: The goal is to advance the state-of-the-art detection algorithms and provide a comprehensive evaluation platform for them and demonstrate that there still remains a large room for improvement for object detection algorithms on drones.

...read moreread less

Abstract: Video object detection has drawn great attention recently. The Vision Meets Drone Object Detection in Video Challenge 2019 (VisDrone-VID2019) is held to advance the state-of-the-art in video object detection for videos captured by drones. Specifically, there are 13 teams participating the challenge. We also report the results of 6 state-of-the-art detectors on the collected dataset. A short description is provided in the appendix for each participating detector. We present the analysis and discussion of the challenge results. Both the dataset and the challenge results are publicly available at the challenge website: http://www.aiskyeye.com/.

...read moreread less

51 citations

Journal Article•DOI•

Benchmarking a large-scale FIR dataset for on-road pedestrian detection

[...]

Zhewei Xu¹, Jiajun Zhuang², Qiong Liu¹, Jingkai Zhou¹, Shaowu Peng¹ - Show less +1 more•Institutions (2)

South China University of Technology¹, Zhongkai University of Agriculture and Engineering²

01 Jan 2019-Infrared Physics & Technology

TL;DR: A nighttime FIR pedestrian dataset with the largest scale at present is introduced in this paper, which is called SCUT (South China University of Technology) dataset and shows that convolutional neural networks (CNN) based detectors obtained good performance on FIR image.

...read moreread less

41 citations

1
2
3
4
…

Cited by

PDF

Open Access

More filters

Posted Content•

Deep High-Resolution Representation Learning for Visual Recognition

[...]

Jingdong Wang¹, Ke Sun², Tianheng Cheng³, Borui Jiang⁴, Chaorui Deng⁵, Yang Zhao⁶, Dong Liu², Yadong Mu⁴, Mingkui Tan⁵, Xinggang Wang³, Wenyu Liu³, Bin Xiao¹ - Show less +8 more•Institutions (6)

Microsoft¹, University of Science and Technology of China², Huazhong University of Science and Technology³, Peking University⁴, South China University of Technology⁵, Griffith University⁶

20 Aug 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: The superiority of the proposed HRNet in a wide range of applications, including human pose estimation, semantic segmentation, and object detection, is shown, suggesting that the HRNet is a stronger backbone for computer vision problems.

...read moreread less

Abstract: High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection. Existing state-of-the-art frameworks first encode the input image as a low-resolution representation through a subnetwork that is formed by connecting high-to-low resolution convolutions \emph{in series} (e.g., ResNet, VGGNet), and then recover the high-resolution representation from the encoded low-resolution representation. Instead, our proposed network, named as High-Resolution Network (HRNet), maintains high-resolution representations through the whole process. There are two key characteristics: (i) Connect the high-to-low resolution convolution streams \emph{in parallel}; (ii) Repeatedly exchange the information across resolutions. The benefit is that the resulting representation is semantically richer and spatially more precise. We show the superiority of the proposed HRNet in a wide range of applications, including human pose estimation, semantic segmentation, and object detection, suggesting that the HRNet is a stronger backbone for computer vision problems. All the codes are available at~{\url{this https URL}}.

...read moreread less

1,278 citations

Journal Article•DOI•

Deep High-Resolution Representation Learning for Visual Recognition

[...]

Microsoft¹, University of Science and Technology of China², Huazhong University of Science and Technology³, Peking University⁴, South China University of Technology⁵, Griffith University⁶

01 Oct 2021-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: The High-Resolution Network (HRNet) as mentioned in this paper maintains high-resolution representations through the whole process by connecting the high-to-low resolution convolution streams in parallel and repeatedly exchanging the information across resolutions.

...read moreread less

Abstract: High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection. Existing state-of-the-art frameworks first encode the input image as a low-resolution representation through a subnetwork that is formed by connecting high-to-low resolution convolutions in series (e.g., ResNet, VGGNet), and then recover the high-resolution representation from the encoded low-resolution representation. Instead, our proposed network, named as High-Resolution Network (HRNet), maintains high-resolution representations through the whole process. There are two key characteristics: (i) Connect the high-to-low resolution convolution streams in parallel and (ii) repeatedly exchange the information across resolutions. The benefit is that the resulting representation is semantically richer and spatially more precise. We show the superiority of the proposed HRNet in a wide range of applications, including human pose estimation, semantic segmentation, and object detection, suggesting that the HRNet is a stronger backbone for computer vision problems. All the codes are available at https://github.com/HRNet .

...read moreread less

1,162 citations

Proceedings Article•DOI•

Clustered Object Detection in Aerial Images

[...]

Fan Yang¹, Heng Fan¹, Peng Chu¹, Erik Blasch², Haibin Ling¹ - Show less +1 more•Institutions (2)

Temple University¹, Air Force Research Laboratory²

01 Oct 2019

TL;DR: Zhang et al. as discussed by the authors proposed a cluster proposal sub-network (CPNet), a scale estimation sub-networks (ScaleNet), and a dedicated detection network (DetecNet) to detect small objects in aerial images.

...read moreread less

Abstract: Detecting objects in aerial images is challenging for at least two reasons: (1) target objects like pedestrians are very small in pixels, making them hardly distinguished from surrounding background; and (2) targets are in general sparsely and non-uniformly distributed, making the detection very inefficient. In this paper, we address both issues inspired by observing that these targets are often clustered. In particular, we propose a Clustered Detection (ClusDet) network that unifies object clustering and detection in an end-to-end framework. The key components in ClusDet include a cluster proposal sub-network (CPNet), a scale estimation sub-network (ScaleNet), and a dedicated detection network (DetecNet). Given an input image, CPNet produces object cluster regions and ScaleNet estimates object scales for these regions. Then, each scale-normalized cluster region is fed into DetecNet for object detection. ClusDet has several advantages over previous solutions: (1) it greatly reduces the number of chips for final object detection and hence achieves high running time efficiency, (2) the cluster-based scale estimation is more accurate than previously used single-object based ones, hence effectively improves the detection for small objects, and (3) the final DetecNet is dedicated for clustered regions and implicitly models the prior context information so as to boost detection accuracy. The proposed method is tested on three popular aerial image datasets including VisDrone, UAVDT and DOTA. In all experiments, ClusDet achieves promising performance in comparison with state-of-the-art detectors.

...read moreread less

161 citations

Posted Content•

Vision Meets Drones: Past, Present and Future

[...]

Pengfei Zhu, Longyin Wen, Dawei Du, Xiao Bian, Qinghua Hu, Haibin Ling - Show less +2 more

16 Jan 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: The VisDrone dataset, which is captured over various urban/suburban areas of 14 different cities across China from North to South, is described, being the largest such dataset ever published, and enables extensive evaluation and investigation of visual analysis algorithms on the drone platform.

...read moreread less

Abstract: Drones, or general UAVs, equipped with cameras have been fast deployed with a wide range of applications, including agriculture, aerial photography, and surveillance. Consequently, automatic understanding of visual data collected from drones becomes highly demanding, bringing computer vision and drones more and more closely. To promote and track the evelopments of object detection and tracking algorithms, we have organized two challenge workshops in conjunction with ECCV 2018, and ICCV 2019, attracting more than 100 teams around the world. We provide a large-scale drone captured dataset, VisDrone, which includes four tracks, i.e., (1) image object detection, (2) video object detection, (3) single object tracking, and (4) multi-object tracking. In this paper, we first presents a thorough review of object detection and tracking datasets and benchmarks, and discuss the challenges of collecting large-scale drone-based object detection and tracking datasets with fully manual annotations. After that, we describe our VisDrone dataset, which is captured over various urban/suburban areas of 14 different cities across China from North to South. Being the largest such dataset ever published, VisDrone enables extensive evaluation and investigation of visual analysis algorithms on the drone platform. We provide a detailed analysis of the current state of the field of large-scale object detection and tracking on drones, and conclude the challenge as well as propose future directions. We expect the benchmark largely boost the research and development in video analysis on drone platforms. All the datasets and experimental results can be downloaded from the website: this https URL.

...read moreread less

129 citations

Journal Article•DOI•

Deep learning-based object detection in low-altitude UAV datasets: A survey

[...]

Payal Mittal¹, Raman Singh², Akashdeep Sharma¹•Institutions (2)

Panjab University, Chandigarh¹, Thapar University²

01 Dec 2020-Image and Vision Computing

TL;DR: A comprehensive review of the state of the art deep learning based object detection algorithms and analyze recent contributions of these algorithms to low altitude UAV datasets is provided.

...read moreread less

116 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79

Collapse