Home
/
Authors
/
Siyuan Qiao

Author

Siyuan Qiao

Other affiliations: Shanghai Jiao Tong University

Bio: Siyuan Qiao is an academic researcher from Johns Hopkins University. The author has contributed to research in topics: Computer science & Artificial neural network. The author has an hindex of 15, co-authored 44 publications receiving 1541 citations. Previous affiliations of Siyuan Qiao include Shanghai Jiao Tong University.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2015

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Few-Shot Image Recognition by Predicting Parameters from Activations

[...]

Siyuan Qiao¹, Chenxi Liu¹, Wei Shen¹, Alan L. Yuille¹•Institutions (1)

Johns Hopkins University¹

18 Jun 2018

TL;DR: A novel method that can adapt a pre-trained neural network to novel categories by directly predicting the parameters from the activations is proposed, which achieves the state-of-the-art classification accuracy on Novel categories by a significant margin while keeping comparable performance on the large-scale categories.

...read moreread less

Abstract: In this paper, we are interested in the few-shot learning problem. In particular, we focus on a challenging scenario where the number of categories is large and the number of examples per novel category is very limited, e.g. 1, 2, or 3. Motivated by the close relationship between the parameters and the activations in a neural network associated with the same category, we propose a novel method that can adapt a pre-trained neural network to novel categories by directly predicting the parameters from the activations. Zero training is required in adaptation to novel categories, and fast inference is realized by a single forward pass. We evaluate our method by doing few-shot image recognition on the ImageNet dataset, which achieves the state-of-the-art classification accuracy on novel categories by a significant margin while keeping comparable performance on the large-scale categories. We also test our method on the MiniImageNet dataset and it strongly outperforms the previous state-of-the-art methods.

...read moreread less

579 citations

Posted Content•

DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution.

[...]

Siyuan Qiao¹, Liang-Chieh Chen², Alan L. Yuille¹•Institutions (2)

Johns Hopkins University¹, Google²

03 Jun 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper proposes Recursive Feature Pyramid, which incorporates extra feedback connections from Feature Pyramid Networks into the bottom-up backbone layers and proposes Switchable Atrous Convolution, which convolves the features with different atrous rates and gathers the results using switch functions.

...read moreread less

Abstract: Many modern object detectors demonstrate outstanding performances by using the mechanism of looking and thinking twice. In this paper, we explore this mechanism in the backbone design for object detection. At the macro level, we propose Recursive Feature Pyramid, which incorporates extra feedback connections from Feature Pyramid Networks into the bottom-up backbone layers. At the micro level, we propose Switchable Atrous Convolution, which convolves the features with different atrous rates and gathers the results using switch functions. Combining them results in DetectoRS, which significantly improves the performances of object detection. On COCO test-dev, DetectoRS achieves state-of-the-art 55.7% box AP for object detection, 48.5% mask AP for instance segmentation, and 50.0% PQ for panoptic segmentation. The code is made publicly available.

...read moreread less

360 citations

Book Chapter•DOI•

Deep Co-Training for Semi-Supervised Image Recognition

[...]

Siyuan Qiao¹, Wei Shen¹, Zhishuai Zhang¹, Bo Wang, Alan L. Yuille¹ - Show less +1 more•Institutions (1)

Johns Hopkins University¹

08 Sep 2018

TL;DR: Deep Co-training as discussed by the authors exploits adversarial examples to encourage view difference, in order to prevent the networks from collapsing into each other and provide complementary information about the data, which is necessary for the co-training framework to achieve good results.

...read moreread less

Abstract: In this paper, we study the problem of semi-supervised image recognition, which is to learn classifiers using both labeled and unlabeled images. We present Deep Co-Training, a deep learning based method inspired by the Co-Training framework. The original Co-Training learns two classifiers on two views which are data from different sources that describe the same instances. To extend this concept to deep learning, Deep Co-Training trains multiple deep neural networks to be the different views and exploits adversarial examples to encourage view difference, in order to prevent the networks from collapsing into each other. As a result, the co-trained networks provide different and complementary information about the data, which is necessary for the Co-Training framework to achieve good results. We test our method on SVHN, CIFAR-10/100 and ImageNet datasets, and our method outperforms the previous state-of-the-art methods by a large margin.

...read moreread less

343 citations

Proceedings Article•DOI•

DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution

[...]

Siyuan Qiao¹, Liang-Chieh Chen², Alan L. Yuille¹•Institutions (2)

Johns Hopkins University¹, Google²

01 Jun 2021

TL;DR: DetectoRS as mentioned in this paper proposes recursive feature pyramid, which incorporates extra feedback connections from Feature Pyramid Networks into the bottom-up backbone layers, and switchable atrous convolution which convolves the features with different atrous rates and gathers the results using switch functions.

...read moreread less

213 citations

Proceedings Article•DOI•

UnrealCV: Virtual Worlds for Computer Vision

[...]

Weichao Qiu¹, Fangwei Zhong², Yi Zhang¹, Siyuan Qiao¹, Zihao Xiao¹, Tae Soo Kim¹, Yizhou Wang² - Show less +3 more•Institutions (2)

Johns Hopkins University¹, Peking University²

19 Oct 2017

TL;DR: UnrealCV is a project to help computer vision researchers build virtual worlds using Unreal Engine 4 (UE4).

...read moreread less

Abstract: UnrealCV is a project to help computer vision researchers build virtual worlds using Unreal Engine 4 (UE4). It extends UE4 with a plugin by providing (1) A set of UnrealCV commands to interact with the virtual world. (2) Communication between UE4 and an external program, such as Caffe. UnrealCV can be used in two ways. The first one is using a compiled game binary with UnrealCV embedded. This is as simple as running a game, no knowledge of Unreal Engine is required. The second is installing UnrealCV plugin to Unreal Engine 4 (UE4) and use the editor of UE4 to build a new virtual world. UnrealCV is an open-source software under the MIT license. Since the initial release in September 2016, it has gathered an active community of users, including students and researchers.

...read moreread less

181 citations

1
2
3
4
…
5
6
7
8
9
10
11
12

Collapse

Cited by

PDF

Open Access

More filters

Posted Content•

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows.

[...]

Ze Liu¹, Yutong Lin¹, Yue Cao¹, Han Hu¹, Yixuan Wei¹, Zheng Zhang¹, Stephen Lin¹, Baining Guo¹ - Show less +4 more•Institutions (1)

Microsoft¹

25 Mar 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: Wang et al. as mentioned in this paper proposed a new vision Transformer called Swin Transformer, which is computed with shifted windows to address the differences between the two domains, such as large variations in the scale of visual entities and the high resolution of pixels in images compared to words in text.

...read moreread less

Abstract: This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for computer vision. Challenges in adapting Transformer from language to vision arise from differences between the two domains, such as large variations in the scale of visual entities and the high resolution of pixels in images compared to words in text. To address these differences, we propose a hierarchical Transformer whose representation is computed with shifted windows. The shifted windowing scheme brings greater efficiency by limiting self-attention computation to non-overlapping local windows while also allowing for cross-window connection. This hierarchical architecture has the flexibility to model at various scales and has linear computational complexity with respect to image size. These qualities of Swin Transformer make it compatible with a broad range of vision tasks, including image classification (86.4 top-1 accuracy on ImageNet-1K) and dense prediction tasks such as object detection (58.7 box AP and 51.1 mask AP on COCO test-dev) and semantic segmentation (53.5 mIoU on ADE20K val). Its performance surpasses the previous state-of-the-art by a large margin of +2.7 box AP and +2.6 mask AP on COCO, and +3.2 mIoU on ADE20K, demonstrating the potential of Transformer-based models as vision backbones. The code and models will be made publicly available at~\url{this https URL}.

...read moreread less

3,518 citations

The PASCAL Visual Object Classes Challenge

[...]

Jianguo Zhang

01 Jan 2006

3,012 citations

Journal Article•DOI•

Deep Learning for Generic Object Detection: A Survey

[...]

Li Liu¹, Li Liu², Wanli Ouyang³, Xiaogang Wang⁴, Paul Fieguth⁵, Jie Chen², Xinwang Liu¹, Matti Pietikäinen² - Show less +4 more•Institutions (5)

National University of Defense Technology¹, University of Oulu², University of Sydney³, The Chinese University of Hong Kong⁴, University of Waterloo⁵

01 Feb 2020-International Journal of Computer Vision

TL;DR: A comprehensive survey of the recent achievements in this field brought about by deep learning techniques, covering many aspects of generic object detection: detection frameworks, object feature representation, object proposal generation, context modeling, training strategies, and evaluation metrics.

...read moreread less

Abstract: Object detection, one of the most fundamental and challenging problems in computer vision, seeks to locate object instances from a large number of predefined categories in natural images. Deep learning techniques have emerged as a powerful strategy for learning feature representations directly from data and have led to remarkable breakthroughs in the field of generic object detection. Given this period of rapid evolution, the goal of this paper is to provide a comprehensive survey of the recent achievements in this field brought about by deep learning techniques. More than 300 research contributions are included in this survey, covering many aspects of generic object detection: detection frameworks, object feature representation, object proposal generation, context modeling, training strategies, and evaluation metrics. We finish the survey by identifying promising directions for future research.

...read moreread less

1,897 citations

Proceedings Article•DOI•

Self-Training With Noisy Student Improves ImageNet Classification

[...]

Qizhe Xie¹, Minh-Thang Luong¹, Eduard Hovy², Quoc V. Le¹•Institutions (2)

Google¹, Carnegie Mellon University²

14 Jun 2020

TL;DR: A simple self-training method that achieves 88.4% top-1 accuracy on ImageNet, which is 2.0% better than the state-of-the-art model that requires 3.5B weakly labeled Instagram images.

...read moreread less

Abstract: We present a simple self-training method that achieves 88.4% top-1 accuracy on ImageNet, which is 2.0% better than the state-of-the-art model that requires 3.5B weakly labeled Instagram images. On robustness test sets, it improves ImageNet-A top-1 accuracy from 61.0% to 83.7%, reduces ImageNet-C mean corruption error from 45.7 to 28.3, and reduces ImageNet-P mean flip rate from 27.8 to 12.2. To achieve this result, we first train an EfficientNet model on labeled ImageNet images and use it as a teacher to generate pseudo labels on 300M unlabeled images. We then train a larger EfficientNet as a student model on the combination of labeled and pseudo labeled images. We iterate this process by putting back the student as the teacher. During the generation of the pseudo labels, the teacher is not noised so that the pseudo labels are as accurate as possible. However, during the learning of the student, we inject noise such as dropout, stochastic depth and data augmentation via RandAugment to the student so that the student generalizes better than the teacher.

...read moreread less

1,696 citations

Posted Content•

MMDetection: Open MMLab Detection Toolbox and Benchmark.

[...]

Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jiarui Xu, Zheng Zhang, Dazhi Cheng, Chenchen Zhu, Tianheng Cheng, Qijie Zhao, Buyu Li, Xin Lu, Rui Zhu, Yue Wu, Jifeng Dai, Jingdong Wang, Jianping Shi, Wanli Ouyang, Chen Change Loy, Dahua Lin - Show less +21 more

17 Jun 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper presents MMDetection, an object detection toolbox that contains a rich set of object detection and instance segmentation methods as well as related components and modules, and conducts a benchmarking study on different methods, components, and their hyper-parameters.

...read moreread less

Abstract: We present MMDetection, an object detection toolbox that contains a rich set of object detection and instance segmentation methods as well as related components and modules. The toolbox started from a codebase of MMDet team who won the detection track of COCO Challenge 2018. It gradually evolves into a unified platform that covers many popular detection methods and contemporary modules. It not only includes training and inference codes, but also provides weights for more than 200 network models. We believe this toolbox is by far the most complete detection toolbox. In this paper, we introduce the various features of this toolbox. In addition, we also conduct a benchmarking study on different methods, components, and their hyper-parameters. We wish that the toolbox and benchmark could serve the growing research community by providing a flexible toolkit to reimplement existing methods and develop their own new detectors. Code and models are available at this https URL. The project is under active development and we will keep this document updated.

...read moreread less

1,573 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse