Home
/
Authors
/
Chao Peng

Author

Chao Peng

Bio: Chao Peng is an academic researcher from Peking University. The author has contributed to research in topics: Segmentation & Object detection. The author has an hindex of 16, co-authored 17 publications receiving 4852 citations. Previous affiliations of Chao Peng include Tsinghua University.

Papers

PDF

Open Access

More filters

Book Chapter•DOI•

BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation

[...]

Changqian Yu¹, Jingbo Wang², Chao Peng, Changxin Gao¹, Gang Yu, Nong Sang¹ - Show less +2 more•Institutions (2)

Huazhong University of Science and Technology¹, Peking University²

08 Sep 2018

TL;DR: BiSeNet as discussed by the authors designs a spatial path with a small stride to preserve the spatial information and generate high-resolution features, while a context path with fast downsampling strategy is employed to obtain sufficient receptive field.

...read moreread less

Abstract: Semantic segmentation requires both rich spatial information and sizeable receptive field. However, modern approaches usually compromise spatial resolution to achieve real-time inference speed, which leads to poor performance. In this paper, we address this dilemma with a novel Bilateral Segmentation Network (BiSeNet). We first design a Spatial Path with a small stride to preserve the spatial information and generate high-resolution features. Meanwhile, a Context Path with a fast downsampling strategy is employed to obtain sufficient receptive field. On top of the two paths, we introduce a new Feature Fusion Module to combine features efficiently. The proposed architecture makes a right balance between the speed and segmentation performance on Cityscapes, CamVid, and COCO-Stuff datasets. Specifically, for a 2048 \(\times \) 1024 input, we achieve 68.4% Mean IOU on the Cityscapes test dataset with speed of 105 FPS on one NVIDIA Titan XP card, which is significantly faster than the existing methods with comparable performance.

...read moreread less

1,547 citations

Proceedings Article•DOI•

Large Kernel Matters — Improve Semantic Segmentation by Global Convolutional Network

[...]

Chao Peng¹, Xiangyu Zhang, Gang Yu, Guiming Luo¹, Jian Sun - Show less +1 more•Institutions (1)

Tsinghua University¹

21 Jul 2017

TL;DR: This work proposes a Global Convolutional Network to address both the classification and localization issues for the semantic segmentation and suggests a residual-based boundary refinement to further refine the object boundaries.

...read moreread less

Abstract: One of recent trends [31, 32, 14] in network architecture design is stacking small filters (e.g., 1x1 or 3x3) in the entire network because the stacked small filters is more efficient than a large kernel, given the same computational complexity. However, in the field of semantic segmentation, where we need to perform dense per-pixel prediction, we find that the large kernel (and effective receptive field) plays an important role when we have to perform the classification and localization tasks simultaneously. Following our design principle, we propose a Global Convolutional Network to address both the classification and localization issues for the semantic segmentation. We also suggest a residual-based boundary refinement to further refine the object boundaries. Our approach achieves state-of-art performance on two public benchmarks and significantly outperforms previous results, 82.2% (vs 80.2%) on PASCAL VOC 2012 dataset and 76.9% (vs 71.8%) on Cityscapes dataset.

...read moreread less

1,047 citations

Posted Content•

Large Kernel Matters -- Improve Semantic Segmentation by Global Convolutional Network

[...]

Chao Peng¹, Xiangyu Zhang, Gang Yu, Guiming Luo¹, Jian Sun - Show less +1 more•Institutions (1)

Tsinghua University¹

08 Mar 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, a Global Convolutional Network (GCN) is proposed to address both the classification and localization issues for the semantic segmentation, which achieves state-of-the-art performance on two public benchmarks.

...read moreread less

Abstract: One of recent trends [30, 31, 14] in network architec- ture design is stacking small filters (e.g., 1x1 or 3x3) in the entire network because the stacked small filters is more ef- ficient than a large kernel, given the same computational complexity. However, in the field of semantic segmenta- tion, where we need to perform dense per-pixel prediction, we find that the large kernel (and effective receptive field) plays an important role when we have to perform the clas- sification and localization tasks simultaneously. Following our design principle, we propose a Global Convolutional Network to address both the classification and localization issues for the semantic segmentation. We also suggest a residual-based boundary refinement to further refine the ob- ject boundaries. Our approach achieves state-of-art perfor- mance on two public benchmarks and significantly outper- forms previous results, 82.2% (vs 80.2%) on PASCAL VOC 2012 dataset and 76.9% (vs 71.8%) on Cityscapes dataset.

...read moreread less

935 citations

Proceedings Article•DOI•

Learning a Discriminative Feature Network for Semantic Segmentation

[...]

Changqian Yu¹, Jingbo Wang², Chao Peng, Changxin Gao¹, Gang Yu, Nong Sang¹ - Show less +2 more•Institutions (2)

Huazhong University of Science and Technology¹, Peking University²

25 Apr 2018

TL;DR: This work proposes a Discriminative Feature Network (DFN), which contains two sub-networks: Smooth Network and Border Network, which is specially design to handle the intra-class inconsistency problem and to make the bilateral features of boundary distinguishable with deep semantic boundary supervision.

...read moreread less

Abstract: Most existing methods of semantic segmentation still suffer from two aspects of challenges: intra-class inconsistency and inter-class indistinction. To tackle these two problems, we propose a Discriminative Feature Network (DFN), which contains two sub-networks: Smooth Network and Border Network. Specifically, to handle the intra-class inconsistency problem, we specially design a Smooth Network with Channel Attention Block and global average pooling to select the more discriminative features. Furthermore, we propose a Border Network to make the bilateral features of boundary distinguishable with deep semantic boundary supervision. Based on our proposed DFN, we achieve state-of-the-art performance 86.2% mean IOU on PASCAL VOC 2012 and 80.3% mean IOU on Cityscapes dataset.

...read moreread less

652 citations

Posted Content•

BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation

[...]

Changqian Yu¹, Jingbo Wang², Chao Peng, Changxin Gao¹, Gang Yu, Nong Sang¹ - Show less +2 more•Institutions (2)

Huazhong University of Science and Technology¹, Peking University²

02 Aug 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: A novel Bilateral Segmentation Network (BiSeNet) is proposed that makes a right balance between the speed and segmentation performance on Cityscapes, CamVid, and COCO-Stuff datasets.

...read moreread less

Abstract: Semantic segmentation requires both rich spatial information and sizeable receptive field. However, modern approaches usually compromise spatial resolution to achieve real-time inference speed, which leads to poor performance. In this paper, we address this dilemma with a novel Bilateral Segmentation Network (BiSeNet). We first design a Spatial Path with a small stride to preserve the spatial information and generate high-resolution features. Meanwhile, a Context Path with a fast downsampling strategy is employed to obtain sufficient receptive field. On top of the two paths, we introduce a new Feature Fusion Module to combine features efficiently. The proposed architecture makes a right balance between the speed and segmentation performance on Cityscapes, CamVid, and COCO-Stuff datasets. Specifically, for a 2048x1024 input, we achieve 68.4% Mean IOU on the Cityscapes test dataset with speed of 105 FPS on one NVIDIA Titan XP card, which is significantly faster than the existing methods with comparable performance.

...read moreread less

389 citations

1
2
3
4
…

Cited by

PDF

Open Access

More filters

Book Chapter•DOI•

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

[...]

Liang-Chieh Chen¹, Yukun Zhu¹, George Papandreou¹, Florian Schroff¹, Hartwig Adam¹ - Show less +1 more•Institutions (1)

Google¹

08 Sep 2018

TL;DR: This work extends DeepLabv3 by adding a simple yet effective decoder module to refine the segmentation results especially along object boundaries and applies the depthwise separable convolution to both Atrous Spatial Pyramid Pooling and decoder modules, resulting in a faster and stronger encoder-decoder network.

...read moreread less

Abstract: Spatial pyramid pooling module or encode-decoder structure are used in deep neural networks for semantic segmentation task. The former networks are able to encode multi-scale contextual information by probing the incoming features with filters or pooling operations at multiple rates and multiple effective fields-of-view, while the latter networks can capture sharper object boundaries by gradually recovering the spatial information. In this work, we propose to combine the advantages from both methods. Specifically, our proposed model, DeepLabv3+, extends DeepLabv3 by adding a simple yet effective decoder module to refine the segmentation results especially along object boundaries. We further explore the Xception model and apply the depthwise separable convolution to both Atrous Spatial Pyramid Pooling and decoder modules, resulting in a faster and stronger encoder-decoder network. We demonstrate the effectiveness of the proposed model on PASCAL VOC 2012 and Cityscapes datasets, achieving the test set performance of 89% and 82.1% without any post-processing. Our paper is accompanied with a publicly available reference implementation of the proposed models in Tensorflow at https://github.com/tensorflow/models/tree/master/research/deeplab.

...read moreread less

7,113 citations

Posted Content•

YOLOv4: Optimal Speed and Accuracy of Object Detection

[...]

Alexey Bochkovskiy, Chien-Yao Wang¹, Hong-Yuan Mark Liao¹•Institutions (1)

Academia Sinica¹

23 Apr 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work uses new features: WRC, CSP, CmBN, SAT, Mish activation, Mosaic data augmentation, C mBN, DropBlock regularization, and CIoU loss, and combine some of them to achieve state-of-the-art results: 43.5% AP for the MS COCO dataset at a realtime speed of ~65 FPS on Tesla V100.

...read moreread less

Abstract: There are a huge number of features which are said to improve Convolutional Neural Network (CNN) accuracy. Practical testing of combinations of such features on large datasets, and theoretical justification of the result, is required. Some features operate on certain models exclusively and for certain problems exclusively, or only for small-scale datasets; while some features, such as batch-normalization and residual-connections, are applicable to the majority of models, tasks, and datasets. We assume that such universal features include Weighted-Residual-Connections (WRC), Cross-Stage-Partial-connections (CSP), Cross mini-Batch Normalization (CmBN), Self-adversarial-training (SAT) and Mish-activation. We use new features: WRC, CSP, CmBN, SAT, Mish activation, Mosaic data augmentation, CmBN, DropBlock regularization, and CIoU loss, and combine some of them to achieve state-of-the-art results: 43.5% AP (65.7% AP50) for the MS COCO dataset at a realtime speed of ~65 FPS on Tesla V100. Source code is at this https URL

...read moreread less

5,709 citations

Posted Content•

Rethinking Atrous Convolution for Semantic Image Segmentation

[...]

Liang-Chieh Chen, George Papandreou, Florian Schroff, Hartwig Adam

17 Jun 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: The proposed `DeepLabv3' system significantly improves over the previous DeepLab versions without DenseCRF post-processing and attains comparable performance with other state-of-art models on the PASCAL VOC 2012 semantic image segmentation benchmark.

...read moreread less

Abstract: In this work, we revisit atrous convolution, a powerful tool to explicitly adjust filter's field-of-view as well as control the resolution of feature responses computed by Deep Convolutional Neural Networks, in the application of semantic image segmentation. To handle the problem of segmenting objects at multiple scales, we design modules which employ atrous convolution in cascade or in parallel to capture multi-scale context by adopting multiple atrous rates. Furthermore, we propose to augment our previously proposed Atrous Spatial Pyramid Pooling module, which probes convolutional features at multiple scales, with image-level features encoding global context and further boost performance. We also elaborate on implementation details and share our experience on training our system. The proposed `DeepLabv3' system significantly improves over our previous DeepLab versions without DenseCRF post-processing and attains comparable performance with other state-of-art models on the PASCAL VOC 2012 semantic image segmentation benchmark.

...read moreread less

5,691 citations

Proceedings Article•DOI•

Dual Attention Network for Scene Segmentation

[...]

Jun Fu¹, Jing Liu¹, Haijie Tian, Yong Li, Yongjun Bao, Zhiwei Fang¹, Hanqing Lu - Show less +3 more•Institutions (1)

Chinese Academy of Sciences¹

15 Jun 2019

TL;DR: New state-of-the-art segmentation performance on three challenging scene segmentation datasets, i.e., Cityscapes, PASCAL Context and COCO Stuff dataset is achieved without using coarse data.

...read moreread less

Abstract: In this paper, we address the scene segmentation task by capturing rich contextual dependencies based on the self-attention mechanism. Unlike previous works that capture contexts by multi-scale features fusion, we propose a Dual Attention Networks (DANet) to adaptively integrate local features with their global dependencies. Specifically, we append two types of attention modules on top of traditional dilated FCN, which model the semantic interdependencies in spatial and channel dimensions respectively. The position attention module selectively aggregates the features at each position by a weighted sum of the features at all positions. Similar features would be related to each other regardless of their distances. Meanwhile, the channel attention module selectively emphasizes interdependent channel maps by integrating associated features among all channel maps. We sum the outputs of the two attention modules to further improve feature representation which contributes to more precise segmentation results. We achieve new state-of-the-art segmentation performance on three challenging scene segmentation datasets, i.e., Cityscapes, PASCAL Context and COCO Stuff dataset. In particular, a Mean IoU score of 81.5% on Cityscapes test set is achieved without using coarse data.

...read moreread less

4,327 citations

Posted Content•

Momentum Contrast for Unsupervised Visual Representation Learning

[...]

Kaiming He¹, Haoqi Fan¹, Yuxin Wu¹, Saining Xie¹, Ross Girshick¹ - Show less +1 more•Institutions (1)

Facebook¹

13 Nov 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: This article proposed Momentum Contrast (MoCo) for unsupervised visual representation learning, which enables building a large and consistent dictionary on-the-fly that facilitates contrastive learning.

...read moreread less

Abstract: We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build a dynamic dictionary with a queue and a moving-averaged encoder. This enables building a large and consistent dictionary on-the-fly that facilitates contrastive unsupervised learning. MoCo provides competitive results under the common linear protocol on ImageNet classification. More importantly, the representations learned by MoCo transfer well to downstream tasks. MoCo can outperform its supervised pre-training counterpart in 7 detection/segmentation tasks on PASCAL VOC, COCO, and other datasets, sometimes surpassing it by large margins. This suggests that the gap between unsupervised and supervised representation learning has been largely closed in many vision tasks.

...read moreread less

4,272 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse