Home
/
Authors
/
Fengwei Yu

Author

Fengwei Yu

Bio: Fengwei Yu is an academic researcher from SenseTime. The author has contributed to research in topics: Quantization (signal processing) & Artificial neural network. The author has an hindex of 9, co-authored 29 publications receiving 811 citations. Previous affiliations of Fengwei Yu include Beihang University.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks

[...]

Ruihao Gong¹, Xianglong Liu¹, Shenghu Jiang¹, Tianxiang Li², Peng Hu¹, Jiazhen Lin¹, Fengwei Yu³, Junjie Yan³ - Show less +4 more•Institutions (3)

Beihang University¹, Beijing Institute of Technology², SenseTime³

01 Oct 2019

TL;DR: Differentiable soft quantization (DSQ) as mentioned in this paper is proposed to bridge the gap between the full-precision and low-bit networks, which can automatically evolve during training to gradually approximate the standard quantization.

...read moreread less

Abstract: Hardware-friendly network quantization (e.g., binary/uniform quantization) can efﬁciently accelerate the inference and meanwhile reduce memory consumption of the deep neural networks, which is crucial for model deployment on resource-limited devices like mobile phones. However, due to the discreteness of low-bit quantization, existing quantization methods often face the unstable training process and severe performance degradation. To address this problem, in this paper we propose Differentiable Soft Quantization (DSQ) to bridge the gap between the full-precision and low-bit networks. DSQ can automatically evolve during training to gradually approximate the standard quantization. Owing to its differentiable property, DSQ can help pursue the accurate gradients in backward propagation, and reduce the quantization loss in forward process with an appropriate clipping range. Extensive experiments over several popular network structures show that training low-bit neural networks with DSQ can consistently outperform state-of-the-art quantization methods. Besides, our ﬁrst efﬁcient implementation for deploying 2 to 4-bit DSQ on devices with ARM architecture achieves up to 1.7× speed up, compared with the open-source 8-bit high-performance inference framework NCNN [31].

...read moreread less

363 citations

Book Chapter•DOI•

POI: Multiple Object Tracking with High Performance Detection and Appearance Feature

[...]

Fengwei Yu¹, Fengwei Yu², Wenbo Li³, Wenbo Li², Quanquan Li², Yu Liu², Xiaohua Shi¹, Junjie Yan² - Show less +4 more•Institutions (3)

Beihang University¹, SenseTime², University at Albany, SUNY³

08 Oct 2016

TL;DR: Li et al. as mentioned in this paper explored the high-performance detection and deep learning based appearance feature, and showed that they lead to significantly better MOT results in both online and offline setting.

...read moreread less

Abstract: Detection and learning based appearance feature play the central role in data association based multiple object tracking (MOT), but most recent MOT works usually ignore them and only focus on the hand-crafted feature and association algorithms. In this paper, we explore the high-performance detection and deep learning based appearance feature, and show that they lead to significantly better MOT results in both online and offline setting. We make our detection and appearance feature publicly available (https://drive.google.com/open?id=0B5ACiy41McAHMjczS2p0dFg3emM). In the following part, we first summarize the detection and appearance feature, and then introduce our tracker named Person of Interest (POI), which has both online and offline version (We use POI to denote our online tracker and KDNT to denote our offline tracker in submission.).

...read moreread less

299 citations

Posted Content•

POI: Multiple Object Tracking with High Performance Detection and Appearance Feature

[...]

Fengwei Yu¹, Fengwei Yu², Wenbo Li³, Wenbo Li², Quanquan Li², Yu Liu², Xiaohua Shi¹, Junjie Yan² - Show less +4 more•Institutions (3)

Beihang University¹, SenseTime², University at Albany, SUNY³

19 Oct 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper explores the high-performance detection and deep learning based appearance feature, and shows that they lead to significantly better MOT results in both online and offline setting.

...read moreread less

Abstract: Detection and learning based appearance feature play the central role in data association based multiple object tracking (MOT), but most recent MOT works usually ignore them and only focus on the hand-crafted feature and association algorithms. In this paper, we explore the high-performance detection and deep learning based appearance feature, and show that they lead to significantly better MOT results in both online and offline setting. We make our detection and appearance feature publicly available. In the following part, we first summarize the detection and appearance feature, and then introduce our tracker named Person of Interest (POI), which has both online and offline version.

...read moreread less

274 citations

Proceedings Article•DOI•

Forward and Backward Information Retention for Accurate Binary Neural Networks

[...]

Haotong Qin¹, Ruihao Gong¹, Xianglong Liu¹, Mingzhu Shen¹, Ziran Wei², Fengwei Yu³, Jingkuan Song⁴ - Show less +3 more•Institutions (4)

Beihang University¹, Beijing University of Posts and Telecommunications², SenseTime³, University of Electronic Science and Technology of China⁴

14 Jun 2020

TL;DR: The proposed Information Retention Network (IR-Net) is the first to investigate both forward and backward processes of binary networks from the unified information perspective, which provides new insight into the mechanism of network binarization.

...read moreread less

Abstract: Weight and activation binarization is an effective approach to deep neural network compression and can accelerate the inference by leveraging bitwise operations. Although many binarization methods have improved the accuracy of the model by minimizing the quantization error in forward propagation, there remains a noticeable performance gap between the binarized model and the full-precision one. Our empirical study indicates that the quantization brings information loss in both forward and backward propagation, which is the bottleneck of training accurate binary neural networks. To address these issues, we propose an Information Retention Network (IR-Net) to retain the information that consists in the forward activations and backward gradients. IR-Net mainly relies on two technical contributions: (1) Libra Parameter Binarization (Libra-PB): simultaneously minimizing both quantization error and information loss of parameters by balanced and standardized weights in forward propagation; (2) Error Decay Estimator (EDE): minimizing the information loss of gradients by gradually approximating the sign function in backward propagation, jointly considering the updating ability and accurate gradients. We are the first to investigate both forward and backward processes of binary networks from the unified information perspective, which provides new insight into the mechanism of network binarization. Comprehensive experiments with various network structures on CIFAR-10 and ImageNet datasets manifest that the proposed IR-Net can consistently outperform state-of-the-art quantization methods.

...read moreread less

253 citations

Posted Content•

Incorporating Convolution Designs into Visual Transformers.

[...]

Kun Yuan¹, Shaopeng Guo², Ziwei Liu, Aojun Zhou¹, Fengwei Yu³, Wei Wu⁴ - Show less +2 more•Institutions (4)

Chinese Academy of Sciences¹, SenseTime², Beihang University³, Nanyang Technological University⁴

22 Mar 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: CeiT as discussed by the authors combines the advantages of CNNs in extracting low-level features, strengthening locality, and the advantage of Transformers in establishing long-range dependencies, which can reduce the training cost significantly.

...read moreread less

Abstract: Motivated by the success of Transformers in natural language processing (NLP) tasks, there emerge some attempts (e.g., ViT and DeiT) to apply Transformers to the vision domain. However, pure Transformer architectures often require a large amount of training data or extra supervision to obtain comparable performance with convolutional neural networks (CNNs). To overcome these limitations, we analyze the potential drawbacks when directly borrowing Transformer architectures from NLP. Then we propose a new \textbf{Convolution-enhanced image Transformer (CeiT)} which combines the advantages of CNNs in extracting low-level features, strengthening locality, and the advantages of Transformers in establishing long-range dependencies. Three modifications are made to the original Transformer: \textbf{1)} instead of the straightforward tokenization from raw input images, we design an \textbf{Image-to-Tokens (I2T)} module that extracts patches from generated low-level features; \textbf{2)} the feed-froward network in each encoder block is replaced with a \textbf{Locally-enhanced Feed-Forward (LeFF)} layer that promotes the correlation among neighboring tokens in the spatial dimension; \textbf{3)} a \textbf{Layer-wise Class token Attention (LCA)} is attached at the top of the Transformer that utilizes the multi-level representations. Experimental results on ImageNet and seven downstream tasks show the effectiveness and generalization ability of CeiT compared with previous Transformers and state-of-the-art CNNs, without requiring a large amount of training data and extra CNN teachers. Besides, CeiT models also demonstrate better convergence with $3\times$ fewer training iterations, which can reduce the training cost significantly\footnote{Code and models will be released upon acceptance.}.

...read moreread less

153 citations

1
2
3
4
…
5
6

Collapse

Cited by

PDF

Open Access

More filters

The PASCAL Visual Object Classes Challenge

[...]

Jianguo Zhang

01 Jan 2006

3,012 citations

Proceedings Article•DOI•

Simple online and realtime tracking with a deep association metric

[...]

Nicolai Wojke¹, Alex Bewley², Dietrich Paulus¹•Institutions (2)

University of Koblenz and Landau¹, Queensland University of Technology²

21 Mar 2017

TL;DR: This paper integrates appearance information to improve the performance of SORT and reduces the number of identity switches, achieving overall competitive performance at high frame rates.

...read moreread less

Abstract: Simple Online and Realtime Tracking (SORT) is a pragmatic approach to multiple object tracking with a focus on simple, effective algorithms. In this paper, we integrate appearance information to improve the performance of SORT. Due to this extension we are able to track objects through longer periods of occlusions, effectively reducing the number of identity switches. In spirit of the original framework we place much of the computational complexity into an offline pre-training stage where we learn a deep association metric on a largescale person re-identification dataset. During online application, we establish measurement-to-track associations using nearest neighbor queries in visual appearance space. Experimental evaluation shows that our extensions reduce the number of identity switches by 45%, achieving overall competitive performance at high frame rates.

...read moreread less

1,808 citations

Posted Content•

Simple Online and Realtime Tracking with a Deep Association Metric

[...]

Nicolai Wojke¹, Alex Bewley², Dietrich Paulus¹•Institutions (2)

University of Koblenz and Landau¹, Queensland University of Technology²

21 Mar 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, the authors integrate appearance information to improve the performance of Simple Online and Real-time Tracking (SORT) by tracking objects through longer periods of occlusions, effectively reducing the number of identity switches.

...read moreread less

Abstract: Simple Online and Realtime Tracking (SORT) is a pragmatic approach to multiple object tracking with a focus on simple, effective algorithms. In this paper, we integrate appearance information to improve the performance of SORT. Due to this extension we are able to track objects through longer periods of occlusions, effectively reducing the number of identity switches. In spirit of the original framework we place much of the computational complexity into an offline pre-training stage where we learn a deep association metric on a large-scale person re-identification dataset. During online application, we establish measurement-to-track associations using nearest neighbor queries in visual appearance space. Experimental evaluation shows that our extensions reduce the number of identity switches by 45%, achieving overall competitive performance at high frame rates.

...read moreread less

987 citations

Book Chapter•DOI•

Tracking Objects as Points

[...]

Xingyi Zhou¹, Vladlen Koltun², Philipp Krähenbühl¹•Institutions (2)

University of Texas at Austin¹, Intel²

23 Aug 2020

TL;DR: CenterTrack as mentioned in this paper applies a detection model to a pair of images and detections from the prior frame, given this minimal input, localizes objects and predicts their associations with the previous frame.

...read moreread less

Abstract: Tracking has traditionally been the art of following interest points through space and time. This changed with the rise of powerful deep networks. Nowadays, tracking is dominated by pipelines that perform object detection followed by temporal association, also known as tracking-by-detection. We present a simultaneous detection and tracking algorithm that is simpler, faster, and more accurate than the state of the art. Our tracker, CenterTrack, applies a detection model to a pair of images and detections from the prior frame. Given this minimal input, CenterTrack localizes objects and predicts their associations with the previous frame. That’s it. CenterTrack is simple, online (no peeking into the future), and real-time. It achieves $67.8\%$ MOTA on the MOT17 challenge at 22 FPS and $89.4\%$ MOTA on the KITTI tracking benchmark at 15 FPS, setting a new state of the art on both datasets. CenterTrack is easily extended to monocular 3D tracking by regressing additional 3D attributes. Using monocular video input, it achieves $28.3\%$ AMOTA@0.2 on the newly released nuScenes 3D tracking benchmark, substantially outperforming the monocular baseline on this benchmark while running at 28 FPS.

...read moreread less

657 citations

Book•

Computer Vision for Autonomous Vehicles: Problems, Datasets and State-of-the-Art

[...]

Joel Janai¹, Fatma Güney², Aseem Behl¹, Andreas Geiger¹•Institutions (2)

Max Planck Society¹, Koç University²

03 Jul 2020

TL;DR: This survey includes both the historically most relevant literature as well as the current state of the art on several specific topics, including recognition, reconstruction, motion estimation, tracking, scene understanding, and end-to-end learning for autonomous driving.

...read moreread less

Abstract: Recent years have witnessed enormous progress in AI-related fields such as computer vision, machine learning, and autonomous vehicles. As with any rapidly growing field, it becomes increasingly difficult to stay up-to-date or enter the field as a beginner. While several survey papers on particular sub-problems have appeared, no comprehensive survey on problems, datasets, and methods in computer vision for autonomous vehicles has been published. This monograph attempts to narrow this gap by providing a survey on the state-of-the-art datasets and techniques. Our survey includes both the historically most relevant literature as well as the current state of the art on several specific topics, including recognition, reconstruction, motion estimation, tracking, scene understanding, and end-to-end learning for autonomous driving. Towards this goal, we analyze the performance of the state of the art on several challenging benchmarking datasets, including KITTI, MOT, and Cityscapes. Besides, we discuss open problems and current research challenges. To ease accessibility and accommodate missing references, we also provide a website that allows navigating topics as well as methods and provides additional information.

...read moreread less

579 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse