Home
/
Authors
/
Enhua Wu

Author

Enhua Wu

Other affiliations: University of Macau, Academia Sinica, The Chinese University of Hong Kong ...read more

Bio: Enhua Wu is an academic researcher from Chinese Academy of Sciences. The author has contributed to research in topics: Rendering (computer graphics) & Polygon mesh. The author has an hindex of 24, co-authored 266 publications receiving 10340 citations. Previous affiliations of Enhua Wu include University of Macau & Academia Sinica.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1995
1991
1990
1988
1986

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Handling motion blur in multi-frame super-resolution

[...]

Ziyang Ma¹, Renjie Liao², Xin Tao², Li Xu², Jiaya Jia², Enhua Wu¹ - Show less +2 more•Institutions (2)

Chinese Academy of Sciences¹, The Chinese University of Hong Kong²

07 Jun 2015

TL;DR: The method proposed in this paper tackles the issue of ubiquitous motion blur easily fails multi-frame super-resolution by optimally searching least blurred pixels in MFSR and produces sharp and higher-resolution results given input of challenging low-resolution noisy and blurred sequences.

...read moreread less

Abstract: Ubiquitous motion blur easily fails multi-frame super-resolution (MFSR). Our method proposed in this paper tackles this issue by optimally searching least blurred pixels in MFSR. An EM framework is proposed to guide residual blur estimation and high-resolution image reconstruction. To suppress noise, we employ a family of sparse penalties as natural image priors, along with an effective solver. Theoretical analysis is performed on how and when our method works. The relationship between estimation errors of motion blur and the quality of input images is discussed. Our method produces sharp and higher-resolution results given input of challenging low-resolution noisy and blurred sequences.

...read moreread less

101 citations

Posted Content•

Transformer in Transformer

[...]

Kai Han¹, An Xiao², Enhua Wu¹, Jianyuan Guo³, Chunjing Xu⁴, Yunhe Wang⁴ - Show less +2 more•Institutions (4)

Chinese Academy of Sciences¹, Tsinghua University², Peking University³, Huawei⁴

27 Feb 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: Transformer iN Transformer (TNT) as discussed by the authors is a new kind of neural architecture which encodes the input data as powerful features via the attention mechanism, where the visual transformers first divide the input images into several local patches and then calculate both representations and their relationship.

...read moreread less

Abstract: Transformer is a new kind of neural architecture which encodes the input data as powerful features via the attention mechanism. Basically, the visual transformers first divide the input images into several local patches and then calculate both representations and their relationship. Since natural images are of high complexity with abundant detail and color information, the granularity of the patch dividing is not fine enough for excavating features of objects in different scales and locations. In this paper, we point out that the attention inside these local patches are also essential for building visual transformers with high performance and we explore a new architecture, namely, Transformer iN Transformer (TNT). Specifically, we regard the local patches (e.g., 16$\times$16) as "visual sentences" and present to further divide them into smaller patches (e.g., 4$\times$4) as "visual words". The attention of each word will be calculated with other words in the given visual sentence with negligible computational costs. Features of both words and sentences will be aggregated to enhance the representation ability. Experiments on several benchmarks demonstrate the effectiveness of the proposed TNT architecture, e.g., we achieve an $81.5%$ top-1 accuracy on the ImageNet, which is about $1.7%$ higher than that of the state-of-the-art visual transformer with similar computational cost. The PyTorch code is available at this https URL, and the MindSpore code is at this https URL.

...read moreread less

101 citations

Proceedings Article•DOI•

Photorealistic rendering of knitwear using the lumislice

[...]

Ying-Qing Xu¹, Yanyun Chen², Stephen Lin¹, Hua Zhong¹, Enhua Wu², Baining Guo¹, Heung-Yeung Shum¹ - Show less +3 more•Institutions (2)

Microsoft¹, Academia Sinica²

01 Aug 2001

TL;DR: A method for efficient synthesis of photorealistic free-form knitwear that accommodates varying levels of detail and capitalizes on hardware-assisted transparency blending and a technique for generating soft shadows from yarn is introduced.

...read moreread less

Abstract: We present a method for efficient synthesis of photorealistic free-form knitwear. Our approach is motivated by the observation that a single cross-section of yarn can serve as the basic primitive for modeling entire articles of knitwear. This primitive, called the lumislice, describes radiance from a yarn cross-section based on fine-level interactions — such as occlusion, shadowing, and multiple scattering — among yarn fibers. By representing yarn as a sequence of identical but rotated cross-sections, the lumislice can effectively propagate local microstructure over arbitrary stitch patterns and knitwear shapes. This framework accommodates varying levels of detail and capitalizes on hardware-assisted transparency blending. To further enhance realism, a technique for generating soft shadows from yarn is also introduced.

...read moreread less

95 citations

Proceedings Article•DOI•

Efficient depth peeling via bucket sort

[...]

Fang Liu¹, Meng-Cheng Huang¹, Xuehui Liu¹, Enhua Wu¹•Institutions (1)

Chinese Academy of Sciences¹

01 Aug 2009

TL;DR: An efficient algorithm for multi-layer depth peeling via bucket sort of fragments on GPU, which makes it possible to capture up to 32 layers simultaneously with correct depth ordering in a single geometry pass, and is free of read-modify-write (RMW) hazards.

...read moreread less

Abstract: In this paper we present an efficient algorithm for multi-layer depth peeling via bucket sort of fragments on GPU, which makes it possible to capture up to 32 layers simultaneously with correct depth ordering in a single geometry pass. We exploit multiple render targets (MRT) as storage and construct a bucket array of size 32 per pixel. Each bucket is capable of holding only one fragment, and can be concurrently updated using the MAX/MIN blending operation. During the rasterization, the depth range of each pixel location is divided into consecutive subintervals uniformly, and a linear bucket sort is performed so that fragments within each subintervals will be routed into the corresponding buckets. In a following fullscreen shader pass, the bucket array can be sequentially accessed to get the sorted fragments for further applications. Collisions will happen when more than one fragment is routed to the same bucket, which can be alleviated by multi-pass approach. We also develop a two-pass approach to further reduce the collisions, namely adaptive bucket depth peeling. In the first geometry pass, the depth range is redivided into non-uniform subintervals according to the depth distribution to make sure that there is only one fragment within each subinterval. In the following bucket sorting pass, there will be only one fragment routed into each bucket and collisions will be substantially reduced. Our algorithm shows up to 32 times speedup to the classical depth peeling especially for large scenes with high depth complexity, and the experimental results are visually faithful to the ground truth. Also it has no requirement of pre-sorting geometries or post-sorting fragments, and is free of read-modify-write (RMW) hazards.

...read moreread less

67 citations

Journal Article•DOI•

An improved study of real-time fluid simulation on GPU

[...]

Enhua Wu¹, Youquan Liu¹, Xuehui Liu¹•Institutions (1)

Chinese Academy of Sciences¹

01 Jul 2004-Computer Animation and Virtual Worlds

TL;DR: This work solves the fluid dynamics problem completely on GPU by packing the scalar and vector variables into four channels of texels and compute the texture coordinates offsets according to the type of the boundary condition of each node to determine the corresponding variables.

...read moreread less

Abstract: Taking advantage of the parallelism and programmability of GPU, we solve the fluid dynamics problem completely on GPU. Different from previous methods, the whole computation is accelerated in our method by packing the scalar and vector variables into four channels of texels. In order to be adaptive to the arbitrary boundary conditions, we group the grid nodes into different types according to their positions relative to obstacles and search the node that determines the value of the current node. Then we compute the texture coordinates offsets according to the type of the boundary condition of each node to determine the corresponding variables and achieve the interaction of flows with obstacles set freely by users. The test results prove the efficiency of our method and exhibit the potential of GPU for general-purpose computations. Copyright © 2004 John Wiley & Sons, Ltd.

...read moreread less

66 citations

1
2
3
4
5
…
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Squeeze-and-Excitation Networks

[...]

Jie Hu¹, Li Shen², Samuel Albanie², Gang Sun¹, Enhua Wu¹ - Show less +1 more•Institutions (2)

Chinese Academy of Sciences¹, University of Oxford²

18 Jun 2018

TL;DR: This work proposes a novel architectural unit, which is term the "Squeeze-and-Excitation" (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels and finds that SE blocks produce significant performance improvements for existing state-of-the-art deep architectures at minimal additional computational cost.

...read moreread less

Abstract: The central building block of convolutional neural networks (CNNs) is the convolution operator, which enables networks to construct informative features by fusing both spatial and channel-wise information within local receptive fields at each layer. A broad range of prior research has investigated the spatial component of this relationship, seeking to strengthen the representational power of a CNN by enhancing the quality of spatial encodings throughout its feature hierarchy. In this work, we focus instead on the channel relationship and propose a novel architectural unit, which we term the “Squeeze-and-Excitation” (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels. We show that these blocks can be stacked together to form SENet architectures that generalise extremely effectively across different datasets. We further demonstrate that SE blocks bring significant improvements in performance for existing state-of-the-art CNNs at slight additional computational cost. Squeeze-and-Excitation Networks formed the foundation of our ILSVRC 2017 classification submission which won first place and reduced the top-5 error to 2.251 percent, surpassing the winning entry of 2016 by a relative improvement of ${\sim }$ ∼ 25 percent. Models and code are available at https://github.com/hujie-frank/SENet .

...read moreread less

14,807 citations

Pattern Recognition and Machine Learning

[...]

Christopher M. Bishop¹•Institutions (1)

Microsoft¹

01 Jan 2006

TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.

...read moreread less

Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

...read moreread less

10,141 citations

Posted Content•

CBAM: Convolutional Block Attention Module

[...]

Sanghyun Woo¹, Jongchan Park, Joon-Young Lee², In So Kweon¹•Institutions (2)

KAIST¹, Adobe Systems²

17 Jul 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: The proposed Convolutional Block Attention Module (CBAM), a simple yet effective attention module for feed-forward convolutional neural networks, can be integrated into any CNN architectures seamlessly with negligible overheads and is end-to-end trainable along with base CNNs.

...read moreread less

Abstract: We propose Convolutional Block Attention Module (CBAM), a simple yet effective attention module for feed-forward convolutional neural networks. Given an intermediate feature map, our module sequentially infers attention maps along two separate dimensions, channel and spatial, then the attention maps are multiplied to the input feature map for adaptive feature refinement. Because CBAM is a lightweight and general module, it can be integrated into any CNN architectures seamlessly with negligible overheads and is end-to-end trainable along with base CNNs. We validate our CBAM through extensive experiments on ImageNet-1K, MS~COCO detection, and VOC~2007 detection datasets. Our experiments show consistent improvements in classification and detection performances with various models, demonstrating the wide applicability of CBAM. The code and models will be publicly available.

...read moreread less

5,757 citations

Posted Content•

YOLOv4: Optimal Speed and Accuracy of Object Detection

[...]

Alexey Bochkovskiy, Chien-Yao Wang¹, Hong-Yuan Mark Liao¹•Institutions (1)

Academia Sinica¹

23 Apr 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work uses new features: WRC, CSP, CmBN, SAT, Mish activation, Mosaic data augmentation, C mBN, DropBlock regularization, and CIoU loss, and combine some of them to achieve state-of-the-art results: 43.5% AP for the MS COCO dataset at a realtime speed of ~65 FPS on Tesla V100.

...read moreread less

Abstract: There are a huge number of features which are said to improve Convolutional Neural Network (CNN) accuracy. Practical testing of combinations of such features on large datasets, and theoretical justification of the result, is required. Some features operate on certain models exclusively and for certain problems exclusively, or only for small-scale datasets; while some features, such as batch-normalization and residual-connections, are applicable to the majority of models, tasks, and datasets. We assume that such universal features include Weighted-Residual-Connections (WRC), Cross-Stage-Partial-connections (CSP), Cross mini-Batch Normalization (CmBN), Self-adversarial-training (SAT) and Mish-activation. We use new features: WRC, CSP, CmBN, SAT, Mish activation, Mosaic data augmentation, CmBN, DropBlock regularization, and CIoU loss, and combine some of them to achieve state-of-the-art results: 43.5% AP (65.7% AP50) for the MS COCO dataset at a realtime speed of ~65 FPS on Tesla V100. Source code is at this https URL

...read moreread less

5,709 citations

Posted Content•

Squeeze-and-Excitation Networks

[...]

Jie Hu¹, Li Shen², Samuel Albanie², Gang Sun¹, Enhua Wu¹ - Show less +1 more•Institutions (2)

Chinese Academy of Sciences¹, University of Oxford²

05 Sep 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: Squeeze-and-excitation (SE) as mentioned in this paper adaptively recalibrates channel-wise feature responses by explicitly modeling interdependencies between channels, which can be stacked together to form SENet architectures.

...read moreread less

Abstract: The central building block of convolutional neural networks (CNNs) is the convolution operator, which enables networks to construct informative features by fusing both spatial and channel-wise information within local receptive fields at each layer. A broad range of prior research has investigated the spatial component of this relationship, seeking to strengthen the representational power of a CNN by enhancing the quality of spatial encodings throughout its feature hierarchy. In this work, we focus instead on the channel relationship and propose a novel architectural unit, which we term the "Squeeze-and-Excitation" (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels. We show that these blocks can be stacked together to form SENet architectures that generalise extremely effectively across different datasets. We further demonstrate that SE blocks bring significant improvements in performance for existing state-of-the-art CNNs at slight additional computational cost. Squeeze-and-Excitation Networks formed the foundation of our ILSVRC 2017 classification submission which won first place and reduced the top-5 error to 2.251%, surpassing the winning entry of 2016 by a relative improvement of ~25%. Models and code are available at this https URL.

...read moreread less

5,411 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse