Home
/
Authors
/
Kai Li

Author

Kai Li

Other affiliations: Wuhan University, NEC

Bio: Kai Li is an academic researcher from Northeastern University. The author has contributed to research in topics: Line segment & Feature (computer vision). The author has an hindex of 17, co-authored 44 publications receiving 3362 citations. Previous affiliations of Kai Li include Wuhan University & NEC.

Papers

PDF

Open Access

More filters

Posted Content•

Image Super-Resolution Using Very Deep Residual Channel Attention Networks

[...]

Yulun Zhang¹, Kunpeng Li¹, Kai Li¹, Lichen Wang¹, Bineng Zhong¹, Yun Fu¹ - Show less +2 more•Institutions (1)

Northeastern University¹

08 Jul 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work proposes a residual in residual (RIR) structure to form very deep network, which consists of several residual groups with long skip connections, and proposes a channel attention mechanism to adaptively rescale channel-wise features by considering interdependencies among channels.

...read moreread less

Abstract: Convolutional neural network (CNN) depth is of crucial importance for image super-resolution (SR). However, we observe that deeper networks for image SR are more difficult to train. The low-resolution inputs and features contain abundant low-frequency information, which is treated equally across channels, hence hindering the representational ability of CNNs. To solve these problems, we propose the very deep residual channel attention networks (RCAN). Specifically, we propose a residual in residual (RIR) structure to form very deep network, which consists of several residual groups with long skip connections. Each residual group contains some residual blocks with short skip connections. Meanwhile, RIR allows abundant low-frequency information to be bypassed through multiple skip connections, making the main network focus on learning high-frequency information. Furthermore, we propose a channel attention mechanism to adaptively rescale channel-wise features by considering interdependencies among channels. Extensive experiments show that our RCAN achieves better accuracy and visual improvements against state-of-the-art methods.

...read moreread less

2,025 citations

Book Chapter•DOI•

Image Super-Resolution Using Very Deep Residual Channel Attention Networks

[...]

Yulun Zhang¹, Kunpeng Li¹, Kai Li¹, Lichen Wang¹, Bineng Zhong¹, Yun Fu¹ - Show less +2 more•Institutions (1)

Northeastern University¹

08 Sep 2018

TL;DR: Very deep residual channel attention networks (RCAN) as mentioned in this paper proposes a residual in residual (RIR) structure to form very deep network, which consists of several residual groups with long skip connections Each residual group contains some residual blocks with short skip connections.

...read moreread less

Abstract: Convolutional neural network (CNN) depth is of crucial importance for image super-resolution (SR) However, we observe that deeper networks for image SR are more difficult to train The low-resolution inputs and features contain abundant low-frequency information, which is treated equally across channels, hence hindering the representational ability of CNNs To solve these problems, we propose the very deep residual channel attention networks (RCAN) Specifically, we propose a residual in residual (RIR) structure to form very deep network, which consists of several residual groups with long skip connections Each residual group contains some residual blocks with short skip connections Meanwhile, RIR allows abundant low-frequency information to be bypassed through multiple skip connections, making the main network focus on learning high-frequency information Furthermore, we propose a channel attention mechanism to adaptively rescale channel-wise features by considering interdependencies among channels Extensive experiments show that our RCAN achieves better accuracy and visual improvements against state-of-the-art methods

...read moreread less

1,991 citations

Proceedings Article•DOI•

Visual Semantic Reasoning for Image-Text Matching

[...]

Kunpeng Li¹, Yulun Zhang¹, Kai Li¹, Yuanyuan Li¹, Yun Fu¹ - Show less +1 more•Institutions (1)

Northeastern University¹

01 Oct 2019

TL;DR: A simple and interpretable reasoning model to generate visual representation that captures key objects and semantic concepts of a scene that outperforms the current best method for image retrieval and caption retrieval on MS-COCO and Flickr30K datasets.

...read moreread less

Abstract: Image-text matching has been a hot research topic bridging the vision and language areas. It remains challenging because the current representation of image usually lacks global semantic concepts as in its corresponding text caption. To address this issue, we propose a simple and interpretable reasoning model to generate visual representation that captures key objects and semantic concepts of a scene. Specifically, we first build up connections between image regions and perform reasoning with Graph Convolutional Networks to generate features with semantic relationships. Then, we propose to use the gate and memory mechanism to perform global semantic reasoning on these relationship-enhanced features, select the discriminative information and gradually generate the representation for the whole scene. Experiments validate that our method achieves a new state-of-the-art for the image-text matching on MS-COCO and Flickr30K datasets. It outperforms the current best method by 6.8% relatively for image retrieval and 4.8% relatively for caption retrieval on MS-COCO (Recall@1 using 1K test set). On Flickr30K, our model improves image retrieval by 12.6% relatively and caption retrieval by 5.8% relatively (Recall@1).

...read moreread less

393 citations

Posted Content•

Residual Non-local Attention Networks for Image Restoration.

[...]

Yulun Zhang¹, Kunpeng Li¹, Kai Li¹, Bineng Zhong², Yun Fu¹ - Show less +1 more•Institutions (2)

Northeastern University¹, Huaqiao University²

24 Mar 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: The proposed residual local and non-local attention learning to train the very deep network is generalized for various image restoration applications, such as image denoising, demosaicing, compression artifacts reduction, and super-resolution.

...read moreread less

Abstract: In this paper, we propose a residual non-local attention network for high-quality image restoration. Without considering the uneven distribution of information in the corrupted images, previous methods are restricted by local convolutional operation and equal treatment of spatial- and channel-wise features. To address this issue, we design local and non-local attention blocks to extract features that capture the long-range dependencies between pixels and pay more attention to the challenging parts. Specifically, we design trunk branch and (non-)local mask branch in each (non-)local attention block. The trunk branch is used to extract hierarchical features. Local and non-local mask branches aim to adaptively rescale these hierarchical features with mixed attentions. The local mask branch concentrates on more local structures with convolutional operations, while non-local attention considers more about long-range dependencies in the whole feature map. Furthermore, we propose residual local and non-local attention learning to train the very deep network, which further enhance the representation ability of the network. Our proposed method can be generalized for various image restoration applications, such as image denoising, demosaicing, compression artifacts reduction, and super-resolution. Experiments demonstrate that our method obtains comparable or better results compared with recently leading methods quantitatively and visually.

...read moreread less

257 citations

Proceedings Article•

Residual Non-local Attention Networks for Image Restoration

[...]

Yulun Zhang¹, Kunpeng Li¹, Kai Li¹, Bineng Zhong², Yun Fu¹ - Show less +1 more•Institutions (2)

Northeastern University¹, Huaqiao University²

01 Jan 2019

TL;DR: Zhang et al. as discussed by the authors proposed a residual non-local attention network for high-quality image restoration, which designed a trunk branch and (non-) local mask branch in each attention block.

...read moreread less

230 citations

1
2
3
4
…
5
6
7
8
9
10

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•

“Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告

[...]

杉山拓海

12 Sep 2017-Computers & Graphics

3,940 citations

Book Chapter•DOI•

ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks

[...]

Xintao Wang¹, Ke Yu¹, Shixiang Wu², Jinjin Gu¹, Yihao Liu², Chao Dong², Yu Qiao², Chen Change Loy³ - Show less +4 more•Institutions (3)

The Chinese University of Hong Kong¹, Chinese Academy of Sciences², Nanyang Technological University³

08 Sep 2018

TL;DR: ESRGAN as mentioned in this paper improves the perceptual loss by using the features before activation, which could provide stronger supervision for brightness consistency and texture recovery, and won the first place in the PIRM2018-SR Challenge (region 3).

...read moreread less

Abstract: The Super-Resolution Generative Adversarial Network (SRGAN) is a seminal work that is capable of generating realistic textures during single image super-resolution. However, the hallucinated details are often accompanied with unpleasant artifacts. To further enhance the visual quality, we thoroughly study three key components of SRGAN – network architecture, adversarial loss and perceptual loss, and improve each of them to derive an Enhanced SRGAN (ESRGAN). In particular, we introduce the Residual-in-Residual Dense Block (RRDB) without batch normalization as the basic network building unit. Moreover, we borrow the idea from relativistic GAN to let the discriminator predict relative realness instead of the absolute value. Finally, we improve the perceptual loss by using the features before activation, which could provide stronger supervision for brightness consistency and texture recovery. Benefiting from these improvements, the proposed ESRGAN achieves consistently better visual quality with more realistic and natural textures than SRGAN and won the first place in the PIRM2018-SR Challenge (region 3) with the best perceptual index. The code is available at https://github.com/xinntao/ESRGAN.

...read moreread less

2,298 citations

Proceedings Article•DOI•

Selective Kernel Networks

[...]

Xiang Li¹, Wenhai Wang², Xiaolin Hu³, Jian Yang¹•Institutions (3)

Nanjing University of Science and Technology¹, Tsinghua University², Nanjing University³

01 Jun 2019

TL;DR: SKNet as discussed by the authors proposes a dynamic selection mechanism in CNNs that allows each neuron to adaptively adjust its receptive field size based on multiple scales of input information, which can capture target objects with different scales.

...read moreread less

Abstract: In standard Convolutional Neural Networks (CNNs), the receptive fields of artificial neurons in each layer are designed to share the same size. It is well-known in the neuroscience community that the receptive field size of visual cortical neurons are modulated by the stimulus, which has been rarely considered in constructing CNNs. We propose a dynamic selection mechanism in CNNs that allows each neuron to adaptively adjust its receptive field size based on multiple scales of input information. A building block called Selective Kernel (SK) unit is designed, in which multiple branches with different kernel sizes are fused using softmax attention that is guided by the information in these branches. Different attentions on these branches yield different sizes of the effective receptive fields of neurons in the fusion layer. Multiple SK units are stacked to a deep network termed Selective Kernel Networks (SKNets). On the ImageNet and CIFAR benchmarks, we empirically show that SKNet outperforms the existing state-of-the-art architectures with lower model complexity. Detailed analyses show that the neurons in SKNet can capture target objects with different scales, which verifies the capability of neurons for adaptively adjusting their receptive field sizes according to the input. The code and models are available at https://github.com/implus/SKNet.

...read moreread less

1,401 citations

Proceedings Article•DOI•

Second-Order Attention Network for Single Image Super-Resolution

[...]

Tao Dai¹, Jianrui Cai², Yongbing Zhang¹, Shu-Tao Xia¹, Lei Zhang² - Show less +1 more•Institutions (2)

Tsinghua University¹, Hong Kong Polytechnic University²

15 Jun 2019

TL;DR: Experimental results demonstrate the superiority of the SAN network over state-of-the-art SISR methods in terms of both quantitative metrics and visual quality.

...read moreread less

Abstract: Recently, deep convolutional neural networks (CNNs) have been widely explored in single image super-resolution (SISR) and obtained remarkable performance. However, most of the existing CNN-based SISR methods mainly focus on wider or deeper architecture design, neglecting to explore the feature correlations of intermediate layers, hence hindering the representational power of CNNs. To address this issue, in this paper, we propose a second-order attention network (SAN) for more powerful feature expression and feature correlation learning. Specifically, a novel train- able second-order channel attention (SOCA) module is developed to adaptively rescale the channel-wise features by using second-order feature statistics for more discriminative representations. Furthermore, we present a non-locally enhanced residual group (NLRG) structure, which not only incorporates non-local operations to capture long-distance spatial contextual information, but also contains repeated local-source residual attention groups (LSRAG) to learn increasingly abstract feature representations. Experimental results demonstrate the superiority of our SAN network over state-of-the-art SISR methods in terms of both quantitative metrics and visual quality.

...read moreread less

1,219 citations

Posted Content•

ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks

[...]

Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Chen Change Loy, Yu Qiao, Xiaoou Tang - Show less +5 more

01 Sep 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work thoroughly study three key components of SRGAN – network architecture, adversarial loss and perceptual loss, and improves each of them to derive an Enhanced SRGAN (ESRGAN), which achieves consistently better visual quality with more realistic and natural textures than SRGAN.

...read moreread less

Abstract: The Super-Resolution Generative Adversarial Network (SRGAN) is a seminal work that is capable of generating realistic textures during single image super-resolution. However, the hallucinated details are often accompanied with unpleasant artifacts. To further enhance the visual quality, we thoroughly study three key components of SRGAN - network architecture, adversarial loss and perceptual loss, and improve each of them to derive an Enhanced SRGAN (ESRGAN). In particular, we introduce the Residual-in-Residual Dense Block (RRDB) without batch normalization as the basic network building unit. Moreover, we borrow the idea from relativistic GAN to let the discriminator predict relative realness instead of the absolute value. Finally, we improve the perceptual loss by using the features before activation, which could provide stronger supervision for brightness consistency and texture recovery. Benefiting from these improvements, the proposed ESRGAN achieves consistently better visual quality with more realistic and natural textures than SRGAN and won the first place in the PIRM2018-SR Challenge. The code is available at this https URL .

...read moreread less

915 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse