Home
/
Authors
/
Hanwang Zhang

Author

Hanwang Zhang

Other affiliations: Singapore Management University, Columbia University, National University of Singapore

Bio: Hanwang Zhang is an academic researcher from Nanyang Technological University. The author has contributed to research in topics: Computer science & Closed captioning. The author has an hindex of 50, co-authored 182 publications receiving 12890 citations. Previous affiliations of Hanwang Zhang include Singapore Management University & Columbia University.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
1992

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Neural Collaborative Filtering

[...]

Xiangnan He¹, Lizi Liao¹, Hanwang Zhang², Liqiang Nie³, Xia Hu⁴, Tat-Seng Chua¹ - Show less +2 more•Institutions (4)

National University of Singapore¹, Columbia University², Shandong University³, Texas A&M University⁴

03 Apr 2017

TL;DR: This work strives to develop techniques based on neural networks to tackle the key problem in recommendation --- collaborative filtering --- on the basis of implicit feedback, and presents a general framework named NCF, short for Neural network-based Collaborative Filtering.

...read moreread less

Abstract: In recent years, deep neural networks have yielded immense success on speech recognition, computer vision and natural language processing. However, the exploration of deep neural networks on recommender systems has received relatively less scrutiny. In this work, we strive to develop techniques based on neural networks to tackle the key problem in recommendation --- collaborative filtering --- on the basis of implicit feedback. Although some recent work has employed deep learning for recommendation, they primarily used it to model auxiliary information, such as textual descriptions of items and acoustic features of musics. When it comes to model the key factor in collaborative filtering --- the interaction between user and item features, they still resorted to matrix factorization and applied an inner product on the latent features of users and items. By replacing the inner product with a neural architecture that can learn an arbitrary function from data, we present a general framework named NCF, short for Neural network-based Collaborative Filtering. NCF is generic and can express and generalize matrix factorization under its framework. To supercharge NCF modelling with non-linearities, we propose to leverage a multi-layer perceptron to learn the user-item interaction function. Extensive experiments on two real-world datasets show significant improvements of our proposed NCF framework over the state-of-the-art methods. Empirical evidence shows that using deeper layers of neural networks offers better recommendation performance.

...read moreread less

4,419 citations

Proceedings Article•DOI•

SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning

[...]

Long Chen¹, Hanwang Zhang², Jun Xiao¹, Liqiang Nie³, Jian Shao¹, Wei Liu⁴, Tat-Seng Chua⁵ - Show less +3 more•Institutions (5)

Zhejiang University¹, Columbia University², Shandong University³, Tencent⁴, National University of Singapore⁵

21 Jul 2017

TL;DR: This paper introduces a novel convolutional neural network dubbed SCA-CNN that incorporates Spatial and Channel-wise Attentions in a CNN that significantly outperforms state-of-the-art visual attention-based image captioning methods.

...read moreread less

Abstract: Visual attention has been successfully applied in structural prediction tasks such as visual captioning and question answering. Existing visual attention models are generally spatial, i.e., the attention is modeled as spatial probabilities that re-weight the last conv-layer feature map of a CNN encoding an input image. However, we argue that such spatial attention does not necessarily conform to the attention mechanism — a dynamic feature extractor that combines contextual fixations over time, as CNN features are naturally spatial, channel-wise and multi-layer. In this paper, we introduce a novel convolutional neural network dubbed SCA-CNN that incorporates Spatial and Channel-wise Attentions in a CNN. In the task of image captioning, SCA-CNN dynamically modulates the sentence generation context in multi-layer feature maps, encoding where (i.e., attentive spatial locations at multiple layers) and what (i.e., attentive channels) the visual attention is. We evaluate the proposed SCA-CNN architecture on three benchmark image captioning datasets: Flickr8K, Flickr30K, and MSCOCO. It is consistently observed that SCA-CNN significantly outperforms state-of-the-art visual attention-based image captioning methods.

...read moreread less

1,527 citations

Proceedings Article•DOI•

Fast Matrix Factorization for Online Recommendation with Implicit Feedback

[...]

Xiangnan He¹, Hanwang Zhang¹, Min-Yen Kan¹, Tat-Seng Chua¹•Institutions (1)

National University of Singapore¹

07 Jul 2016

TL;DR: A new learning algorithm based on the element-wise Alternating Least Squares (eALS) technique is designed, for efficiently optimizing a Matrix Factorization (MF) model with variably-weighted missing data and exploiting this efficiency to then seamlessly devise an incremental update strategy that instantly refreshes a MF model given new feedback.

...read moreread less

Abstract: This paper contributes improvements on both the effectiveness and efficiency of Matrix Factorization (MF) methods for implicit feedback. We highlight two critical issues of existing works. First, due to the large space of unobserved feedback, most existing works resort to assign a uniform weight to the missing data to reduce computational complexity. However, such a uniform assumption is invalid in real-world settings. Second, most methods are also designed in an offline setting and fail to keep up with the dynamic nature of online data. We address the above two issues in learning MF models from implicit feedback. We first propose to weight the missing data based on item popularity, which is more effective and flexible than the uniform-weight assumption. However, such a non-uniform weighting poses efficiency challenge in learning the model. To address this, we specifically design a new learning algorithm based on the element-wise Alternating Least Squares (eALS) technique, for efficiently optimizing a MF model with variably-weighted missing data. We exploit this efficiency to then seamlessly devise an incremental update strategy that instantly refreshes a MF model given new feedback. Through comprehensive experiments on two public datasets in both offline and online protocols, we show that our implemented, open-source (https://github.com/hexiangnan/sigir16-eals) eALS consistently outperforms state-of-the-art implicit MF methods.

...read moreread less

916 citations

Proceedings Article•DOI•

Attentive Collaborative Filtering: Multimedia Recommendation with Item- and Component-Level Attention

[...]

Jingyuan Chen¹, Hanwang Zhang², Xiangnan He¹, Liqiang Nie³, Wei Liu⁴, Tat-Seng Chua¹ - Show less +2 more•Institutions (4)

National University of Singapore¹, Columbia University², Shandong University³, Tencent⁴

07 Aug 2017

TL;DR: A novel attention mechanism in CF is introduced to address the challenging item- and component-level implicit feedback in multimedia recommendation, dubbed Attentive Collaborative Filtering (ACF), which significantly outperforms state-of-the-art CF methods.

...read moreread less

Abstract: Multimedia content is dominating today's Web information. The nature of multimedia user-item interactions is 1/0 binary implicit feedback (e.g., photo likes, video views, song downloads, etc.), which can be collected at a larger scale with a much lower cost than explicit feedback (e.g., product ratings). However, the majority of existing collaborative filtering (CF) systems are not well-designed for multimedia recommendation, since they ignore the implicitness in users' interactions with multimedia content. We argue that, in multimedia recommendation, there exists item- and component-level implicitness which blurs the underlying users' preferences. The item-level implicitness means that users' preferences on items (e.g. photos, videos, songs, etc.) are unknown, while the component-level implicitness means that inside each item users' preferences on different components (e.g. regions in an image, frames of a video, etc.) are unknown. For example, a 'view'' on a video does not provide any specific information about how the user likes the video (i.e.item-level) and which parts of the video the user is interested in (i.e.component-level). In this paper, we introduce a novel attention mechanism in CF to address the challenging item- and component-level implicit feedback in multimedia recommendation, dubbed Attentive Collaborative Filtering (ACF). Specifically, our attention model is a neural network that consists of two attention modules: the component-level attention module, starting from any content feature extraction network (e.g. CNN for images/videos), which learns to select informative components of multimedia items, and the item-level attention module, which learns to score the item preferences. ACF can be seamlessly incorporated into classic CF models with implicit feedback, such as BPR and SVD++, and efficiently trained using SGD. Through extensive experiments on two real-world multimedia Web services: Vine and Pinterest, we show that ACF significantly outperforms state-of-the-art CF methods.

...read moreread less

735 citations

Posted Content•

SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning

[...]

Long Chen¹, Hanwang Zhang², Jun Xiao¹, Liqiang Nie³, Jian Shao¹, Wei Liu⁴, Tat-Seng Chua⁵ - Show less +3 more•Institutions (5)

Zhejiang University¹, Columbia University², Shandong University³, Tencent⁴, National University of Singapore⁵

17 Nov 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: SCA-CNN as mentioned in this paper incorporates spatial and channel-wise attentions in a CNN to dynamically modulate the sentence generation context in multi-layer feature maps, encoding where attentive spatial locations at multiple layers and what (i.e., attentive channels) the visual attention is.

...read moreread less

Abstract: Visual attention has been successfully applied in structural prediction tasks such as visual captioning and question answering. Existing visual attention models are generally spatial, i.e., the attention is modeled as spatial probabilities that re-weight the last conv-layer feature map of a CNN encoding an input image. However, we argue that such spatial attention does not necessarily conform to the attention mechanism --- a dynamic feature extractor that combines contextual fixations over time, as CNN features are naturally spatial, channel-wise and multi-layer. In this paper, we introduce a novel convolutional neural network dubbed SCA-CNN that incorporates Spatial and Channel-wise Attentions in a CNN. In the task of image captioning, SCA-CNN dynamically modulates the sentence generation context in multi-layer feature maps, encoding where (i.e., attentive spatial locations at multiple layers) and what (i.e., attentive channels) the visual attention is. We evaluate the proposed SCA-CNN architecture on three benchmark image captioning datasets: Flickr8K, Flickr30K, and MSCOCO. It is consistently observed that SCA-CNN significantly outperforms state-of-the-art visual attention-based image captioning methods.

...read moreread less

721 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Squeeze-and-Excitation Networks

[...]

Jie Hu¹, Li Shen², Samuel Albanie², Gang Sun¹, Enhua Wu¹ - Show less +1 more•Institutions (2)

Chinese Academy of Sciences¹, University of Oxford²

18 Jun 2018

TL;DR: This work proposes a novel architectural unit, which is term the "Squeeze-and-Excitation" (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels and finds that SE blocks produce significant performance improvements for existing state-of-the-art deep architectures at minimal additional computational cost.

...read moreread less

Abstract: The central building block of convolutional neural networks (CNNs) is the convolution operator, which enables networks to construct informative features by fusing both spatial and channel-wise information within local receptive fields at each layer. A broad range of prior research has investigated the spatial component of this relationship, seeking to strengthen the representational power of a CNN by enhancing the quality of spatial encodings throughout its feature hierarchy. In this work, we focus instead on the channel relationship and propose a novel architectural unit, which we term the “Squeeze-and-Excitation” (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels. We show that these blocks can be stacked together to form SENet architectures that generalise extremely effectively across different datasets. We further demonstrate that SE blocks bring significant improvements in performance for existing state-of-the-art CNNs at slight additional computational cost. Squeeze-and-Excitation Networks formed the foundation of our ILSVRC 2017 classification submission which won first place and reduced the top-5 error to 2.251 percent, surpassing the winning entry of 2016 by a relative improvement of ${\sim }$ ∼ 25 percent. Models and code are available at https://github.com/hujie-frank/SENet .

...read moreread less

14,807 citations

Posted Content•

CBAM: Convolutional Block Attention Module

[...]

Sanghyun Woo¹, Jongchan Park, Joon-Young Lee², In So Kweon¹•Institutions (2)

KAIST¹, Adobe Systems²

17 Jul 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: The proposed Convolutional Block Attention Module (CBAM), a simple yet effective attention module for feed-forward convolutional neural networks, can be integrated into any CNN architectures seamlessly with negligible overheads and is end-to-end trainable along with base CNNs.

...read moreread less

Abstract: We propose Convolutional Block Attention Module (CBAM), a simple yet effective attention module for feed-forward convolutional neural networks. Given an intermediate feature map, our module sequentially infers attention maps along two separate dimensions, channel and spatial, then the attention maps are multiplied to the input feature map for adaptive feature refinement. Because CBAM is a lightweight and general module, it can be integrated into any CNN architectures seamlessly with negligible overheads and is end-to-end trainable along with base CNNs. We validate our CBAM through extensive experiments on ImageNet-1K, MS~COCO detection, and VOC~2007 detection datasets. Our experiments show consistent improvements in classification and detection performances with various models, demonstrating the wide applicability of CBAM. The code and models will be publicly available.

...read moreread less

5,757 citations

Posted Content•

Squeeze-and-Excitation Networks

[...]

Jie Hu¹, Li Shen², Samuel Albanie², Gang Sun¹, Enhua Wu¹ - Show less +1 more•Institutions (2)

Chinese Academy of Sciences¹, University of Oxford²

05 Sep 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: Squeeze-and-excitation (SE) as mentioned in this paper adaptively recalibrates channel-wise feature responses by explicitly modeling interdependencies between channels, which can be stacked together to form SENet architectures.

...read moreread less

Abstract: The central building block of convolutional neural networks (CNNs) is the convolution operator, which enables networks to construct informative features by fusing both spatial and channel-wise information within local receptive fields at each layer. A broad range of prior research has investigated the spatial component of this relationship, seeking to strengthen the representational power of a CNN by enhancing the quality of spatial encodings throughout its feature hierarchy. In this work, we focus instead on the channel relationship and propose a novel architectural unit, which we term the "Squeeze-and-Excitation" (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels. We show that these blocks can be stacked together to form SENet architectures that generalise extremely effectively across different datasets. We further demonstrate that SE blocks bring significant improvements in performance for existing state-of-the-art CNNs at slight additional computational cost. Squeeze-and-Excitation Networks formed the foundation of our ILSVRC 2017 classification submission which won first place and reduced the top-5 error to 2.251%, surpassing the winning entry of 2016 by a relative improvement of ~25%. Models and code are available at this https URL.

...read moreread less

5,411 citations

Book Chapter•DOI•

CBAM: Convolutional Block Attention Module

[...]

Sanghyun Woo¹, Jongchan Park, Joon-Young Lee², In So Kweon¹•Institutions (2)

KAIST¹, Adobe Systems²

08 Sep 2018

TL;DR: Convolutional Block Attention Module (CBAM) as discussed by the authors is a simple yet effective attention module for feed-forward convolutional neural networks, given an intermediate feature map, the module sequentially infers attention maps along two separate dimensions, channel and spatial, then the attention maps are multiplied to the input feature map for adaptive feature refinement.

...read moreread less

Abstract: We propose Convolutional Block Attention Module (CBAM), a simple yet effective attention module for feed-forward convolutional neural networks. Given an intermediate feature map, our module sequentially infers attention maps along two separate dimensions, channel and spatial, then the attention maps are multiplied to the input feature map for adaptive feature refinement. Because CBAM is a lightweight and general module, it can be integrated into any CNN architectures seamlessly with negligible overheads and is end-to-end trainable along with base CNNs. We validate our CBAM through extensive experiments on ImageNet-1K, MS COCO detection, and VOC 2007 detection datasets. Our experiments show consistent improvements in classification and detection performances with various models, demonstrating the wide applicability of CBAM. The code and models will be publicly available.

...read moreread less

5,335 citations

Proceedings Article•DOI•

Neural Collaborative Filtering

[...]

Xiangnan He¹, Lizi Liao¹, Hanwang Zhang², Liqiang Nie³, Xia Hu⁴, Tat-Seng Chua¹ - Show less +2 more•Institutions (4)

National University of Singapore¹, Columbia University², Shandong University³, Texas A&M University⁴

03 Apr 2017

...read moreread less

4,419 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse