Home
/
Authors
/
Yao Zhao

Author

Yao Zhao

Other affiliations: Texas A&M University, University of Electronic Science and Technology of China, Nanyang Technological University

Bio: Yao Zhao is an academic researcher from Beijing Jiaotong University. The author has contributed to research in topics: Computer science & Feature (computer vision). The author has an hindex of 35, co-authored 524 publications receiving 6660 citations. Previous affiliations of Yao Zhao include Texas A&M University & University of Electronic Science and Technology of China.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2000
1998
1996
1994

Papers

PDF

Open Access

More filters

Journal Article•DOI•

HCP: A Flexible CNN Framework for Multi-Label Image Classification

[...]

Yunchao Wei¹, Wei Xia², Min Lin², Junshi Huang², Bingbing Ni³, Jian Dong², Yao Zhao¹, Shuicheng Yan² - Show less +4 more•Institutions (3)

Beijing Jiaotong University¹, National University of Singapore², Shanghai Jiao Tong University³

01 Sep 2016-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Experimental results on Pascal VOC 2007 and VOC 2012 multi-label image datasets well demonstrate the superiority of the proposed HCP infrastructure over other state-of-the-arts, where an arbitrary number of object segment hypotheses are taken as the inputs.

...read moreread less

Abstract: Convolutional Neural Network (CNN) has demonstrated promising performance in single-label image classification tasks. However, how CNN best copes with multi-label images still remains an open problem, mainly due to the complex underlying object layouts and insufficient multi-label training images. In this work, we propose a flexible deep CNN infrastructure, called Hypotheses-CNN-Pooling (HCP), where an arbitrary number of object segment hypotheses are taken as the inputs, then a shared CNN is connected with each hypothesis, and finally the CNN output results from different hypotheses are aggregated with max pooling to produce the ultimate multi-label predictions. Some unique characteristics of this flexible deep CNN infrastructure include: 1) no ground-truth bounding box information is required for training; 2) the whole HCP infrastructure is robust to possibly noisy and/or redundant hypotheses; 3) the shared CNN is flexible and can be well pre-trained with a large-scale single-label image dataset, e.g., ImageNet; and 4) it may naturally output multi-label prediction results. Experimental results on Pascal VOC 2007 and VOC 2012 multi-label image datasets well demonstrate the superiority of the proposed HCP infrastructure over other state-of-the-arts. In particular, the mAP reaches 90.5% by HCP only and 93.2% after the fusion with our complementary result in [12] based on hand-crafted features on the VOC 2012 dataset.

...read moreread less

722 citations

Journal Article•DOI•

STC: A Simple to Complex Framework for Weakly-Supervised Semantic Segmentation

[...]

Yunchao Wei¹, Xiaodan Liang, Yunpeng Chen², Xiaohui Shen³, Ming-Ming Cheng⁴, Jiashi Feng², Yao Zhao¹, Shuicheng Yan² - Show less +4 more•Institutions (4)

Beijing Jiaotong University¹, National University of Singapore², Adobe Systems³, Nankai University⁴

01 Nov 2017-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A simple to complex (STC) framework in which only image-level annotations are utilized to learn DCNNs for semantic segmentation, which demonstrates the superiority of the proposed STC framework compared with other state-of-the-arts frameworks.

...read moreread less

Abstract: Recently, significant improvement has been made on semantic object segmentation due to the development of deep convolutional neural networks (DCNNs). Training such a DCNN usually relies on a large number of images with pixel-level segmentation masks, and annotating these images is very costly in terms of both finance and human effort. In this paper, we propose a simple to complex (STC) framework in which only image-level annotations are utilized to learn DCNNs for semantic segmentation. Specifically, we first train an initial segmentation network called Initial-DCNN with the saliency maps of simple images (i.e., those with a single category of major object(s) and clean background). These saliency maps can be automatically obtained by existing bottom-up salient object detection techniques, where no supervision information is needed. Then, a better network called Enhanced-DCNN is learned with supervision from the predicted segmentation masks of simple images based on the Initial-DCNN as well as the image-level annotations. Finally, more pixel-level segmentation masks of complex images (two or more categories of objects with cluttered background), which are inferred by using Enhanced-DCNN and image-level annotations, are utilized as the supervision information to learn the Powerful-DCNN for semantic segmentation. Our method utilizes 40K simple images from Flickr.com and 10K complex images from PASCAL VOC for step-wisely boosting the segmentation network. Extensive experimental results on PASCAL VOC 2012 segmentation benchmark well demonstrate the superiority of the proposed STC framework compared with other state-of-the-arts.

...read moreread less

526 citations

Journal Article•DOI•

Pairwise Prediction-Error Expansion for Efficient Reversible Data Hiding

[...]

Bo Ou¹, Xiaolong Li², Yao Zhao¹, Rongrong Ni¹, Yun-Qing Shi³ - Show less +1 more•Institutions (3)

Beijing Jiaotong University¹, Peking University², New Jersey Institute of Technology³

01 Dec 2013-IEEE Transactions on Image Processing

TL;DR: This paper proposes to consider every two adjacent prediction-errors jointly to generate a sequence consisting of prediction-error pairs, and based on the sequence and the resulting 2D prediction- error histogram, a more efficient embedding strategy, namely, pairwise PEE, can be designed to achieve an improved performance.

...read moreread less

Abstract: In prediction-error expansion (PEE) based reversible data hiding, better exploiting image redundancy usually leads to a superior performance. However, the correlations among prediction-errors are not considered and utilized in current PEE based methods. Specifically, in PEE, the prediction-errors are modified individually in data embedding. In this paper, to better exploit these correlations, instead of utilizing prediction-errors individually, we propose to consider every two adjacent prediction-errors jointly to generate a sequence consisting of prediction-error pairs. Then, based on the sequence and the resulting 2D prediction-error histogram, a more efficient embedding strategy, namely, pairwise PEE, can be designed to achieve an improved performance. The superiority of our method is verified through extensive experiments.

...read moreread less

422 citations

Journal Article•DOI•

Cross-Modal Retrieval With CNN Visual Features: A New Baseline

[...]

Yunchao Wei¹, Yao Zhao¹, Canyi Lu², Shikui Wei¹, Luoqi Liu², Zhenfeng Zhu¹, Shuicheng Yan² - Show less +3 more•Institutions (2)

Beijing Jiaotong University¹, National University of Singapore²

01 Feb 2017-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: Off-the-shelf CNN visual features are extracted from the CNN model, which is pretrained on ImageNet with more than one million images from 1000 object categories, as a generic image representation to tackle cross-modal retrieval.

...read moreread less

Abstract: Recently, convolutional neural network (CNN) visual features have demonstrated their powerful ability as a universal representation for various recognition tasks. In this paper, cross-modal retrieval with CNN visual features is implemented with several classic methods. Specifically, off-the-shelf CNN visual features are extracted from the CNN model, which is pretrained on ImageNet with more than one million images from 1000 object categories, as a generic image representation to tackle cross-modal retrieval. To further enhance the representational ability of CNN visual features, based on the pretrained CNN model on ImageNet, a fine-tuning step is performed by using the open source Caffe CNN library for each target data set. Besides, we propose a deep semantic matching method to address the cross-modal retrieval problem with respect to samples which are annotated with one or multiple labels. Extensive experiments on five popular publicly available data sets well demonstrate the superiority of CNN visual features for cross-modal retrieval.

...read moreread less

329 citations

Journal Article•DOI•

EA-LSTM: Evolutionary attention-based LSTM for time series prediction

[...]

Youru Li¹, Zhenfeng Zhu¹, Deqiang Kong², Hua Han³, Hua Han⁴, Yao Zhao¹ - Show less +2 more•Institutions (4)

Beijing Jiaotong University¹, Microsoft², Center for Excellence in Education³, Chinese Academy of Sciences⁴

01 Oct 2019-Knowledge Based Systems

TL;DR: In this paper, an evolutionary attention-based LSTM training with competitive random search is proposed for multivariate time series prediction, which can help to capture long-term dependencies and pay different degree of attention on subwindow feature within multiple time-steps.

...read moreread less

Abstract: Time series prediction with deep learning methods, especially Long Short-term Memory Neural Network (LSTM), have scored significant achievements in recent years. Despite the fact that LSTM can help to capture long-term dependencies, its ability to pay different degree of attention on sub-window feature within multiple time-steps is insufficient. To address this issue, an evolutionary attention-based LSTM training with competitive random search is proposed for multivariate time series prediction. By transferring shared parameters, an evolutionary attention learning approach is introduced to LSTM. Thus, like that for biological evolution, the pattern for importance-based attention sampling can be confirmed during temporal relationship mining. To refrain from being trapped into partial optimization like traditional gradient-based methods, an evolutionary computation inspired competitive random search method is proposed, which can well configure the parameters in the attention layer. Experimental results have illustrated that the proposed model can achieve competetive prediction performance compared with other baseline methods.

...read moreread less

236 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•

다중혈관 관상동맥 환자에서 y-문합을 이용하여 양쪽 내흉동맥만을 사용한 우회술의 조기 성적

[...]

성기익, 이영탁, 박계현, 전태국, 박표원, 한일용, 장윤희 - Show less +3 more

01 Mar 2003-The Korean Journal of Thoracic and Cardiovascular Surgery

28,685 citations

[신간의 별자리x] 우리/미술, 그리고 ‘슬픔의 박물관’

[...]

이화영

01 Jan 2015

12,972 citations

Proceedings Article•DOI•

The Cityscapes Dataset for Semantic Urban Scene Understanding

[...]

Marius Cordts¹, Mohamed Omran², Sebastian Ramos³, Timo Rehfeld¹, Markus Enzweiler³, Rodrigo Benenson², Uwe Franke³, Stefan Roth¹, Bernt Schiele² - Show less +5 more•Institutions (3)

Technische Universität Darmstadt¹, Max Planck Society², Daimler AG³

01 Jun 2016

TL;DR: This work introduces Cityscapes, a benchmark suite and large-scale dataset to train and test approaches for pixel-level and instance-level semantic labeling, and exceeds previous attempts in terms of dataset size, annotation richness, scene variability, and complexity.

...read moreread less

Abstract: Visual understanding of complex urban street scenes is an enabling factor for a wide range of applications. Object detection has benefited enormously from large-scale datasets, especially in the context of deep learning. For semantic urban scene understanding, however, no current dataset adequately captures the complexity of real-world urban scenes. To address this, we introduce Cityscapes, a benchmark suite and large-scale dataset to train and test approaches for pixel-level and instance-level semantic labeling. Cityscapes is comprised of a large, diverse set of stereo video sequences recorded in streets from 50 different cities. 5000 of these images have high quality pixel-level annotations, 20 000 additional images have coarse annotations to enable methods that leverage large volumes of weakly-labeled data. Crucially, our effort exceeds previous attempts in terms of dataset size, annotation richness, scene variability, and complexity. Our accompanying empirical study provides an in-depth analysis of the dataset characteristics, as well as a performance evaluation of several state-of-the-art approaches based on our benchmark.

...read moreread less

7,547 citations

Journal Article•

“Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告

[...]

杉山拓海

12 Sep 2017-Computers & Graphics

3,940 citations

Proceedings Article•DOI•

Return of the Devil in the Details: Delving Deep into Convolutional Nets

[...]

Ken Chatfield¹, Karen Simonyan¹, Andrea Vedaldi¹, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

14 May 2014

TL;DR: It is shown that the data augmentation techniques commonly applied to CNN-based methods can also be applied to shallow methods, and result in an analogous performance boost, and it is identified that the dimensionality of the CNN output layer can be reduced significantly without having an adverse effect on performance.

...read moreread less

Abstract: The latest generation of Convolutional Neural Networks (CNN) have achieved impressive results in challenging benchmarks on image recognition and object detection, significantly raising the interest of the community in these methods. Nevertheless, it is still unclear how different CNN methods compare with each other and with previous state-of-the-art shallow representations such as the Bag-of-Visual-Words and the Improved Fisher Vector. This paper conducts a rigorous evaluation of these new techniques, exploring different deep architectures and comparing them on a common ground, identifying and disclosing important implementation details. We identify several useful properties of CNN-based representations, including the fact that the dimensionality of the CNN output layer can be reduced significantly without having an adverse effect on performance. We also identify aspects of deep and shallow methods that can be successfully shared. In particular, we show that the data augmentation techniques commonly applied to CNN-based methods can also be applied to shallow methods, and result in an analogous performance boost. Source code and models to reproduce the experiments in the paper is made publicly available.

...read moreread less

3,533 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse