Home
/
Authors
/
Yi Yu

Author

Yi Yu

Other affiliations: Nara Women's University, National University of Singapore, NII Holdings ...read more

Bio: Yi Yu is an academic researcher from National Institute of Informatics. The author has contributed to research in topics: Lyrics & Feature learning. The author has an hindex of 24, co-authored 128 publications receiving 1931 citations. Previous affiliations of Yi Yu include Nara Women's University & National University of Singapore.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2010
2009
2008
2007
2005
1995

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Person Reidentification via Ranking Aggregation of Similarity Pulling and Dissimilarity Pushing

[...]

Mang Ye¹, Chao Liang¹, Yi Yu², Zheng Wang¹, Qingming Leng³, Chunxia Xiao¹, Jun Chen¹, Ruimin Hu¹ - Show less +4 more•Institutions (3)

Wuhan University¹, National Institute of Informatics², Jiujiang University³

01 Dec 2016-IEEE Transactions on Multimedia

TL;DR: A ranking aggregation algorithm is proposed to enhance the detection of similarity and dissimilarity based on the following assumption: the true match should be similar to the probe in different baseline methods, but also be dissimilar to those strongly dissimilar galleries of the probe.

...read moreread less

Abstract: Person reidentification is a key technique to match different persons observed in nonoverlapping camera views. Many researchers treat it as a special object-retrieval problem, where ranking optimization plays an important role. Existing ranking optimization methods mainly utilize the similarity relationship between the probe and gallery images to optimize the original ranking list, but seldom consider the important dissimilarity relationship. In this paper, we propose to use both similarity and dissimilarity cues in a ranking optimization framework for person reidentification. Its core idea is that the true match should not only be similar to those strongly similar galleries of the probe, but also be dissimilar to those strongly dissimilar galleries of the probe. Furthermore, motivated by the philosophy of multiview verification, a ranking aggregation algorithm is proposed to enhance the detection of similarity and dissimilarity based on the following assumption: the true match should be similar to the probe in different baseline methods. In other words, if a gallery blue image is strongly similar to the probe in one method, while simultaneously strongly dissimilar to the probe in another method, it will probably be a wrong match of the probe. Extensive experiments conducted on public benchmark datasets and comparisons with different baseline methods have shown the great superiority of the proposed ranking optimization method.

...read moreread less

183 citations

Journal Article•DOI•

Zero-Shot Person Re-identification via Cross-View Consistency

[...]

Zheng Wang¹, Ruimin Hu¹, Chao Liang¹, Yi Yu², Junjun Jiang³, Mang Ye¹, Jun Chen¹, Qingming Leng⁴ - Show less +4 more•Institutions (4)

Wuhan University¹, National Institute of Informatics², China University of Geosciences (Wuhan)³, Jiujiang University⁴

01 Feb 2016-IEEE Transactions on Multimedia

TL;DR: This paper proposes a data-driven distance metric (DDDM) method, re-exploiting the training data to adjust the metric for each query-gallery pair, with a significant improvement over three baseline metric learning methods.

...read moreread less

Abstract: Person re-identification, aiming to identify images of the same person from various cameras configured in different places, has attracted much attention in the multimedia retrieval community. In this problem, choosing a proper distance metric is a crucial aspect, and many classic methods utilize a uniform learnt metric. However, their performance is limited due to ignoring the zero-shot and fine-grained characteristics presented in real person re-identification applications. In this paper, we investigate two consistencies across two cameras, which are cross-view support consistency and cross-view projection consistency. The philosophy behind it is that, in spite of visual changes in two images of the same person under two camera views, the support sets in their respective views are highly consistent, and after being projected to the same view, their context sets are also highly consistent. Based on the above phenomena, we propose a data-driven distance metric (DDDM) method, re-exploiting the training data to adjust the metric for each query-gallery pair. Experiments conducted on three public data sets have validated the effectiveness of the proposed method, with a significant improvement over three baseline metric learning methods. In particular, on the public VIPeR dataset, the proposed method achieves an accuracy rate of 42.09% at rank-1, which outperforms the state-of-the-art methods by 4.29%.

...read moreread less

162 citations

Journal Article•DOI•

A Probabilistic Associative Model for Segmenting Weakly-Supervised Images.

[...]

Luming Zhang¹, Yi Yang², Yue Gao³, Yi Yu¹, Changbo Wang⁴, Xuelong Li⁵ - Show less +2 more•Institutions (5)

National University of Singapore¹, University of Queensland², Tsinghua University³, East China Normal University⁴, Chinese Academy of Sciences⁵

30 Jul 2014-IEEE Transactions on Image Processing

TL;DR: This paper proposes a new weakly supervised image segmentation model, focusing on learning the semantic associations between superpixel sets (graphlets in this paper), and presents a hierarchical Bayesian network to capture the semantic association between postembedding graphlets.

...read moreread less

Abstract: Weakly supervised image segmentation is an important yet challenging task in image processing and pattern recognition fields. It is defined as: in the training stage, semantic labels are only at the image-level, without regard to their specific object/scene location within the image. Given a test image, the goal is to predict the semantics of every pixel/superpixel. In this paper, we propose a new weakly supervised image segmentation model, focusing on learning the semantic associations between superpixel sets (graphlets in this paper). In particular, we first extract graphlets from each image, where a graphlet is a small-sized graph measures the potential of multiple spatially neighboring superpixels (i.e., the probability of these superpixels sharing a common semantic label, such as the sky or the sea). To compare different-sized graphlets and to incorporate image-level labels, a manifold embedding algorithm is designed to transform all graphlets into equal-length feature vectors. Finally, we present a hierarchical Bayesian network to capture the semantic associations between postembedding graphlets, based on which the semantics of each superpixel is inferred accordingly. Experimental results demonstrate that: 1) our approach performs competitively compared with the state-of-the-art approaches on three public data sets and 2) considerable performance enhancement is achieved when using our approach on segmentation-based photo cropping and image categorization.

...read moreread less

146 citations

Journal Article•DOI•

Deep Cross-Modal Correlation Learning for Audio and Lyrics in Music Retrieval

[...]

Yi Yu¹, Suhua Tang², Francisco Raposo³, Lei Chen⁴•Institutions (4)

National Institute of Informatics¹, University of Electro-Communications², University of Lisbon³, Hong Kong University of Science and Technology⁴

13 Feb 2019-ACM Transactions on Multimedia Computing, Communications, and Applications

TL;DR: This is the first study that uses deep architectures for learning the temporal correlation between audio and lyrics, involving two-branch deep neural networks for audio modality and text modality (lyrics) and two significant contributions are made in the audio branch.

...read moreread less

Abstract: Deep cross-modal learning has successfully demonstrated excellent performance in cross-modal multimedia retrieval, with the aim of learning joint representations between different data modalities. Unfortunately, little research focuses on cross-modal correlation learning where temporal structures of different data modalities, such as audio and lyrics, should be taken into account. Stemming from the characteristic of temporal structures of music in nature, we are motivated to learn the deep sequential correlation between audio and lyrics. In this work, we propose a deep cross-modal correlation learning architecture involving two-branch deep neural networks for audio modality and text modality (lyrics). Data in different modalities are converted to the same canonical space where intermodal canonical correlation analysis is utilized as an objective function to calculate the similarity of temporal structures. This is the first study that uses deep architectures for learning the temporal correlation between audio and lyrics. A pretrained Doc2Vec model followed by fully connected layers is used to represent lyrics. Two significant contributions are made in the audio branch, as follows: (i) We propose an end-to-end network to learn cross-modal correlation between audio and lyrics, where feature extraction and correlation learning are simultaneously performed and joint representation is learned by considering temporal structures. (ii) And, as for feature extraction, we further represent an audio signal by a short sequence of local summaries (VGG16 features) and apply a recurrent neural network to compute a compact feature that better learns the temporal structures of music audio. Experimental results, using audio to retrieve lyrics or using lyrics to retrieve audio, verify the effectiveness of the proposed deep correlation learning architectures in cross-modal music retrieval.

...read moreread less

84 citations

Journal Article•DOI•

Category-Based Deep CCA for Fine-Grained Venue Discovery From Multimodal Data

[...]

Yi Yu¹, Suhua Tang², Kiyoharu Aizawa³, Akiko Aizawa¹•Institutions (3)

National Institute of Informatics¹, University of Electro-Communications², University of Tokyo³

01 Apr 2019-IEEE Transactions on Neural Networks

TL;DR: In this paper, a novel deep learning model, category-based deep canonical correlation analysis, was proposed for fine-grained venue discovery from heterogeneous social multimodal data, where data in different modalities are projected to a same space via deep networks.

...read moreread less

Abstract: In this work, travel destinations and business locations are taken as venues. Discovering a venue by a photograph is very important for visual context-aware applications. Unfortunately, few efforts paid attention to complicated real images such as venue photographs generated by users. Our goal is fine-grained venue discovery from heterogeneous social multimodal data. To this end, we propose a novel deep learning model, category-based deep canonical correlation analysis. Given a photograph as input, this model performs: 1) exact venue search (find the venue where the photograph was taken) and 2) group venue search (find relevant venues that have the same category as the photograph), by the cross-modal correlation between the input photograph and textual description of venues. In this model, data in different modalities are projected to a same space via deep networks. Pairwise correlation (between different modality data from the same venue) for exact venue search and category-based correlation (between different modality data from different venues with the same category) for group venue search are jointly optimized. Because a photograph cannot fully reflect rich text description of a venue, the number of photographs per venue in the training phase is increased to capture more aspects of a venue. We build a new venue-aware multimodal data set by integrating Wikipedia featured articles and Foursquare venue photographs. Experimental results on this data set confirm the feasibility of the proposed method. Moreover, the evaluation over another publicly available data set confirms that the proposed method outperforms state of the arts for cross-modal retrieval between image and text.

...read moreread less

81 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•

다중혈관 관상동맥 환자에서 y-문합을 이용하여 양쪽 내흉동맥만을 사용한 우회술의 조기 성적

[...]

성기익, 이영탁, 박계현, 전태국, 박표원, 한일용, 장윤희 - Show less +3 more

01 Mar 2003-The Korean Journal of Thoracic and Cardiovascular Surgery

28,685 citations

The PASCAL Visual Object Classes Challenge

[...]

Jianguo Zhang

01 Jan 2006

3,012 citations

Proceedings Article•

Image Processing

[...]

E.E. Pissaloux¹•Institutions (1)

University of Paris¹

01 Jan 1994

TL;DR: The main focus in MUCKE is on cleaning large scale Web image corpora and on proposing image representations which are closer to the human interpretation of images.

...read moreread less

Abstract: MUCKE aims to mine a large volume of images, to structure them conceptually and to use this conceptual structuring in order to improve large-scale image retrieval. The last decade witnessed important progress concerning low-level image representations. However, there are a number problems which need to be solved in order to unleash the full potential of image mining in applications. The central problem with low-level representations is the mismatch between them and the human interpretation of image content. This problem can be instantiated, for instance, by the incapability of existing descriptors to capture spatial relationships between the concepts represented or by their incapability to convey an explanation of why two images are similar in a content-based image retrieval framework. We start by assessing existing local descriptors for image classification and by proposing to use co-occurrence matrices to better capture spatial relationships in images. The main focus in MUCKE is on cleaning large scale Web image corpora and on proposing image representations which are closer to the human interpretation of images. Consequently, we introduce methods which tackle these two problems and compare results to state of the art methods. Note: some aspects of this deliverable are withheld at this time as they are pending review. Please contact the authors for a preview.

...read moreread less

2,134 citations

Proceedings Article•DOI•

Re-ranking Person Re-identification with k-Reciprocal Encoding

[...]

Zhun Zhong¹, Liang Zheng², Donglin Cao¹, Shaozi Li¹•Institutions (2)

Xiamen University¹, University of Technology, Sydney²

21 Jul 2017

TL;DR: This paper proposes a k-reciprocal encoding method to re-rank the re-ID results, and hypothesis is that if a gallery image is similar to the probe in the k- Reciprocal nearest neighbors, it is more likely to be a true match.

...read moreread less

Abstract: When considering person re-identification (re-ID) as a retrieval process, re-ranking is a critical step to improve its accuracy. Yet in the re-ID community, limited effort has been devoted to re-ranking, especially those fully automatic, unsupervised solutions. In this paper, we propose a k-reciprocal encoding method to re-rank the re-ID results. Our hypothesis is that if a gallery image is similar to the probe in the k-reciprocal nearest neighbors, it is more likely to be a true match. Specifically, given an image, a k-reciprocal feature is calculated by encoding its k-reciprocal nearest neighbors into a single vector, which is used for re-ranking under the Jaccard distance. The final distance is computed as the combination of the original distance and the Jaccard distance. Our re-ranking method does not require any human interaction or any labeled data, so it is applicable to large-scale datasets. Experiments on the large-scale Market-1501, CUHK03, MARS, and PRW datasets confirm the effectiveness of our method.

...read moreread less

1,306 citations

Posted Content•

Person Re-identification: Past, Present and Future

[...]

Liang Zheng, Yi Yang, Alexander G. Hauptmann

10 Oct 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: The history of person re-identification and its relationship with image classification and instance retrieval is introduced and two new re-ID tasks which are much closer to real-world applications are described and discussed.

...read moreread less

Abstract: Person re-identification (re-ID) has become increasingly popular in the community due to its application and research significance. It aims at spotting a person of interest in other cameras. In the early days, hand-crafted algorithms and small-scale evaluation were predominantly reported. Recent years have witnessed the emergence of large-scale datasets and deep learning systems which make use of large data volumes. Considering different tasks, we classify most current re-ID methods into two classes, i.e., image-based and video-based; in both tasks, hand-crafted and deep learning systems will be reviewed. Moreover, two new re-ID tasks which are much closer to real-world applications are described and discussed, i.e., end-to-end re-ID and fast re-ID in very large galleries. This paper: 1) introduces the history of person re-ID and its relationship with image classification and instance retrieval; 2) surveys a broad selection of the hand-crafted systems and the large-scale methods in both image- and video-based re-ID; 3) describes critical future directions in end-to-end re-ID and fast retrieval in large galleries; and 4) finally briefs some important yet under-developed issues.

...read moreread less

984 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse