Home
/
Authors
/
Xin-Jing Wang

Author

Xin-Jing Wang

Bio: Xin-Jing Wang is an academic researcher from Microsoft. The author has contributed to research in topics: Image retrieval & Automatic image annotation. The author has an hindex of 20, co-authored 55 publications receiving 1815 citations. Previous affiliations of Xin-Jing Wang include Tsinghua University.

Papers published on a yearly basis

2017
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
1989

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

AnnoSearch: Image Auto-Annotation by Search

[...]

Xin-Jing Wang¹, Lei Zhang¹, Feng Jing¹, Wei-Ying Ma¹•Institutions (1)

Microsoft¹

17 Jun 2006

TL;DR: This paper presents AnnoSearch, a novel way to annotate images using search and data mining technologies, which enables annotating with unlimited vocabulary, which is impossible for all existing approaches.

...read moreread less

Abstract: Although it has been studied for several years by computer vision and machine learning communities, image annotation is still far from practical. In this paper, we present AnnoSearch, a novel way to annotate images using search and data mining technologies. Leveraging the Web-scale images, we solve this problem in two-steps: 1) searching for semantically and visually similar images on the Web, 2) and mining annotations from them. Firstly, at least one accurate keyword is required to enable text-based search for a set of semantically similar images. Then content-based search is performed on this set to retrieve visually similar images. At last, annotations are mined from the descriptions (titles, URLs and surrounding texts) of these images. It worth highlighting that to ensure the efficiency, high dimensional visual features are mapped to hash codes which significantly speed up the content-based search process. Our proposed approach enables annotating with unlimited vocabulary, which is impossible for all existing approaches. Experimental results on real web images show the effectiveness and efficiency of the proposed algorithm.

...read moreread less

334 citations

Journal Article•DOI•

Annotating Images by Mining Image Search Results

[...]

Xin-Jing Wang¹, Lei Zhang¹, Xirong Li², Wei-Ying Ma¹•Institutions (2)

Microsoft¹, University of Amsterdam²

01 Nov 2008-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper proposes a novel attempt at model-free image annotation, which is a data-driven approach that annotates images by mining their search results, and enables annotating with unlimited vocabulary and is highly scalable and robust to outliers.

...read moreread less

Abstract: Although it has been studied for years by the computer vision and machine learning communities, image annotation is still far from practical. In this paper, we propose a novel attempt at model-free image annotation, which is a data-driven approach that annotates images by mining their search results. Some 2.4 million images with their surrounding text are collected from a few photo forums to support this approach. The entire process is formulated in a divide-and-conquer framework where a query keyword is provided along with the uncaptioned image to improve both the effectiveness and efficiency. This is helpful when the collected data set is not dense everywhere. In this sense, our approach contains three steps: 1) the search process to discover visually and semantically similar search results, 2) the mining process to identify salient terms from textual descriptions of the search results, and 3) the annotation rejection process to filter out noisy terms yielded by Step 2. To ensure real-time annotation, two key techniques are leveraged - one is to map the high-dimensional image visual features into hash codes, the other is to implement it as a distributed system, of which the search and mining processes are provided as Web services. As a typical result, the entire process finishes in less than 1 second. Since no training data set is required, our approach enables annotating with unlimited vocabulary and is highly scalable and robust to outliers. Experimental results on both real Web images and a benchmark image data set show the effectiveness and efficiency of the proposed algorithm. It is also worth noting that, although the entire approach is illustrated within the divide-and- conquer framework, a query keyword is not crucial to our current implementation. We provide experimental results to prove this.

...read moreread less

213 citations

Proceedings Article•DOI•

Ranking community answers by modeling question-answer relationships via analogical reasoning

[...]

Xin-Jing Wang¹, Xudong Tu, Dan Feng, Lei Zhang¹•Institutions (1)

Microsoft¹

19 Jul 2009

TL;DR: This work proposes an analogical reasoning-based approach which measures the analogy between the new question-answer linkages and those of relevant knowledge which contains only positive links; the candidate answer which has the most analogous link is assumed to be the best answer.

...read moreread less

Abstract: The method of finding high-quality answers has significant impact on user satisfaction in community question answering systems. However, due to the lexical gap between questions and answers as well as spam typically existing in user-generated content, filtering and ranking answers is very challenging. Previous solutions mainly focus on generating redundant features, or finding textual clues using machine learning techniques; none of them ever consider questions and their answers as relational data but instead model them as independent information. Moreover, they only consider the answers of the current question, and ignore any previous knowledge that would be helpful to bridge the lexical and semantic gap. We assume that answers are connected to their questions with various types of latent links, i.e. positive indicating high-quality answers, negative links indicating incorrect answers or user-generated spam, and propose an analogical reasoning-based approach which measures the analogy between the new question-answer linkages and those of relevant knowledge which contains only positive links; the candidate answer which has the most analogous link is assumed to be the best answer. We conducted experiments based on 29.8 million Yahoo!Answer question-answer threads and showed the effectiveness of our approach.

...read moreread less

113 citations

Patent•

Answer Ranking In Community Question-Answering Sites

[...]

Xin-Jing Wang¹, Lei Zhang¹, Wei-Ying Ma¹•Institutions (1)

Microsoft¹

23 Jan 2009

TL;DR: In this article, a plurality of first questions and corresponding first answers are identified at a community question-answer (CQA) site as a pluralityof first questionanswer (q-a) pairs.

...read moreread less

Abstract: In some implementations, a plurality of first questions and corresponding first answers are identified at a community question-answer (CQA) site as a plurality of first question-answer (q-a) pairs. A query thread comprised of a second question and a plurality of candidate second answers is selected for making a determination of answer quality. A set of the first questions that are similar to the second question are identified from the plurality of first questions. First linking features between the identified set of first questions and their corresponding first answers are used for determining an analogy with second linking features between the second question and candidate answers for ranking the candidate answers.

...read moreread less

111 citations

Proceedings Article•DOI•

[...]

Xin-Jing Wang¹, Wei-Ying Ma¹, Gui-Rong Xue², Xing Li³•Institutions (3)

Microsoft¹, Shanghai Jiao Tong University², Tsinghua University³

10 Oct 2004

TL;DR: An iterative similarity propagation approach to explore the inter-relationships between Web images and their textual annotations for image retrieval and shows that the proposed approach can significantly improve Web image retrieval performance.

...read moreread less

Abstract: In this paper, we propose an iterative similarity propagation approach to explore the inter-relationships between Web images and their textual annotations for image retrieval. By considering Web images as one type of objects, their surrounding texts as another type, and constructing the links structure between them via webpage analysis, we can iteratively reinforce the similarities between images. The basic idea is that if two objects of the same type are both related to one object of another type, these two objects are similar; likewise, if two objects of the same type are related to two different, but similar objects of another type, then to some extent, these two objects are also similar. The goal of our method is to fully exploit the mutual reinforcement between images and their textual annotations. Our experiments based on 10,628 images crawled from the Web show that our proposed approach can significantly improve Web image retrieval performance.

...read moreread less

102 citations

1
2
3
4
…
5
6
7
8
9
10
11

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Image retrieval: Ideas, influences, and trends of the new age

[...]

Ritendra Datta¹, Dhiraj Joshi¹, Jia Li¹, James Z. Wang¹•Institutions (1)

Pennsylvania State University¹

08 May 2008-ACM Computing Surveys

TL;DR: Almost 300 key theoretical and empirical contributions in the current decade related to image retrieval and automatic image annotation are surveyed, and the spawning of related subfields are discussed, to discuss the adaptation of existing image retrieval techniques to build systems that can be useful in the real world.

...read moreread less

Abstract: We have witnessed great interest and a wealth of promise in content-based image retrieval as an emerging technology. While the last decade laid foundation to such promise, it also paved the way for a large number of new techniques and systems, got many new people involved, and triggered stronger association of weakly related fields. In this article, we survey almost 300 key theoretical and empirical contributions in the current decade related to image retrieval and automatic image annotation, and in the process discuss the spawning of related subfields. We also discuss significant challenges involved in the adaptation of existing image retrieval techniques to build systems that can be useful in the real world. In retrospect of what has been achieved so far, we also conjecture what the future may hold for image retrieval research.

...read moreread less

3,433 citations

The PASCAL Visual Object Classes Challenge

[...]

Jianguo Zhang

01 Jan 2006

3,012 citations

Proceedings Article•DOI•

NUS-WIDE: a real-world web image database from National University of Singapore

[...]

Tat-Seng Chua¹, Jinhui Tang¹, Richang Hong¹, Haojie Li¹, Zhiping Luo¹, Yan-Tao Zheng¹ - Show less +2 more•Institutions (1)

National University of Singapore¹

08 Jul 2009

TL;DR: The benchmark results indicate that it is possible to learn effective models from sufficiently large image dataset to facilitate general image retrieval and four research issues on web image annotation and retrieval are identified.

...read moreread less

Abstract: This paper introduces a web image dataset created by NUS's Lab for Media Search. The dataset includes: (1) 269,648 images and the associated tags from Flickr, with a total of 5,018 unique tags; (2) six types of low-level features extracted from these images, including 64-D color histogram, 144-D color correlogram, 73-D edge direction histogram, 128-D wavelet texture, 225-D block-wise color moments extracted over 5x5 fixed grid partitions, and 500-D bag of words based on SIFT descriptions; and (3) ground-truth for 81 concepts that can be used for evaluation. Based on this dataset, we highlight characteristics of Web image collections and identify four research issues on web image annotation and retrieval. We also provide the baseline results for web image annotation by learning from the tags using the traditional k-NN algorithm. The benchmark results indicate that it is possible to learn effective models from sufficiently large image dataset to facilitate general image retrieval.

...read moreread less

2,648 citations

Proceedings Article•DOI•

Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition

[...]

Jianlong Fu¹, Heliang Zheng², Tao Mei¹•Institutions (2)

Microsoft¹, University of Science and Technology of China²

21 Jul 2017

TL;DR: Li et al. as discussed by the authors proposed a recurrent attention convolutional neural network (RA-CNN) which recursively learns discriminative region attention and region-based feature representation at multiple scales in a mutual reinforced way.

...read moreread less

Abstract: Recognizing fine-grained categories (e.g., bird species) is difficult due to the challenges of discriminative region localization and fine-grained feature learning. Existing approaches predominantly solve these challenges independently, while neglecting the fact that region detection and fine-grained feature learning are mutually correlated and thus can reinforce each other. In this paper, we propose a novel recurrent attention convolutional neural network (RA-CNN) which recursively learns discriminative region attention and region-based feature representation at multiple scales in a mutual reinforced way. The learning at each scale consists of a classification sub-network and an attention proposal sub-network (APN). The APN starts from full images, and iteratively generates region attention from coarse to fine by taking previous prediction as a reference, while the finer scale network takes as input an amplified attended region from previous scale in a recurrent way. The proposed RA-CNN is optimized by an intra-scale classification loss and an inter-scale ranking loss, to mutually learn accurate region attention and fine-grained representation. RA-CNN does not need bounding box/part annotations and can be trained end-to-end. We conduct comprehensive experiments and show that RA-CNN achieves the best performance in three fine-grained tasks, with relative accuracy gains of 3.3%, 3.7%, 3.8%, on CUB Birds, Stanford Dogs and Stanford Cars, respectively.

...read moreread less

1,035 citations

Proceedings Article•DOI•

Visual attention detection in video sequences using spatiotemporal cues

[...]

Yun Zhai¹, Mubarak Shah¹•Institutions (1)

University of Central Florida¹

23 Oct 2006

TL;DR: The proposed spatiotemporal video attention framework has been applied on over 20 testing video sequences, and attended regions are detected to highlight interesting objects and motions present in the sequences with very high user satisfaction rate.

...read moreread less

Abstract: Human vision system actively seeks interesting regions in images to reduce the search effort in tasks, such as object detection and recognition. Similarly, prominent actions in video sequences are more likely to attract our first sight than their surrounding neighbors. In this paper, we propose a spatiotemporal video attention detection technique for detecting the attended regions that correspond to both interesting objects and actions in video sequences. Both spatial and temporal saliency maps are constructed and further fused in a dynamic fashion to produce the overall spatiotemporal attention model. In the temporal attention model, motion contrast is computed based on the planar motions (homography) between images, which is estimated by applying RANSAC on point correspondences in the scene. To compensate the non-uniformity of spatial distribution of interest-points, spanning areas of motion segments are incorporated in the motion contrast computation. In the spatial attention model, a fast method for computing pixel-level saliency maps has been developed using color histograms of images. A hierarchical spatial attention representation is established to reveal the interesting points in images as well as the interesting regions. Finally, a dynamic fusion technique is applied to combine both the temporal and spatial saliency maps, where temporal attention is dominant over the spatial model when large motion contrast exists, and vice versa. The proposed spatiotemporal attention framework has been applied on over 20 testing video sequences, and attended regions are detected to highlight interesting objects and motions present in the sequences with very high user satisfaction rate.

...read moreread less

983 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse