Home
/
Authors
/
Jiwoon Jeon

Author

Jiwoon Jeon

Other affiliations: Google

Bio: Jiwoon Jeon is an academic researcher from University of Massachusetts Amherst. The author has contributed to research in topics: Visual Word & Relevance (information retrieval). The author has an hindex of 10, co-authored 13 publications receiving 3732 citations. Previous affiliations of Jiwoon Jeon include Google.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Automatic image annotation and retrieval using cross-media relevance models

[...]

Jiwoon Jeon¹, Victor Lavrenko¹, R. Manmatha¹•Institutions (1)

University of Massachusetts Amherst¹

28 Jul 2003

TL;DR: The approach shows the usefulness of using formal information retrieval models for the task of image annotation and retrieval by assuming that regions in an image can be described using a small vocabulary of blobs.

...read moreread less

Abstract: Libraries have traditionally used manual image annotation for indexing and then later retrieving their image collections. However, manual image annotation is an expensive and labor intensive procedure and hence there has been great interest in coming up with automatic ways to retrieve images based on content. Here, we propose an automatic approach to annotating and retrieving images based on a training set of images. We assume that regions in an image can be described using a small vocabulary of blobs. Blobs are generated from image features using clustering. Given a training set of images with annotations, we show that probabilistic models allow us to predict the probability of generating a word given the blobs in an image. This may be used to automatically annotate and retrieve images given a word as a query. We show that relevance models allow us to derive these probabilities in a natural way. Experiments show that the annotation performance of this cross-media relevance model is almost six times as good (in terms of mean precision) than a model based on word-blob co-occurrence model and twice as good as a state of the art model derived from machine translation. Our approach shows the usefulness of using formal information retrieval models for the task of image annotation and retrieval.

...read moreread less

1,275 citations

Proceedings Article•

A Model for Learning the Semantics of Pictures

[...]

Victor Lavrenko¹, R. Manmatha¹, Jiwoon Jeon¹•Institutions (1)

University of Massachusetts Amherst¹

09 Dec 2003

TL;DR: An approach to learning the semantics of images which allows us to automatically annotate an image with keywords and to retrieve images based on text queries using a formalism that models the generation of annotated images.

...read moreread less

Abstract: We propose an approach to learning the semantics of images which allows us to automatically annotate an image with keywords and to retrieve images based on text queries. We do this using a formalism that models the generation of annotated images. We assume that every image is divided into regions, each described by a continuous-valued feature vector. Given a training set of images with annotations, we compute a joint probabilistic model of image features and words which allow us to predict the probability of generating a word given the image regions. This may be used to automatically annotate and retrieve images given a word as a query. Experiments show that our model significantly outperforms the best of the previously reported results on the tasks of automatic image annotation and retrieval.

...read moreread less

762 citations

Proceedings Article•DOI•

[...]

Jiwoon Jeon¹, W. Bruce Croft¹, Joon Ho Lee¹•Institutions (1)

University of Massachusetts Amherst¹

31 Oct 2005

TL;DR: Methods for question retrieval that are based on using the similarity between answers in the archive to estimate probabilities for a translation-based retrieval model are discussed and it is shown that with this model it is possible to find semantically similar questions with relatively little word overlap.

...read moreread less

Abstract: There has recently been a significant increase in the number of community-based question and answer services on the Web where people answer other peoples' questions. These services rapidly build up large archives of questions and answers, and these archives are a valuable linguistic resource. One of the major tasks in a question and answer service is to find questions in the archive that a semantically similar to a user's question. This enables high quality answers from the archive to be retrieved and removes the time lag associated with a community-based system. In this paper, we discuss methods for question retrieval that are based on using the similarity between answers in the archive to estimate probabilities for a translation-based retrieval model. We show that with this model it is possible to find semantically similar questions with relatively little word overlap.

...read moreread less

499 citations

Proceedings Article•DOI•

Retrieval models for question and answer archives

[...]

Xiaobing Xue¹, Jiwoon Jeon², W. Bruce Croft¹•Institutions (2)

University of Massachusetts Amherst¹, Google²

20 Jul 2008

TL;DR: A retrieval model that combines a translation-based language model for the question part with a query likelihood approach for the answer part and incorporates word-to-word translation probabilities learned through exploiting different sources of information is proposed.

...read moreread less

Abstract: Retrieval in a question and answer archive involves finding good answers for a user's question. In contrast to typical document retrieval, a retrieval model for this task can exploit question similarity as well as ranking the associated answers. In this paper, we propose a retrieval model that combines a translation-based language model for the question part with a query likelihood approach for the answer part. The proposed model incorporates word-to-word translation probabilities learned through exploiting different sources of information. Experiments show that the proposed translation based language model for the question part outperforms baseline methods significantly. By combining with the query likelihood language model for the answer part, substantial additional effectiveness improvements are obtained.

...read moreread less

406 citations

Proceedings Article•DOI•

A framework to predict the quality of answers with non-textual features

[...]

Jiwoon Jeon¹, W. Bruce Croft¹, Joon Ho Lee, Soyeon Park²•Institutions (2)

University of Massachusetts Amherst¹, Duksung Women's University²

06 Aug 2006

TL;DR: This paper presents a framework to use non-textual features to predict the quality of documents and shows the quality measure can be successfully incorporated into the language modeling-based retrieval model.

...read moreread less

Abstract: New types of document collections are being developed by various web services. The service providers keep track of non-textual features such as click counts. In this paper, we present a framework to use non-textual features to predict the quality of documents. We also show our quality measure can be successfully incorporated into the language modeling-based retrieval model. We test our approach on a collection of question and answer pairs gathered from a community based question answering service where people ask and answer questions. Experimental results using our quality measure show a significant improvement over our baseline.

...read moreread less

383 citations

Cited by

PDF

Open Access

More filters

Computer vision : a modern approach = 计算机视觉 : 一种现代的方法

[...]

David Forsyth, Jean Ponce

01 Jan 2004

TL;DR: Comprehensive and up-to-date, this book includes essential topics that either reflect practical significance or are of theoretical importance and describes numerous important application areas such as image based rendering and digital libraries.

...read moreread less

Abstract: From the Publisher: The accessible presentation of this book gives both a general view of the entire computer vision enterprise and also offers sufficient detail to be able to build useful applications. Users learn techniques that have proven to be useful by first-hand experience and a wide range of mathematical methods. A CD-ROM with every copy of the text contains source code for programming practice, color images, and illustrative movies. Comprehensive and up-to-date, this book includes essential topics that either reflect practical significance or are of theoretical importance. Topics are discussed in substantial and increasing depth. Application surveys describe numerous important application areas such as image based rendering and digital libraries. Many important algorithms broken down and illustrated in pseudo code. Appropriate for use by engineers as a comprehensive reference to the computer vision enterprise.

...read moreread less

3,627 citations

Journal Article•DOI•

Image retrieval: Ideas, influences, and trends of the new age

[...]

Ritendra Datta¹, Dhiraj Joshi¹, Jia Li¹, James Z. Wang¹•Institutions (1)

Pennsylvania State University¹

08 May 2008-ACM Computing Surveys

TL;DR: Almost 300 key theoretical and empirical contributions in the current decade related to image retrieval and automatic image annotation are surveyed, and the spawning of related subfields are discussed, to discuss the adaptation of existing image retrieval techniques to build systems that can be useful in the real world.

...read moreread less

Abstract: We have witnessed great interest and a wealth of promise in content-based image retrieval as an emerging technology. While the last decade laid foundation to such promise, it also paved the way for a large number of new techniques and systems, got many new people involved, and triggered stronger association of weakly related fields. In this article, we survey almost 300 key theoretical and empirical contributions in the current decade related to image retrieval and automatic image annotation, and in the process discuss the spawning of related subfields. We also discuss significant challenges involved in the adaptation of existing image retrieval techniques to build systems that can be useful in the real world. In retrospect of what has been achieved so far, we also conjecture what the future may hold for image retrieval research.

...read moreread less

3,433 citations

Posted Content•

Microsoft COCO Captions: Data Collection and Evaluation Server

[...]

Xinlei Chen, Hao Fang, Tsung-Yi Lin, Ramakrishna Vedantam, Saurabh Gupta, Piotr Dollár, C. Lawrence Zitnick - Show less +3 more

01 Apr 2015-arXiv: Computer Vision and Pattern Recognition

TL;DR: The Microsoft COCO Caption dataset and evaluation server are described and several popular metrics, including BLEU, METEOR, ROUGE and CIDEr are used to score candidate captions.

...read moreread less

Abstract: In this paper we describe the Microsoft COCO Caption dataset and evaluation server. When completed, the dataset will contain over one and a half million captions describing over 330,000 images. For the training and validation images, five independent human generated captions will be provided. To ensure consistency in evaluation of automatic caption generation algorithms, an evaluation server is used. The evaluation server receives candidate captions and scores them using several popular metrics, including BLEU, METEOR, ROUGE and CIDEr. Instructions for using the evaluation server are provided.

...read moreread less

1,691 citations

Journal Article•DOI•

Variational Inference for Dirichlet Process Mixtures

[...]

David M. Blei, Michael I. Jordan

01 Mar 2006-Bayesian Analysis

TL;DR: A variational inference algorithm forDP mixtures is presented and experiments that compare the algorithm to Gibbs sampling algorithms for DP mixtures of Gaussians and present an application to a large-scale image analysis problem are presented.

...read moreread less

Abstract: Dirichlet process (DP) mixture models are the cornerstone of non- parametric Bayesian statistics, and the development of Monte-Carlo Markov chain (MCMC) sampling methods for DP mixtures has enabled the application of non- parametric Bayesian methods to a variety of practical data analysis problems. However, MCMC sampling can be prohibitively slow, and it is important to ex- plore alternatives. One class of alternatives is provided by variational methods, a class of deterministic algorithms that convert inference problems into optimization problems (Opper and Saad 2001; Wainwright and Jordan 2003). Thus far, varia- tional methods have mainly been explored in the parametric setting, in particular within the formalism of the exponential family (Attias 2000; Ghahramani and Beal 2001; Blei et al. 2003). In this paper, we present a variational inference algorithm for DP mixtures. We present experiments that compare the algorithm to Gibbs sampling algorithms for DP mixtures of Gaussians and present an application to a large-scale image analysis problem.

...read moreread less

1,471 citations

Proceedings Article•DOI•

Finding high-quality content in social media

[...]

Eugene Agichtein¹, Carlos Castillo², Debora Donato², Aristides Gionis², Gilad Mishne² - Show less +1 more•Institutions (2)

Emory University¹, Yahoo!²

11 Feb 2008

TL;DR: This paper introduces a general classification framework for combining the evidence from different sources of information, that can be tuned automatically for a given social media type and quality definition, and shows that its system is able to separate high-quality items from the rest with an accuracy close to that of humans.

...read moreread less

Abstract: The quality of user-generated content varies drastically from excellent to abuse and spam. As the availability of such content increases, the task of identifying high-quality content sites based on user contributions --social media sites -- becomes increasingly important. Social media in general exhibit a rich variety of information sources: in addition to the content itself, there is a wide array of non-content information available, such as links between items and explicit quality ratings from members of the community. In this paper we investigate methods for exploiting such community feedback to automatically identify high quality content. As a test case, we focus on Yahoo! Answers, a large community question/answering portal that is particularly rich in the amount and types of content and social interactions available in it. We introduce a general classification framework for combining the evidence from different sources of information, that can be tuned automatically for a given social media type and quality definition. In particular, for the community question/answering domain, we show that our system is able to separate high-quality items from the rest with an accuracy close to that of humans

...read moreread less

1,300 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse