Proceedings ArticleDOI
A new approach to cross-modal multimedia retrieval
Nikhil Rasiwasia,Jose Costa Pereira,Emanuele Coviello,Gabriel Doyle,Gert R. G. Lanckriet,Roger Levy,Nuno Vasconcelos +6 more
- pp 251-260
TLDR
It is shown that accounting for cross-modal correlations and semantic abstraction both improve retrieval accuracy and are shown to outperform state-of-the-art image retrieval systems on a unimodal retrieval task.Abstract:
The problem of joint modeling the text and image components of multimedia documents is studied. The text component is represented as a sample from a hidden topic model, learned with latent Dirichlet allocation, and images are represented as bags of visual (SIFT) features. Two hypotheses are investigated: that 1) there is a benefit to explicitly modeling correlations between the two components, and 2) this modeling is more effective in feature spaces with higher levels of abstraction. Correlations between the two components are learned with canonical correlation analysis. Abstraction is achieved by representing text and images at a more general, semantic level. The two hypotheses are studied in the context of the task of cross-modal document retrieval. This includes retrieving the text that most closely matches a query image, or retrieving the images that most closely match a query text. It is shown that accounting for cross-modal correlations and semantic abstraction both improve retrieval accuracy. The cross-modal model is also shown to outperform state-of-the-art image retrieval systems on a unimodal retrieval task.read more
Citations
More filters
Journal ArticleDOI
Multimodal Machine Learning: A Survey and Taxonomy
TL;DR: This paper surveys the recent advances in multimodal machine learning itself and presents them in a common taxonomy to enable researchers to better understand the state of the field and identify directions for future research.
Journal ArticleDOI
Framing image description as a ranking task: data, models and evaluation metrics
TL;DR: This paper proposed to frame sentence-based image annotation as the task of ranking a given pool of captions and showed that the importance of training on multiple captions per image, and of capturing syntactic (word order-based) and semantic features of these captions, is emphasized.
Journal ArticleDOI
Visual Domain Adaptation: A survey of recent advances
TL;DR: A survey of domain adaptation methods for visual recognition discusses the merits and drawbacks of existing domain adaptation approaches and identifies promising avenues for research in this rapidly evolving field.
Proceedings ArticleDOI
Generalized Multiview Analysis: A discriminative latent space
TL;DR: GMA solves a joint, relaxed QCQP over different feature spaces to obtain a single (non)linear subspace and is a supervised extension of Canonical Correlational Analysis (CCA), which is useful for cross-view classification and retrieval.
Proceedings Article
Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics (Extended Abstract)
TL;DR: This work proposes to frame sentence-based image annotation as the task of ranking a given pool of captions, and introduces a new benchmark collection, consisting of 8,000 images that are each paired with five different captions which provide clear descriptions of the salient entities and events.
References
More filters
Journal ArticleDOI
Distinctive Image Features from Scale-Invariant Keypoints
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Book
Applied Logistic Regression
David W. Hosmer,Stanley Lemeshow +1 more
TL;DR: Hosmer and Lemeshow as discussed by the authors provide an accessible introduction to the logistic regression model while incorporating advances of the last decade, including a variety of software packages for the analysis of data sets.
Journal ArticleDOI
Latent dirichlet allocation
TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Journal ArticleDOI
Applied Logistic Regression.
TL;DR: Applied Logistic Regression, Third Edition provides an easily accessible introduction to the logistic regression model and highlights the power of this model by examining the relationship between a dichotomous outcome and a set of covariables.
Proceedings Article
Latent Dirichlet Allocation
TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).