Multiple Bernoulli relevance models for image and video annotation
read more
Citations
Deep Sets
A new approach to cross-modal multimedia retrieval
Supervised Learning of Semantic Classes for Image Annotation and Retrieval
TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation
Large Scale Online Learning of Image Similarity Through Ranking
References
Rapid object detection using a boosted cascade of simple features
Normalized cuts and image segmentation
Normalized cuts and image segmentation
Object class recognition by unsupervised scale-invariant learning
Example-based learning for view-based human face detection
Related Papers (5)
Frequently Asked Questions (11)
Q2. What are the future works mentioned in the paper "Multiple bernoulli relevance models for image and video annotation" ?
Future work will include a more extensive retrieval task with this model, which allows for longer text strings.
Q3. What is the main contribution of the current model over the CRM?
Another major contribution of the current model over the CRM is in their use of the multiple-Bernoulli distribution for modeling image annotations.
Q4. What is the way to model annotation words?
Existing annotation models [5, 3, 7, 8] by analogy with the text retrieval world have used the multinomial distribution to model annotation words.
Q5. What is the definition of a probabilistic generative model?
A probabilistic generative model which uses a Bernoulli process to generate words and kernel density estimate to generate image features.
Q6. What is the reason why the maximization in equation (2) can be done so efficiently?
One can show that the maximization in equation (2) can be done very efficiently because of the factored nature of the Bernoulli component.
Q7. How many features are there in the MBRM?
There are 30 features: 18 color features (including region color average, standard deviation and skewness) and 12 texture features (Gabor energy computed over 3 scales and 4 orientations).
Q8. How many ways would the probability mass be split in the first case?
The probability mass would be split three ways (0.33 each) in the first case while in the second image “face” would have a probability of 1.
Q9. How many rectangles are selected for the Corel set?
The number of rectangles is empirically selected (using the training and validation sets) and is 24 for the Corel set, and 35 for the video dataset set.
Q10. What is the way to improve annotation performance?
While the CRM does not make any assumptions about correspondence of annotation words to image regions, the overall annotation performance is strongly affected by the quality of segmentation.
Q11. What is the way to find images from a database?
The traditional “low-tech” solution to this problem practiced by librarians is to annotate each image manually with keywords or captions and then search on those captions or keywords using a conventional text search engine.