A performance evaluation of gradient field HOG descriptor for sketch based image retrieval

Question

Q1. What contributions have the authors mentioned in the paper "A performance evaluation of gradient field hog descriptor for sketch based image retrieval" ?

Q2. What are the future works in "A performance evaluation of gradient field hog descriptor for sketch based image retrieval" ?

Q3. How do the authors fit the sketched shape to the image?

Q4. How long does the retrieval time scale with the database size?

Q5. What is the common method of determining the similarity of a pair of keywords?

Q6. What is the common way to index images?

Q7. What is the simplest way to encode the relative location and spatial orientation of images?

Q8. What is the way to extract features from the abstract images?

Q9. How many local descriptors have been used in SBIR?

Q10. How many dimensions were used in each experiment?

Q11. How does the cityblock distance based linear search improve the retrieval time?

Q12. How do the authors compute the similarity between two sets of tags?

Q13. What are the common distance measures used in text retrieval?

Q14. What was the first algorithm to be used for matching images?

Q15. What is the affine deviation of the sketch from the typical configuration of the target objects?

Accepted Answer

The authors present an image retrieval system for the interactive search of photo collections using free-hand sketches depicting shape. The authors describe Gradient Field HOG ( GF-HOG ) ; an adapted form of the HOG descriptor suitable for sketch based image retrieval ( SBIR ). The authors incorporate GF-HOG into a Bag of Visual Words ( BoVW ) retrieval framework, and demonstrate how this combination may be harnessed both for robust SBIR, and for localizing sketched objects within an image. The authors compare GF-HOG against state-of-the-art descriptors with common distance measures and language models for image retrieval, and explore how affine deformation of the sketch impacts search performance. Further, the authors incorporate semantic keywords in to their GF-HOG system to enable the use of annotated sketches for image search.

Accepted Answer

Future directions of this work will explore more sophisticated combination schemes, for example kernel canonical correlation analysis ( KCCA ) [ 68 ] which has been used to good effect combining photorealistic and textual constraints outside the domain of SBIR. It may be possible to draw upon work on grouping regions for structure invariant matching [ 69 ], to select an appropriate set of scales for edge detection and further improve retrieval accuracy. However the authors believe such enhancements are not necessary to demonstrate the robustness and performance of GF-HOG for SBIR, and its potential for use in sketch based retrieval applications such as sketch-text search and photo montage.

Accepted Answer

The authors apply Random Sampling and Consensus (RANSAC) to fit the sketched shape to the image via a constrained affine transformation with four degrees of freedom (uniform scale, rotation and translation).

Accepted Answer

In their experiments of using Cityblock distance based linear search the retrieval time increases approximately linearly with the increasing database size.

Accepted Answer

Given a vocabulary V = {w1, ...,wK} of K keywords present within all image tags, the similarity of a pair of keyword tags is commonly defined using tag cooccurrence.

Accepted Answer

Digital image repositories are commonly indexed using manually annotated keyword tags that indicate the presence of salient objects or concepts.

Accepted Answer

In order to encode the relative location and spatial orientation of sketches or Canny edges of images, the authors represent image structure using a dense gradient field interpolated from the sparse set of edge pixels.

Accepted Answer

Chalechale et al. [34] employ angular-spatial distribution of pixels in the abstract images to extract features using the Fourier transform.

Accepted Answer

Whilst a variety of local descriptors such as SIFT, SSIM, HOG have been successfully used in image retrieval and classification tasks [57], it is still unclear how various local descriptors perform in SBIR.

Accepted Answer

In all experiments, the photos and the sketch canvas were pre-scaled so that their largest dimension (e.g. width or height) was 200 pixels.

Accepted Answer

In this paper, the authors explore using the Cityblock distance based kd-tree to improve the retrieval time, since the Cityblock distance achieves comparable performance to the best results achieved by the Histogram Intersection distance (shown in Fig. 7) and its linear geometry nature makes it easy to be adapted in the kd-tree indexing technique.

Accepted Answer

The authors compute the similarity between two sets of tags C1 = C11,C 1 2, ...,C 1 N and C2 = C21,C 2 2, ...,C 2 M , corresponding to images The author1 and I2 as: ∑Mm=1 maxn{p(C 1 n |C 2 m)}N +∑N n=1 maxm{p(C 1 n |C 2 m)}M (7)where p(C1n |C 2 m) calculates the co-occurrence probability of two tags via the shortest path techniques of subSec 5.1.

Accepted Answer

The authors also experiment with eight commonly used distance measures from norms to metrics frequently used in text (“Bag of Words”) retrieval.

Accepted Answer

The early nineties delivered several SBIR algorithms capable of matching photographs with queries comprising blobs of color, or predefined texture.

Accepted Answer

As expected, the greater the affine deviation of the sketch from the typical configuration of the target objects in each category, the greater the performance (MAP) degradation for the rotation and scaling.

A performance evaluation of gradient field HOG descriptor for sketch based image retrieval

Figures

Citations

The sketchy database: learning to retrieve badly drawn bunnies

Sketch Me That Shoe

Sketch-a-Net: A Deep Neural Network that Beats Humans

Deep Sketch Hashing: Fast Free-Hand Sketch-Based Image Retrieval

Deep Spatial-Semantic Attention for Fine-Grained Sketch-Based Image Retrieval

References

Distinctive Image Features from Scale-Invariant Keypoints

Histograms of oriented gradients for human detection

Mining association rules between sets of items in large databases

WordNet : an electronic lexical database

A performance evaluation of local descriptors

Related Papers (5)

How do humans sketch objects

Sketch-Based Image Retrieval: Benchmark and Bag-of-Features Descriptors