Q2. What are the future works in "A performance evaluation of gradient field hog descriptor for sketch based image retrieval" ?
Future directions of this work will explore more sophisticated combination schemes, for example kernel canonical correlation analysis ( KCCA ) [ 68 ] which has been used to good effect combining photorealistic and textual constraints outside the domain of SBIR. It may be possible to draw upon work on grouping regions for structure invariant matching [ 69 ], to select an appropriate set of scales for edge detection and further improve retrieval accuracy. However the authors believe such enhancements are not necessary to demonstrate the robustness and performance of GF-HOG for SBIR, and its potential for use in sketch based retrieval applications such as sketch-text search and photo montage.
Q3. How do the authors fit the sketched shape to the image?
The authors apply Random Sampling and Consensus (RANSAC) to fit the sketched shape to the image via a constrained affine transformation with four degrees of freedom (uniform scale, rotation and translation).
Q4. How long does the retrieval time scale with the database size?
In their experiments of using Cityblock distance based linear search the retrieval time increases approximately linearly with the increasing database size.
Q5. What is the common method of determining the similarity of a pair of keywords?
Given a vocabulary V = {w1, ...,wK} of K keywords present within all image tags, the similarity of a pair of keyword tags is commonly defined using tag cooccurrence.
Q6. What is the common way to index images?
Digital image repositories are commonly indexed using manually annotated keyword tags that indicate the presence of salient objects or concepts.
Q7. What is the simplest way to encode the relative location and spatial orientation of images?
In order to encode the relative location and spatial orientation of sketches or Canny edges of images, the authors represent image structure using a dense gradient field interpolated from the sparse set of edge pixels.
Q8. What is the way to extract features from the abstract images?
Chalechale et al. [34] employ angular-spatial distribution of pixels in the abstract images to extract features using the Fourier transform.
Q9. How many local descriptors have been used in SBIR?
Whilst a variety of local descriptors such as SIFT, SSIM, HOG have been successfully used in image retrieval and classification tasks [57], it is still unclear how various local descriptors perform in SBIR.
Q10. How many dimensions were used in each experiment?
In all experiments, the photos and the sketch canvas were pre-scaled so that their largest dimension (e.g. width or height) was 200 pixels.
Q11. How does the cityblock distance based linear search improve the retrieval time?
In this paper, the authors explore using the Cityblock distance based kd-tree to improve the retrieval time, since the Cityblock distance achieves comparable performance to the best results achieved by the Histogram Intersection distance (shown in Fig. 7) and its linear geometry nature makes it easy to be adapted in the kd-tree indexing technique.
Q12. How do the authors compute the similarity between two sets of tags?
The authors compute the similarity between two sets of tags C1 = C11,C 1 2, ...,C 1 N and C2 = C21,C 2 2, ...,C 2 M , corresponding to images The author1 and I2 as: ∑Mm=1 maxn{p(C 1 n |C 2 m)}N +∑N n=1 maxm{p(C 1 n |C 2 m)}M (7)where p(C1n |C 2 m) calculates the co-occurrence probability of two tags via the shortest path techniques of subSec 5.1.
Q13. What are the common distance measures used in text retrieval?
The authors also experiment with eight commonly used distance measures from norms to metrics frequently used in text (“Bag of Words”) retrieval.
Q14. What was the first algorithm to be used for matching images?
The early nineties delivered several SBIR algorithms capable of matching photographs with queries comprising blobs of color, or predefined texture.
Q15. What is the affine deviation of the sketch from the typical configuration of the target objects?
As expected, the greater the affine deviation of the sketch from the typical configuration of the target objects in each category, the greater the performance (MAP) degradation for the rotation and scaling.