Figure 1. Difference between the two models. In PPM, the conditional probability of a phrase yi given an image J depends on whether that phrase is present in the ground-truth phrases of J (i.e. YJ ) or not. When the phrase is not present, corresponding δyi,J (equation 4) becomes zero without considering the semantic similarity of yi with other phrases in YJ . This limitation of PPM is addressed in SPPM by finding the phrase in YJ that is semantically most similar to yi and using their similarity score instead of zero. In the above example, we have YJ = {“bus”, “road”, ”street”}. Given a phrase yi = “highway”, δyi,J = 0 according to PPM. Whereas δ′yi,J = 0.8582 according to SPPM (equation 9) by considering the similarity of “highway” with “road” (i.e., Vsim(“highway”, “road” ) = 0.8582). (2013)