From Strings to Things: Knowledge-Enabled VQA Model That Can Read and Reason
Citations
96 citations
Cites methods from "From Strings to Things: Knowledge-E..."
...The ST-VQA [5] and TextVQA [35] datasets were introduced in parallel in 2019 and were quickly followed by more research [36, 11, 39]....
[...]
46 citations
Cites background from "From Strings to Things: Knowledge-E..."
...Recently, there are also relatedworks that expoit retrieved knowledge facts throught memory networks [14, 20] or graph neural networks [21]....
[...]
16 citations
9 citations
References
3,765 citations
3,513 citations
3,215 citations
"From Strings to Things: Knowledge-E..." refers background or methods in this paper
...Since category names in our dataset are not exactly the same in Places, we could not perform quantitative analysis on visual content evaluation of places....
[...]
...To this end, we rely on Places [55] for scene recognition and a fine-tuned VGG-16 model for representing visual contents from movie posters and book covers....
[...]
...Word proposals [16]: Subway, open Visual content proposals [55]: fast food restaurant, shop front...
[...]
...We use Places [55] and a VGG-16 finetuned model for recognizing these visual contents for categories-(i) and (ii), respectively....
[...]
2,809 citations
"From Strings to Things: Knowledge-E..." refers background in this paper
..., Wikidata [44], IMDb [1], a book catalogue [13]....
[...]
...Our newly introduced dataset is much larger in scale as compared to the three aforementioned works [8, 32, 42] and more importantly, backed up by web-scale knowledge facts harvested from various sources, e.g., Wikidata [44], IMDb [1], a book catalogue [13]....
[...]
...To construct these three knowledge bases, we crawl open-source world knowledge bases, e.g., Wikidata [3], IMDb [1] and book catalogue provided by [18] around the anchor entities.1 Each knowledge fact is a triplet connecting two entities with a relation....
[...]
...Further, with the access to rich open-source knowledge graphs such as Wikidata [44], we could ask a series of natural questions, such as, Can I get a Sandwich here?, Is this a French brand?, and so on, which are not possible to ask in traditional VQA [5] as well as knowledge-enabled VQA models [47, 48]....
[...]
2,518 citations
"From Strings to Things: Knowledge-E..." refers background or methods in this paper
...This motivated us to utilize the capability of graph representation learning in the form of gated graph neural networks (GGNN) [27]....
[...]
...A natural choice for this is ‘gated graph neural network’ (GGNN) [27] which is emerging as a powerful tool to perform reasoning over graphs....
[...]
...bolic QA [27] to more complex visual reasoning [30]....
[...]
..., visual contents, recognized words, question and knowledge facts, and perform a reasoning on a multi-relational graph using a novel gated graph neural network [27] formulation....
[...]
...We choose gated graph neural network (GGNN) [27] for this task....
[...]