O
Oliver Groth
Researcher at University of Oxford
Publications - 21
Citations - 6520
Oliver Groth is an academic researcher from University of Oxford. The author has contributed to research in topics: Computer science & Question answering. The author has an hindex of 8, co-authored 19 publications receiving 4626 citations. Previous affiliations of Oliver Groth include Dresden University of Technology.
Papers
More filters
Journal ArticleDOI
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
Ranjay Krishna,Yuke Zhu,Oliver Groth,Justin Johnson,Kenji Hata,Joshua Kravitz,Stephanie Chen,Yannis Kalantidis,Li-Jia Li,David A. Shamma,Michael S. Bernstein,Li Fei-Fei +11 more
TL;DR: The Visual Genome dataset as mentioned in this paper contains over 108k images where each image has an average of $35$35 objects, $26$26 attributes, and $21$21 pairwise relationships between objects.
Posted Content
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
Ranjay Krishna,Yuke Zhu,Oliver Groth,Justin Johnson,Kenji Hata,Joshua Kravitz,Stephanie Chen,Yannis Kalantidis,Li-Jia Li,David A. Shamma,Michael S. Bernstein,Fei-Fei Li +11 more
TL;DR: The Visual Genome dataset is presented, which contains over 108K images where each image has an average of $$35$$35 objects, $$26$$26 attributes, and $$21$$21 pairwise relationships between objects, and represents the densest and largest dataset of image descriptions, objects, attributes, relationships, and question answer pairs.
Proceedings ArticleDOI
Visual7W: Grounded Question Answering in Images
TL;DR: In this article, an LSTM model with spatial attention was proposed to tackle the 7W QA task, which enables a new type of QA with visual answers, in addition to textual answers used in previous work.
Book ChapterDOI
ShapeStacks: Learning Vision-Based Physical Intuition for Generalised Object Stacking
TL;DR: This paper provides ShapeStacks, a simulation-based dataset featuring 20,000 stack configurations composed of a variety of elementary geometric primitives richly annotated regarding semantics and structural stability, and trains visual classifiers for binary stability prediction on the data and scrutinise their learned physical intuition.
Proceedings ArticleDOI
Analyzing modular CNN architectures for joint depth prediction and semantic segmentation
TL;DR: It is found that a beneficial balance between the cross-modality influences can be achieved by network architecture and conjecture that this relationship can be utilized to understand different network design choices.