S
Stephen Gould
Researcher at Australian National University
Publications - 201
Citations - 17011
Stephen Gould is an academic researcher from Australian National University. The author has contributed to research in topics: Computer science & Image segmentation. The author has an hindex of 45, co-authored 187 publications receiving 12651 citations. Previous affiliations of Stephen Gould include Amazon.com & Stanford University.
Papers
More filters
Proceedings ArticleDOI
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
TL;DR: In this paper, a bottom-up and top-down attention mechanism was proposed to enable attention to be calculated at the level of objects and other salient image regions, which achieved state-of-the-art results on the MSCOCO test server.
Posted Content
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering.
TL;DR: A combined bottom-up and top-down attention mechanism that enables attention to be calculated at the level of objects and other salient image regions is proposed, demonstrating the broad applicability of this approach to VQA.
Book ChapterDOI
SPICE: Semantic Propositional Image Caption Evaluation
TL;DR: This paper proposed a new automated caption evaluation metric defined over scene graphs coined SPICE, which captures human judgments over model-generated captions better than other automatic metrics (e.g., system-level correlation of 0.88 with human judgments on the MS COCO dataset, versus 0.43 for CIDEr and 0.53 for METEOR).
Posted Content
SPICE: Semantic Propositional Image Caption Evaluation
TL;DR: It is hypothesized that semantic propositional content is an important component of human caption evaluation, and a new automated caption evaluation metric defined over scene graphs coined SPICE is proposed, which can answer questions such as which caption-generator best understands colors?
Proceedings ArticleDOI
Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments
Peter Anderson,Qi Wu,Damien Teney,Jake Bruce,Mark Johnson,Niko Sünderhauf,Ian Reid,Stephen Gould,Anton van den Hengel +8 more
TL;DR: The Room-to-Room (R2R) dataset as mentioned in this paper provides a large-scale reinforcement learning environment based on real imagery for visually-grounded natural language navigation in real buildings.