scispace - formally typeset
M

Marcus Rohrbach

Researcher at Facebook

Publications -  132
Citations -  33867

Marcus Rohrbach is an academic researcher from Facebook. The author has contributed to research in topics: Question answering & Closed captioning. The author has an hindex of 67, co-authored 132 publications receiving 26167 citations. Previous affiliations of Marcus Rohrbach include Technische Universität Darmstadt & Max Planck Society.

Papers
More filters
Proceedings ArticleDOI

Long-term recurrent convolutional networks for visual recognition and description

TL;DR: A novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and shows such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined and/or optimized.
Posted Content

Long-term Recurrent Convolutional Networks for Visual Recognition and Description

TL;DR: A novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and shows such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined and/or optimized.
Proceedings ArticleDOI

Sequence to Sequence -- Video to Text

TL;DR: In this article, an end-to-end sequence to sequence model was proposed to generate captions for videos, which can learn the temporal structure of the sequence of frames as well as the sequence model of the generated sentences, i.e. a language model.
Proceedings ArticleDOI

Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding

TL;DR: This work extensively evaluates Multimodal Compact Bilinear pooling (MCB) on the visual question answering and grounding tasks and consistently shows the benefit of MCB over ablations without MCB.
Proceedings ArticleDOI

Neural Module Networks

TL;DR: The authors decomposes questions into their linguistic substructures, and uses these structures to dynamically instantiate modular networks (with reusable components for recognizing dogs, classifying colors, etc.). The resulting compound networks are jointly trained.