M
Marcus Rohrbach
Researcher at Facebook
Publications - 132
Citations - 33867
Marcus Rohrbach is an academic researcher from Facebook. The author has contributed to research in topics: Question answering & Closed captioning. The author has an hindex of 67, co-authored 132 publications receiving 26167 citations. Previous affiliations of Marcus Rohrbach include Technische Universität Darmstadt & Max Planck Society.
Papers
More filters
Proceedings ArticleDOI
Long-term recurrent convolutional networks for visual recognition and description
Jeff Donahue,Lisa Anne Hendricks,Sergio Guadarrama,Marcus Rohrbach,Subhashini Venugopalan,Trevor Darrell,Kate Saenko +6 more
TL;DR: A novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and shows such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined and/or optimized.
Posted Content
Long-term Recurrent Convolutional Networks for Visual Recognition and Description
Jeff Donahue,Lisa Anne Hendricks,Marcus Rohrbach,Subhashini Venugopalan,Sergio Guadarrama,Kate Saenko,Trevor Darrell +6 more
TL;DR: A novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and shows such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined and/or optimized.
Proceedings ArticleDOI
Sequence to Sequence -- Video to Text
Subhashini Venugopalan,Marcus Rohrbach,Jeff Donahue,Raymond J. Mooney,Trevor Darrell,Kate Saenko +5 more
TL;DR: In this article, an end-to-end sequence to sequence model was proposed to generate captions for videos, which can learn the temporal structure of the sequence of frames as well as the sequence model of the generated sentences, i.e. a language model.
Proceedings ArticleDOI
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
TL;DR: This work extensively evaluates Multimodal Compact Bilinear pooling (MCB) on the visual question answering and grounding tasks and consistently shows the benefit of MCB over ablations without MCB.
Proceedings ArticleDOI
Neural Module Networks
TL;DR: The authors decomposes questions into their linguistic substructures, and uses these structures to dynamically instantiate modular networks (with reusable components for recognizing dogs, classifying colors, etc.). The resulting compound networks are jointly trained.