Institution

Salesforce.com

About: Salesforce.com is a based out in . It is known for research contribution in the topics: User interface & Object (computer science). The organization has 2418 authors who have published 2775 publications receiving 63956 citations.

...read moreread less

Topics: User interface, Object (computer science), Metadata, Cloud computing, Event (computing) ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Posted Content•

PyTorch: An Imperative Style, High-Performance Deep Learning Library

[...]

Adam Paszke¹, Sam Gross², Francisco Massa², Adam Lerer², James Bradbury³, Gregory Chanan², Trevor Killeen⁴, Zeming Lin², Natalia Gimelshein⁵, Luca Antiga⁶, Alban Desmaison⁷, Andreas Kopf⁸, Edward Z. Yang², Zachary DeVito⁹, Martin Raison², Alykhan Tejani¹⁰, Sasank Chilamkurthy, Benoit Steiner², Lu Fang², Junjie Bai², Soumith Chintala² - Show less +17 more•Institutions (10)

University of Warsaw¹, Facebook², Salesforce.com³, University of Washington⁴, Nvidia⁵, Mario Negri Institute for Pharmacological Research⁶, University of Oxford⁷, ETH Zurich⁸, Stanford University⁹, Twitter¹⁰

03 Dec 2019-arXiv: Learning

TL;DR: PyTorch as discussed by the authors is a machine learning library that provides an imperative and Pythonic programming style that makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting hardware accelerators such as GPUs.

...read moreread less

Abstract: Deep learning frameworks have often focused on either usability or speed, but not both. PyTorch is a machine learning library that shows that these two goals are in fact compatible: it provides an imperative and Pythonic programming style that supports code as a model, makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting hardware accelerators such as GPUs. In this paper, we detail the principles that drove the implementation of PyTorch and how they are reflected in its architecture. We emphasize that every aspect of PyTorch is a regular Python program under the full control of its user. We also explain how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance. We demonstrate the efficiency of individual subsystems, as well as the overall speed of PyTorch on several common benchmarks.

...read moreread less

12,767 citations

Proceedings Article•

PyTorch: An Imperative Style, High-Performance Deep Learning Library

[...]

Adam Paszke¹, Sam Gross², Francisco Massa², Adam Lerer², James Bradbury³, Gregory Chanan², Trevor Killeen⁴, Zeming Lin², Natalia Gimelshein⁵, Luca Antiga⁶, Alban Desmaison⁷, Andreas Kopf⁸, Edward Z. Yang², Zachary DeVito⁹, Martin Raison², Alykhan Tejani¹⁰, Sasank Chilamkurthy, Benoit Steiner², Lu Fang¹¹, Junjie Bai², Soumith Chintala² - Show less +17 more•Institutions (11)

01 Jan 2019

TL;DR: This paper details the principles that drove the implementation of PyTorch and how they are reflected in its architecture, and explains how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance.

...read moreread less

Abstract: Deep learning frameworks have often focused on either usability or speed, but not both. PyTorch is a machine learning library that shows that these two goals are in fact compatible: it was designed from first principles to support an imperative and Pythonic programming style that supports code as a model, makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting hardware accelerators such as GPUs. In this paper, we detail the principles that drove the implementation of PyTorch and how they are reflected in its architecture. We emphasize that every aspect of PyTorch is a regular Python program under the full control of its user. We also explain how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance. We demonstrate the efficiency of individual subsystems, as well as the overall speed of PyTorch on several commonly used benchmarks.

...read moreread less

10,045 citations

Posted Content•

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

[...]

Jiasen Lu¹, Dhruv Batra², Devi Parikh², Stefan Lee²•Institutions (2)

Salesforce.com¹, Georgia Institute of Technology²

06 Aug 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: ViLBERT (short for Vision-and-Language BERT), a model for learning task-agnostic joint representations of image content and natural language, is presented, extending the popular BERT architecture to a multi-modal two-stream model, pro-cessing both visual and textual inputs in separate streams that interact through co-attentional transformer layers.

...read moreread less

Abstract: We present ViLBERT (short for Vision-and-Language BERT), a model for learning task-agnostic joint representations of image content and natural language. We extend the popular BERT architecture to a multi-modal two-stream model, pro-cessing both visual and textual inputs in separate streams that interact through co-attentional transformer layers. We pretrain our model through two proxy tasks on the large, automatically collected Conceptual Captions dataset and then transfer it to multiple established vision-and-language tasks -- visual question answering, visual commonsense reasoning, referring expressions, and caption-based image retrieval -- by making only minor additions to the base architecture. We observe significant improvements across tasks compared to existing task-specific models -- achieving state-of-the-art on all four tasks. Our work represents a shift away from learning groundings between vision and language only as part of task training and towards treating visual grounding as a pretrainable and transferable capability.

...read moreread less

1,241 citations

Proceedings Article•DOI•

Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning

[...]

Jiasen Lu¹, Caiming Xiong², Devi Parikh³, Richard Socher²•Institutions (3)

Virginia Tech¹, Salesforce.com², Georgia Institute of Technology³

21 Jul 2017

TL;DR: This paper proposes a novel adaptive attention model with a visual sentinel that sets the new state-of-the-art by a significant margin on image captioning.

...read moreread less

Abstract: Attention-based neural encoder-decoder frameworks have been widely adopted for image captioning. Most methods force visual attention to be active for every generated word. However, the decoder likely requires little to no visual information from the image to predict non-visual words such as the and of. Other words that may seem visual can often be predicted reliably just from the language model e.g., sign after behind a red stop or phone following talking on a cell. In this paper, we propose a novel adaptive attention model with a visual sentinel. At each time step, our model decides whether to attend to the image (and if so, to which regions) or to the visual sentinel. The model decides whether to attend to the image and where, in order to extract meaningful information for sequential word generation. We test our method on the COCO image captioning 2015 challenge dataset and Flickr30K. Our approach sets the new state-of-the-art by a significant margin.

...read moreread less

1,093 citations

Proceedings Article•

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

[...]

Jiasen Lu¹, Dhruv Batra², Devi Parikh², Stefan Lee²•Institutions (2)

Salesforce.com¹, Georgia Institute of Technology²

06 Aug 2019

TL;DR: The ViLBERT model as mentioned in this paper extends the BERT architecture to a multi-modal two-stream model, processing both visual and textual inputs in separate streams that interact through co-attentional transformer layers.

...read moreread less

Abstract: We present ViLBERT (short for Vision-and-Language BERT), a model for learning task-agnostic joint representations of image content and natural language. We extend the popular BERT architecture to a multi-modal two-stream model, processing both visual and textual inputs in separate streams that interact through co-attentional transformer layers. We pretrain our model through two proxy tasks on the large, automatically collected Conceptual Captions dataset and then transfer it to multiple established vision-and-language tasks -- visual question answering, visual commonsense reasoning, referring expressions, and caption-based image retrieval -- by making only minor additions to the base architecture. We observe significant improvements across tasks compared to existing task-specific models -- achieving state-of-the-art on all four tasks. Our work represents a shift away from learning groundings between vision and language only as part of task training and towards treating visual grounding as a pretrainable and transferable capability.

...read moreread less

1,069 citations

Collapse

Authors

Showing all 2418 results

Name	H-index	Papers	Citations
Philip S. Yu	148	1914	107374
Michael R. Lyu	89	696	33257
Silvio Savarese	89	386	35975
Jiashi Feng	77	426	21521
Richard Socher	77	274	97703
Haibin Ling	72	383	20858
Dragomir R. Radev	69	288	20131
Irwin King	67	476	19056
Steven C. H. Hoi	66	375	15935
Xiaodan Liang	61	318	14121
Caiming Xiong	60	336	18037
Min-Yen Kan	52	253	10207
Justin Yifu Lin	48	302	13491
Hannaneh Hajishirzi	42	181	7802
Larry S. Davis	40	105	6960