Institution

Facebook

Company•Tel Aviv, Israel•

About: Facebook is a company organization based out in Tel Aviv, Israel. It is known for research contribution in the topics: Computer science & Artificial neural network. The organization has 7856 authors who have published 10906 publications receiving 570123 citations. The organization is also known as: facebook.com & FB.

...read moreread less

Topics: Computer science, Artificial neural network, Language model, Context (language use), Reinforcement learning ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•

Reducing Transformer Depth on Demand with Structured Dropout

[...]

Angela Fan¹, Edouard Grave¹, Armand Joulin¹•Institutions (1)

Facebook¹

30 Apr 2020

TL;DR: LayerDrop, a form of structured dropout, is explored, which has a regularization effect during training and allows for efficient pruning at inference time, and shows that it is possible to select sub-networks of any depth from one large network without having to finetune them and with limited impact on performance.

...read moreread less

Abstract: Overparametrized transformer networks have obtained state of the art results in various natural language processing tasks, such as machine translation, language modeling, and question answering. These models contain hundreds of millions of parameters, necessitating a large amount of computation and making them prone to overfitting. In this work, we explore LayerDrop, a form of structured dropout, which has a regularization effect during training and allows for efficient pruning at inference time. In particular, we show that it is possible to select sub-networks of any depth from one large network without having to finetune them and with limited impact on performance. We demonstrate the effectiveness of our approach by improving the state of the art on machine translation, language modeling, summarization, question answering, and language understanding benchmarks. Moreover, we show that our approach leads to small BERT-like models of higher quality than when training from scratch or using distillation.

...read moreread less

219 citations

Proceedings Article•

Plug and Play Language Models: A Simple Approach to Controlled Text Generation

[...]

Sumanth Dathathri¹, Andrea Madotto¹, Janice Lan², Jane Hung³, Eric Frank, Piero Molino⁴, Jason Yosinski⁴, Rosanne Liu⁴ - Show less +4 more•Institutions (4)

Hong Kong University of Science and Technology¹, Facebook², Broad Institute³, Uber ⁴

25 Sep 2019

TL;DR: The Plug and Play Language Model (PPLM) as mentioned in this paper combines a pre-trained transformer-based language model with one or more simple attribute classifiers that guide text generation without any further training of the transformer.

...read moreread less

Abstract: Large transformer-based language models (LMs) trained on huge text corpora have shown unparalleled generation capabilities. However, controlling attributes of the generated language (e.g. switching topic or sentiment) is difficult without modifying the model architecture or fine-tuning on attribute-specific data and entailing the significant cost of retraining. We propose a simple alternative: the Plug and Play Language Model (PPLM) for controllable language generation, which combines a pretrained LM with one or more simple attribute classifiers that guide text generation without any further training of the LM. In the canonical scenario we present, the attribute models are simple classifiers consisting of a user-specified bag of words or a single learned layer with 100,000 times fewer parameters than the LM. Sampling entails a forward and backward pass in which gradients from the attribute model push the LM's hidden activations and thus guide the generation. Model samples demonstrate control over a range of topics and sentiment styles, and extensive automated and human annotated evaluations show attribute alignment and fluency. PPLMs are flexible in that any combination of differentiable attribute models may be used to steer text generation, which will allow for diverse and creative applications beyond the examples given in this paper.

...read moreread less

218 citations

Journal Article•DOI•

Bringing portraits to life

[...]

Hadar Averbuch-Elor¹, Daniel Cohen-Or¹, Johannes Kopf², Michael F. Cohen²•Institutions (2)

Tel Aviv University¹, Facebook²

20 Nov 2017-ACM Transactions on Graphics

TL;DR: A technique to automatically animate a still portrait, making it possible for the subject in the photo to come to life and express various emotions, and gives rise to reactive profiles, where people in still images can automatically interact with their viewers.

...read moreread less

Abstract: We present a technique to automatically animate a still portrait, making it possible for the subject in the photo to come to life and express various emotions. We use a driving video (of a different subject) and develop means to transfer the expressiveness of the subject in the driving video to the target portrait. In contrast to previous work that requires an input video of the target face to reenact a facial performance, our technique uses only a single target image. We animate the target image through 2D warps that imitate the facial transformations in the driving video. As warps alone do not carry the full expressiveness of the face, we add fine-scale dynamic details which are commonly associated with facial expressions such as creases and wrinkles. Furthermore, we hallucinate regions that are hidden in the input target face, most notably in the inner mouth. Our technique gives rise to reactive profiles, where people in still images can automatically interact with their viewers. We demonstrate our technique operating on numerous still portraits from the internet.

...read moreread less

218 citations

Book Chapter•DOI•

Learning Visual Features from Large Weakly Supervised Data

[...]

Armand Joulin¹, Laurens van der Maaten¹, Allan Jabri¹, Nicolas Vasilache¹•Institutions (1)

Facebook¹

08 Oct 2016

TL;DR: In this paper, the authors explore the potential of leveraging massive, weakly-labeled image collections for learning good visual features, and train convolutional networks on a dataset of 100 million Flickr photos and comments.

...read moreread less

Abstract: Convolutional networks trained on large supervised datasets produce visual features which form the basis for the state-of-the-art in many computer-vision problems. Further improvements of these visual features will likely require even larger manually labeled data sets, which severely limits the pace at which progress can be made. In this paper, we explore the potential of leveraging massive, weakly-labeled image collections for learning good visual features. We train convolutional networks on a dataset of 100 million Flickr photos and comments, and show that these networks produce features that perform well in a range of vision problems. We also show that the networks appropriately capture word similarity and learn correspondences between different languages.

...read moreread less

218 citations

Proceedings Article•

Reducing Overfitting in Deep Networks by Decorrelating Representations

[...]

Michael Cogswell¹, Faruk Ahmed², Ross Girshick³, Larry Zitnick⁴, Dhruv Batra¹ - Show less +1 more•Institutions (4)

Virginia Tech¹, Université de Montréal², Facebook³, Microsoft⁴

01 Jan 2016

TL;DR: DeCov as mentioned in this paper encourages diverse or non-redundant representations in deep neural networks by minimizing the cross-covariance of hidden activations, which leads to significantly reduced overfitting and better generalization.

...read moreread less

Abstract: One major challenge in training Deep Neural Networks is preventing overfitting. Many techniques such as data augmentation and novel regularizers such as Dropout have been proposed to prevent overfitting without requiring a massive amount of training data. In this work, we propose a new regularizer called DeCov which leads to significantly reduced overfitting (as indicated by the difference between train and val performance), and better generalization. Our regularizer encourages diverse or non-redundant representations in Deep Neural Networks by minimizing the cross-covariance of hidden activations. This simple intuition has been explored in a number of past works but surprisingly has never been applied as a regularizer in supervised learning. Experiments across a range of datasets and network architectures show that this loss always reduces overfitting while almost always maintaining or increasing generalization performance and often improving performance over Dropout.

...read moreread less

218 citations

Collapse

Authors

Showing all 7875 results

Name	H-index	Papers	Citations
Yoshua Bengio	202	1033	420313
Xiang Zhang	154	1733	117576
Jitendra Malik	151	493	165087
Trevor Darrell	148	678	181113
Christopher D. Manning	138	499	147595
Robert W. Heath	128	1049	73171
Pieter Abbeel	126	589	70911
Yann LeCun	121	369	171211
Li Fei-Fei	120	420	145574
Jon Kleinberg	117	444	87865
Sergey Levine	115	652	59769
Richard Szeliski	113	359	72019
Sanjeev Kumar	113	1325	54386
Bruce Neal	108	561	87213
Larry S. Davis	107	693	49714

Network Information

Related Institutions (5)

Google

39.8K papers, 2.1M citations

98% related

Microsoft

86.9K papers, 4.1M citations

96% related

Adobe Systems

8K papers, 214.7K citations

94% related

Carnegie Mellon University

104.3K papers, 5.9M citations

38.6K papers, 1.3M citations

90% related

Performance

Metrics

10,939

Papers

851,954

Citations

No. of papers from the Institution in previous years
Year	Papers
2024	1
2022	37
2021	1,738
2020	2,017
2019	1,607
2018	1,229