scispace - formally typeset
Search or ask a question
Institution

Facebook

CompanyTel Aviv, Israel
About: Facebook is a company organization based out in Tel Aviv, Israel. It is known for research contribution in the topics: Artificial neural network & Language model. The organization has 7856 authors who have published 10906 publications receiving 570123 citations. The organization is also known as: facebook.com & FB.


Papers
More filters
Posted Content
TL;DR: In this paper, self-supervised learning provides new properties to Vision Transformer (ViT) that stand out compared to convolutional networks (convnets) beyond the fact that adapting selfsupervised methods to this architecture works particularly well, they make the following observations: first, self-vised ViT features contain explicit information about the semantic segmentation of an image, which does not emerge as clearly with supervised ViTs, nor with convnets.
Abstract: In this paper, we question if self-supervised learning provides new properties to Vision Transformer (ViT) that stand out compared to convolutional networks (convnets). Beyond the fact that adapting self-supervised methods to this architecture works particularly well, we make the following observations: first, self-supervised ViT features contain explicit information about the semantic segmentation of an image, which does not emerge as clearly with supervised ViTs, nor with convnets. Second, these features are also excellent k-NN classifiers, reaching 78.3% top-1 on ImageNet with a small ViT. Our study also underlines the importance of momentum encoder, multi-crop training, and the use of small patches with ViTs. We implement our findings into a simple self-supervised method, called DINO, which we interpret as a form of self-distillation with no labels. We show the synergy between DINO and ViTs by achieving 80.1% top-1 on ImageNet in linear evaluation with ViT-Base.

557 citations

Proceedings ArticleDOI
Lars Backstrom1, Paolo Boldi, Marco Rosa, Johan Ugander1, Sebastiano Vigna 
22 Jun 2012
TL;DR: The first world-scale social-network graph-distance computations, using the entire Facebook network of active users, and the average distance is 4:74, corresponding to 3:74 intermediaries or "degrees of separation", prompting the title of this paper.
Abstract: Frigyes Karinthy, in his 1929 short story "Lancszemek" (in English, "Chains") suggested that any two persons are distanced by at most six friendship links.1 Stanley Milgram in his famous experiments challenged people to route postcards to a fixed recipient by passing them only through direct acquaintances. Milgram found that the average number of intermediaries on the path of the postcards lay between 4:4 and 5:7, depending on the sample of people chosen. We report the results of the first world-scale social-network graph-distance computations, using the entire Facebook network of active users (≈ 721 million users, ≈ 69 billion friendship links). The average distance we observe is 4:74, corresponding to 3:74 intermediaries or "degrees of separation", prompting the title of this paper. More generally, we study the distance distribution of Facebook and of some interesting geographic subgraphs, looking also at their evolution over time. The networks we are able to explore are almost two orders of magnitude larger than those analysed in the previous literature. We report detailed statistical metadata showing that our measurements (which rely on probabilistic algorithms) are very accurate.

552 citations

Proceedings Article
Moustapha Cisse1, Piotr Bojanowski1, Edouard Grave1, Yann N. Dauphin1, Nicolas Usunier1 
06 Aug 2017
TL;DR: Parseval Networks as discussed by the authors is a form of deep neural networks in which the Lipschitz constant of linear, convolutional and aggregation layers is constrained to be smaller than 1.
Abstract: We introduce Parseval networks, a form of deep neural networks in which the Lipschitz constant of linear, convolutional and aggregation layers is constrained to be smaller than 1. Parseval networks are empirically and theoretically motivated by an analysis of the robustness of the predictions made by deep neural networks when their input is subject to an adversarial perturbation. The most important feature of Parseval networks is to maintain weight matrices of linear and convolutional layers to be (approximately) Parseval tight frames, which are extensions of orthogonal matrices to non-square matrices. We describe how these constraints can be maintained efficiently during SGD. We show that Parseval networks match the state-of-the-art in terms of accuracy on CIFAR-10/100 and Street View House Numbers (SVHN), while being more robust than their vanilla counterpart against adversarial examples. Incidentally, Parseval networks also tend to train faster and make a better usage of the full capacity of the networks.

552 citations

Posted Content
TL;DR: This paper proposed a set of proxy tasks that evaluate reading comprehension via question answering, such as chaining facts, simple induction, deduction and many more, which are designed to be prerequisites for any system that aims to be capable of conversing with a human.
Abstract: One long-term goal of machine learning research is to produce methods that are applicable to reasoning and natural language, in particular building an intelligent dialogue agent. To measure progress towards that goal, we argue for the usefulness of a set of proxy tasks that evaluate reading comprehension via question answering. Our tasks measure understanding in several ways: whether a system is able to answer questions via chaining facts, simple induction, deduction and many more. The tasks are designed to be prerequisites for any system that aims to be capable of conversing with a human. We believe many existing learning systems can currently not solve them, and hence our aim is to classify these tasks into skill sets, so that researchers can identify (and then rectify) the failings of their systems. We also extend and improve the recently introduced Memory Networks model, and show it is able to solve some, but not all, of the tasks.

545 citations

Posted Content
TL;DR: This work proposes a novel sequence level training algorithm that directly optimizes the metric used at test time, such as BLEU or ROUGE, and outperforms several strong baselines for greedy generation.
Abstract: Many natural language processing applications use language models to generate text. These models are typically trained to predict the next word in a sequence, given the previous words and some context such as an image. However, at test time the model is expected to generate the entire sequence from scratch. This discrepancy makes generation brittle, as errors may accumulate along the way. We address this issue by proposing a novel sequence level training algorithm that directly optimizes the metric used at test time, such as BLEU or ROUGE. On three different tasks, our approach outperforms several strong baselines for greedy generation. The method is also competitive when these baselines employ beam search, while being several times faster.

539 citations


Authors

Showing all 7875 results

NameH-indexPapersCitations
Yoshua Bengio2021033420313
Xiang Zhang1541733117576
Jitendra Malik151493165087
Trevor Darrell148678181113
Christopher D. Manning138499147595
Robert W. Heath128104973171
Pieter Abbeel12658970911
Yann LeCun121369171211
Li Fei-Fei120420145574
Jon Kleinberg11744487865
Sergey Levine11565259769
Richard Szeliski11335972019
Sanjeev Kumar113132554386
Bruce Neal10856187213
Larry S. Davis10769349714
Network Information
Related Institutions (5)
Google
39.8K papers, 2.1M citations

98% related

Microsoft
86.9K papers, 4.1M citations

96% related

Adobe Systems
8K papers, 214.7K citations

94% related

Carnegie Mellon University
104.3K papers, 5.9M citations

91% related

Performance
Metrics
No. of papers from the Institution in previous years
YearPapers
20241
202237
20211,738
20202,017
20191,607
20181,229