Institution

Facebook

Company•Tel Aviv, Israel•

About: Facebook is a company organization based out in Tel Aviv, Israel. It is known for research contribution in the topics: Artificial neural network & Language model. The organization has 7856 authors who have published 10906 publications receiving 570123 citations. The organization is also known as: facebook.com & FB.

...read moreread less

Topics: Artificial neural network, Language model, Reinforcement learning, Machine translation, Social network ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•

Learning Continuous Hierarchies in the Lorentz Model of Hyperbolic Geometry

[...]

Maximilian Nickel¹, Douwe Kiela¹•Institutions (1)

Facebook¹

03 Jul 2018

TL;DR: In this article, the Lorentz embedding model is used to discover hierarchical relationships from large-scale unstructured similarity scores, which can reveal important aspects of a company's organizational structure as well as reveal historical relationships between language families.

...read moreread less

Abstract: We are concerned with the discovery of hierarchical relationships from large-scale unstructured similarity scores. For this purpose, we study different models of hyperbolic space and find that learning embeddings in the Lorentz model is substantially more efficient than in the Poincare-ball model. We show that the proposed approach allows us to learn high-quality embeddings of large taxonomies which yield improvements over Poincare embeddings, especially in low dimensions. Lastly, we apply our model to discover hierarchies in two real-world datasets: we show that an embedding in hyperbolic space can reveal important aspects of a company's organizational structure as well as reveal historical relationships between language families.

...read moreread less

145 citations

Proceedings Article•

Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask

[...]

Hattie Zhou, Janice Lan¹, Rosanne Liu², Jason Yosinski²•Institutions (2)

Facebook¹, Uber ²

17 May 2019

TL;DR: The Lottery Ticket Hypothesis as mentioned in this paper showed that a simple approach to create sparse networks (keep the large weights) results in models that are trainable from scratch, but only when starting from the same initial weights.

...read moreread less

Abstract: The recent "Lottery Ticket Hypothesis" paper by Frankle & Carbin showed that a simple approach to creating sparse networks (keep the large weights) results in models that are trainable from scratch, but only when starting from the same initial weights. The performance of these networks often exceeds the performance of the non-sparse base model, but for reasons that were not well understood. In this paper we study the three critical components of the Lottery Ticket (LT) algorithm, showing that each may be varied significantly without impacting the overall results. Ablating these factors leads to new insights for why LT networks perform as well as they do. We show why setting weights to zero is important, how signs are all you need to make the re-initialized network train, and why masking behaves like training. Finally, we discover the existence of Supermasks, or masks that can be applied to an untrained, randomly initialized network to produce a model with performance far better than chance (86% on MNIST, 41% on CIFAR-10).

...read moreread less

145 citations

Proceedings Article•

An Analytical Formula of Population Gradient for two-layered ReLU network and its Applications in Convergence and Critical Point Analysis

[...]

Yuandong Tian¹•Institutions (1)

Facebook¹

17 Jul 2017

TL;DR: In this paper, the authors explore theoretical properties of training a two-layered ReLU network with centered spherical Gaussian input, and show that its population gradient has an analytical formula, leading to interesting theoretical analysis of critical points and convergence behaviors.

...read moreread less

Abstract: In this paper, we explore theoretical properties of training a two-layered ReLU network $g(\mathbf{x}; \mathbf{w}) = \sum_{j=1}^K \sigma(\mathbf{w}_j^T\mathbf{x})$ with centered $d$-dimensional spherical Gaussian input $\mathbf{x}$ ($\sigma$=ReLU). We train our network with gradient descent on $\mathbf{w}$ to mimic the output of a teacher network with the same architecture and fixed parameters $\mathbf{w}^*$. We show that its population gradient has an analytical formula, leading to interesting theoretical analysis of critical points and convergence behaviors. First, we prove that critical points outside the hyperplane spanned by the teacher parameters ("out-of-plane") are not isolated and form manifolds, and characterize in-plane critical-point-free regions for two ReLU case. On the other hand, convergence to $\mathbf{w}^*$ for one ReLU node is guaranteed with at least $(1-\epsilon)/2$ probability, if weights are initialized randomly with standard deviation upper-bounded by $O(\epsilon/\sqrt{d})$, consistent with empirical practice. For network with many ReLU nodes, we prove that an infinitesimal perturbation of weight initialization results in convergence towards $\mathbf{w}^*$ (or its permutation), a phenomenon known as spontaneous symmetric-breaking (SSB) in physics. We assume no independence of ReLU activations. Simulation verifies our findings.

...read moreread less

145 citations

Patent•

Dynamic identification of other users to an online user

[...]

Barry Appelman¹, Terry Christian Buonviri¹, Andrew Ivar Erickson¹, Thomas Jarmolowski¹, Robert Eugene Weltman¹ - Show less +1 more•Institutions (1)

Facebook¹

18 Nov 2003

TL;DR: In this paper, a user is informed dynamically of other users based on the stored trait information, such as, for example, an age or other demographic identifier, or information indicative of an expertise, interest, preference, user type and/or other quality of the user or of the other individual.

...read moreread less

Abstract: Informing a user of a large scale network dynamically of other network users includes determining dynamically an online context of the user. Other users presently within the online context of the user are identified and trait information is stored that is related essentially only to the user or to the other users in a users store associated with the online context. The user is informed dynamically of the other users based on the stored trait information, such as, for example, an age or other demographic identifier, or information indicative of an expertise, interest, preference, user type and/or other quality of the user or of the other individual.

...read moreread less

145 citations

Proceedings Article•

Abductive Commonsense Reasoning

[...]

Chandra Bhagavatula¹, Ronan Le Bras¹, Chaitanya Malaviya¹, Keisuke Sakaguchi¹, Ari Holtzman¹, Hannah Rashkin¹, Doug Downey¹, Wen-tau Yih², Yejin Choi¹ - Show less +5 more•Institutions (2)

Allen Institute for Artificial Intelligence¹, Facebook²

30 Apr 2020

TL;DR: For example, the authors investigate the feasibility of abductive reasoning in natural language inference and generation and show that the best model achieves 68.9% accuracy, well below human performance of 91.4%.

...read moreread less

Abstract: Abductive reasoning is inference to the most plausible explanation. For example, if Jenny finds her house in a mess when she returns from work, and remembers that she left a window open, she can hypothesize that a thief broke into her house and caused the mess, as the most plausible explanation. While abduction has long been considered to be at the core of how people interpret and read between the lines in natural language (Hobbs et al., 1988), there has been relatively little research in support of abductive natural language inference and generation. We present the first study that investigates the viability of language-based abductive reasoning. We introduce a challenge dataset, ART, that consists of over 20k commonsense narrative contexts and 200k explanations. Based on this dataset, we conceptualize two new tasks – (i) Abductive NLI: a multiple-choice question answering task for choosing the more likely explanation, and (ii) Abductive NLG: a conditional generation task for explaining given observations in natural language. On Abductive NLI, the best model achieves 68.9% accuracy, well below human performance of 91.4%. On Abductive NLG, the current best language generators struggle even more, as they lack reasoning capabilities that are trivial for humans. Our analysis leads to new insights into the types of reasoning that deep pre-trained language models fail to perform—despite their strong performance on the related but more narrowly defined task of entailment NLI—pointing to interesting avenues for future research.

...read moreread less

145 citations

Collapse

Authors

Showing all 7875 results

Name	H-index	Papers	Citations
Yoshua Bengio	202	1033	420313
Xiang Zhang	154	1733	117576
Jitendra Malik	151	493	165087
Trevor Darrell	148	678	181113
Christopher D. Manning	138	499	147595
Robert W. Heath	128	1049	73171
Pieter Abbeel	126	589	70911
Yann LeCun	121	369	171211
Li Fei-Fei	120	420	145574
Jon Kleinberg	117	444	87865
Sergey Levine	115	652	59769
Richard Szeliski	113	359	72019
Sanjeev Kumar	113	1325	54386
Bruce Neal	108	561	87213
Larry S. Davis	107	693	49714

Network Information

Related Institutions (5)

Google

39.8K papers, 2.1M citations

98% related

Microsoft

86.9K papers, 4.1M citations

96% related

Adobe Systems

8K papers, 214.7K citations

94% related

Carnegie Mellon University

104.3K papers, 5.9M citations

38.6K papers, 1.3M citations

90% related

Performance

Metrics

10,939

Papers

851,954

Citations

No. of papers from the Institution in previous years
Year	Papers
2024	1
2022	37
2021	1,738
2020	2,017
2019	1,607
2018	1,229