Institution

Facebook

Company•Tel Aviv, Israel•

About: Facebook is a company organization based out in Tel Aviv, Israel. It is known for research contribution in the topics: Artificial neural network & Language model. The organization has 7856 authors who have published 10906 publications receiving 570123 citations. The organization is also known as: facebook.com & FB.

...read moreread less

Topics: Artificial neural network, Language model, Reinforcement learning, Machine translation, Social network ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Posted Content•

What Does BERT Look At? An Analysis of BERT's Attention

[...]

Kevin Clark¹, Urvashi Khandelwal¹, Omer Levy¹, Christopher D. Manning²•Institutions (2)

Stanford University¹, Facebook²

11 Jun 2019-arXiv: Computation and Language

TL;DR: It is shown that certain attention heads correspond well to linguistic notions of syntax and coreference, and an attention-based probing classifier is proposed and used to demonstrate that substantial syntactic information is captured in BERT’s attention.

...read moreread less

Abstract: Large pre-trained neural networks such as BERT have had great recent success in NLP, motivating a growing body of research investigating what aspects of language they are able to learn from unlabeled data. Most recent analysis has focused on model outputs (e.g., language model surprisal) or internal vector representations (e.g., probing classifiers). Complementary to these works, we propose methods for analyzing the attention mechanisms of pre-trained models and apply them to BERT. BERT's attention heads exhibit patterns such as attending to delimiter tokens, specific positional offsets, or broadly attending over the whole sentence, with heads in the same layer often exhibiting similar behaviors. We further show that certain attention heads correspond well to linguistic notions of syntax and coreference. For example, we find heads that attend to the direct objects of verbs, determiners of nouns, objects of prepositions, and coreferent mentions with remarkably high accuracy. Lastly, we propose an attention-based probing classifier and use it to further demonstrate that substantial syntactic information is captured in BERT's attention.

...read moreread less

701 citations

Journal Article•DOI•

Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences

[...]

Alexander Rives¹, Alexander Rives², Joshua Meier², Tom Sercu², Siddharth Goyal², Zeming Lin¹, Jason Liu², Demi Guo³, Myle Ott², C. Lawrence Zitnick², Jerry Ma⁴, Jerry Ma⁵, Rob Fergus¹ - Show less +9 more•Institutions (5)

New York University¹, Facebook², Harvard University³, Yale University⁴, University of Chicago⁵

13 Apr 2021-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: This paper used unsupervised learning to train a deep contextual language model on 86 billion amino acids across 250 million protein sequences spanning evolutionary diversity, which contains information about biological properties in its representations.

...read moreread less

Abstract: In the field of artificial intelligence, a combination of scale in data and model capacity enabled by unsupervised learning has led to major advances in representation learning and statistical generation In the life sciences, the anticipated growth of sequencing promises unprecedented data on natural sequence diversity Protein language modeling at the scale of evolution is a logical step toward predictive and generative artificial intelligence for biology To this end, we use unsupervised learning to train a deep contextual language model on 86 billion amino acids across 250 million protein sequences spanning evolutionary diversity The resulting model contains information about biological properties in its representations The representations are learned from sequence data alone The learned representation space has a multiscale organization reflecting structure from the level of biochemical properties of amino acids to remote homology of proteins Information about secondary and tertiary structure is encoded in the representations and can be identified by linear projections Representation learning produces features that generalize across a range of applications, enabling state-of-the-art supervised prediction of mutational effect and secondary structure and improving state-of-the-art features for long-range contact prediction

...read moreread less

700 citations

Journal Article•DOI•

Label-Embedding for Image Classification

[...]

Zeynep Akata¹, Florent Perronnin², Zaid Harchaoui³, Cordelia Schmid³•Institutions (3)

Max Planck Society¹, Facebook², French Institute for Research in Computer Science and Automation³

01 Jul 2016-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This work proposes to view attribute-based image classification as a label-embedding problem: each class is embedded in the space of attribute vectors, and introduces a function that measures the compatibility between an image and a label embedding.

...read moreread less

Abstract: Attributes act as intermediate representations that enable parameter sharing between classes, a must when training data is scarce. We propose to view attribute-based image classification as a label-embedding problem: each class is embedded in the space of attribute vectors. We introduce a function that measures the compatibility between an image and a label embedding. The parameters of this function are learned on a training set of labeled samples to ensure that, given an image, the correct classes rank higher than the incorrect ones. Results on the Animals With Attributes and Caltech-UCSD-Birds datasets show that the proposed framework outperforms the standard Direct Attribute Prediction baseline in a zero-shot learning scenario. Label embedding enjoys a built-in ability to leverage alternative sources of information instead of or in addition to attributes, such as, e.g., class hierarchies or textual descriptions. Moreover, label embedding encompasses the whole range of learning settings from zero-shot learning to regular learning with a large number of labeled examples.

...read moreread less

699 citations

Proceedings Article•

Gradient Episodic Memory for Continual Learning

[...]

David Lopez-Paz¹, Marc'Aurelio Ranzato²•Institutions (2)

Max Planck Society¹, Facebook²

01 Jun 2017

TL;DR: In this paper, Gradient Episodic Memory (GEM) is proposed for continual learning, where the model observes, once and one by one, examples concerning a sequence of tasks.

...read moreread less

Abstract: One major obstacle towards AI is the poor ability of models to solve new problems quicker, and without forgetting previously acquired knowledge. To better understand this issue, we study the problem of continual learning, where the model observes, once and one by one, examples concerning a sequence of tasks. First, we propose a set of metrics to evaluate models learning over a continuum of data. These metrics characterize models not only by their test accuracy, but also in terms of their ability to transfer knowledge across tasks. Second, we propose a model for continual learning, called Gradient Episodic Memory (GEM) that alleviates forgetting, while allowing beneficial transfer of knowledge to previous tasks. Our experiments on variants of the MNIST and CIFAR-100 datasets demonstrate the strong performance of GEM when compared to the state-of-the-art.

...read moreread less

696 citations

Proceedings Article•DOI•

Dense Passage Retrieval for Open-Domain Question Answering

[...]

Vladimir Karpukhin¹, Barlas Oguz¹, Sewon Min², Patrick S. H. Lewis¹, Ledell Wu, Sergey Edunov¹, Danqi Chen³, Wen-tau Yih¹ - Show less +4 more•Institutions (3)

Facebook¹, University of Washington², Princeton University³

10 Apr 2020

TL;DR: In this paper, a dual-encoder framework is proposed to learn dense representations from a small number of questions and passages by a simple dual encoder framework, which outperforms a strong Lucene-BM25 system greatly.

...read moreread less

Abstract: Open-domain question answering relies on efficient passage retrieval to select candidate contexts, where traditional sparse vector space models, such as TF-IDF or BM25, are the de facto method. In this work, we show that retrieval can be practically implemented using dense representations alone, where embeddings are learned from a small number of questions and passages by a simple dual-encoder framework. When evaluated on a wide range of open-domain QA datasets, our dense retriever outperforms a strong Lucene-BM25 system greatly by 9%-19% absolute in terms of top-20 passage retrieval accuracy, and helps our end-to-end QA system establish new state-of-the-art on multiple open-domain QA benchmarks.

...read moreread less

695 citations

Collapse

Authors

Showing all 7875 results

Name	H-index	Papers	Citations
Yoshua Bengio	202	1033	420313
Xiang Zhang	154	1733	117576
Jitendra Malik	151	493	165087
Trevor Darrell	148	678	181113
Christopher D. Manning	138	499	147595
Robert W. Heath	128	1049	73171
Pieter Abbeel	126	589	70911
Yann LeCun	121	369	171211
Li Fei-Fei	120	420	145574
Jon Kleinberg	117	444	87865
Sergey Levine	115	652	59769
Richard Szeliski	113	359	72019
Sanjeev Kumar	113	1325	54386
Bruce Neal	108	561	87213
Larry S. Davis	107	693	49714

Network Information

Related Institutions (5)

Google

39.8K papers, 2.1M citations

98% related

Microsoft

86.9K papers, 4.1M citations

96% related

Adobe Systems

8K papers, 214.7K citations

94% related

Carnegie Mellon University

104.3K papers, 5.9M citations

38.6K papers, 1.3M citations

90% related

Performance

Metrics

10,939

Papers

851,954

Citations

No. of papers from the Institution in previous years
Year	Papers
2024	1
2022	37
2021	1,738
2020	2,017
2019	1,607
2018	1,229