Institution

Facebook

Company•Tel Aviv, Israel•

About: Facebook is a company organization based out in Tel Aviv, Israel. It is known for research contribution in the topics: Computer science & Artificial neural network. The organization has 7856 authors who have published 10906 publications receiving 570123 citations. The organization is also known as: facebook.com & FB.

...read moreread less

Topics: Computer science, Artificial neural network, Language model, Context (language use), Reinforcement learning ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Hubert: How Much Can a Bad Teacher Benefit ASR Pre-Training?

[...]

Wei-Ning Hsu¹, Yao-Hung Hubert Tsai², Benjamin Bolte¹, Ruslan Salakhutdinov², Abdelrahman Mohamed¹ - Show less +1 more•Institutions (2)

Facebook¹, Carnegie Mellon University²

06 Jun 2021

TL;DR: This paper proposed the Hidden Unit BERT (HUBERT) model, which utilizes a cheap k-means clustering step to provide aligned target labels for pre-training of a BERT model.

...read moreread less

Abstract: Compared to vision and language applications, self-supervised pre-training approaches for ASR are challenged by three unique problems: (1) There are multiple sound units in each input utterance, (2) With audio-only pre-training, there is no lexicon of sound units, and (3) Sound units have variable lengths with no explicit segmentation. In this paper, we propose the Hidden-Unit BERT (HUBERT) model which utilizes a cheap k-means clustering step to provide aligned target labels for pre-training of a BERT model. A key ingredient of our approach is applying the predictive loss over the masked regions only. This allows the pre-training stage to benefit from the consistency of the unsupervised teacher rather that its intrinsic quality. Starting with a simple k-means teacher of 100 cluster, and using two iterations of clustering, the HUBERT model matches the state-of-the-art wav2vec 2.0 performance on the ultra low-resource Libri-light 10h, 1h, 10min supervised subsets.

...read moreread less

106 citations

Patent•

Method and apparatus for a ranking engine

[...]

Timothy D. Tuttle¹, Adam L. Beguelin¹, Peter F. Kocks¹•Institutions (1)

Facebook¹

22 Nov 2005

TL;DR: In this article, a computer-implemented method for ranking files from an Internet search is provided, which comprises assigning a score to each file based on at least one of the following factors: recency, editorial popularity, clickthru popularity, favorites metadata, or collaborative filtering.

...read moreread less

Abstract: A computer-implemented method is provided for ranking files from an Internet search. In one embodiment, the method comprises assigning a score to each file based on at least one of the following factors: recency, editorial popularity, clickthru popularity, favorites metadata, or favorites collaborative filtering. The files may be organized based on the assigned scores to provide users with more accurate search results.

...read moreread less

106 citations

Proceedings Article•DOI•

nocaps: novel object captioning at scale

[...]

Harsh Agrawal¹, Peter Anderson¹, Karan Desai¹, Yufei Wang², Xinlei Chen³, Rishabh Jain¹, Mark Johnson², Dhruv Batra¹, Devi Parikh¹, Stefan Lee¹ - Show less +6 more•Institutions (3)

Georgia Institute of Technology¹, Macquarie University², Facebook³

01 Oct 2019

TL;DR: The nocaps benchmark as discussed by the authors is a large-scale benchmark for object captioning, which consists of 166,100 human-generated captions describing 15,100 images from the Open Images validation and test sets.

...read moreread less

Abstract: Image captioning models have achieved impressive results on datasets containing limited visual concepts and large amounts of paired image-caption training data. However, if these models are to ever function in the wild, a much larger variety of visual concepts must be learned, ideally from less supervision. To encourage the development of image captioning models that can learn visual concepts from alternative data sources, such as object detection datasets, we present the first large-scale benchmark for this task. Dubbed ‘nocaps’, for novel object captioning at scale, our benchmark consists of 166,100 human-generated captions describing 15,100 images from the Open Images validation and test sets. The associated training data consists of COCO image-caption pairs, plus Open Images image-level labels and object bounding boxes. Since Open Images contains many more classes than COCO, nearly 400 object classes seen in test images have no or very few associated training captions (hence, nocaps). We extend existing novel object captioning models to establish strong baselines for this benchmark and provide analysis to guide future work.

...read moreread less

105 citations

Proceedings Article•DOI•

Filtering and Mining Parallel Data in a Joint Multilingual Space

[...]

Holger Schwenk¹•Institutions (1)

Facebook¹

24 May 2018

TL;DR: A joint multilingual sentence embedding is learned and it is used to filter noisy parallel data and to mine for parallel data in large news collections and to identify parallel sentences in comparable corpora on the BUCC shared task.

...read moreread less

Abstract: We learn a joint multilingual sentence embedding and use the distance between sentences in different languages to filter noisy parallel data and to mine for parallel data in large news collections. We are able to improve a competitive baseline on the WMT’14 English to German task by 0.3 BLEU by filtering out 25% of the training data. The same approach is used to mine additional bitexts for the WMT’14 system and to obtain competitive results on the BUCC shared task to identify parallel sentences in comparable corpora. The approach is generic, it can be applied to many language pairs and it is independent of the architecture of the machine translation system.

...read moreread less

105 citations

Proceedings Article•DOI•

Reducing DRAM footprint with NVM in Facebook

[...]

Assaf Eisenman¹, Darryl Gardner², Islam Farid Hamed AbdelRahman², Jens Axboe², Siying Dong², Kim Hazelwood², Chris Petersen², Asaf Cidon¹, Sachin Katti¹ - Show less +5 more•Institutions (2)

Stanford University¹, Facebook²

23 Apr 2018

TL;DR: This work designs a key-value store, MyNVM, which leverages an NVM block device to reduce DRAM usage, and to reduce the total cost of ownership, while providing comparable latency and queries-per-second as MyRocks on a server with a much larger amount of DRAM.

...read moreread less

Abstract: Popular SSD-based key-value stores consume a large amount of DRAM in order to provide high-performance database operations. However, DRAM can be expensive for data center providers, especially given recent global supply shortages that have resulted in increasing DRAM costs. In this work, we design a key-value store, MyNVM, which leverages an NVM block device to reduce DRAM usage, and to reduce the total cost of ownership, while providing comparable latency and queries-per-second (QPS) as MyRocks on a server with a much larger amount of DRAM. Replacing DRAM with NVM introduces several challenges. In particular, NVM has limited read bandwidth, and it wears out quickly under a high write bandwidth. We design novel solutions to these challenges, including using small block sizes with a partitioned index, aligning blocks post-compression to reduce read bandwidth, utilizing dictionary compression, implementing an admission control policy for which objects get cached in NVM to control its durability, as well as replacing interrupts with a hybrid polling mechanism. We implemented MyNVM and measured its performance in Facebook's production environment. Our implementation reduces the size of the DRAM cache from 96 GB to 16 GB, and incurs a negligible impact on latency and queries-per-second compared to MyRocks. Finally, to the best of our knowledge, this is the first study on the usage of NVM devices in a commercial data center environment.

...read moreread less

105 citations

Collapse

Authors

Showing all 7875 results

Name	H-index	Papers	Citations
Yoshua Bengio	202	1033	420313
Xiang Zhang	154	1733	117576
Jitendra Malik	151	493	165087
Trevor Darrell	148	678	181113
Christopher D. Manning	138	499	147595
Robert W. Heath	128	1049	73171
Pieter Abbeel	126	589	70911
Yann LeCun	121	369	171211
Li Fei-Fei	120	420	145574
Jon Kleinberg	117	444	87865
Sergey Levine	115	652	59769
Richard Szeliski	113	359	72019
Sanjeev Kumar	113	1325	54386
Bruce Neal	108	561	87213
Larry S. Davis	107	693	49714

Network Information

Related Institutions (5)

Google

39.8K papers, 2.1M citations

98% related

Microsoft

86.9K papers, 4.1M citations

96% related

Adobe Systems

8K papers, 214.7K citations

94% related

Carnegie Mellon University

104.3K papers, 5.9M citations

38.6K papers, 1.3M citations

90% related

Performance

Metrics

10,939

Papers

851,954

Citations

No. of papers from the Institution in previous years
Year	Papers
2024	1
2022	37
2021	1,738
2020	2,017
2019	1,607
2018	1,229