Institution

Facebook

Company•Tel Aviv, Israel•

About: Facebook is a company organization based out in Tel Aviv, Israel. It is known for research contribution in the topics: Artificial neural network & Language model. The organization has 7856 authors who have published 10906 publications receiving 570123 citations. The organization is also known as: facebook.com & FB.

...read moreread less

Topics: Artificial neural network, Language model, Reinforcement learning, Machine translation, Social network ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•

RUBi: Reducing Unimodal Biases for Visual Question Answering

[...]

Remi Cadene, Corentin Dancette, Hedi Ben-younes, Matthieu Cord, Devi Parikh¹, Devi Parikh² - Show less +2 more•Institutions (2)

Facebook¹, Georgia Institute of Technology²

08 Dec 2019

TL;DR: RUBi, a new learning strategy to reduce biases in any VQA model, is proposed, which reduces the importance of the most biased examples, i.e. examples that can be correctly classified without looking at the image.

...read moreread less

Abstract: Visual Question Answering (VQA) is the task of answering questions about an image. Some VQA models often exploit unimodal biases to provide the correct answer without using the image information. As a result, they suffer from a huge drop in performance when evaluated on data outside their training set distribution. This critical issue makes them unsuitable for real-world settings. We propose RUBi, a new learning strategy to reduce biases in any VQA model. It reduces the importance of the most biased examples, i.e. examples that can be correctly classified without looking at the image. It implicitly forces the VQA model to use the two input modalities instead of relying on statistical regularities between the question and the answer. We leverage a question-only model that captures the language biases by identifying when these unwanted regularities are used. It prevents the base VQA model from learning them by influencing its predictions. This leads to dynamically adjusting the loss in order to compensate for biases. We validate our contributions by surpassing the current state-of-the-art results on VQA-CP v2. This dataset is specifically designed to assess the robustness of VQA models when exposed to different question biases at test time than what was seen during training.

...read moreread less

226 citations

Posted Content•

Self-Supervised Learning by Cross-Modal Audio-Video Clustering

[...]

Humam Alwassel¹, Dhruv Mahajan², Bruno Korbar², Lorenzo Torresani², Bernard Ghanem¹, Du Tran² - Show less +2 more•Institutions (2)

King Abdullah University of Science and Technology¹, Facebook²

28 Nov 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: XDC as discussed by the authors leverages unsupervised clustering in one modality (e.g., audio) as a supervisory signal for the other modality, which helps XDC utilize the semantic correlation and the differences between the two modalities.

...read moreread less

Abstract: Visual and audio modalities are highly correlated, yet they contain different information. Their strong correlation makes it possible to predict the semantics of one from the other with good accuracy. Their intrinsic differences make cross-modal prediction a potentially more rewarding pretext task for self-supervised learning of video and audio representations compared to within-modality learning. Based on this intuition, we propose Cross-Modal Deep Clustering (XDC), a novel self-supervised method that leverages unsupervised clustering in one modality (e.g., audio) as a supervisory signal for the other modality (e.g., video). This cross-modal supervision helps XDC utilize the semantic correlation and the differences between the two modalities. Our experiments show that XDC outperforms single-modality clustering and other multi-modal variants. XDC achieves state-of-the-art accuracy among self-supervised methods on multiple video and audio benchmarks. Most importantly, our video model pretrained on large-scale unlabeled data significantly outperforms the same model pretrained with full-supervision on ImageNet and Kinetics for action recognition on HMDB51 and UCF101. To the best of our knowledge, XDC is the first self-supervised learning method that outperforms large-scale fully-supervised pretraining for action recognition on the same architecture.

...read moreread less

225 citations

Proceedings Article•DOI•

An analysis of Facebook photo caching

[...]

Qi Huang¹, Kenneth P. Birman¹, Robbert van Renesse¹, Wyatt Lloyd², Sanjeev Kumar³, Harry C. Li³ - Show less +2 more•Institutions (3)

Cornell University¹, Princeton University², Facebook³

03 Nov 2013

TL;DR: This paper instrumented every Facebook-controlled layer of the stack and sampled the resulting event stream to obtain traces covering over 77 million requests for more than 1 million unique photos to study traffic patterns, cache access patterns, geolocation of clients and servers, and to explore correlation between properties of the content and accesses.

...read moreread less

Abstract: This paper examines the workload of Facebook's photo-serving stack and the effectiveness of the many layers of caching it employs Facebook's image-management infrastructure is complex and geographically distributed It includes browser caches on end-user systems, Edge Caches at ~20 PoPs, an Origin Cache, and for some kinds of images, additional caching via Akamai The underlying image storage layer is widely distributed, and includes multiple data centersWe instrumented every Facebook-controlled layer of the stack and sampled the resulting event stream to obtain traces covering over 77 million requests for more than 1 million unique photos This permits us to study traffic patterns, cache access patterns, geolocation of clients and servers, and to explore correlation between properties of the content and accesses Our results (1) quantify the overall traffic percentages served by different layers: 655% browser cache, 200% Edge Cache, 46% Origin Cache, and 99% Backend storage, (2) reveal that a significant portion of photo requests are routed to remote PoPs and data centers as a consequence both of load-balancing and peering policy, (3) demonstrate the potential performance benefits of coordinating Edge Caches and adopting S4LRU eviction algorithms at both Edge and Origin layers, and (4) show that the popularity of photos is highly dependent on content age and conditionally dependent on the social-networking metrics we considered

...read moreread less

225 citations

Book Chapter•DOI•

SqueezeSegV3: Spatially-Adaptive Convolution for Efficient Point-Cloud Segmentation

[...]

Chenfeng Xu¹, Bichen Wu², Zining Wang¹, Wei Zhan¹, Peter Vajda², Kurt Keutzer¹, Masayoshi Tomizuka¹ - Show less +3 more•Institutions (2)

University of California, Berkeley¹, Facebook²

23 Aug 2020

TL;DR: Li et al. as mentioned in this paper proposed Spatially-Adaptive Convolution (SAC) to adopt different filters for different locations according to the input image, which can be implemented as a series of element-wise multiplications, im2col, and standard convolution.

...read moreread less

Abstract: LiDAR point-cloud segmentation is an important problem for many applications. For large-scale point cloud segmentation, the de facto method is to project a 3D point cloud to get a 2D LiDAR image and use convolutions to process it. Despite the similarity between regular RGB and LiDAR images, we are the first to discover that the feature distribution of LiDAR images changes drastically at different image locations. Using standard convolutions to process such LiDAR images is problematic, as convolution filters pick up local features that are only active in specific regions in the image. As a result, the capacity of the network is under-utilized and the segmentation performance decreases. To fix this, we propose Spatially-Adaptive Convolution (SAC) to adopt different filters for different locations according to the input image. SAC can be computed efficiently since it can be implemented as a series of element-wise multiplications, im2col, and standard convolution. It is a general framework such that several previous methods can be seen as special cases of SAC. Using SAC, we build SqueezeSegV3 for LiDAR point-cloud segmentation and outperform all previous published methods by at least 2.0% mIoU on the SemanticKITTI benchmark. Code and pretrained model are available at https://github.com/chenfengxu714/SqueezeSegV3.

...read moreread less

224 citations

Proceedings Article•DOI•

Transformer-Based Acoustic Modeling for Hybrid Speech Recognition

[...]

Yongqiang Wang¹, Abdelrahman Mohamed¹, Due Le¹, Chunxi Liu¹, Alex Xiao¹, Jay Mahadeokar¹, Hongzhao Huang¹, Andros Tjandra², Xiaohui Zhang¹, Frank Zhang¹, Christian Fuegen¹, Geoffrey Zweig¹, Michael L. Seltzer¹ - Show less +9 more•Institutions (2)

Facebook¹, Nara Institute of Science and Technology²

04 May 2020

TL;DR: This article proposed and evaluated transformer-based acoustic models (AMs) for hybrid speech recognition, including various positional embedding methods and an iterated loss to enable training deep transformers.

...read moreread less

Abstract: We propose and evaluate transformer-based acoustic models (AMs) for hybrid speech recognition. Several modeling choices are discussed in this work, including various positional embedding methods and an iterated loss to enable training deep transformers. We also present a preliminary study of using limited right context in transformer models, which makes it possible for streaming applications. We demonstrate that on the widely used Librispeech benchmark, our transformer-based AM outperforms the best published hybrid result by 19% to 26% relative when the standard n-gram language model (LM) is used. Combined with neural network LM for rescoring, our proposed approach achieves state-of-the-art results on Librispeech. Our findings are also confirmed on a much larger internal dataset.

...read moreread less

224 citations

Collapse

Authors

Showing all 7875 results

Name	H-index	Papers	Citations
Yoshua Bengio	202	1033	420313
Xiang Zhang	154	1733	117576
Jitendra Malik	151	493	165087
Trevor Darrell	148	678	181113
Christopher D. Manning	138	499	147595
Robert W. Heath	128	1049	73171
Pieter Abbeel	126	589	70911
Yann LeCun	121	369	171211
Li Fei-Fei	120	420	145574
Jon Kleinberg	117	444	87865
Sergey Levine	115	652	59769
Richard Szeliski	113	359	72019
Sanjeev Kumar	113	1325	54386
Bruce Neal	108	561	87213
Larry S. Davis	107	693	49714

Network Information

Related Institutions (5)

Google

39.8K papers, 2.1M citations

98% related

Microsoft

86.9K papers, 4.1M citations

96% related

Adobe Systems

8K papers, 214.7K citations

94% related

Carnegie Mellon University

104.3K papers, 5.9M citations

38.6K papers, 1.3M citations

90% related

Performance

Metrics

10,939

Papers

851,954

Citations

No. of papers from the Institution in previous years
Year	Papers
2024	1
2022	37
2021	1,738
2020	2,017
2019	1,607
2018	1,229