Institution
Company•Tel Aviv, Israel•
About: Facebook is a company organization based out in Tel Aviv, Israel. It is known for research contribution in the topics: Artificial neural network & Language model. The organization has 7856 authors who have published 10906 publications receiving 570123 citations. The organization is also known as: facebook.com & FB.
Topics: Artificial neural network, Language model, Reinforcement learning, Machine translation, Social network
Papers published on a yearly basis
Papers
More filters
•
08 Dec 2019TL;DR: RUBi, a new learning strategy to reduce biases in any VQA model, is proposed, which reduces the importance of the most biased examples, i.e. examples that can be correctly classified without looking at the image.
Abstract: Visual Question Answering (VQA) is the task of answering questions about an image. Some VQA models often exploit unimodal biases to provide the correct answer without using the image information. As a result, they suffer from a huge drop in performance when evaluated on data outside their training set distribution. This critical issue makes them unsuitable for real-world settings. We propose RUBi, a new learning strategy to reduce biases in any VQA model. It reduces the importance of the most biased examples, i.e. examples that can be correctly classified without looking at the image. It implicitly forces the VQA model to use the two input modalities instead of relying on statistical regularities between the question and the answer. We leverage a question-only model that captures the language biases by identifying when these unwanted regularities are used. It prevents the base VQA model from learning them by influencing its predictions. This leads to dynamically adjusting the loss in order to compensate for biases. We validate our contributions by surpassing the current state-of-the-art results on VQA-CP v2. This dataset is specifically designed to assess the robustness of VQA models when exposed to different question biases at test time than what was seen during training.
226 citations
•
TL;DR: XDC as discussed by the authors leverages unsupervised clustering in one modality (e.g., audio) as a supervisory signal for the other modality, which helps XDC utilize the semantic correlation and the differences between the two modalities.
Abstract: Visual and audio modalities are highly correlated, yet they contain different information. Their strong correlation makes it possible to predict the semantics of one from the other with good accuracy. Their intrinsic differences make cross-modal prediction a potentially more rewarding pretext task for self-supervised learning of video and audio representations compared to within-modality learning. Based on this intuition, we propose Cross-Modal Deep Clustering (XDC), a novel self-supervised method that leverages unsupervised clustering in one modality (e.g., audio) as a supervisory signal for the other modality (e.g., video). This cross-modal supervision helps XDC utilize the semantic correlation and the differences between the two modalities. Our experiments show that XDC outperforms single-modality clustering and other multi-modal variants. XDC achieves state-of-the-art accuracy among self-supervised methods on multiple video and audio benchmarks. Most importantly, our video model pretrained on large-scale unlabeled data significantly outperforms the same model pretrained with full-supervision on ImageNet and Kinetics for action recognition on HMDB51 and UCF101. To the best of our knowledge, XDC is the first self-supervised learning method that outperforms large-scale fully-supervised pretraining for action recognition on the same architecture.
225 citations
••
03 Nov 2013TL;DR: This paper instrumented every Facebook-controlled layer of the stack and sampled the resulting event stream to obtain traces covering over 77 million requests for more than 1 million unique photos to study traffic patterns, cache access patterns, geolocation of clients and servers, and to explore correlation between properties of the content and accesses.
Abstract: This paper examines the workload of Facebook's photo-serving stack and the effectiveness of the many layers of caching it employs Facebook's image-management infrastructure is complex and geographically distributed It includes browser caches on end-user systems, Edge Caches at ~20 PoPs, an Origin Cache, and for some kinds of images, additional caching via Akamai The underlying image storage layer is widely distributed, and includes multiple data centersWe instrumented every Facebook-controlled layer of the stack and sampled the resulting event stream to obtain traces covering over 77 million requests for more than 1 million unique photos This permits us to study traffic patterns, cache access patterns, geolocation of clients and servers, and to explore correlation between properties of the content and accesses Our results (1) quantify the overall traffic percentages served by different layers: 655% browser cache, 200% Edge Cache, 46% Origin Cache, and 99% Backend storage, (2) reveal that a significant portion of photo requests are routed to remote PoPs and data centers as a consequence both of load-balancing and peering policy, (3) demonstrate the potential performance benefits of coordinating Edge Caches and adopting S4LRU eviction algorithms at both Edge and Origin layers, and (4) show that the popularity of photos is highly dependent on content age and conditionally dependent on the social-networking metrics we considered
225 citations
••
23 Aug 2020TL;DR: Li et al. as mentioned in this paper proposed Spatially-Adaptive Convolution (SAC) to adopt different filters for different locations according to the input image, which can be implemented as a series of element-wise multiplications, im2col, and standard convolution.
Abstract: LiDAR point-cloud segmentation is an important problem for many applications. For large-scale point cloud segmentation, the de facto method is to project a 3D point cloud to get a 2D LiDAR image and use convolutions to process it. Despite the similarity between regular RGB and LiDAR images, we are the first to discover that the feature distribution of LiDAR images changes drastically at different image locations. Using standard convolutions to process such LiDAR images is problematic, as convolution filters pick up local features that are only active in specific regions in the image. As a result, the capacity of the network is under-utilized and the segmentation performance decreases. To fix this, we propose Spatially-Adaptive Convolution (SAC) to adopt different filters for different locations according to the input image. SAC can be computed efficiently since it can be implemented as a series of element-wise multiplications, im2col, and standard convolution. It is a general framework such that several previous methods can be seen as special cases of SAC. Using SAC, we build SqueezeSegV3 for LiDAR point-cloud segmentation and outperform all previous published methods by at least 2.0% mIoU on the SemanticKITTI benchmark. Code and pretrained model are available at https://github.com/chenfengxu714/SqueezeSegV3.
224 citations
••
04 May 2020TL;DR: This article proposed and evaluated transformer-based acoustic models (AMs) for hybrid speech recognition, including various positional embedding methods and an iterated loss to enable training deep transformers.
Abstract: We propose and evaluate transformer-based acoustic models (AMs) for hybrid speech recognition. Several modeling choices are discussed in this work, including various positional embedding methods and an iterated loss to enable training deep transformers. We also present a preliminary study of using limited right context in transformer models, which makes it possible for streaming applications. We demonstrate that on the widely used Librispeech benchmark, our transformer-based AM outperforms the best published hybrid result by 19% to 26% relative when the standard n-gram language model (LM) is used. Combined with neural network LM for rescoring, our proposed approach achieves state-of-the-art results on Librispeech. Our findings are also confirmed on a much larger internal dataset.
224 citations
Authors
Showing all 7875 results
Name | H-index | Papers | Citations |
---|---|---|---|
Yoshua Bengio | 202 | 1033 | 420313 |
Xiang Zhang | 154 | 1733 | 117576 |
Jitendra Malik | 151 | 493 | 165087 |
Trevor Darrell | 148 | 678 | 181113 |
Christopher D. Manning | 138 | 499 | 147595 |
Robert W. Heath | 128 | 1049 | 73171 |
Pieter Abbeel | 126 | 589 | 70911 |
Yann LeCun | 121 | 369 | 171211 |
Li Fei-Fei | 120 | 420 | 145574 |
Jon Kleinberg | 117 | 444 | 87865 |
Sergey Levine | 115 | 652 | 59769 |
Richard Szeliski | 113 | 359 | 72019 |
Sanjeev Kumar | 113 | 1325 | 54386 |
Bruce Neal | 108 | 561 | 87213 |
Larry S. Davis | 107 | 693 | 49714 |