Institution
Company•Tel Aviv, Israel•
About: Facebook is a company organization based out in Tel Aviv, Israel. It is known for research contribution in the topics: Artificial neural network & Language model. The organization has 7856 authors who have published 10906 publications receiving 570123 citations. The organization is also known as: facebook.com & FB.
Topics: Artificial neural network, Language model, Reinforcement learning, Machine translation, Social network
Papers published on a yearly basis
Papers
More filters
•
TL;DR: A unified deformation model is presented for the markerless capture of human movement at multiple scales, including facial expressions, body motion, and hand gestures, which enables the full expression of part movements by a single seamless model.
Abstract: We present a unified deformation model for the markerless capture of multiple scales of human movement, including facial expressions, body motion, and hand gestures. An initial model is generated by locally stitching together models of the individual parts of the human body, which we refer to as the "Frankenstein" model. This model enables the full expression of part movements, including face and hands by a single seamless model. Using a large-scale capture of people wearing everyday clothes, we optimize the Frankenstein model to create "Adam". Adam is a calibrated model that shares the same skeleton hierarchy as the initial model but can express hair and clothing geometry, making it directly usable for fitting people as they normally appear in everyday life. Finally, we demonstrate the use of these models for total motion tracking, simultaneously capturing the large-scale body movements and the subtle face and hand motion of a social group of people.
314 citations
••
28 Mar 2019TL;DR: It is demonstrated that the tensor view leads to large gains over baselines that ignore this structure, and leads to results comparable to Mask R-CNN, suggesting that TensorMask can serve as a foundation for novel advances in dense mask prediction and a more complete understanding of the task.
Abstract: Sliding-window object detectors that generate bounding-box object predictions over a dense, regular grid have advanced rapidly and proven popular. In contrast, modern instance segmentation approaches are dominated by methods that first detect object bounding boxes, and then crop and segment these regions, as popularized by Mask R-CNN. In this work, we investigate the paradigm of dense sliding-window instance segmentation, which is surprisingly under-explored. Our core observation is that this task is fundamentally different than other dense prediction tasks such as semantic segmentation or bounding-box object detection, as the output at every spatial location is itself a geometric structure with its own spatial dimensions. To formalize this, we treat dense instance segmentation as a prediction task over 4D tensors and present a general framework called TensorMask that explicitly captures this geometry and enables novel operators on 4D tensors. We demonstrate that the tensor view leads to large gains over baselines that ignore this structure, and leads to results comparable to Mask R-CNN. These promising results suggest that TensorMask can serve as a foundation for novel advances in dense mask prediction and a more complete understanding of the task. Code will be made available.
314 citations
••
15 Jun 2019TL;DR: In this article, a long-term feature bank is proposed to augment state-of-the-art video models that otherwise would only view short clips of 2-5 seconds, enabling existing video models to relate the present to the past, and put events in context.
Abstract: To understand the world, we humans constantly need to relate the present to the past, and put events in context. In this paper, we enable existing video models to do the same. We propose a long-term feature bank—supportive information extracted over the entire span of a video—to augment state-of-the-art video models that otherwise would only view short clips of 2-5 seconds. Our experiments demonstrate that augmenting 3D convolutional networks with a long-term feature bank yields state-of-the-art results on three challenging video datasets: AVA, EPIC-Kitchens, and Charades. Code is available online.
313 citations
•
01 Nov 2019TL;DR: An automatic pipeline to extract massive high-quality monolingual datasets from Common Crawl for a variety of languages by following the data processing introduced in fastText, that deduplicates documents and identifies their language.
Abstract: Pre-training text representations have led to significant improvements in many areas of natural language processing. The quality of these models benefits greatly from the size of the pretraining corpora as long as its quality is preserved. In this paper, we describe an automatic pipeline to extract massive high-quality monolingual datasets from Common Crawl for a variety of languages. Our pipeline follows the data processing introduced in fastText (Mikolov et al., 2017; Grave et al., 2018), that deduplicates documents and identifies their language. We augment this pipeline with a filtering step to select documents that are close to high quality corpora like Wikipedia.
313 citations
•
TL;DR: An interactive visualization tool called Captum Insights that is built on top of Captum library and allows sample-based model debugging and visualization using feature importance metrics and is designed for easy understanding and use.
Abstract: In this paper we introduce a novel, unified, open-source model interpretability library for PyTorch [12]. The library contains generic implementations of a number of gradient and perturbation-based attribution algorithms, also known as feature, neuron and layer importance algorithms, as well as a set of evaluation metrics for these algorithms. It can be used for both classification and non-classification models including graph-structured models built on Neural Networks (NN). In this paper we give a high-level overview of supported attribution algorithms and show how to perform memory-efficient and scalable computations. We emphasize that the three main characteristics of the library are multimodality, extensibility and ease of use. Multimodality supports different modality of inputs such as image, text, audio or video. Extensibility allows adding new algorithms and features. The library is also designed for easy understanding and use. Besides, we also introduce an interactive visualization tool called Captum Insights that is built on top of Captum library and allows sample-based model debugging and visualization using feature importance metrics.
312 citations
Authors
Showing all 7875 results
Name | H-index | Papers | Citations |
---|---|---|---|
Yoshua Bengio | 202 | 1033 | 420313 |
Xiang Zhang | 154 | 1733 | 117576 |
Jitendra Malik | 151 | 493 | 165087 |
Trevor Darrell | 148 | 678 | 181113 |
Christopher D. Manning | 138 | 499 | 147595 |
Robert W. Heath | 128 | 1049 | 73171 |
Pieter Abbeel | 126 | 589 | 70911 |
Yann LeCun | 121 | 369 | 171211 |
Li Fei-Fei | 120 | 420 | 145574 |
Jon Kleinberg | 117 | 444 | 87865 |
Sergey Levine | 115 | 652 | 59769 |
Richard Szeliski | 113 | 359 | 72019 |
Sanjeev Kumar | 113 | 1325 | 54386 |
Bruce Neal | 108 | 561 | 87213 |
Larry S. Davis | 107 | 693 | 49714 |