Institution

Facebook

Company•Tel Aviv, Israel•

About: Facebook is a company organization based out in Tel Aviv, Israel. It is known for research contribution in the topics: Computer science & Artificial neural network. The organization has 7856 authors who have published 10906 publications receiving 570123 citations. The organization is also known as: facebook.com & FB.

...read moreread less

Topics: Computer science, Artificial neural network, Language model, Context (language use), Reinforcement learning ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•

Gradient Descent Learns One-hidden-layer CNN: Don’t be Afraid of Spurious Local Minima

[...]

Simon S. Du¹, Jason D. Lee², Yuandong Tian³, Barnabás Póczos¹, Aarti Singh¹ - Show less +1 more•Institutions (3)

Carnegie Mellon University¹, University of Southern California², Facebook³

03 Jul 2018

TL;DR: In this paper, the authors consider the problem of learning a one-hidden-layer neural network with non-overlapping convolutional layer and ReLU activation, and prove that with Gaussian input, there is a spurious local minimizer.

...read moreread less

Abstract: We consider the problem of learning a one-hidden-layer neural network with non-overlapping convolutional layer and ReLU activation, i.e., $f(\mathbf{Z}, \mathbf{w}, \mathbf{a}) = \sum_j a_j\sigma(\mathbf{w}^T\mathbf{Z}_j)$, in which both the convolutional weights $\mathbf{w}$ and the output weights $\mathbf{a}$ are parameters to be learned. When the labels are the outputs from a teacher network of the same architecture with fixed weights $(\mathbf{w}^*, \mathbf{a}^*)$, we prove that with Gaussian input $\mathbf{Z}$, there is a spurious local minimizer. Surprisingly, in the presence of the spurious local minimizer, gradient descent with weight normalization from randomly initialized weights can still be proven to recover the true parameters with constant probability, which can be boosted to probability $1$ with multiple restarts. We also show that with constant probability, the same procedure could also converge to the spurious local minimum, showing that the local minimum plays a non-trivial role in the dynamics of gradient descent. Furthermore, a quantitative analysis shows that the gradient descent dynamics has two phases: it starts off slow, but converges much faster after several iterations.

...read moreread less

116 citations

Book Chapter•DOI•

Deep Burst Denoising

[...]

Clément Godard¹, Kevin Matzen², Matthew T. Uyttendaele²•Institutions (2)

University College London¹, Facebook²

08 Sep 2018

TL;DR: In this paper, a recurrent fully convolutional deep neural network (CNN) is proposed to integrate multiple short (thus noisy) frames in a burst and intelligently integrate the content, thus avoiding the above downsides.

...read moreread less

Abstract: Noise is an inherent issue of low-light image capture, which is worsened on mobile devices due to their narrow apertures and small sensors. One strategy for mitigating noise in low-light situations is to increase the shutter time, allowing each photosite to integrate more light and decrease noise variance. However, there are two downsides of long exposures: (a) bright regions can exceed the sensor range, and (b) camera and scene motion will cause blur. Another way of gathering more light is to capture multiple short (thus noisy) frames in a burst and intelligently integrate the content, thus avoiding the above downsides. In this paper, we use the burst-capture strategy and implement the intelligent integration via a recurrent fully convolutional deep neural net (CNN). We build our novel, multi-frame architecture to be a simple addition to any single frame denoising model. The resulting architecture denoises all frames in a sequence of arbitrary length. We show that it achieves state of the art denoising results on our burst dataset, improving on the best published multi-frame techniques, such as VBM4D and FlexISP. Finally, we explore other applications of multi-frame image enhancement and show that our CNN architecture generalizes well to image super-resolution.

...read moreread less

116 citations

Posted Content•

Identifying Mislabeled Data using the Area Under the Margin Ranking

[...]

Geoff Pleiss¹, Tianyi Zhang², Ethan R. Elenberg³, Kilian Q. Weinberger⁴•Institutions (4)

Columbia University¹, Cornell University², University of Texas at Austin³, Facebook⁴

28 Jan 2020-arXiv: Learning

TL;DR: A new method to identify overly ambiguous or outrightly mislabeled samples and mitigate their impact when training neural networks is introduced, at the heart of which is the Area Under the Margin (AUM) statistic.

...read moreread less

Abstract: Not all data in a typical training set help with generalization; some samples can be overly ambiguous or outrightly mislabeled. This paper introduces a new method to identify such samples and mitigate their impact when training neural networks. At the heart of our algorithm is the Area Under the Margin (AUM) statistic, which exploits differences in the training dynamics of clean and mislabeled samples. A simple procedure - adding an extra class populated with purposefully mislabeled threshold samples - learns a AUM upper bound that isolates mislabeled data. This approach consistently improves upon prior work on synthetic and real-world datasets. On the WebVision50 classification task our method removes 17% of training data, yielding a 1.6% (absolute) improvement in test error. On CIFAR100 removing 13% of the data leads to a 1.2% drop in error.

...read moreread less

116 citations

Proceedings Article•DOI•

GrandSLAm: Guaranteeing SLAs for Jobs in Microservices Execution Frameworks

[...]

Ram Srivatsa Kannan¹, Lavanya Subramanian², Ashwin Raju³, Jeongseob Ahn⁴, Jason Mars¹, Lingjia Tang¹ - Show less +2 more•Institutions (4)

University of Michigan¹, Facebook², University of Texas at Arlington³, Ajou University⁴

25 Mar 2019

TL;DR: This study presents GrandSLAm, a microservice execution framework that improves utilization of datacenters hosting microservices, and significantly increases throughput by up to 3x compared to the baseline, without violating SLAs for a wide range of real-world AI and ML applications.

...read moreread less

Abstract: The microservice architecture has dramatically reduced user effort in adopting and maintaining servers by providing a catalog of functions as services that can be used as building blocks to construct applications. This has enabled datacenter operators to look at managing datacenter hosting microservices quite differently from traditional infrastructures. Such a paradigm shift calls for a need to rethink resource management strategies employed in such execution environments. We observe that the visibility enabled by a microservices execution framework can be exploited to achieve high throughput and resource utilization while still meeting Service Level Agreements, especially in multi-tenant execution scenarios. In this study, we present GrandSLAm, a microservice execution framework that improves utilization of datacenters hosting microservices. GrandSLAm estimates time of completion of requests propagating through individual microservice stages within an application. It then leverages this estimate to drive a runtime system that dynamically batches and reorders requests at each microservice in a manner where individual jobs meet their respective target latency while achieving high throughput. GrandSLAm significantly increases throughput by up to 3x compared to the our baseline, without violating SLAs for a wide range of real-world AI and ML applications.

...read moreread less

116 citations

Posted Content•

Sequence-Level Knowledge Distillation

[...]

Yoon Kim¹, Alexander M. Rush²•Institutions (2)

Harvard University¹, Facebook²

25 Jun 2016-arXiv: Computation and Language

TL;DR: The authors applied knowledge distillation to neural machine translation (NMT) and showed that it can reduce the size of neural models in other domains to the problem of NMT by reducing the number of parameters.

...read moreread less

Abstract: Neural machine translation (NMT) offers a novel alternative formulation of translation that is potentially simpler than statistical approaches. However to reach competitive performance, NMT models need to be exceedingly large. In this paper we consider applying knowledge distillation approaches (Bucila et al., 2006; Hinton et al., 2015) that have proven successful for reducing the size of neural models in other domains to the problem of NMT. We demonstrate that standard knowledge distillation applied to word-level prediction can be effective for NMT, and also introduce two novel sequence-level versions of knowledge distillation that further improve performance, and somewhat surprisingly, seem to eliminate the need for beam search (even when applied on the original teacher model). Our best student model runs 10 times faster than its state-of-the-art teacher with little loss in performance. It is also significantly better than a baseline model trained without knowledge distillation: by 4.2/1.7 BLEU with greedy decoding/beam search. Applying weight pruning on top of knowledge distillation results in a student model that has 13 times fewer parameters than the original teacher model, with a decrease of 0.4 BLEU.

...read moreread less

116 citations

Collapse

Authors

Showing all 7875 results

Name	H-index	Papers	Citations
Yoshua Bengio	202	1033	420313
Xiang Zhang	154	1733	117576
Jitendra Malik	151	493	165087
Trevor Darrell	148	678	181113
Christopher D. Manning	138	499	147595
Robert W. Heath	128	1049	73171
Pieter Abbeel	126	589	70911
Yann LeCun	121	369	171211
Li Fei-Fei	120	420	145574
Jon Kleinberg	117	444	87865
Sergey Levine	115	652	59769
Richard Szeliski	113	359	72019
Sanjeev Kumar	113	1325	54386
Bruce Neal	108	561	87213
Larry S. Davis	107	693	49714

Network Information

Related Institutions (5)

Google

39.8K papers, 2.1M citations

98% related

Microsoft

86.9K papers, 4.1M citations

96% related

Adobe Systems

8K papers, 214.7K citations

94% related

Carnegie Mellon University

104.3K papers, 5.9M citations

38.6K papers, 1.3M citations

90% related

Performance

Metrics

10,939

Papers

851,954

Citations

No. of papers from the Institution in previous years
Year	Papers
2024	1
2022	37
2021	1,738
2020	2,017
2019	1,607
2018	1,229