200 million+ research papers across 250,000+ topics on SciSpace

Browse all papers

PDF

Open Access

Posted Content•

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

[...]

01 Jan 2015-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: The TensorFlow interface and an implementation of that interface that is built at Google are described, which has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields.

...read moreread less

Abstract: TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery. This paper describes the TensorFlow interface and an implementation of that interface that we have built at Google. The TensorFlow API and a reference implementation were released as an open-source package under the Apache 2.0 license in November, 2015 and are available at www.tensorflow.org.

...read moreread less

10,447 citations

Proceedings Article•DOI•

Xception: Deep Learning with Depthwise Separable Convolutions

[...]

François Chollet¹•Institutions (1)

Google¹

21 Jul 2017

TL;DR: This work proposes a novel deep convolutional neural network architecture inspired by Inception, where Inception modules have been replaced with depthwise separable convolutions, and shows that this architecture, dubbed Xception, slightly outperforms Inception V3 on the ImageNet dataset, and significantly outperforms it on a larger image classification dataset.

...read moreread less

Abstract: We present an interpretation of Inception modules in convolutional neural networks as being an intermediate step in-between regular convolution and the depthwise separable convolution operation (a depthwise convolution followed by a pointwise convolution). In this light, a depthwise separable convolution can be understood as an Inception module with a maximally large number of towers. This observation leads us to propose a novel deep convolutional neural network architecture inspired by Inception, where Inception modules have been replaced with depthwise separable convolutions. We show that this architecture, dubbed Xception, slightly outperforms Inception V3 on the ImageNet dataset (which Inception V3 was designed for), and significantly outperforms Inception V3 on a larger image classification dataset comprising 350 million images and 17,000 classes. Since the Xception architecture has the same number of parameters as Inception V3, the performance gains are not due to increased capacity but rather to a more efficient use of model parameters.

...read moreread less

10,422 citations

Journal Article•DOI•

Global, regional, and national incidence, prevalence, and years lived with disability for 328 diseases and injuries for 195 countries, 1990–2016: a systematic analysis for the Global Burden of Disease Study 2016

[...]

Theo Vos¹, Amanuel Alemu Abajobir, Kalkidan Hassen Abate², Cristiana Abbafati³ +775 more•Institutions (305)

16 Sep 2017-The Lancet

TL;DR: The Global Burden of Diseases, Injuries, and Risk Factors Study 2016 (GBD 2016) provides a comprehensive assessment of prevalence, incidence, and years lived with disability (YLDs) for 328 causes in 195 countries and territories from 1990 to 2016.

...read moreread less

10,401 citations

Book•

Econometric Analysis of Panel Data

[...]

Badi H. Baltagi

28 Apr 2021

TL;DR: In this article, the authors proposed a two-way error component regression model for estimating the likelihood of a particular item in a set of data points in a single-dimensional graph.

...read moreread less

Abstract: Preface.1. Introduction.1.1 Panel Data: Some Examples.1.2 Why Should We Use Panel Data? Their Benefits and Limitations.Note.2. The One-way Error Component Regression Model.2.1 Introduction.2.2 The Fixed Effects Model.2.3 The Random Effects Model.2.4 Maximum Likelihood Estimation.2.5 Prediction.2.6 Examples.2.7 Selected Applications.2.8 Computational Note.Notes.Problems.3. The Two-way Error Component Regression Model.3.1 Introduction.3.2 The Fixed Effects Model.3.3 The Random Effects Model.3.4 Maximum Likelihood Estimation.3.5 Prediction.3.6 Examples.3.7 Selected Applications.Notes.Problems.4. Test of Hypotheses with Panel Data.4.1 Tests for Poolability of the Data.4.2 Tests for Individual and Time Effects.4.3 Hausman's Specification Test.4.4 Further Reading.Notes.Problems.5. Heteroskedasticity and Serial Correlation in the Error Component Model.5.1 Heteroskedasticity.5.2 Serial Correlation.Notes.Problems.6. Seemingly Unrelated Regressions with Error Components.6.1 The One-way Model.6.2 The Two-way Model.6.3 Applications and Extensions.Problems.7. Simultaneous Equations with Error Components.7.1 Single Equation Estimation.7.2 Empirical Example: Crime in North Carolina.7.3 System Estimation.7.4 The Hausman and Taylor Estimator.7.5 Empirical Example: Earnings Equation Using PSID Data.7.6 Extensions.Notes.Problems.8. Dynamic Panel Data Models.8.1 Introduction.8.2 The Arellano and Bond Estimator.8.3 The Arellano and Bover Estimator.8.4 The Ahn and Schmidt Moment Conditions.8.5 The Blundell and Bond System GMM Estimator.8.6 The Keane and Runkle Estimator.8.7 Further Developments.8.8 Empirical Example: Dynamic Demand for Cigarettes.8.9 Further Reading.Notes.Problems.9. Unbalanced Panel Data Models.9.1 Introduction.9.2 The Unbalanced One-way Error Component Model.9.3 Empirical Example: Hedonic Housing.9.4 The Unbalanced Two-way Error Component Model.9.5 Testing for Individual and Time Effects Using Unbalanced Panel Data.9.6 The Unbalanced Nested Error Component Model.Notes.Problems.10. Special Topics.10.1 Measurement Error and Panel Data.10.2 Rotating Panels.10.3 Pseudo-panels.10.4 Alternative Methods of Pooling Time Series of Cross-section Data.10.5 Spatial Panels.10.6 Short-run vs Long-run Estimates in Pooled Models.10.7 Heterogeneous Panels.Notes.Problems.11. Limited Dependent Variables and Panel Data.11.1 Fixed and Random Logit and Probit Models.11.2 Simulation Estimation of Limited Dependent Variable Models with Panel Data.11.3 Dynamic Panel Data Limited Dependent Variable Models.11.4 Selection Bias in Panel Data.11.5 Censored and Truncated Panel Data Models.11.6 Empirical Applications.11.7 Empirical Example: Nurses' Labor Supply.11.8 Further Reading.Notes.Problems.12. Nonstationary Panels.12.1 Introduction.12.2 Panel Unit Roots Tests Assuming Cross-sectional Independence.12.3 Panel Unit Roots Tests Allowing for Cross-sectional Dependence.12.4 Spurious Regression in Panel Data.12.5 Panel Cointegration Tests.12.6 Estimation and Inference in Panel Cointegration Models.12.7 Empirical Example: Purchasing Power Parity.12.8 Further Reading.Notes.Problems.References.Index.

...read moreread less

10,363 citations

Journal Article•DOI•

Safety and Efficacy of the BNT162b2 mRNA Covid-19 Vaccine.

[...]

Fernando P. Polack, Stephen J. Thomas¹, Nicholas Kitchin², Judith Absalon², Alejandra Gurtman², Stephen Lockhart², John L. Perez², Gonzalo Pérez Marc, Edson D. Moreira³, Cristiano Zerbini, Ruth Bailey², Kena A. Swanson², Satrajit Roychoudhury², Kenneth Koury², Ping Li², Warren Kalina², David A. Cooper², Robert W. Frenck⁴, Laura L. Hammitt⁵, Özlem Türeci, Haylene Nell, Axel Schaefer, Serhat Ünal⁶, Dina B. Tresnan², Susan Mather², Philip R. Dormitzer², Ugur Sahin, Kathrin U. Jansen², William C. Gruber² - Show less +25 more•Institutions (6)

State University of New York Upstate Medical University¹, Pfizer², Oswaldo Cruz Foundation³, Cincinnati Children's Hospital Medical Center⁴, Johns Hopkins University⁵, Hacettepe University⁶

10 Dec 2020-The New England Journal of Medicine

TL;DR: A two-dose regimen of BNT162b2 conferred 95% protection against Covid-19 in persons 16 years of age or older and safety over a median of 2 months was similar to that of other viral vaccines.

...read moreread less

Abstract: Background Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection and the resulting coronavirus disease 2019 (Covid-19) have afflicted tens of millions of people in a world...

...read moreread less

10,274 citations

Journal Article•DOI•

Ultrastructural Characterization of the Lower Motor System in a Mouse Model of Krabbe Disease.

[...]

Valentina Cappello¹, Laura Marchetti¹, Paola Parlanti², Paola Parlanti¹, Silvia Landi², Ilaria Tonazzini², Marco Cecchini², Vincenzo Piazza¹, Mauro Gemmi¹ - Show less +5 more•Institutions (2)

Istituto Italiano di Tecnologia¹, Nest Labs²

05 Dec 2016-Scientific Reports

TL;DR: The data further characterize the ultrastructural analysis of the KD mouse model, and support recent theories of a dying-back mechanism for neuronal degeneration, which is independent of demyelination.

...read moreread less

Abstract: Krabbe disease (KD) is a neurodegenerative disorder caused by the lack of β- galactosylceramidase enzymatic activity and by widespread accumulation of the cytotoxic galactosyl-sphingosine in neuronal, myelinating and endothelial cells. Despite the wide use of Twitcher mice as experimental model for KD, the ultrastructure of this model is partial and mainly addressing peripheral nerves. More details are requested to elucidate the basis of the motor defects, which are the first to appear during KD onset. Here we use transmission electron microscopy (TEM) to focus on the alterations produced by KD in the lower motor system at postnatal day 15 (P15), a nearly asymptomatic stage, and in the juvenile P30 mouse. We find mild effects on motorneuron soma, severe ones on sciatic nerves and very severe effects on nerve terminals and neuromuscular junctions at P30, with peripheral damage being already detectable at P15. Finally, we find that the gastrocnemius muscle undergoes atrophy and structural changes that are independent of denervation at P15. Our data further characterize the ultrastructural analysis of the KD mouse model, and support recent theories of a dying-back mechanism for neuronal degeneration, which is independent of demyelination.

...read moreread less

10,233 citations

Proceedings Article•DOI•

Pyramid Scene Parsing Network

[...]

Hengshuang Zhao¹, Jianping Shi², Xiaojuan Qi¹, Xiaogang Wang¹, Jiaya Jia¹ - Show less +1 more•Institutions (2)

The Chinese University of Hong Kong¹, SenseTime²

21 Jul 2017

TL;DR: This paper exploits the capability of global context information by different-region-based context aggregation through the pyramid pooling module together with the proposed pyramid scene parsing network (PSPNet) to produce good quality results on the scene parsing task.

...read moreread less

Abstract: Scene parsing is challenging for unrestricted open vocabulary and diverse scenes. In this paper, we exploit the capability of global context information by different-region-based context aggregation through our pyramid pooling module together with the proposed pyramid scene parsing network (PSPNet). Our global prior representation is effective to produce good quality results on the scene parsing task, while PSPNet provides a superior framework for pixel-level prediction. The proposed approach achieves state-of-the-art performance on various datasets. It came first in ImageNet scene parsing challenge 2016, PASCAL VOC 2012 benchmark and Cityscapes benchmark. A single PSPNet yields the new record of mIoU accuracy 85.4% on PASCAL VOC 2012 and accuracy 80.2% on Cityscapes.

...read moreread less

10,189 citations

Proceedings Article•

Language Models are Few-Shot Learners

[...]

Tom B. Brown¹, Benjamin Mann, Nick Ryder², Melanie Subbiah, Jared Kaplan³, Prafulla Dhariwal¹, Arvind Neelakantan⁴, Pranav Shyam, Girish Sastry¹, Amanda Askell¹, Sandhini Agarwal¹, Ariel Herbert-Voss¹, Gretchen Krueger¹, Thomas Henighan¹, Rewon Child¹, Aditya Ramesh¹, Daniel M. Ziegler⁵, Jeffrey Wu¹, Clemens Winter, Christopher Hesse¹, Mark Chen¹, Eric Sigler, Mateusz Litwin, Scott Gray¹, Benjamin Chess¹, Jack Clark¹, Christopher Berner, Samuel McCandlish¹, Alec Radford¹, Ilya Sutskever¹, Dario Amodei¹ - Show less +27 more•Institutions (5)

OpenAI¹, University of California, Berkeley², Johns Hopkins University³, Google⁴, Massachusetts Institute of Technology⁵

28 May 2020

TL;DR: GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.

...read moreread less

Abstract: Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.

...read moreread less

10,132 citations

Posted Content•

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

[...]

Liang-Chieh Chen¹, George Papandreou¹, Iasonas Kokkinos², Kevin Murphy¹, Alan L. Yuille³ - Show less +1 more•Institutions (3)

Google¹, University College London², Johns Hopkins University³

02 Jun 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: DeepLab as discussed by the authors proposes atrous spatial pyramid pooling (ASPP) to segment objects at multiple scales by probing an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views.

...read moreread less

Abstract: In this work we address the task of semantic image segmentation with Deep Learning and make three main contributions that are experimentally shown to have substantial practical merit. First, we highlight convolution with upsampled filters, or 'atrous convolution', as a powerful tool in dense prediction tasks. Atrous convolution allows us to explicitly control the resolution at which feature responses are computed within Deep Convolutional Neural Networks. It also allows us to effectively enlarge the field of view of filters to incorporate larger context without increasing the number of parameters or the amount of computation. Second, we propose atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP probes an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views, thus capturing objects as well as image context at multiple scales. Third, we improve the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models. The commonly deployed combination of max-pooling and downsampling in DCNNs achieves invariance but has a toll on localization accuracy. We overcome this by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF), which is shown both qualitatively and quantitatively to improve localization performance. Our proposed "DeepLab" system sets the new state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching 79.7% mIOU in the test set, and advances the results on three other datasets: PASCAL-Context, PASCAL-Person-Part, and Cityscapes. All of our code is made publicly available online.

...read moreread less

10,120 citations

Proceedings Article•

PyTorch: An Imperative Style, High-Performance Deep Learning Library

[...]

Adam Paszke¹, Sam Gross², Francisco Massa², Adam Lerer², James Bradbury³, Gregory Chanan², Trevor Killeen⁴, Zeming Lin², Natalia Gimelshein⁵, Luca Antiga⁶, Alban Desmaison⁷, Andreas Kopf⁸, Edward Z. Yang², Zachary DeVito⁹, Martin Raison², Alykhan Tejani¹⁰, Sasank Chilamkurthy, Benoit Steiner², Lu Fang¹¹, Junjie Bai², Soumith Chintala² - Show less +17 more•Institutions (11)

University of Warsaw¹, Facebook², Salesforce.com³, University of Washington⁴, Nvidia⁵, Mario Negri Institute for Pharmacological Research⁶, University of Oxford⁷, ETH Zurich⁸, Stanford University⁹, Twitter¹⁰, Tsinghua University¹¹

01 Jan 2019

TL;DR: This paper details the principles that drove the implementation of PyTorch and how they are reflected in its architecture, and explains how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance.

...read moreread less

Abstract: Deep learning frameworks have often focused on either usability or speed, but not both. PyTorch is a machine learning library that shows that these two goals are in fact compatible: it was designed from first principles to support an imperative and Pythonic programming style that supports code as a model, makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting hardware accelerators such as GPUs. In this paper, we detail the principles that drove the implementation of PyTorch and how they are reflected in its architecture. We emphasize that every aspect of PyTorch is a regular Python program under the full control of its user. We also explain how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance. We demonstrate the efficiency of individual subsystems, as well as the overall speed of PyTorch on several commonly used benchmarks.

...read moreread less

10,045 citations

Journal Article•DOI•

Thematic Analysis: Striving to Meet the Trustworthiness Criteria

[...]

Lorelli Nowell¹, Jill M. Norris¹, Deborah E. White¹, Nancy J. Moules¹•Institutions (1)

University of Calgary¹

02 Oct 2017-The International Journal of Qualitative Methods

TL;DR: The process of conducting a thematic analysis is illustrated through the presentation of an auditable decision trail, guiding interpreting and representing textual data and exploring issues of rigor and trustworthiness.

...read moreread less

Abstract: As qualitative research becomes increasingly recognized and valued, it is imperative that it is conducted in a rigorous and methodical manner to yield meaningful and useful results. To be accepted ...

...read moreread less

Journal Article•DOI•

Planck 2015 results. XIII. Cosmological parameters

[...]

Peter A. R. Ade, Nabila Aghanim, Monique Arnaud, M. Ashdown +257 more

05 Feb 2015-arXiv: Cosmology and Nongalactic Astrophysics

TL;DR: In this paper, the authors present results based on full-mission Planck observations of temperature and polarization anisotropies of the CMB, which are consistent with the six-parameter inflationary LCDM cosmology.

...read moreread less

Abstract: We present results based on full-mission Planck observations of temperature and polarization anisotropies of the CMB. These data are consistent with the six-parameter inflationary LCDM cosmology. From the Planck temperature and lensing data, for this cosmology we find a Hubble constant, H0= (67.8 +/- 0.9) km/s/Mpc, a matter density parameter Omega_m = 0.308 +/- 0.012 and a scalar spectral index with n_s = 0.968 +/- 0.006. (We quote 68% errors on measured parameters and 95% limits on other parameters.) Combined with Planck temperature and lensing data, Planck LFI polarization measurements lead to a reionization optical depth of tau = 0.066 +/- 0.016. Combining Planck with other astrophysical data we find N_ eff = 3.15 +/- 0.23 for the effective number of relativistic degrees of freedom and the sum of neutrino masses is constrained to < 0.23 eV. Spatial curvature is found to be |Omega_K| < 0.005. For LCDM we find a limit on the tensor-to-scalar ratio of r <0.11 consistent with the B-mode constraints from an analysis of BICEP2, Keck Array, and Planck (BKP) data. Adding the BKP data leads to a tighter constraint of r < 0.09. We find no evidence for isocurvature perturbations or cosmic defects. The equation of state of dark energy is constrained to w = -1.006 +/- 0.045. Standard big bang nucleosynthesis predictions for the Planck LCDM cosmology are in excellent agreement with observations. We investigate annihilating dark matter and deviations from standard recombination, finding no evidence for new physics. The Planck results for base LCDM are in agreement with BAO data and with the JLA SNe sample. However the amplitude of the fluctuations is found to be higher than inferred from rich cluster counts and weak gravitational lensing. Apart from these tensions, the base LCDM cosmology provides an excellent description of the Planck CMB observations and many other astrophysical data sets.

...read moreread less

Journal Article•DOI•

Cancer Statistics, 2021.

[...]

Rebecca L. Siegel¹, Kimberly D. Miller¹, Hannah E. Fuchs¹, Ahmedin Jemal¹•Institutions (1)

American Cancer Society¹

01 Jan 2021-CA: A Cancer Journal for Clinicians

TL;DR: In the United States, the cancer death rate has dropped continuously from its peak in 1991 through 2018, for a total decline of 31%, because of reductions in smoking and improvements in early detection and treatment as mentioned in this paper.

...read moreread less

Abstract: Each year, the American Cancer Society estimates the numbers of new cancer cases and deaths in the United States and compiles the most recent data on population-based cancer occurrence. Incidence data (through 2017) were collected by the Surveillance, Epidemiology, and End Results Program; the National Program of Cancer Registries; and the North American Association of Central Cancer Registries. Mortality data (through 2018) were collected by the National Center for Health Statistics. In 2021, 1,898,160 new cancer cases and 608,570 cancer deaths are projected to occur in the United States. After increasing for most of the 20th century, the cancer death rate has fallen continuously from its peak in 1991 through 2018, for a total decline of 31%, because of reductions in smoking and improvements in early detection and treatment. This translates to 3.2 million fewer cancer deaths than would have occurred if peak rates had persisted. Long-term declines in mortality for the 4 leading cancers have halted for prostate cancer and slowed for breast and colorectal cancers, but accelerated for lung cancer, which accounted for almost one-half of the total mortality decline from 2014 to 2018. The pace of the annual decline in lung cancer mortality doubled from 3.1% during 2009 through 2013 to 5.5% during 2014 through 2018 in men, from 1.8% to 4.4% in women, and from 2.4% to 5% overall. This trend coincides with steady declines in incidence (2.2%-2.3%) but rapid gains in survival specifically for nonsmall cell lung cancer (NSCLC). For example, NSCLC 2-year relative survival increased from 34% for persons diagnosed during 2009 through 2010 to 42% during 2015 through 2016, including absolute increases of 5% to 6% for every stage of diagnosis; survival for small cell lung cancer remained at 14% to 15%. Improved treatment accelerated progress against lung cancer and drove a record drop in overall cancer mortality, despite slowing momentum for other common cancers.

...read moreread less

Journal Article•DOI•

Observation of Gravitational Waves from a Binary Black Hole Merger

[...]

B. P. Abbott¹, Richard J. Abbott¹, T. D. Abbott², Matthew Abernathy¹ +1008 more•Institutions (96)

11 Feb 2016-Physical Review Letters

TL;DR: This is the first direct detection of gravitational waves and the first observation of a binary black hole merger, and these observations demonstrate the existence of binary stellar-mass black hole systems.

...read moreread less

Abstract: On September 14, 2015 at 09:50:45 UTC the two detectors of the Laser Interferometer Gravitational-Wave Observatory simultaneously observed a transient gravitational-wave signal. The signal sweeps upwards in frequency from 35 to 250 Hz with a peak gravitational-wave strain of $1.0 \times 10^{-21}$. It matches the waveform predicted by general relativity for the inspiral and merger of a pair of black holes and the ringdown of the resulting single black hole. The signal was observed with a matched-filter signal-to-noise ratio of 24 and a false alarm rate estimated to be less than 1 event per 203 000 years, equivalent to a significance greater than 5.1 {\sigma}. The source lies at a luminosity distance of $410^{+160}_{-180}$ Mpc corresponding to a redshift $z = 0.09^{+0.03}_{-0.04}$. In the source frame, the initial black hole masses are $36^{+5}_{-4} M_\odot$ and $29^{+4}_{-4} M_\odot$, and the final black hole mass is $62^{+4}_{-4} M_\odot$, with $3.0^{+0.5}_{-0.5} M_\odot c^2$ radiated in gravitational waves. All uncertainties define 90% credible intervals.These observations demonstrate the existence of binary stellar-mass black hole systems. This is the first direct detection of gravitational waves and the first observation of a binary black hole merger.

...read moreread less

Proceedings Article•DOI•

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

[...]

R. Qi Charles¹, Hao Su¹, Mo Kaichun¹, Leonidas J. Guibas¹•Institutions (1)

Stanford University¹

21 Jul 2017

TL;DR: This paper designs a novel type of neural network that directly consumes point clouds, which well respects the permutation invariance of points in the input and provides a unified architecture for applications ranging from object classification, part segmentation, to scene semantic parsing.

...read moreread less

Abstract: Point cloud is an important type of geometric data structure. Due to its irregular format, most researchers transform such data to regular 3D voxel grids or collections of images. This, however, renders data unnecessarily voluminous and causes issues. In this paper, we design a novel type of neural network that directly consumes point clouds, which well respects the permutation invariance of points in the input. Our network, named PointNet, provides a unified architecture for applications ranging from object classification, part segmentation, to scene semantic parsing. Though simple, PointNet is highly efficient and effective. Empirically, it shows strong performance on par or even better than state of the art. Theoretically, we provide analysis towards understanding of what the network has learnt and why the network is robust with respect to input perturbation and corruption.

...read moreread less

Proceedings Article•DOI•

MobileNetV2: Inverted Residuals and Linear Bottlenecks

[...]

Mark Sandler¹, Andrew Howard¹, Menglong Zhu¹, Andrey Zhmoginov¹, Liang-Chieh Chen¹ - Show less +1 more•Institutions (1)

Google¹

18 Jun 2018

TL;DR: MobileNetV2 as mentioned in this paper is based on an inverted residual structure where the shortcut connections are between the thin bottleneck layers and intermediate expansion layer uses lightweight depthwise convolutions to filter features as a source of non-linearity.

...read moreread less

Abstract: In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes. We also describe efficient ways of applying these mobile models to object detection in a novel framework we call SSDLite. Additionally, we demonstrate how to build mobile semantic segmentation models through a reduced form of DeepLabv3 which we call Mobile DeepLabv3. is based on an inverted residual structure where the shortcut connections are between the thin bottleneck layers. The intermediate expansion layer uses lightweight depthwise convolutions to filter features as a source of non-linearity. Additionally, we find that it is important to remove non-linearities in the narrow layers in order to maintain representational power. We demonstrate that this improves performance and provide an intuition that led to this design. Finally, our approach allows decoupling of the input/output domains from the expressiveness of the transformation, which provides a convenient framework for further analysis. We measure our performance on ImageNet [1] classification, COCO object detection [2], VOC image segmentation [3]. We evaluate the trade-offs between accuracy, and number of operations measured by multiply-adds (MAdd), as well as actual latency, and the number of parameters.

...read moreread less

Journal Article•DOI•

Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015: elaboration and explanation.

[...]

Larissa Shamseer¹, David Moher¹, Mike Clarke², Davina Ghersi³, Alessandro Liberati, Mark Petticrew⁴, Paul G. Shekelle, Lesley A. Stewart⁵ - Show less +4 more•Institutions (5)

Ottawa Hospital Research Institute¹, Queen's University Belfast², National Health and Medical Research Council³, University of London⁴, University of York⁵

02 Jan 2015-BMJ

TL;DR: The PRISMA-P checklist as mentioned in this paper provides 17 items considered to be essential and minimum components of a systematic review or meta-analysis protocol, as well as a model example from an existing published protocol.

...read moreread less

Abstract: Protocols of systematic reviews and meta-analyses allow for planning and documentation of review methods, act as a guard against arbitrary decision making during review conduct, enable readers to assess for the presence of selective reporting against completed reviews, and, when made publicly available, reduce duplication of efforts and potentially prompt collaboration. Evidence documenting the existence of selective reporting and excessive duplication of reviews on the same or similar topics is accumulating and many calls have been made in support of the documentation and public availability of review protocols. Several efforts have emerged in recent years to rectify these problems, including development of an international register for prospective reviews (PROSPERO) and launch of the first open access journal dedicated to the exclusive publication of systematic review products, including protocols (BioMed Central's Systematic Reviews). Furthering these efforts and building on the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-analyses) guidelines, an international group of experts has created a guideline to improve the transparency, accuracy, completeness, and frequency of documented systematic review and meta-analysis protocols--PRISMA-P (for protocols) 2015. The PRISMA-P checklist contains 17 items considered to be essential and minimum components of a systematic review or meta-analysis protocol.This PRISMA-P 2015 Explanation and Elaboration paper provides readers with a full understanding of and evidence about the necessity of each item as well as a model example from an existing published protocol. This paper should be read together with the PRISMA-P 2015 statement. Systematic review authors and assessors are strongly encouraged to make use of PRISMA-P when drafting and appraising review protocols.

...read moreread less

Journal Article•DOI•

RoB 2: a revised tool for assessing risk of bias in randomised trials.

[...]

Jonathan A C Sterne¹, Jelena Savović¹, Jelena Savović², Matthew J. Page³, Roy G Elbers¹, Natalie S Blencowe¹, Isabelle Boutron⁴, Isabelle Boutron⁵, Isabelle Boutron⁶, Christopher J Cates⁷, Hung-Yuan Cheng¹, Mark Corbett⁸, Sandra Eldridge⁹, Jonathan Emberson¹⁰, Miguel A. Hernán¹¹, Sally Hopewell¹², Asbjørn Hróbjartsson¹³, Asbjørn Hróbjartsson¹⁴, Daniela R Junqueira¹⁵, Peter Jüni¹⁶, Jamie J Kirkham¹⁷, Toby J Lasserson⁴, Tianjing Li¹⁸, Alexandra McAleenan¹, Barnaby C Reeves¹, Sasha Shepperd¹², Ian Shrier¹⁹, Lesley A. Stewart⁸, Kate Tilling¹, Ian R. White²⁰, Penny Whiting², Penny Whiting¹, Julian P T Higgins², Julian P T Higgins¹ - Show less +30 more•Institutions (20)

28 Aug 2019-BMJ

TL;DR: The Cochrane risk-of-bias tool has been updated to respond to developments in understanding how bias arises in randomised trials, and to address user feedback on and limitations of the original tool.

...read moreread less

Abstract: Assessment of risk of bias is regarded as an essential component of a systematic review on the effects of an intervention. The most commonly used tool for randomised trials is the Cochrane risk-of-bias tool. We updated the tool to respond to developments in understanding how bias arises in randomised trials, and to address user feedback on and limitations of the original tool.

...read moreread less

Proceedings Article•DOI•

YOLO9000: Better, Faster, Stronger

[...]

Joseph Redmon¹, Ali Farhadi¹•Institutions (1)

University of Washington¹

21 Jul 2017

TL;DR: YOLO9000 as discussed by the authors is a state-of-the-art real-time object detection system that can detect over 9000 object categories in real time using a novel multi-scale training method, offering an easy tradeoff between speed and accuracy.

...read moreread less

Abstract: We introduce YOLO9000, a state-of-the-art, real-time object detection system that can detect over 9000 object categories. First we propose various improvements to the YOLO detection method, both novel and drawn from prior work. The improved model, YOLOv2, is state-of-the-art on standard detection tasks like PASCAL VOC and COCO. Using a novel, multi-scale training method the same YOLOv2 model can run at varying sizes, offering an easy tradeoff between speed and accuracy. At 67 FPS, YOLOv2 gets 76.8 mAP on VOC 2007. At 40 FPS, YOLOv2 gets 78.6 mAP, outperforming state-of-the-art methods like Faster RCNN with ResNet and SSD while still running significantly faster. Finally we propose a method to jointly train on object detection and classification. Using this method we train YOLO9000 simultaneously on the COCO detection dataset and the ImageNet classification dataset. Our joint training allows YOLO9000 to predict detections for object classes that dont have labelled detection data. We validate our approach on the ImageNet detection task. YOLO9000 gets 19.7 mAP on the ImageNet detection validation set despite only having detection data for 44 of the 200 classes. On the 156 classes not in COCO, YOLO9000 gets 16.0 mAP. YOLO9000 predicts detections for more than 9000 different object categories, all in real-time.

...read moreread less

Posted Content•

Proximal Policy Optimization Algorithms

[...]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov - Show less +1 more

20 Jul 2017-arXiv: Learning

TL;DR: A new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent, are proposed.

...read moreread less

Abstract: We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. Whereas standard policy gradient methods perform one gradient update per data sample, we propose a novel objective function that enables multiple epochs of minibatch updates. The new methods, which we call proximal policy optimization (PPO), have some of the benefits of trust region policy optimization (TRPO), but they are much simpler to implement, more general, and have better sample complexity (empirically). Our experiments test PPO on a collection of benchmark tasks, including simulated robotic locomotion and Atari game playing, and we show that PPO outperforms other online policy gradient methods, and overall strikes a favorable balance between sample complexity, simplicity, and wall-time.

...read moreread less

Posted Content•

MobileNetV2: Inverted Residuals and Linear Bottlenecks

[...]

Mark Sandler¹, Andrew Howard¹, Menglong Zhu¹, Andrey Zhmoginov¹, Liang-Chieh Chen¹ - Show less +1 more•Institutions (1)

Google¹

13 Jan 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: A new mobile architecture, MobileNetV2, is described that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes and allows decoupling of the input/output domains from the expressiveness of the transformation.

...read moreread less

Abstract: In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes. We also describe efficient ways of applying these mobile models to object detection in a novel framework we call SSDLite. Additionally, we demonstrate how to build mobile semantic segmentation models through a reduced form of DeepLabv3 which we call Mobile DeepLabv3. The MobileNetV2 architecture is based on an inverted residual structure where the input and output of the residual block are thin bottleneck layers opposite to traditional residual models which use expanded representations in the input an MobileNetV2 uses lightweight depthwise convolutions to filter features in the intermediate expansion layer. Additionally, we find that it is important to remove non-linearities in the narrow layers in order to maintain representational power. We demonstrate that this improves performance and provide an intuition that led to this design. Finally, our approach allows decoupling of the input/output domains from the expressiveness of the transformation, which provides a convenient framework for further analysis. We measure our performance on Imagenet classification, COCO object detection, VOC image segmentation. We evaluate the trade-offs between accuracy, and number of operations measured by multiply-adds (MAdd), as well as the number of parameters

...read moreread less

Journal Article•DOI•

Analysis of protein-coding genetic variation in 60,706 humans

[...]

Monkol Lek, Konrad J. Karczewski¹, Konrad J. Karczewski², Eric Vallabh Minikel¹, Eric Vallabh Minikel², Kaitlin E. Samocha, Eric Banks², Timothy Fennell², Anne H. O’Donnell-Luria³, Anne H. O’Donnell-Luria², Anne H. O’Donnell-Luria¹, James S. Ware, Andrew J. Hill², Andrew J. Hill¹, Andrew J. Hill⁴, Beryl B. Cummings², Beryl B. Cummings¹, Taru Tukiainen², Taru Tukiainen¹, Daniel P. Birnbaum², Jack A. Kosmicki, Laramie E. Duncan¹, Laramie E. Duncan², Karol Estrada¹, Karol Estrada², Fengmei Zhao², Fengmei Zhao¹, James Zou², Emma Pierce-Hoffman¹, Emma Pierce-Hoffman², Joanne Berghout⁵, David Neil Cooper⁶, Nicole A. Deflaux⁷, Mark A. DePristo², Ron Do, Jason Flannick¹, Jason Flannick², Menachem Fromer, Laura D. Gauthier², Jackie Goldstein¹, Jackie Goldstein², Namrata Gupta², Daniel P. Howrigan¹, Daniel P. Howrigan², Adam Kiezun², Mitja I. Kurki², Mitja I. Kurki¹, Ami Levy Moonshine², Pradeep Natarajan, Lorena Orozco, Gina M. Peloso¹, Gina M. Peloso², Ryan Poplin², Manuel A. Rivas², Valentin Ruano-Rubio², Samuel A. Rose², Douglas M. Ruderfer⁸, Khalid Shakir², Peter D. Stenson⁶, Christine Stevens², Brett Thomas², Brett Thomas¹, Grace Tiao², María Teresa Tusié-Luna, Ben Weisburd², Hong-Hee Won⁹, Dongmei Yu, David Altshuler¹⁰, David Altshuler², Diego Ardissino, Michael Boehnke¹¹, John Danesh¹², Stacey Donnelly², Roberto Elosua, Jose C. Florez², Jose C. Florez¹, Stacey Gabriel², Gad Getz², Gad Getz¹, Stephen J. Glatt¹³, Christina M. Hultman¹⁴, Sekar Kathiresan, Markku Laakso¹⁵, Steven A. McCarroll¹, Steven A. McCarroll², Mark I. McCarthy¹⁶, Mark I. McCarthy¹⁷, Dermot P.B. McGovern¹⁸, Ruth McPherson¹⁹, Benjamin M. Neale¹, Benjamin M. Neale², Aarno Palotie, Shaun Purcell⁸, Danish Saleheen²⁰, Jeremiah M. Scharf, Pamela Sklar, Patrick F. Sullivan¹⁴, Patrick F. Sullivan²¹, Jaakko Tuomilehto²², Ming T. Tsuang²³, Hugh Watkins¹⁷, Hugh Watkins¹⁶, James G. Wilson²⁴, Mark J. Daly¹, Mark J. Daly², Daniel G. MacArthur², Daniel G. MacArthur¹ - Show less +103 more•Institutions (24)

18 Aug 2016-Nature

TL;DR: The aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC) provides direct evidence for the presence of widespread mutational recurrence.

...read moreread less

Abstract: Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.

...read moreread less

Journal Article•DOI•

A survey on deep learning in medical image analysis

[...]

Geert Litjens¹, Thijs Kooi¹, Babak Ehteshami Bejnordi¹, Arnaud Arindra Adiyoso Setio¹, Francesco Ciompi¹, Mohsen Ghafoorian¹, Jeroen van der Laak¹, Bram van Ginneken¹, Clara I. Sánchez¹ - Show less +5 more•Institutions (1)

Radboud University Nijmegen¹

01 Dec 2017-Medical Image Analysis

TL;DR: This paper reviews the major deep learning concepts pertinent to medical image analysis and summarizes over 300 contributions to the field, most of which appeared in the last year, to survey the use of deep learning for image classification, object detection, segmentation, registration, and other tasks.

...read moreread less

Posted Content•

YOLO9000: Better, Faster, Stronger

[...]

Joseph Redmon¹, Ali Farhadi¹•Institutions (1)

University of Washington¹

25 Dec 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: YOLO9000, a state-of-the-art, real-time object detection system that can detect over 9000 object categories, is introduced and a method to jointly train on object detection and classification is proposed, both novel and drawn from prior work.

...read moreread less

Abstract: We introduce YOLO9000, a state-of-the-art, real-time object detection system that can detect over 9000 object categories. First we propose various improvements to the YOLO detection method, both novel and drawn from prior work. The improved model, YOLOv2, is state-of-the-art on standard detection tasks like PASCAL VOC and COCO. At 67 FPS, YOLOv2 gets 76.8 mAP on VOC 2007. At 40 FPS, YOLOv2 gets 78.6 mAP, outperforming state-of-the-art methods like Faster RCNN with ResNet and SSD while still running significantly faster. Finally we propose a method to jointly train on object detection and classification. Using this method we train YOLO9000 simultaneously on the COCO detection dataset and the ImageNet classification dataset. Our joint training allows YOLO9000 to predict detections for object classes that don't have labelled detection data. We validate our approach on the ImageNet detection task. YOLO9000 gets 19.7 mAP on the ImageNet detection validation set despite only having detection data for 44 of the 200 classes. On the 156 classes not in COCO, YOLO9000 gets 16.0 mAP. But YOLO can detect more than just 200 classes; it predicts detections for more than 9000 different object categories. And it still runs in real-time.

...read moreread less

Book Chapter•DOI•

Time, clocks, and the ordering of events in a distributed system

[...]

Leslie Lamport¹•Institutions (1)

CA Technologies¹

04 Oct 2019-Concurrency and Computation: Practice and Experience

TL;DR: In this paper, the concept of one event happening before another in a distributed system is examined, and a distributed algorithm is given for synchronizing a system of logical clocks which can be used to totally order the events.

...read moreread less

Abstract: The concept of one event happening before another in a distributed system is examined, and is shown to define a partial ordering of the events. A distributed algorithm is given for synchronizing a system of logical clocks which can be used to totally order the events. The use of the total ordering is illustrated with a method for solving synchronization problems. The algorithm is then specialized for synchronizing physical clocks, and a bound is derived on how far out of synchrony the clocks can become.

...read moreread less

Journal Article•DOI•

Gaia Data Release 2. Summary of the contents and survey properties

[...]

Anthony G. A. Brown¹, Antonella Vallenari¹, T. Prusti¹, J. H. J. de Bruijne¹ +449 more•Institutions (1)

10 Aug 2018-Astronomy and Astrophysics

TL;DR: The second Gaia data release, Gaia DR2 as mentioned in this paper, is a major advance with respect to Gaia DR1 in terms of completeness, performance, and richness of the data products.

...read moreread less

Abstract: Context. We present the second Gaia data release, Gaia DR2, consisting of astrometry, photometry, radial velocities, and information on astrophysical parameters and variability, for sources brighter than magnitude 21. In addition epoch astrometry and photometry are provided for a modest sample of minor planets in the solar system. Aims: A summary of the contents of Gaia DR2 is presented, accompanied by a discussion on the differences with respect to Gaia DR1 and an overview of the main limitations which are still present in the survey. Recommendations are made on the responsible use of Gaia DR2 results. Methods: The raw data collected with the Gaia instruments during the first 22 months of the mission have been processed by the Gaia Data Processing and Analysis Consortium (DPAC) and turned into this second data release, which represents a major advance with respect to Gaia DR1 in terms of completeness, performance, and richness of the data products. Results: Gaia DR2 contains celestial positions and the apparent brightness in G for approximately 1.7 billion sources. For 1.3 billion of those sources, parallaxes and proper motions are in addition available. The sample of sources for which variability information is provided is expanded to 0.5 million stars. This data release contains four new elements: broad-band colour information in the form of the apparent brightness in the GBP (330-680 nm) and GRP (630-1050 nm) bands is available for 1.4 billion sources; median radial velocities for some 7 million sources are presented; for between 77 and 161 million sources estimates are provided of the stellar effective temperature, extinction, reddening, and radius and luminosity; and for a pre-selected list of 14 000 minor planets in the solar system epoch astrometry and photometry are presented. Finally, Gaia DR2 also represents a new materialisation of the celestial reference frame in the optical, the Gaia-CRF2, which is the first optical reference frame based solely on extragalactic sources. There are notable changes in the photometric system and the catalogue source list with respect to Gaia DR1, and we stress the need to consider the two data releases as independent. Conclusions: Gaia DR2 represents a major achievement for the Gaia mission, delivering on the long standing promise to provide parallaxes and proper motions for over 1 billion stars, and representing a first step in the availability of complementary radial velocity and source astrophysical information for a sample of stars in the Gaia survey which covers a very substantial fraction of the volume of our galaxy.

...read moreread less

Proceedings Article•DOI•

FaceNet: A unified embedding for face recognition and clustering

[...]

Florian Schroff¹, Dmitry Kalenichenko¹, James Philbin¹•Institutions (1)

Google¹

07 Jun 2015

TL;DR: A system that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure offace similarity, and achieves state-of-the-art face recognition performance using only 128-bytes perface.

...read moreread less

Abstract: Despite significant recent advances in the field of face recognition [10, 14, 15, 17], implementing face verification and recognition efficiently at scale presents serious challenges to current approaches. In this paper we present a system, called FaceNet, that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure of face similarity. Once this space has been produced, tasks such as face recognition, verification and clustering can be easily implemented using standard techniques with FaceNet embeddings as feature vectors.

...read moreread less

Journal Article•DOI•

STRING v10: protein–protein interaction networks, integrated over the tree of life

[...]

Damian Szklarczyk¹, Andrea Franceschini¹, Stefan Wyder¹, Kristoffer Forslund, Davide Heller¹, Jaime Huerta-Cepas, Milan Simonovic¹, Alexander Roth¹, Alberto Santos², Kalliopi Tsafou², Michael Kuhn³, Peer Bork, Lars Juhl Jensen², Christian von Mering¹ - Show less +10 more•Institutions (3)

Swiss Institute of Bioinformatics¹, University of Copenhagen², Dresden University of Technology³

28 Jan 2015-Nucleic Acids Research

TL;DR: H hierarchical and self-consistent orthology annotations are introduced for all interacting proteins, grouping the proteins into families at various levels of phylogenetic resolution in the STRING database.

...read moreread less

Abstract: The many functional partnerships and interactions that occur between proteins are at the core of cellular processing and their systematic characterization helps to provide context in molecular systems biology. However, known and predicted interactions are scattered over multiple resources, and the available data exhibit notable differences in terms of quality and completeness. The STRING database (http://string-db.org) aims to provide a critical assessment and integration of protein-protein interactions, including direct (physical) as well as indirect (functional) associations. The new version 10.0 of STRING covers more than 2000 organisms, which has necessitated novel, scalable algorithms for transferring interaction information between organisms. For this purpose, we have introduced hierarchical and self-consistent orthology annotations for all interacting proteins, grouping the proteins into families at various levels of phylogenetic resolution. Further improvements in version 10.0 include a completely redesigned prediction pipeline for inferring protein-protein associations from co-expression data, an API interface for the R computing environment and improved statistical analysis for enrichment tests in user-provided networks.

...read moreread less

Proceedings Article•DOI•

Non-local Neural Networks

[...]

Xiaolong Wang¹, Ross Girshick¹, Abhinav Gupta², Kaiming He¹•Institutions (2)

Facebook¹, Carnegie Mellon University²

18 Jun 2018

TL;DR: In this article, the non-local operation computes the response at a position as a weighted sum of the features at all positions, which can be used to capture long-range dependencies.

...read moreread less

Abstract: Both convolutional and recurrent operations are building blocks that process one local neighborhood at a time. In this paper, we present non-local operations as a generic family of building blocks for capturing long-range dependencies. Inspired by the classical non-local means method [4] in computer vision, our non-local operation computes the response at a position as a weighted sum of the features at all positions. This building block can be plugged into many computer vision architectures. On the task of video classification, even without any bells and whistles, our nonlocal models can compete or outperform current competition winners on both Kinetics and Charades datasets. In static image recognition, our non-local models improve object detection/segmentation and pose estimation on the COCO suite of tasks. Code will be made available.

...read moreread less

Proceedings Article•DOI•

Effective Approaches to Attention-based Neural Machine Translation

[...]

Minh-Thang Luong¹, Hieu Pham¹, Christopher D. Manning¹•Institutions (1)

Stanford University¹

17 Aug 2015

TL;DR: A global approach which always attends to all source words and a local one that only looks at a subset of source words at a time are examined, demonstrating the effectiveness of both approaches on the WMT translation tasks between English and German in both directions.

...read moreread less

Abstract: An attentional mechanism has lately been used to improve neural machine translation (NMT) by selectively focusing on parts of the source sentence during translation. However, there has been little work exploring useful architectures for attention-based NMT. This paper examines two simple and effective classes of attentional mechanism: a global approach which always attends to all source words and a local one that only looks at a subset of source words at a time. We demonstrate the effectiveness of both approaches on the WMT translation tasks between English and German in both directions. With local attention, we achieve a significant gain of 5.0 BLEU points over non-attentional systems that already incorporate known techniques such as dropout. Our ensemble model using different attention architectures yields a new state-of-the-art result in the WMT’15 English to German translation task with 25.9 BLEU points, an improvement of 1.0 BLEU points over the existing best system backed by NMT and an n-gram reranker. 1

...read moreread less

Collapse