scispace - formally typeset
Search or ask a question
Browse all papers

Posted Content
TL;DR: The TensorFlow interface and an implementation of that interface that is built at Google are described, which has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields.
Abstract: TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery. This paper describes the TensorFlow interface and an implementation of that interface that we have built at Google. The TensorFlow API and a reference implementation were released as an open-source package under the Apache 2.0 license in November, 2015 and are available at www.tensorflow.org.

10,447 citations


Proceedings ArticleDOI
François Chollet1
21 Jul 2017
TL;DR: This work proposes a novel deep convolutional neural network architecture inspired by Inception, where Inception modules have been replaced with depthwise separable convolutions, and shows that this architecture, dubbed Xception, slightly outperforms Inception V3 on the ImageNet dataset, and significantly outperforms it on a larger image classification dataset.
Abstract: We present an interpretation of Inception modules in convolutional neural networks as being an intermediate step in-between regular convolution and the depthwise separable convolution operation (a depthwise convolution followed by a pointwise convolution). In this light, a depthwise separable convolution can be understood as an Inception module with a maximally large number of towers. This observation leads us to propose a novel deep convolutional neural network architecture inspired by Inception, where Inception modules have been replaced with depthwise separable convolutions. We show that this architecture, dubbed Xception, slightly outperforms Inception V3 on the ImageNet dataset (which Inception V3 was designed for), and significantly outperforms Inception V3 on a larger image classification dataset comprising 350 million images and 17,000 classes. Since the Xception architecture has the same number of parameters as Inception V3, the performance gains are not due to increased capacity but rather to a more efficient use of model parameters.

10,422 citations


Journal ArticleDOI
TL;DR: The Global Burden of Diseases, Injuries, and Risk Factors Study 2016 (GBD 2016) provides a comprehensive assessment of prevalence, incidence, and years lived with disability (YLDs) for 328 causes in 195 countries and territories from 1990 to 2016.

10,401 citations


Book
28 Apr 2021
TL;DR: In this article, the authors proposed a two-way error component regression model for estimating the likelihood of a particular item in a set of data points in a single-dimensional graph.
Abstract: Preface.1. Introduction.1.1 Panel Data: Some Examples.1.2 Why Should We Use Panel Data? Their Benefits and Limitations.Note.2. The One-way Error Component Regression Model.2.1 Introduction.2.2 The Fixed Effects Model.2.3 The Random Effects Model.2.4 Maximum Likelihood Estimation.2.5 Prediction.2.6 Examples.2.7 Selected Applications.2.8 Computational Note.Notes.Problems.3. The Two-way Error Component Regression Model.3.1 Introduction.3.2 The Fixed Effects Model.3.3 The Random Effects Model.3.4 Maximum Likelihood Estimation.3.5 Prediction.3.6 Examples.3.7 Selected Applications.Notes.Problems.4. Test of Hypotheses with Panel Data.4.1 Tests for Poolability of the Data.4.2 Tests for Individual and Time Effects.4.3 Hausman's Specification Test.4.4 Further Reading.Notes.Problems.5. Heteroskedasticity and Serial Correlation in the Error Component Model.5.1 Heteroskedasticity.5.2 Serial Correlation.Notes.Problems.6. Seemingly Unrelated Regressions with Error Components.6.1 The One-way Model.6.2 The Two-way Model.6.3 Applications and Extensions.Problems.7. Simultaneous Equations with Error Components.7.1 Single Equation Estimation.7.2 Empirical Example: Crime in North Carolina.7.3 System Estimation.7.4 The Hausman and Taylor Estimator.7.5 Empirical Example: Earnings Equation Using PSID Data.7.6 Extensions.Notes.Problems.8. Dynamic Panel Data Models.8.1 Introduction.8.2 The Arellano and Bond Estimator.8.3 The Arellano and Bover Estimator.8.4 The Ahn and Schmidt Moment Conditions.8.5 The Blundell and Bond System GMM Estimator.8.6 The Keane and Runkle Estimator.8.7 Further Developments.8.8 Empirical Example: Dynamic Demand for Cigarettes.8.9 Further Reading.Notes.Problems.9. Unbalanced Panel Data Models.9.1 Introduction.9.2 The Unbalanced One-way Error Component Model.9.3 Empirical Example: Hedonic Housing.9.4 The Unbalanced Two-way Error Component Model.9.5 Testing for Individual and Time Effects Using Unbalanced Panel Data.9.6 The Unbalanced Nested Error Component Model.Notes.Problems.10. Special Topics.10.1 Measurement Error and Panel Data.10.2 Rotating Panels.10.3 Pseudo-panels.10.4 Alternative Methods of Pooling Time Series of Cross-section Data.10.5 Spatial Panels.10.6 Short-run vs Long-run Estimates in Pooled Models.10.7 Heterogeneous Panels.Notes.Problems.11. Limited Dependent Variables and Panel Data.11.1 Fixed and Random Logit and Probit Models.11.2 Simulation Estimation of Limited Dependent Variable Models with Panel Data.11.3 Dynamic Panel Data Limited Dependent Variable Models.11.4 Selection Bias in Panel Data.11.5 Censored and Truncated Panel Data Models.11.6 Empirical Applications.11.7 Empirical Example: Nurses' Labor Supply.11.8 Further Reading.Notes.Problems.12. Nonstationary Panels.12.1 Introduction.12.2 Panel Unit Roots Tests Assuming Cross-sectional Independence.12.3 Panel Unit Roots Tests Allowing for Cross-sectional Dependence.12.4 Spurious Regression in Panel Data.12.5 Panel Cointegration Tests.12.6 Estimation and Inference in Panel Cointegration Models.12.7 Empirical Example: Purchasing Power Parity.12.8 Further Reading.Notes.Problems.References.Index.

10,363 citations


Journal ArticleDOI
TL;DR: A two-dose regimen of BNT162b2 conferred 95% protection against Covid-19 in persons 16 years of age or older and safety over a median of 2 months was similar to that of other viral vaccines.
Abstract: Background Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection and the resulting coronavirus disease 2019 (Covid-19) have afflicted tens of millions of people in a world...

10,274 citations


Journal ArticleDOI
TL;DR: The data further characterize the ultrastructural analysis of the KD mouse model, and support recent theories of a dying-back mechanism for neuronal degeneration, which is independent of demyelination.
Abstract: Krabbe disease (KD) is a neurodegenerative disorder caused by the lack of β- galactosylceramidase enzymatic activity and by widespread accumulation of the cytotoxic galactosyl-sphingosine in neuronal, myelinating and endothelial cells. Despite the wide use of Twitcher mice as experimental model for KD, the ultrastructure of this model is partial and mainly addressing peripheral nerves. More details are requested to elucidate the basis of the motor defects, which are the first to appear during KD onset. Here we use transmission electron microscopy (TEM) to focus on the alterations produced by KD in the lower motor system at postnatal day 15 (P15), a nearly asymptomatic stage, and in the juvenile P30 mouse. We find mild effects on motorneuron soma, severe ones on sciatic nerves and very severe effects on nerve terminals and neuromuscular junctions at P30, with peripheral damage being already detectable at P15. Finally, we find that the gastrocnemius muscle undergoes atrophy and structural changes that are independent of denervation at P15. Our data further characterize the ultrastructural analysis of the KD mouse model, and support recent theories of a dying-back mechanism for neuronal degeneration, which is independent of demyelination.

10,233 citations


Proceedings ArticleDOI
21 Jul 2017
TL;DR: This paper exploits the capability of global context information by different-region-based context aggregation through the pyramid pooling module together with the proposed pyramid scene parsing network (PSPNet) to produce good quality results on the scene parsing task.
Abstract: Scene parsing is challenging for unrestricted open vocabulary and diverse scenes. In this paper, we exploit the capability of global context information by different-region-based context aggregation through our pyramid pooling module together with the proposed pyramid scene parsing network (PSPNet). Our global prior representation is effective to produce good quality results on the scene parsing task, while PSPNet provides a superior framework for pixel-level prediction. The proposed approach achieves state-of-the-art performance on various datasets. It came first in ImageNet scene parsing challenge 2016, PASCAL VOC 2012 benchmark and Cityscapes benchmark. A single PSPNet yields the new record of mIoU accuracy 85.4% on PASCAL VOC 2012 and accuracy 80.2% on Cityscapes.

10,189 citations


Proceedings Article
28 May 2020
TL;DR: GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.
Abstract: Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.

10,132 citations


Posted Content
TL;DR: DeepLab as discussed by the authors proposes atrous spatial pyramid pooling (ASPP) to segment objects at multiple scales by probing an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views.
Abstract: In this work we address the task of semantic image segmentation with Deep Learning and make three main contributions that are experimentally shown to have substantial practical merit. First, we highlight convolution with upsampled filters, or 'atrous convolution', as a powerful tool in dense prediction tasks. Atrous convolution allows us to explicitly control the resolution at which feature responses are computed within Deep Convolutional Neural Networks. It also allows us to effectively enlarge the field of view of filters to incorporate larger context without increasing the number of parameters or the amount of computation. Second, we propose atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP probes an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views, thus capturing objects as well as image context at multiple scales. Third, we improve the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models. The commonly deployed combination of max-pooling and downsampling in DCNNs achieves invariance but has a toll on localization accuracy. We overcome this by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF), which is shown both qualitatively and quantitatively to improve localization performance. Our proposed "DeepLab" system sets the new state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching 79.7% mIOU in the test set, and advances the results on three other datasets: PASCAL-Context, PASCAL-Person-Part, and Cityscapes. All of our code is made publicly available online.

10,120 citations


Proceedings Article
01 Jan 2019
TL;DR: This paper details the principles that drove the implementation of PyTorch and how they are reflected in its architecture, and explains how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance.
Abstract: Deep learning frameworks have often focused on either usability or speed, but not both. PyTorch is a machine learning library that shows that these two goals are in fact compatible: it was designed from first principles to support an imperative and Pythonic programming style that supports code as a model, makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting hardware accelerators such as GPUs. In this paper, we detail the principles that drove the implementation of PyTorch and how they are reflected in its architecture. We emphasize that every aspect of PyTorch is a regular Python program under the full control of its user. We also explain how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance. We demonstrate the efficiency of individual subsystems, as well as the overall speed of PyTorch on several commonly used benchmarks.

10,045 citations


Journal ArticleDOI
TL;DR: The process of conducting a thematic analysis is illustrated through the presentation of an auditable decision trail, guiding interpreting and representing textual data and exploring issues of rigor and trustworthiness.
Abstract: As qualitative research becomes increasingly recognized and valued, it is imperative that it is conducted in a rigorous and methodical manner to yield meaningful and useful results. To be accepted ...

Journal ArticleDOI
TL;DR: In this paper, the authors present results based on full-mission Planck observations of temperature and polarization anisotropies of the CMB, which are consistent with the six-parameter inflationary LCDM cosmology.
Abstract: We present results based on full-mission Planck observations of temperature and polarization anisotropies of the CMB. These data are consistent with the six-parameter inflationary LCDM cosmology. From the Planck temperature and lensing data, for this cosmology we find a Hubble constant, H0= (67.8 +/- 0.9) km/s/Mpc, a matter density parameter Omega_m = 0.308 +/- 0.012 and a scalar spectral index with n_s = 0.968 +/- 0.006. (We quote 68% errors on measured parameters and 95% limits on other parameters.) Combined with Planck temperature and lensing data, Planck LFI polarization measurements lead to a reionization optical depth of tau = 0.066 +/- 0.016. Combining Planck with other astrophysical data we find N_ eff = 3.15 +/- 0.23 for the effective number of relativistic degrees of freedom and the sum of neutrino masses is constrained to < 0.23 eV. Spatial curvature is found to be |Omega_K| < 0.005. For LCDM we find a limit on the tensor-to-scalar ratio of r <0.11 consistent with the B-mode constraints from an analysis of BICEP2, Keck Array, and Planck (BKP) data. Adding the BKP data leads to a tighter constraint of r < 0.09. We find no evidence for isocurvature perturbations or cosmic defects. The equation of state of dark energy is constrained to w = -1.006 +/- 0.045. Standard big bang nucleosynthesis predictions for the Planck LCDM cosmology are in excellent agreement with observations. We investigate annihilating dark matter and deviations from standard recombination, finding no evidence for new physics. The Planck results for base LCDM are in agreement with BAO data and with the JLA SNe sample. However the amplitude of the fluctuations is found to be higher than inferred from rich cluster counts and weak gravitational lensing. Apart from these tensions, the base LCDM cosmology provides an excellent description of the Planck CMB observations and many other astrophysical data sets.

Journal ArticleDOI
TL;DR: In the United States, the cancer death rate has dropped continuously from its peak in 1991 through 2018, for a total decline of 31%, because of reductions in smoking and improvements in early detection and treatment as mentioned in this paper.
Abstract: Each year, the American Cancer Society estimates the numbers of new cancer cases and deaths in the United States and compiles the most recent data on population-based cancer occurrence. Incidence data (through 2017) were collected by the Surveillance, Epidemiology, and End Results Program; the National Program of Cancer Registries; and the North American Association of Central Cancer Registries. Mortality data (through 2018) were collected by the National Center for Health Statistics. In 2021, 1,898,160 new cancer cases and 608,570 cancer deaths are projected to occur in the United States. After increasing for most of the 20th century, the cancer death rate has fallen continuously from its peak in 1991 through 2018, for a total decline of 31%, because of reductions in smoking and improvements in early detection and treatment. This translates to 3.2 million fewer cancer deaths than would have occurred if peak rates had persisted. Long-term declines in mortality for the 4 leading cancers have halted for prostate cancer and slowed for breast and colorectal cancers, but accelerated for lung cancer, which accounted for almost one-half of the total mortality decline from 2014 to 2018. The pace of the annual decline in lung cancer mortality doubled from 3.1% during 2009 through 2013 to 5.5% during 2014 through 2018 in men, from 1.8% to 4.4% in women, and from 2.4% to 5% overall. This trend coincides with steady declines in incidence (2.2%-2.3%) but rapid gains in survival specifically for nonsmall cell lung cancer (NSCLC). For example, NSCLC 2-year relative survival increased from 34% for persons diagnosed during 2009 through 2010 to 42% during 2015 through 2016, including absolute increases of 5% to 6% for every stage of diagnosis; survival for small cell lung cancer remained at 14% to 15%. Improved treatment accelerated progress against lung cancer and drove a record drop in overall cancer mortality, despite slowing momentum for other common cancers.

Journal ArticleDOI
B. P. Abbott1, Richard J. Abbott1, T. D. Abbott2, Matthew Abernathy1  +1008 moreInstitutions (96)
TL;DR: This is the first direct detection of gravitational waves and the first observation of a binary black hole merger, and these observations demonstrate the existence of binary stellar-mass black hole systems.
Abstract: On September 14, 2015 at 09:50:45 UTC the two detectors of the Laser Interferometer Gravitational-Wave Observatory simultaneously observed a transient gravitational-wave signal. The signal sweeps upwards in frequency from 35 to 250 Hz with a peak gravitational-wave strain of $1.0 \times 10^{-21}$. It matches the waveform predicted by general relativity for the inspiral and merger of a pair of black holes and the ringdown of the resulting single black hole. The signal was observed with a matched-filter signal-to-noise ratio of 24 and a false alarm rate estimated to be less than 1 event per 203 000 years, equivalent to a significance greater than 5.1 {\sigma}. The source lies at a luminosity distance of $410^{+160}_{-180}$ Mpc corresponding to a redshift $z = 0.09^{+0.03}_{-0.04}$. In the source frame, the initial black hole masses are $36^{+5}_{-4} M_\odot$ and $29^{+4}_{-4} M_\odot$, and the final black hole mass is $62^{+4}_{-4} M_\odot$, with $3.0^{+0.5}_{-0.5} M_\odot c^2$ radiated in gravitational waves. All uncertainties define 90% credible intervals.These observations demonstrate the existence of binary stellar-mass black hole systems. This is the first direct detection of gravitational waves and the first observation of a binary black hole merger.

Proceedings ArticleDOI
21 Jul 2017
TL;DR: This paper designs a novel type of neural network that directly consumes point clouds, which well respects the permutation invariance of points in the input and provides a unified architecture for applications ranging from object classification, part segmentation, to scene semantic parsing.
Abstract: Point cloud is an important type of geometric data structure. Due to its irregular format, most researchers transform such data to regular 3D voxel grids or collections of images. This, however, renders data unnecessarily voluminous and causes issues. In this paper, we design a novel type of neural network that directly consumes point clouds, which well respects the permutation invariance of points in the input. Our network, named PointNet, provides a unified architecture for applications ranging from object classification, part segmentation, to scene semantic parsing. Though simple, PointNet is highly efficient and effective. Empirically, it shows strong performance on par or even better than state of the art. Theoretically, we provide analysis towards understanding of what the network has learnt and why the network is robust with respect to input perturbation and corruption.

Proceedings ArticleDOI
Mark Sandler1, Andrew Howard1, Menglong Zhu1, Andrey Zhmoginov1, Liang-Chieh Chen1 
18 Jun 2018
TL;DR: MobileNetV2 as mentioned in this paper is based on an inverted residual structure where the shortcut connections are between the thin bottleneck layers and intermediate expansion layer uses lightweight depthwise convolutions to filter features as a source of non-linearity.
Abstract: In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes. We also describe efficient ways of applying these mobile models to object detection in a novel framework we call SSDLite. Additionally, we demonstrate how to build mobile semantic segmentation models through a reduced form of DeepLabv3 which we call Mobile DeepLabv3. is based on an inverted residual structure where the shortcut connections are between the thin bottleneck layers. The intermediate expansion layer uses lightweight depthwise convolutions to filter features as a source of non-linearity. Additionally, we find that it is important to remove non-linearities in the narrow layers in order to maintain representational power. We demonstrate that this improves performance and provide an intuition that led to this design. Finally, our approach allows decoupling of the input/output domains from the expressiveness of the transformation, which provides a convenient framework for further analysis. We measure our performance on ImageNet [1] classification, COCO object detection [2], VOC image segmentation [3]. We evaluate the trade-offs between accuracy, and number of operations measured by multiply-adds (MAdd), as well as actual latency, and the number of parameters.

Journal ArticleDOI
02 Jan 2015-BMJ
TL;DR: The PRISMA-P checklist as mentioned in this paper provides 17 items considered to be essential and minimum components of a systematic review or meta-analysis protocol, as well as a model example from an existing published protocol.
Abstract: Protocols of systematic reviews and meta-analyses allow for planning and documentation of review methods, act as a guard against arbitrary decision making during review conduct, enable readers to assess for the presence of selective reporting against completed reviews, and, when made publicly available, reduce duplication of efforts and potentially prompt collaboration. Evidence documenting the existence of selective reporting and excessive duplication of reviews on the same or similar topics is accumulating and many calls have been made in support of the documentation and public availability of review protocols. Several efforts have emerged in recent years to rectify these problems, including development of an international register for prospective reviews (PROSPERO) and launch of the first open access journal dedicated to the exclusive publication of systematic review products, including protocols (BioMed Central's Systematic Reviews). Furthering these efforts and building on the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-analyses) guidelines, an international group of experts has created a guideline to improve the transparency, accuracy, completeness, and frequency of documented systematic review and meta-analysis protocols--PRISMA-P (for protocols) 2015. The PRISMA-P checklist contains 17 items considered to be essential and minimum components of a systematic review or meta-analysis protocol.This PRISMA-P 2015 Explanation and Elaboration paper provides readers with a full understanding of and evidence about the necessity of each item as well as a model example from an existing published protocol. This paper should be read together with the PRISMA-P 2015 statement. Systematic review authors and assessors are strongly encouraged to make use of PRISMA-P when drafting and appraising review protocols.

Journal ArticleDOI
28 Aug 2019-BMJ
TL;DR: The Cochrane risk-of-bias tool has been updated to respond to developments in understanding how bias arises in randomised trials, and to address user feedback on and limitations of the original tool.
Abstract: Assessment of risk of bias is regarded as an essential component of a systematic review on the effects of an intervention. The most commonly used tool for randomised trials is the Cochrane risk-of-bias tool. We updated the tool to respond to developments in understanding how bias arises in randomised trials, and to address user feedback on and limitations of the original tool.

Proceedings ArticleDOI
21 Jul 2017
TL;DR: YOLO9000 as discussed by the authors is a state-of-the-art real-time object detection system that can detect over 9000 object categories in real time using a novel multi-scale training method, offering an easy tradeoff between speed and accuracy.
Abstract: We introduce YOLO9000, a state-of-the-art, real-time object detection system that can detect over 9000 object categories. First we propose various improvements to the YOLO detection method, both novel and drawn from prior work. The improved model, YOLOv2, is state-of-the-art on standard detection tasks like PASCAL VOC and COCO. Using a novel, multi-scale training method the same YOLOv2 model can run at varying sizes, offering an easy tradeoff between speed and accuracy. At 67 FPS, YOLOv2 gets 76.8 mAP on VOC 2007. At 40 FPS, YOLOv2 gets 78.6 mAP, outperforming state-of-the-art methods like Faster RCNN with ResNet and SSD while still running significantly faster. Finally we propose a method to jointly train on object detection and classification. Using this method we train YOLO9000 simultaneously on the COCO detection dataset and the ImageNet classification dataset. Our joint training allows YOLO9000 to predict detections for object classes that dont have labelled detection data. We validate our approach on the ImageNet detection task. YOLO9000 gets 19.7 mAP on the ImageNet detection validation set despite only having detection data for 44 of the 200 classes. On the 156 classes not in COCO, YOLO9000 gets 16.0 mAP. YOLO9000 predicts detections for more than 9000 different object categories, all in real-time.

Posted Content
TL;DR: A new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent, are proposed.
Abstract: We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. Whereas standard policy gradient methods perform one gradient update per data sample, we propose a novel objective function that enables multiple epochs of minibatch updates. The new methods, which we call proximal policy optimization (PPO), have some of the benefits of trust region policy optimization (TRPO), but they are much simpler to implement, more general, and have better sample complexity (empirically). Our experiments test PPO on a collection of benchmark tasks, including simulated robotic locomotion and Atari game playing, and we show that PPO outperforms other online policy gradient methods, and overall strikes a favorable balance between sample complexity, simplicity, and wall-time.

Posted Content
Mark Sandler1, Andrew Howard1, Menglong Zhu1, Andrey Zhmoginov1, Liang-Chieh Chen1 
TL;DR: A new mobile architecture, MobileNetV2, is described that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes and allows decoupling of the input/output domains from the expressiveness of the transformation.
Abstract: In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes. We also describe efficient ways of applying these mobile models to object detection in a novel framework we call SSDLite. Additionally, we demonstrate how to build mobile semantic segmentation models through a reduced form of DeepLabv3 which we call Mobile DeepLabv3. The MobileNetV2 architecture is based on an inverted residual structure where the input and output of the residual block are thin bottleneck layers opposite to traditional residual models which use expanded representations in the input an MobileNetV2 uses lightweight depthwise convolutions to filter features in the intermediate expansion layer. Additionally, we find that it is important to remove non-linearities in the narrow layers in order to maintain representational power. We demonstrate that this improves performance and provide an intuition that led to this design. Finally, our approach allows decoupling of the input/output domains from the expressiveness of the transformation, which provides a convenient framework for further analysis. We measure our performance on Imagenet classification, COCO object detection, VOC image segmentation. We evaluate the trade-offs between accuracy, and number of operations measured by multiply-adds (MAdd), as well as the number of parameters

Journal ArticleDOI
Monkol Lek, Konrad J. Karczewski1, Konrad J. Karczewski2, Eric Vallabh Minikel1, Eric Vallabh Minikel2, Kaitlin E. Samocha, Eric Banks2, Timothy Fennell2, Anne H. O’Donnell-Luria3, Anne H. O’Donnell-Luria2, Anne H. O’Donnell-Luria1, James S. Ware, Andrew J. Hill2, Andrew J. Hill1, Andrew J. Hill4, Beryl B. Cummings2, Beryl B. Cummings1, Taru Tukiainen2, Taru Tukiainen1, Daniel P. Birnbaum2, Jack A. Kosmicki, Laramie E. Duncan1, Laramie E. Duncan2, Karol Estrada1, Karol Estrada2, Fengmei Zhao2, Fengmei Zhao1, James Zou2, Emma Pierce-Hoffman1, Emma Pierce-Hoffman2, Joanne Berghout5, David Neil Cooper6, Nicole A. Deflaux7, Mark A. DePristo2, Ron Do, Jason Flannick1, Jason Flannick2, Menachem Fromer, Laura D. Gauthier2, Jackie Goldstein1, Jackie Goldstein2, Namrata Gupta2, Daniel P. Howrigan1, Daniel P. Howrigan2, Adam Kiezun2, Mitja I. Kurki2, Mitja I. Kurki1, Ami Levy Moonshine2, Pradeep Natarajan, Lorena Orozco, Gina M. Peloso1, Gina M. Peloso2, Ryan Poplin2, Manuel A. Rivas2, Valentin Ruano-Rubio2, Samuel A. Rose2, Douglas M. Ruderfer8, Khalid Shakir2, Peter D. Stenson6, Christine Stevens2, Brett Thomas2, Brett Thomas1, Grace Tiao2, María Teresa Tusié-Luna, Ben Weisburd2, Hong-Hee Won9, Dongmei Yu, David Altshuler10, David Altshuler2, Diego Ardissino, Michael Boehnke11, John Danesh12, Stacey Donnelly2, Roberto Elosua, Jose C. Florez2, Jose C. Florez1, Stacey Gabriel2, Gad Getz2, Gad Getz1, Stephen J. Glatt13, Christina M. Hultman14, Sekar Kathiresan, Markku Laakso15, Steven A. McCarroll1, Steven A. McCarroll2, Mark I. McCarthy16, Mark I. McCarthy17, Dermot P.B. McGovern18, Ruth McPherson19, Benjamin M. Neale1, Benjamin M. Neale2, Aarno Palotie, Shaun Purcell8, Danish Saleheen20, Jeremiah M. Scharf, Pamela Sklar, Patrick F. Sullivan14, Patrick F. Sullivan21, Jaakko Tuomilehto22, Ming T. Tsuang23, Hugh Watkins17, Hugh Watkins16, James G. Wilson24, Mark J. Daly1, Mark J. Daly2, Daniel G. MacArthur2, Daniel G. MacArthur1 
18 Aug 2016-Nature
TL;DR: The aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC) provides direct evidence for the presence of widespread mutational recurrence.
Abstract: Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.

Journal ArticleDOI
TL;DR: This paper reviews the major deep learning concepts pertinent to medical image analysis and summarizes over 300 contributions to the field, most of which appeared in the last year, to survey the use of deep learning for image classification, object detection, segmentation, registration, and other tasks.

Posted Content
TL;DR: YOLO9000, a state-of-the-art, real-time object detection system that can detect over 9000 object categories, is introduced and a method to jointly train on object detection and classification is proposed, both novel and drawn from prior work.
Abstract: We introduce YOLO9000, a state-of-the-art, real-time object detection system that can detect over 9000 object categories. First we propose various improvements to the YOLO detection method, both novel and drawn from prior work. The improved model, YOLOv2, is state-of-the-art on standard detection tasks like PASCAL VOC and COCO. At 67 FPS, YOLOv2 gets 76.8 mAP on VOC 2007. At 40 FPS, YOLOv2 gets 78.6 mAP, outperforming state-of-the-art methods like Faster RCNN with ResNet and SSD while still running significantly faster. Finally we propose a method to jointly train on object detection and classification. Using this method we train YOLO9000 simultaneously on the COCO detection dataset and the ImageNet classification dataset. Our joint training allows YOLO9000 to predict detections for object classes that don't have labelled detection data. We validate our approach on the ImageNet detection task. YOLO9000 gets 19.7 mAP on the ImageNet detection validation set despite only having detection data for 44 of the 200 classes. On the 156 classes not in COCO, YOLO9000 gets 16.0 mAP. But YOLO can detect more than just 200 classes; it predicts detections for more than 9000 different object categories. And it still runs in real-time.

Book ChapterDOI
Leslie Lamport1
TL;DR: In this paper, the concept of one event happening before another in a distributed system is examined, and a distributed algorithm is given for synchronizing a system of logical clocks which can be used to totally order the events.
Abstract: The concept of one event happening before another in a distributed system is examined, and is shown to define a partial ordering of the events. A distributed algorithm is given for synchronizing a system of logical clocks which can be used to totally order the events. The use of the total ordering is illustrated with a method for solving synchronization problems. The algorithm is then specialized for synchronizing physical clocks, and a bound is derived on how far out of synchrony the clocks can become.

Journal ArticleDOI
TL;DR: The second Gaia data release, Gaia DR2 as mentioned in this paper, is a major advance with respect to Gaia DR1 in terms of completeness, performance, and richness of the data products.
Abstract: Context. We present the second Gaia data release, Gaia DR2, consisting of astrometry, photometry, radial velocities, and information on astrophysical parameters and variability, for sources brighter than magnitude 21. In addition epoch astrometry and photometry are provided for a modest sample of minor planets in the solar system. Aims: A summary of the contents of Gaia DR2 is presented, accompanied by a discussion on the differences with respect to Gaia DR1 and an overview of the main limitations which are still present in the survey. Recommendations are made on the responsible use of Gaia DR2 results. Methods: The raw data collected with the Gaia instruments during the first 22 months of the mission have been processed by the Gaia Data Processing and Analysis Consortium (DPAC) and turned into this second data release, which represents a major advance with respect to Gaia DR1 in terms of completeness, performance, and richness of the data products. Results: Gaia DR2 contains celestial positions and the apparent brightness in G for approximately 1.7 billion sources. For 1.3 billion of those sources, parallaxes and proper motions are in addition available. The sample of sources for which variability information is provided is expanded to 0.5 million stars. This data release contains four new elements: broad-band colour information in the form of the apparent brightness in the GBP (330-680 nm) and GRP (630-1050 nm) bands is available for 1.4 billion sources; median radial velocities for some 7 million sources are presented; for between 77 and 161 million sources estimates are provided of the stellar effective temperature, extinction, reddening, and radius and luminosity; and for a pre-selected list of 14 000 minor planets in the solar system epoch astrometry and photometry are presented. Finally, Gaia DR2 also represents a new materialisation of the celestial reference frame in the optical, the Gaia-CRF2, which is the first optical reference frame based solely on extragalactic sources. There are notable changes in the photometric system and the catalogue source list with respect to Gaia DR1, and we stress the need to consider the two data releases as independent. Conclusions: Gaia DR2 represents a major achievement for the Gaia mission, delivering on the long standing promise to provide parallaxes and proper motions for over 1 billion stars, and representing a first step in the availability of complementary radial velocity and source astrophysical information for a sample of stars in the Gaia survey which covers a very substantial fraction of the volume of our galaxy.

Proceedings ArticleDOI
07 Jun 2015
TL;DR: A system that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure offace similarity, and achieves state-of-the-art face recognition performance using only 128-bytes perface.
Abstract: Despite significant recent advances in the field of face recognition [10, 14, 15, 17], implementing face verification and recognition efficiently at scale presents serious challenges to current approaches. In this paper we present a system, called FaceNet, that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure of face similarity. Once this space has been produced, tasks such as face recognition, verification and clustering can be easily implemented using standard techniques with FaceNet embeddings as feature vectors.

Journal ArticleDOI
TL;DR: H hierarchical and self-consistent orthology annotations are introduced for all interacting proteins, grouping the proteins into families at various levels of phylogenetic resolution in the STRING database.
Abstract: The many functional partnerships and interactions that occur between proteins are at the core of cellular processing and their systematic characterization helps to provide context in molecular systems biology. However, known and predicted interactions are scattered over multiple resources, and the available data exhibit notable differences in terms of quality and completeness. The STRING database (http://string-db.org) aims to provide a critical assessment and integration of protein-protein interactions, including direct (physical) as well as indirect (functional) associations. The new version 10.0 of STRING covers more than 2000 organisms, which has necessitated novel, scalable algorithms for transferring interaction information between organisms. For this purpose, we have introduced hierarchical and self-consistent orthology annotations for all interacting proteins, grouping the proteins into families at various levels of phylogenetic resolution. Further improvements in version 10.0 include a completely redesigned prediction pipeline for inferring protein-protein associations from co-expression data, an API interface for the R computing environment and improved statistical analysis for enrichment tests in user-provided networks.

Proceedings ArticleDOI
18 Jun 2018
TL;DR: In this article, the non-local operation computes the response at a position as a weighted sum of the features at all positions, which can be used to capture long-range dependencies.
Abstract: Both convolutional and recurrent operations are building blocks that process one local neighborhood at a time. In this paper, we present non-local operations as a generic family of building blocks for capturing long-range dependencies. Inspired by the classical non-local means method [4] in computer vision, our non-local operation computes the response at a position as a weighted sum of the features at all positions. This building block can be plugged into many computer vision architectures. On the task of video classification, even without any bells and whistles, our nonlocal models can compete or outperform current competition winners on both Kinetics and Charades datasets. In static image recognition, our non-local models improve object detection/segmentation and pose estimation on the COCO suite of tasks. Code will be made available.

Proceedings ArticleDOI
17 Aug 2015
TL;DR: A global approach which always attends to all source words and a local one that only looks at a subset of source words at a time are examined, demonstrating the effectiveness of both approaches on the WMT translation tasks between English and German in both directions.
Abstract: An attentional mechanism has lately been used to improve neural machine translation (NMT) by selectively focusing on parts of the source sentence during translation. However, there has been little work exploring useful architectures for attention-based NMT. This paper examines two simple and effective classes of attentional mechanism: a global approach which always attends to all source words and a local one that only looks at a subset of source words at a time. We demonstrate the effectiveness of both approaches on the WMT translation tasks between English and German in both directions. With local attention, we achieve a significant gain of 5.0 BLEU points over non-attentional systems that already incorporate known techniques such as dropout. Our ensemble model using different attention architectures yields a new state-of-the-art result in the WMT’15 English to German translation task with 25.9 BLEU points, an improvement of 1.0 BLEU points over the existing best system backed by NMT and an n-gram reranker. 1