scispace - formally typeset
Search or ask a question
Author

Brendan O'Donoghue

Other affiliations: Stanford University
Bio: Brendan O'Donoghue is an academic researcher from Google. The author has contributed to research in topics: Reinforcement learning & Convex optimization. The author has an hindex of 22, co-authored 48 publications receiving 4408 citations. Previous affiliations of Brendan O'Donoghue include Stanford University.

Papers
More filters
Journal ArticleDOI
TL;DR: A novel deep learning architecture performs device-independent tissue segmentation of clinical 3D retinal images followed by separate diagnostic classification that meets or exceeds human expert clinical diagnoses of retinal disease.
Abstract: The volume and complexity of diagnostic imaging is increasing at a pace faster than the availability of human expertise to interpret it. Artificial intelligence has shown great promise in classifying two-dimensional photographs of some common diseases and typically relies on databases of millions of annotated images. Until now, the challenge of reaching the performance of expert clinicians in a real-world clinical pathway with three-dimensional diagnostic scans has remained unsolved. Here, we apply a novel deep learning architecture to a clinically heterogeneous set of three-dimensional optical coherence tomography scans from patients referred to a major eye hospital. We demonstrate performance in making a referral recommendation that reaches or exceeds that of experts on a range of sight-threatening retinal diseases after training on only 14,884 scans. Moreover, we demonstrate that the tissue segmentations produced by our architecture act as a device-independent representation; referral accuracy is maintained when using tissue segmentations from a different type of device. Our work removes previous barriers to wider clinical use without prohibitive training data requirements across multiple pathologies in a real-world setting.

1,665 citations

Journal ArticleDOI
TL;DR: This paper considers accelerated variants of two common alternating direction methods: the alternating direction method of multipliers (ADMM) and the alternating minimization algorithm (AMA), of the form first proposed by Nesterov for gradient descent methods.
Abstract: Alternating direction methods are a common tool for general mathematical programming and optimization. These methods have become particularly important in the field of variational image processing, which frequently requires the minimization of nondifferentiable objectives. This paper considers accelerated (i.e., fast) variants of two common alternating direction methods: the alternating direction method of multipliers (ADMM) and the alternating minimization algorithm (AMA). The proposed acceleration is of the form first proposed by Nesterov for gradient descent methods. In the case that the objective function is strongly convex, global convergence bounds are provided for both classical and accelerated variants of the methods. Numerical examples are presented to demonstrate the superior performance of the fast methods for a wide variety of problems.

772 citations

Journal ArticleDOI
TL;DR: In this paper, a simple heuristic adaptive restart technique that can dramatically improve the convergence rate of accelerated gradient schemes is proposed. But it is not known whether the adaptive restart interval is proportional to the square root of the local condition number of the objective function.
Abstract: In this paper we introduce a simple heuristic adaptive restart technique that can dramatically improve the convergence rate of accelerated gradient schemes. The analysis of the technique relies on the observation that these schemes exhibit two modes of behavior depending on how much momentum is applied at each iteration. In what we refer to as the `high momentum' regime the iterates generated by an accelerated gradient scheme exhibit a periodic behavior, where the period is proportional to the square root of the local condition number of the objective function. Separately, it is known that the optimal restart interval is proportional to this same quantity. This suggests a restart technique whereby we reset the momentum whenever we observe periodic behavior. We provide a heuristic analysis that suggests that in many cases adaptively restarting allows us to recover the optimal rate of convergence with no prior knowledge of function parameters.

664 citations

Journal ArticleDOI
TL;DR: In this article, the alternating directions method of multipliers is used to solve the homogeneous self-dual embedding, an equivalent feasibility problem involving finding a nonzero point in the intersection of a subspace and a cone.
Abstract: We introduce a first-order method for solving very large convex cone programs. The method uses an operator splitting method, the alternating directions method of multipliers, to solve the homogeneous self-dual embedding, an equivalent feasibility problem involving finding a nonzero point in the intersection of a subspace and a cone. This approach has several favorable properties. Compared to interior-point methods, first-order methods scale to very large problems, at the cost of requiring more time to reach very high accuracy. Compared to other first-order methods for cone programs, our approach finds both primal and dual solutions when available or a certificate of infeasibility or unboundedness otherwise, is parameter free, and the per-iteration cost of the method is the same as applying a splitting method to the primal or dual alone. We discuss efficient implementation of the method in detail, including direct and indirect methods for computing projection onto the subspace, scaling the original problem data, and stopping criteria. We describe an open-source implementation, which handles the usual (symmetric) nonnegative, second-order, and semidefinite cones as well as the (non-self-dual) exponential and power cones and their duals. We report numerical results that show speedups over interior-point cone solvers for large problems, and scaling to very large general cone programs.

597 citations

Posted Content
TL;DR: In this paper, the authors use adversarial risk as an objective, although it cannot easily be computed exactly, and frame commonly used attacks and evaluation metrics as defining a tractable surrogate objective to the true adversarial risks.
Abstract: This paper investigates recently proposed approaches for defending against adversarial examples and evaluating adversarial robustness The existence of adversarial examples in trained neural networks reflects the fact that expected risk alone does not capture the model's performance against worst-case inputs We motivate the use of adversarial risk as an objective, although it cannot easily be computed exactly We then frame commonly used attacks and evaluation metrics as defining a tractable surrogate objective to the true adversarial risk This suggests that models may be obscured to adversaries, by optimizing this surrogate rather than the true adversarial risk We demonstrate that this is a significant problem in practice by repurposing gradient-free optimization techniques into adversarial attacks, which we use to decrease the accuracy of several recently proposed defenses to near zero Our hope is that our formulations and results will help researchers to develop more powerful defenses

207 citations


Cited by
More filters
Book
27 Nov 2013
TL;DR: The many different interpretations of proximal operators and algorithms are discussed, their connections to many other topics in optimization and applied mathematics are described, some popular algorithms are surveyed, and a large number of examples of proxiesimal operators that commonly arise in practice are provided.
Abstract: This monograph is about a class of optimization algorithms called proximal algorithms. Much like Newton's method is a standard tool for solving unconstrained smooth optimization problems of modest size, proximal algorithms can be viewed as an analogous tool for nonsmooth, constrained, large-scale, or distributed versions of these problems. They are very generally applicable, but are especially well-suited to problems of substantial recent interest involving large or high-dimensional datasets. Proximal methods sit at a higher level of abstraction than classical algorithms like Newton's method: the base operation is evaluating the proximal operator of a function, which itself involves solving a small convex optimization problem. These subproblems, which generalize the problem of projecting a point onto a convex set, often admit closed-form solutions or can be solved very quickly with standard or simple specialized methods. Here, we discuss the many different interpretations of proximal operators and algorithms, describe their connections to many other topics in optimization and applied mathematics, survey some popular algorithms, and provide a large number of examples of proximal operators that commonly arise in practice.

3,627 citations

Posted Content
TL;DR: In this paper, a simple warm restart technique for stochastic gradient descent was proposed to improve its anytime performance when training deep neural networks, which achieved state-of-the-art results on both the CIFAR-10 and CifAR-100 datasets.
Abstract: Restart techniques are common in gradient-free optimization to deal with multimodal functions. Partial warm restarts are also gaining popularity in gradient-based optimization to improve the rate of convergence in accelerated gradient schemes to deal with ill-conditioned functions. In this paper, we propose a simple warm restart technique for stochastic gradient descent to improve its anytime performance when training deep neural networks. We empirically study its performance on the CIFAR-10 and CIFAR-100 datasets, where we demonstrate new state-of-the-art results at 3.14% and 16.21%, respectively. We also demonstrate its advantages on a dataset of EEG recordings and on a downsampled version of the ImageNet dataset. Our source code is available at this https URL

3,497 citations

Posted Content
TL;DR: In this article, an off-policy actor-critic deep RL algorithm based on the maximum entropy reinforcement learning framework is proposed, where the actor aims to maximize expected reward while also maximizing entropy.
Abstract: Model-free deep reinforcement learning (RL) algorithms have been demonstrated on a range of challenging decision making and control tasks. However, these methods typically suffer from two major challenges: very high sample complexity and brittle convergence properties, which necessitate meticulous hyperparameter tuning. Both of these challenges severely limit the applicability of such methods to complex, real-world domains. In this paper, we propose soft actor-critic, an off-policy actor-critic deep RL algorithm based on the maximum entropy reinforcement learning framework. In this framework, the actor aims to maximize expected reward while also maximizing entropy. That is, to succeed at the task while acting as randomly as possible. Prior deep RL methods based on this framework have been formulated as Q-learning methods. By combining off-policy updates with a stable stochastic actor-critic formulation, our method achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off-policy methods. Furthermore, we demonstrate that, in contrast to other off-policy algorithms, our approach is very stable, achieving very similar performance across different random seeds.

3,141 citations

Journal ArticleDOI
TL;DR: A comprehensive survey of the state-of-the-art MEC research with a focus on joint radio-and-computational resource management is provided in this paper, where a set of issues, challenges, and future research directions for MEC are discussed.
Abstract: Driven by the visions of Internet of Things and 5G communications, recent years have seen a paradigm shift in mobile computing, from the centralized mobile cloud computing toward mobile edge computing (MEC). The main feature of MEC is to push mobile computing, network control and storage to the network edges (e.g., base stations and access points) so as to enable computation-intensive and latency-critical applications at the resource-limited mobile devices. MEC promises dramatic reduction in latency and mobile energy consumption, tackling the key challenges for materializing 5G vision. The promised gains of MEC have motivated extensive efforts in both academia and industry on developing the technology. A main thrust of MEC research is to seamlessly merge the two disciplines of wireless communications and mobile computing, resulting in a wide-range of new designs ranging from techniques for computation offloading to network architectures. This paper provides a comprehensive survey of the state-of-the-art MEC research with a focus on joint radio-and-computational resource management. We also discuss a set of issues, challenges, and future research directions for MEC research, including MEC system deployment, cache-enabled MEC, mobility management for MEC, green MEC, as well as privacy-aware MEC. Advancements in these directions will facilitate the transformation of MEC from theory to practice. Finally, we introduce recent standardization efforts on MEC as well as some typical MEC application scenarios.

2,992 citations