scispace - formally typeset
Search or ask a question

Showing papers by "Amos Storkey published in 2015"


Proceedings Article
06 Jul 2015
TL;DR: In this paper, the authors train deep convolutional neural networks to play Go by training them to predict the moves made by expert Go players and achieve state-of-the-art performance.
Abstract: Mastering the game of Go has remained a longstanding challenge to the field of AI. Modern computer Go programs rely on processing millions of possible future positions to play well, but intuitively a stronger and more 'humanlike' way to play the game would be to rely on pattern recognition rather than brute force computation. Following this sentiment, we train deep convolutional neural networks to play Go by training them to predict the moves made by expert Go players. To solve this problem we introduce a number of novel techniques, including a method of tying weights in the network to 'hard code' symmetries that are expected to exist in the target function, and demonstrate in an ablation study they considerably improve performance. Our final networks are able to achieve move prediction accuracies of 41.1% and 44.4% on two different Go datasets, surpassing previous state of the art on this task by significant margins. Additionally, while previous move prediction systems have not yielded strong Go playing programs, we show that the networks trained in this work acquired high levels of skill. Our convolutional neural networks can consistently defeat the well known Go program GNU Go and win some games against state of the art Go playing program Fuego while using a fraction of the play time.

149 citations


Posted Content
TL;DR: In this paper, the adversarial model is formulated as a minimax problem, and the objective is to minimize the performance of the adversary while ensuring that there is little or no information in the representation about the sensitive variable.
Abstract: In practice, there are often explicit constraints on what representations or decisions are acceptable in an application of machine learning. For example it may be a legal requirement that a decision must not favour a particular group. Alternatively it can be that that representation of data must not have identifying information. We address these two related issues by learning flexible representations that minimize the capability of an adversarial critic. This adversary is trying to predict the relevant sensitive variable from the representation, and so minimizing the performance of the adversary ensures there is little or no information in the representation about the sensitive variable. We demonstrate this adversarial approach on two problems: making decisions free from discrimination and removing private information from images. We formulate the adversarial model as a minimax problem, and optimize that minimax objective using a stochastic gradient alternate min-max optimizer. We demonstrate the ability to provide discriminant free representations for standard test problems, and compare with previous state of the art methods for fairness, showing statistically significant improvement across most cases. The flexibility of this method is shown via a novel problem: removing annotations from images, from unaligned training examples of annotated and unannotated images, and with no a priori knowledge of the form of annotation provided to the model.

98 citations


Journal ArticleDOI
TL;DR: The supervised hierarchical Dirichlet process (sHDP), a nonparametric generative model for the joint distribution of a group of observations and a response variable directly associated with that whole group, is proposed and compared with another leading method for regression on grouped data.
Abstract: We propose the supervised hierarchical Dirichlet process (sHDP), a nonparametric generative model for the joint distribution of a group of observations and a response variable directly associated with that whole group. We compare the sHDP with another leading method for regression on grouped data, the supervised latent Dirichlet allocation (sLDA) model. We evaluate our method on two real-world classification problems and two real-world regression problems. Bayesian nonparametric regression models based on the Dirichlet process, such as the Dirichlet process-generalised linear models (DP-GLM) have previously been explored; these models allow flexibility in modelling nonlinear relationships. However, until now, hierarchical Dirichlet process (HDP) mixtures have not seen significant use in supervised problems with grouped data since a straightforward application of the HDP on the grouped data results in learnt clusters that are not predictive of the responses. The sHDP solves this problem by allowing for clusters to be learnt jointly from the group structure and from the label assigned to each group.

37 citations



Journal ArticleDOI
TL;DR: To investigate white matter structural connectivity changes associated with amyotrophic lateral sclerosis using network analysis and compare the results with those obtained using standard voxel‐based methods, specifically Tract‐based Spatial Statistics (TBSS).
Abstract: Background To investigate white matter structural connectivity changes associated with amyotrophic lateral sclerosis (ALS) using network analysis and compare the results with those obtained using standard voxel-based methods, specifically Tract-based Spatial Statistics (TBSS). Methods MRI data were acquired from 30 patients with ALS and 30 age-matched healthy controls. For each subject, 85 grey matter regions (network nodes) were identified from high resolution structural MRI, and network connections formed from the white matter tracts generated by diffusion MRI and probabilistic tractography. Whole-brain networks were constructed using strong constraints on anatomical plausibility and a weighting reflecting tract-averaged fractional anisotropy (FA). Results Analysis using Network-based Statistics (NBS), without a priori selected regions, identified an impaired motor-frontal-subcortical subnetwork (10 nodes and 12 bidirectional connections), consistent with upper motor neuron pathology, in the ALS group compared with the controls (P = 0.020). Reduced FA in three of the impaired network connections, which involved fibers of the corticospinal tract, correlated with rate of disease progression (P ≤ 0.024). A novel network-tract comparison revealed that the connections involved in the affected network had a strong correspondence (mean overlap of 86.2%) with white matter tracts identified as having reduced FA compared with the control group using TBSS. Conclusion These findings suggest that white matter degeneration in ALS is strongly linked to the motor cortex, and that impaired structural networks identified using NBS have a strong correspondence to affected white matter tracts identified using more conventional voxel-based methods. J. Magn. Reson. Imaging 2015;41:1342–1352. © 2014 Wiley Periodicals, Inc.

33 citations


Proceedings Article
07 Dec 2015
TL;DR: This article proposes a covariance-controlled adaptive Langevin thermostat that can effectively dissipate parameter-dependent noise while maintaining a desired target distribution and achieves a substantial speedup over popular alternative schemes for large-scale machine learning applications.
Abstract: Monte Carlo sampling for Bayesian posterior inference is a common approach used in machine learning. The Markov chain Monte Carlo procedures that are used are often discrete-time analogues of associated stochastic differential equations (SDEs). These SDEs are guaranteed to leave invariant the required posterior distribution. An area of current research addresses the computational benefits of stochastic gradient methods in this setting. Existing techniques rely on estimating the variance or covariance of the subsampling error, and typically assume constant variance. In this article, we propose a covariance-controlled adaptive Langevin thermostat that can effectively dissipate parameter-dependent noise while maintaining a desired target distribution. The proposed method achieves a substantial speedup over popular alternative schemes for large-scale machine learning applications.

33 citations


Posted Content
TL;DR: In this paper, a covariance-controlled adaptive Langevin thermostat is proposed to dissipate parameter-dependent noise while maintaining a desired target distribution, which achieves a substantial speedup over popular alternative schemes for large scale machine learning applications.
Abstract: Monte Carlo sampling for Bayesian posterior inference is a common approach used in machine learning. The Markov Chain Monte Carlo procedures that are used are often discrete-time analogues of associated stochastic differential equations (SDEs). These SDEs are guaranteed to leave invariant the required posterior distribution. An area of current research addresses the computational benefits of stochastic gradient methods in this setting. Existing techniques rely on estimating the variance or covariance of the subsampling error, and typically assume constant variance. In this article, we propose a covariance-controlled adaptive Langevin thermostat that can effectively dissipate parameter-dependent noise while maintaining a desired target distribution. The proposed method achieves a substantial speedup over popular alternative schemes for large-scale machine learning applications.

19 citations


Book ChapterDOI
07 Sep 2015
TL;DR: In this paper, a generic convex-concave saddle point problem with a separable structure was considered, and adaptive stepsizes were proposed to improve the linear convergence rate.
Abstract: We consider a generic convex-concave saddle point problem with a separable structure, a form that covers a wide-ranged machine learning applications. Under this problem structure, we follow the framework of primal-dual updates for saddle point problems, and incorporate stochastic block coordinate descent with adaptive stepsizes into this framework. We theoretically show that our proposal of adaptive stepsizes potentially achieves a sharper linear convergence rate compared with the existing methods. Additionally, since we can select "mini-batch" of block coordinates to update, our method is also amenable to parallel processing for large-scale data. We apply the proposed method to regularized empirical risk minimization and show that it performs comparably or, more often, better than state-of-the-art methods on both synthetic and real-world data sets.

16 citations


Posted Content
TL;DR: This work considers a generic convex-concave saddle point problem with a separable structure, and incorporates stochastic block coordinate descent with adaptive stepsizes into this framework, and theoretically shows that the proposal of adaptive Stepsizes potentially achieves a sharper linear convergence rate compared with the existing methods.
Abstract: We consider a generic convex-concave saddle point problem with separable structure, a form that covers a wide-ranged machine learning applications. Under this problem structure, we follow the framework of primal-dual updates for saddle point problems, and incorporate stochastic block coordinate descent with adaptive stepsize into this framework. We theoretically show that our proposal of adaptive stepsize potentially achieves a sharper linear convergence rate compared with the existing methods. Additionally, since we can select "mini-batch" of block coordinates to update, our method is also amenable to parallel processing for large-scale data. We apply the proposed method to regularized empirical risk minimization and show that it performs comparably or, more often, better than state-of-the-art methods on both synthetic and real-world data sets.

12 citations


Book ChapterDOI
07 Sep 2015
TL;DR: A spectrum of compositional methods, Renyi divergence aggregators, that interpolate between log opinion pools and linear opinion pools are introduced, showing that these Compositional methods are maximum entropy distributions for aggregating information from agents subject to individual biases, with theRenyi divergence parameter dependent on the bias.
Abstract: Trading in information markets, such as machine learning markets, has been shown to be an effective approach for aggregating the beliefs of different agents. In a machine learning context, aggregation commonly uses forms of linear opinion pools, or logarithmic (log) opinion pools. It is interesting to relate information market aggregation to the machine learning setting. In this paper we introduce a spectrum of compositional methods, Renyi divergence aggregators, that interpolate between log opinion pools and linear opinion pools. We show that these compositional methods are maximum entropy distributions for aggregating information from agents subject to individual biases, with the Renyi divergence parameter dependent on the bias. In the limit of no bias this reduces to the optimal limit of log opinion pools. We demonstrate this relationship practically on both simulated and real datasets. We then return to information markets and show that Renyi divergence aggregators are directly implemented by machine learning markets with isoelastic utilities, and so can result from autonomous self interested decision making by individuals contributing different predictors. The risk averseness of the isoelastic utility directly relates to the Renyi divergence parameter, and hence encodes how much an agent believes (s)he may be subject to an individual bias that could affect the trading outcome: if an agent believes (s)he might be acting on significantly biased information, a more risk averse isoelastic utility is warranted.

6 citations


Posted Content
TL;DR: In this article, a stochastic block coordinate descent method using adaptive primal-dual updates is proposed to solve saddle point problems with a separable structure and non-strongly convex functions.
Abstract: We consider convex-concave saddle point problems with a separable structure and non-strongly convex functions. We propose an efficient stochastic block coordinate descent method using adaptive primal-dual updates, which enables flexible parallel optimization for large-scale problems. Our method shares the efficiency and flexibility of block coordinate descent methods with the simplicity of primal-dual methods and utilizing the structure of the separable convex-concave saddle point problem. It is capable of solving a wide range of machine learning applications, including robust principal component analysis, Lasso, and feature selection by group Lasso, etc. Theoretically and empirically, we demonstrate significantly better performance than state-of-the-art methods in all these applications.