Showing papers on "Function (mathematics) published in 2017"

PDF

Open Access

Journal Article•DOI•

On the linear convergence of the alternating direction method of multipliers

[...]

Mingyi Hong¹, Zhi-Quan Luo²•Institutions (2)

Iowa State University¹, The Chinese University of Hong Kong²

01 Mar 2017-Mathematical Programming

TL;DR: This paper establishes the global R-linear convergence of the ADMM for minimizing the sum of any number of convex separable functions, assuming that a certain error bound condition holds true and the dual stepsize is sufficiently small.

...read moreread less

Abstract: We analyze the convergence rate of the alternating direction method of multipliers (ADMM) for minimizing the sum of two or more nonsmooth convex separable functions subject to linear constraints. Previous analysis of the ADMM typically assumes that the objective function is the sum of only two convex functions defined on two separable blocks of variables even though the algorithm works well in numerical experiments for three or more blocks. Moreover, there has been no rate of convergence analysis for the ADMM without strong convexity in the objective function. In this paper we establish the global R-linear convergence of the ADMM for minimizing the sum of any number of convex separable functions, assuming that a certain error bound condition holds true and the dual stepsize is sufficiently small. Such an error bound condition is satisfied for example when the feasible set is a compact polyhedron and the objective function consists of a smooth strictly convex function composed with a linear mapping, and a nonsmooth $$\ell _1$$l1 regularizer. This result implies the linear convergence of the ADMM for contemporary applications such as LASSO without assuming strong convexity of the objective function.

...read moreread less

705 citations

Journal Article•DOI•

LEVEL: A computer program for solving the radial Schrödinger equation for bound and quasibound levels

[...]

Robert J. Le Roy¹•Institutions (1)

University of Waterloo¹

01 Jan 2017-Journal of Quantitative Spectroscopy & Radiative Transfer

TL;DR: Level as mentioned in this paper can automatically locate the bound and/or quasibounded levels of any smooth single- or double-minimum potential, and calculate inertial rotation and centrifugal distortion constants and various expectation values for those levels.

...read moreread less

Abstract: This paper describes program LEVEL, which can solve the radial or one-dimensional Schrodinger equation and automatically locate either all of, or a selected number of, the bound and/or quasibound levels of any smooth single- or double-minimum potential, and calculate inertial rotation and centrifugal distortion constants and various expectation values for those levels. It can also calculate Franck–Condon factors and other off-diagonal matrix elements, either between levels of a single potential or between levels of two different potentials. The potential energy function may be defined by any one of a number of analytic functions, or by a set of input potential function values which the code will interpolate over and extrapolate beyond to span the desired range.

...read moreread less

674 citations

Journal Article•DOI•

Deep Learning-Based Numerical Methods for High-Dimensional Parabolic Partial Differential Equations and Backward Stochastic Differential Equations

[...]

Weinan E¹, Weinan E², Jiequn Han², Arnulf Jentzen³•Institutions (3)

Peking University¹, Princeton University², ETH Zurich³

15 Jun 2017

TL;DR: In this article, a new algorithm for solving parabolic partial differential equations and backward stochastic differential equations (BSDEs) in high dimension, which is based on an analogy between the BSDE and reinforcement learning with the gradient of the solution playing the role of the policy function, and the loss function given by the error between the prescribed terminal condition and the solution of BSDE.

...read moreread less

Abstract: We study a new algorithm for solving parabolic partial differential equations (PDEs) and backward stochastic differential equations (BSDEs) in high dimension, which is based on an analogy between the BSDE and reinforcement learning with the gradient of the solution playing the role of the policy function, and the loss function given by the error between the prescribed terminal condition and the solution of the BSDE. The policy function is then approximated by a neural network, as is done in deep reinforcement learning. Numerical results using TensorFlow illustrate the efficiency and accuracy of the studied algorithm for several 100-dimensional nonlinear PDEs from physics and finance such as the Allen–Cahn equation, the Hamilton–Jacobi–Bellman equation, and a nonlinear pricing model for financial derivatives.

...read moreread less

408 citations

Posted Content•

BAS: Beetle Antennae Search Algorithm for Optimization Problems

[...]

Xiangyuan Jiang¹, Shuai Li²•Institutions (2)

China University of Petroleum¹, Hong Kong Polytechnic University²

30 Oct 2017-arXiv: Neural and Evolutionary Computing

TL;DR: The proposed beetle antennae search algorithm (BAS) imitates the function of antennae and the random walking mechanism of beetles in nature, and then two main steps of detecting and searching are implemented.

...read moreread less

Abstract: Meta-heuristic algorithms have become very popular because of powerful performance on the optimization problem. A new algorithm called beetle antennae search algorithm (BAS) is proposed in the paper inspired by the searching behavior of longhorn beetles. The BAS algorithm imitates the function of antennae and the random walking mechanism of beetles in nature, and then two main steps of detecting and searching are implemented. Finally, the algorithm is benchmarked on 2 well-known test functions, in which the numerical results validate the efficacy of the proposed BAS algorithm.

...read moreread less

276 citations

Journal Article•DOI•

Machine learning quantum phases of matter beyond the fermion sign problem

[...]

Peter Broecker¹, Juan Carrasquilla², Roger G. Melko², Roger G. Melko³, Simon Trebst¹ - Show less +1 more•Institutions (3)

University of Cologne¹, Perimeter Institute for Theoretical Physics², University of Waterloo³

18 Aug 2017-Scientific Reports

TL;DR: In this article, a convolutional neural network (CNN) was used to identify and locate quantum phase transitions in quantum many-fermion systems using auxiliary-field quantum Monte Carlo simulations.

...read moreread less

Abstract: State-of-the-art machine learning techniques promise to become a powerful tool in statistical mechanics via their capacity to distinguish different phases of matter in an automated way. Here we demonstrate that convolutional neural networks (CNN) can be optimized for quantum many-fermion systems such that they correctly identify and locate quantum phase transitions in such systems. Using auxiliary-field quantum Monte Carlo (QMC) simulations to sample the many-fermion system, we show that the Green’s function holds sufficient information to allow for the distinction of different fermionic phases via a CNN. We demonstrate that this QMC + machine learning approach works even for systems exhibiting a severe fermion sign problem where conventional approaches to extract information from the Green’s function, e.g. in the form of equal-time correlation functions, fail.

...read moreread less

272 citations

Proceedings Article•

On Kernelized Multi-armed Bandits.

[...]

Sayak Ray Chowdhury¹, Aditya Gopalan¹•Institutions (1)

Indian Institute of Science¹

17 Jul 2017

TL;DR: In this article, the authors considered the stochastic bandit problem with a continuous set of arms, with the expected reward function over the arms assumed to be fixed but unknown.

...read moreread less

Abstract: We consider the stochastic bandit problem with a continuous set of arms, with the expected reward function over the arms assumed to be fixed but unknown. We provide two new Gaussian process-based algorithms for continuous bandit optimization-Improved GP-UCB (IGP-UCB) and GP-Thomson sampling (GP-TS), and derive corresponding regret bounds. Specifically, the bounds hold when the expected reward function belongs to the reproducing kernel Hilbert space (RKHS) that naturally corresponds to a Gaussian process kernel used as input by the algorithms. Along the way, we derive a new self-normalized concentration inequality for vector- valued martingales of arbitrary, possibly infinite, dimension. Finally, experimental evaluation and comparisons to existing algorithms on synthetic and real-world environments are carried out that highlight the favorable gains of the proposed strategies in many cases.

...read moreread less

248 citations

Journal Article•DOI•

DG2: A Faster and More Accurate Differential Grouping for Large-Scale Black-Box Optimization

[...]

Mohammad Nabi Omidvar¹, Ming Yang², Yi Mei³, Xiaodong Li⁴, Xin Yao¹ - Show less +1 more•Institutions (4)

University of Birmingham¹, China University of Geosciences (Wuhan)², Victoria University of Wellington³, RMIT University⁴

25 Apr 2017-IEEE Transactions on Evolutionary Computation

TL;DR: The proposed improved variant of the differential grouping (DG) algorithm, DG2, finds a reliable threshold value by estimating the magnitude of roundoff errors and automatic calculation of its threshold parameter, which makes it parameter-free.

...read moreread less

Abstract: Identification of variable interaction is essential for an efficient implementation of a divide-and-conquer algorithm for large-scale black-box optimization. In this paper, we propose an improved variant of the differential grouping (DG) algorithm, which has a better efficiency and grouping accuracy. The proposed algorithm, DG2, finds a reliable threshold value by estimating the magnitude of roundoff errors. With respect to efficiency, DG2 reuses the sample points that are generated for detecting interactions and saves up to half of the computational resources on fully separable functions. We mathematically show that the new sampling technique achieves the lower bound with respect to the number of function evaluations. Unlike its predecessor, DG2 checks all possible pairs of variables for interactions and has the capacity to identify overlapping components of an objective function. On the accuracy aspect, DG2 outperforms the state-of-the-art decomposition methods on the latest large-scale continuous optimization benchmark suites. DG2 also performs reliably in the presence of imbalance among contribution of components in an objective function. Another major advantage of DG2 is the automatic calculation of its threshold parameter ( $\epsilon $ ), which makes it parameter-free. Finally, the experimental results show that when DG2 is used within a cooperative co-evolutionary framework, it can generate competitive results as compared to several state-of-the-art algorithms.

...read moreread less

243 citations

Posted Content•

Loss-Sensitive Generative Adversarial Networks on Lipschitz Densities

[...]

Guo-Jun Qi

23 Jan 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, a loss-sensitive GAN (LS-GAN) is proposed to distinguish between real and fake samples by designated margins, while learning a generator alternately to produce realistic samples by minimizing their losses.

...read moreread less

Abstract: In this paper, we present the Lipschitz regularization theory and algorithms for a novel Loss-Sensitive Generative Adversarial Network (LS-GAN). Specifically, it trains a loss function to distinguish between real and fake samples by designated margins, while learning a generator alternately to produce realistic samples by minimizing their losses. The LS-GAN further regularizes its loss function with a Lipschitz regularity condition on the density of real data, yielding a regularized model that can better generalize to produce new data from a reasonable number of training examples than the classic GAN. We will further present a Generalized LS-GAN (GLS-GAN) and show it contains a large family of regularized GAN models, including both LS-GAN and Wasserstein GAN, as its special cases. Compared with the other GAN models, we will conduct experiments to show both LS-GAN and GLS-GAN exhibit competitive ability in generating new images in terms of the Minimum Reconstruction Error (MRE) assessed on a separate test set. We further extend the LS-GAN to a conditional form for supervised and semi-supervised learning problems, and demonstrate its outstanding performance on image classification tasks.

...read moreread less

218 citations

Journal Article•DOI•

Distributed Continuous-Time Algorithm for Constrained Convex Optimizations via Nonsmooth Analysis Approach

[...]

Xianlin Zeng¹, Peng Yi², Yiguang Hong¹•Institutions (2)

Chinese Academy of Sciences¹, University of Toronto²

01 Oct 2017-IEEE Transactions on Automatic Control

TL;DR: In this article, a distributed continuous-time projected algorithm for convex cost functions with local constraints is proposed, in which each agent knows its local cost function and local constraint set, and proves that all the agents of the algorithm can find the same optimal solution.

...read moreread less

Abstract: This technical note studies the distributed optimization problem of a sum of nonsmooth convex cost functions with local constraints. At first, we propose a novel distributed continuous-time projected algorithm, in which each agent knows its local cost function and local constraint set, for the constrained optimization problem. Then we prove that all the agents of the algorithm can find the same optimal solution, and meanwhile, keep the states bounded while seeking the optimal solutions. We conduct a complete convergence analysis by employing nonsmooth Lyapunov functions for the stability analysis of differential inclusions. Finally, we provide a numerical example for illustration.

...read moreread less

198 citations

Journal Article•

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback

[...]

Ohad Shamir

01 Jan 2017-Journal of Machine Learning Research

TL;DR: In this paper, a bandit convex optimization with two-point feedback and zero-order stochastic convex optimisation with two function evaluations per round is considered. And the algorithm is based on a small but surprisingly powerful modification of the gradient estimator.

...read moreread less

Abstract: We consider the closely related problems of bandit convex optimization with two-point feedback, and zero-order stochastic convex optimization with two function evaluations per round. We provide a simple algorithm and analysis which is optimal for convex Lipschitz functions. This improves on \cite{dujww13}, which only provides an optimal result for smooth functions; Moreover, the algorithm and analysis are simpler, and readily extend to non-Euclidean problems. The algorithm is based on a small but surprisingly powerful modification of the gradient estimator.

...read moreread less

185 citations

Posted Content•

On the Properties of the Softmax Function with Application in Game Theory and Reinforcement Learning

[...]

Bolin Gao, Lacra Pavel

03 Apr 2017-arXiv: Optimization and Control

TL;DR: This paper shows that the softmax function is the monotone gradient map of the log-sum-exp function and exploits the inverse temperature parameter to derive the Lipschitz and co-coercivity properties of thesoftmax function.

...read moreread less

Abstract: In this paper, we utilize results from convex analysis and monotone operator theory to derive additional properties of the softmax function that have not yet been covered in the existing literature. In particular, we show that the softmax function is the monotone gradient map of the log-sum-exp function. By exploiting this connection, we show that the inverse temperature parameter determines the Lipschitz and co-coercivity properties of the softmax function. We then demonstrate the usefulness of these properties through an application in game-theoretic reinforcement learning.

...read moreread less

Journal Article•DOI•

Distributed Continuous-Time Convex Optimization With Time-Varying Cost Functions

[...]

Salar Rahili¹, Wei Ren¹•Institutions (1)

University of California, Riverside¹

01 Apr 2017-IEEE Transactions on Automatic Control

TL;DR: A time-varying distributed convex optimization problem is studied for continuous-time multi-agent systems and it is shown that the center of the agents tracks the optimal trajectory, the connectivity of the Agents is maintained, and interagent collision is avoided.

...read moreread less

Abstract: In this paper, a time-varying distributed convex optimization problem is studied for continuous-time multi-agent systems. The objective is to minimize the sum of local time-varying cost functions, each of which is known to only an individual agent, through local interaction. Here, the optimal point is time varying and creates an optimal trajectory. Control algorithms are designed for the cases of single-integrator and double-integrator dynamics. In both cases, a centralized approach is first introduced to solve the optimization problem. Then, this problem is solved in a distributed manner and a discontinuous algorithm based on the signum function is proposed in each case. In the case of single-integrator (respectively, double-integrator) dynamics, each agent relies only on its own position and the relative positions (respectively, positions and velocities) between itself and its neighbors. A gain adaption scheme is introduced in both algorithms to eliminate certain global information requirement. To relax the restricted assumption imposed on feasible cost functions, an estimator based algorithm using the signum function is proposed, where each agent uses dynamic average tracking as a tool to estimate the centralized control input. As a tradeoff, the estimator-based algorithm necessitates communication between neighbors. Then, in the case of double-integrator dynamics, the proposed algorithms are further extended. Two continuous algorithms based on, respectively, a time-varying and a fixed boundary layer are proposed as continuous approximations of the signum function. To account for interagent collision for physical agents, a distributed convex optimization problem with swarm tracking behavior is introduced for both single-integrator and double-integrator dynamics. It is shown that the center of the agents tracks the optimal trajectory, the connectivity of the agents is maintained, and interagent collision is avoided.

...read moreread less

Journal Article•

Information-geometric optimization algorithms: a unifying picture via invariance principles

[...]

Yann Ollivier¹, Ludovic Arnold, Anne Auger², Nikolaus Hansen²•Institutions (2)

Université Paris-Saclay¹, École Polytechnique²

01 Jan 2017-Journal of Machine Learning Research

TL;DR: A canonical way to turn any smooth parametric family of probability distributions on an arbitrary search space X into a continuous-time black-box optimization method on X, the information-geometric optimization (IGO) method, which achieves maximal invariance properties.

...read moreread less

Abstract: We present a canonical way to turn any smooth parametric family of probability distributions on an arbitrary search space X into a continuous-time black-box optimization method on X, the information-geometric optimization (IGO) method. Invariance as a major design principle keeps the number of arbitrary choices to a minimum. The resulting IGO flow is the flow of an ordinary differential equation conducting the natural gradient ascent of an adaptive, time-dependent transformation of the objective function. It makes no particular assumptions on the objective function to be optimized. The IGO method produces explicit IGO algorithms through time discretization. It naturally recovers versions of known algorithms and offers a systematic way to derive new ones. In continuous search spaces, IGO algorithms take a form related to natural evolution strategies (NES). The cross-entropy method is recovered in a particular case with a large time step, and can be extended into a smoothed, parametrization-independent maximum likelihood update (IGO-ML). When applied to the family of Gaussian distributions on Rd, the IGO framework recovers a version of the well-known CMA-ES algorithm and of xNES. For the family of Bernoulli distributions on {0, 1}d, we recover the seminal PBIL algorithm and cGA. For the distributions of restricted Boltzmann machines, we naturally obtain a novel algorithm for discrete optimization on {0, 1}d. All these algorithms are natural instances of, and unified under, the single information-geometric optimization framework. The IGO method achieves, thanks to its intrinsic formulation, maximal invariance properties: invariance under reparametrization of the search space X, under a change of parameters of the probability distribution, and under increasing transformation of the function to be optimized. The latter is achieved through an adaptive, quantile-based formulation of the objective. Theoretical considerations strongly suggest that IGO algorithms are essentially characterized by a minimal change of the distribution over time. Therefore they have minimal loss in diversity through the course of optimization, provided the initial diversity is high. First experiments using restricted Boltzmann machines confirm this insight. As a simple consequence, IGO seems to provide, from information theory, an elegant way to simultaneously explore several valleys of a fitness landscape in a single run.

...read moreread less

Proceedings Article•DOI•

The capacity of cache aided private information retrieval

[...]

Ravi Tandon¹•Institutions (1)

University of Arizona¹

21 Jun 2017

TL;DR: In this article, the capacity of cache-enabled private information retrieval (PIR) was characterized as a function of the storage parameter S and the information-theoretically optimal download cost was shown to be (1 − S/K) where S ∊ (0, K).

...read moreread less

Abstract: The problem of cache enabled private information retrieval (PIR) is considered in which a user wishes to privately retrieve one out of K messages, each of size L bits from N distributed databases. The user has a local cache of storage SL bits which can be used to store any function of the K messages. The main contribution of this work is the exact characterization of the capacity of cache enabled PIR as a function of the storage parameter S. In particular, for a given cache storage parameter S, the information-theoretically optimal download cost D∗(S)/L (or the inverse of capacity) is shown to be equal to (1 − S/K)(1 + 1/N + … + 1/NK−1). Special cases of this result correspond to the settings when S = 0, for which the optimal download cost was shown by Sun and Jafar to be (1 + 1/N + … + 1/NK−1), and the case when S = K, i.e., cache size is large enough to store all messages locally, for which the optimal download cost is 0. The intermediate points S ∊ (0, K) can be readily achieved through a simple memory-sharing based PIR scheme. The key technical contribution of this work is the converse, i.e., a lower bound on the download cost as a function of storage S which shows that memory sharing is information-theoretically optimal.

...read moreread less

Posted Content•

Joint Distribution Optimal Transportation for Domain Adaptation

[...]

Nicolas Courty¹, Rémi Flamary, Amaury Habrard, Alain Rakotomamonjy•Institutions (1)

University of Paris-Sud¹

24 May 2017-arXiv: Machine Learning

TL;DR: This paper proposes a solution of the unsupervised domain adaptation problem with optimal transport, that allows to recover an estimated target $\mathcal{P}^f_t=(X,f(X))$ by optimizing simultaneously the optimal coupling and $f$.

...read moreread less

Abstract: This paper deals with the unsupervised domain adaptation problem, where one wants to estimate a prediction function $f$ in a given target domain without any labeled sample by exploiting the knowledge available from a source domain where labels are known Our work makes the following assumption: there exists a non-linear transformation between the joint feature/label space distributions of the two domain $\mathcal{P}_s$ and $\mathcal{P}_t$ We propose a solution of this problem with optimal transport, that allows to recover an estimated target $\mathcal{P}^f_t=(X,f(X))$ by optimizing simultaneously the optimal coupling and $f$ We show that our method corresponds to the minimization of a bound on the target error, and provide an efficient algorithmic solution, for which convergence is proved The versatility of our approach, both in terms of class of hypothesis or loss functions is demonstrated with real world classification and regression problems, for which we reach or surpass state-of-the-art results

...read moreread less

Journal Article•DOI•

Non-Convex Distributed Optimization

[...]

Tatiana Tatarenko¹, Behrouz Touri²•Institutions (2)

Technische Universität Darmstadt¹, University of Colorado Boulder²

05 Jan 2017-IEEE Transactions on Automatic Control

TL;DR: In this article, the authors study distributed non-convex optimization on a time-varying multi-agent network, where each node has access to its own smooth local cost function, and the collective goal is to minimize the sum of these functions.

...read moreread less

Abstract: We study distributed non-convex optimization on a time-varying multi-agent network. Each node has access to its own smooth local cost function, and the collective goal is to minimize the sum of these functions. The perturbed push-sum algorithm was previously used for convex distributed optimization. We generalize the result obtained for the convex case to the case of non-convex functions. Under some additional technical assumptions on the gradients we prove the convergence of the distributed push-sum algorithm to some critical point of the objective function. By utilizing perturbations on the update process, we show the almost sure convergence of the perturbed dynamics to a local minimum of the global objective function, if the objective function has no saddle points. Our analysis shows that this perturbed procedure converges at a rate of $O(1/t)$ .

...read moreread less

Journal Article•DOI•

Discrete-Time Deterministic $Q$ -Learning: A Novel Convergence Analysis

[...]

Qinglai Wei¹, Frank L. Lewis², Qiuye Sun³, Pengfei Yan¹, Ruizhuo Song⁴ - Show less +1 more•Institutions (4)

Chinese Academy of Sciences¹, University of Texas at Arlington², Northeastern University (China)³, University of Science and Technology Beijing⁴

01 May 2017-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: A novel discrete-time deterministic deterministic inline-formula-learning algorithm is developed and the convergence criterion for the discounted case is established, and the iterative control law of the developed algorithm is simplified.

...read moreread less

Abstract: In this paper, a novel discrete-time deterministic $ Q$ -learning algorithm is developed. In each iteration of the developed $ Q$ -learning algorithm, the iterative $ Q$ function is updated for all the state and control spaces, instead of updating for a single state and a single control in traditional $ Q$ -learning algorithm. A new convergence criterion is established to guarantee that the iterative $ Q$ function converges to the optimum, where the convergence criterion of the learning rates for traditional $ Q$ -learning algorithms is simplified. During the convergence analysis, the upper and lower bounds of the iterative $ Q$ function are analyzed to obtain the convergence criterion, instead of analyzing the iterative $ Q$ function itself. For convenience of analysis, the convergence properties for undiscounted case of the deterministic $ Q$ -learning algorithm are first developed. Then, considering the discounted factor, the convergence criterion for the discounted case is established. Neural networks are used to approximate the iterative $ Q$ function and compute the iterative control law, respectively, for facilitating the implementation of the deterministic $ Q$ -learning algorithm. Finally, simulation results and comparisons are given to illustrate the performance of the developed algorithm.

...read moreread less

Journal Article•DOI•

Saddle-point dynamics: conditions for asymptotic stability of saddle points

[...]

Ashish Cherukuri¹, Bahman Gharesifard², Jorge E. Cortes¹•Institutions (2)

University of California, San Diego¹, Queen's University²

22 Feb 2017-Siam Journal on Control and Optimization

TL;DR: In this article, the authors consider continuous differentiable functions with min-max saddle points and study the asymptotic convergence properties of the associated saddle-point dynamics (gradient descent in the first variable and gradient ascent in the second one).

...read moreread less

Abstract: This paper considers continuously differentiable functions of two vector variables that have (possibly a continuum of) min-max saddle points. We study the asymptotic convergence properties of the associated saddle-point dynamics (gradient descent in the first variable and gradient ascent in the second one). We identify a suite of complementary conditions under which the set of saddle points is asymptotically stable under the saddle-point dynamics. Our first set of results is based on the convexity-concavity of the function defining the saddle-point dynamics to establish the convergence guarantees. For functions that do not enjoy this feature, our second set of results relies on properties of the linearization of the dynamics, the function along the proximal normals to the saddle set, and the linearity of the function in one variable. We also provide global versions of the asymptotic convergence results. Various examples illustrate our discussion.

...read moreread less

Proceedings Article•DOI•

Joint Distribution Optimal Transportation for Domain Adaptation

[...]

Nicolas Courty¹, Rémi Flamary, Amaury Habrard, Alain Rakotomamonjy•Institutions (1)

University of Paris-Sud¹

24 May 2017

TL;DR: In this article, a non-linear transformation between the joint feature/label space distributions of the two domains P s and P t can be estimated with optimal transport, which corresponds to the minimization of a bound on the target error.

...read moreread less

Abstract: This paper deals with the unsupervised domain adaptation problem, where one wants to estimate a prediction function f in a given target domain without any labeled sample by exploiting the knowledge available from a source domain where labels are known. Our work makes the following assumption: there exists a non-linear transformation between the joint feature/label space distributions of the two domain P s and P t that can be estimated with optimal transport. We propose a solution of this problem that allows to recover an estimated target P f t = (X, f (X)) by optimizing simultaneously the optimal coupling and f. We show that our method corresponds to the minimization of a bound on the target error, and provide an efficient algorithmic solution, for which convergence is proved. The versatility of our approach, both in terms of class of hypothesis or loss functions is demonstrated with real world classification and regression problems, for which we reach or surpass state-of-the-art results.

...read moreread less

Posted Content•

SGD Learns the Conjugate Kernel Class of the Network

[...]

Amit Daniely¹•Institutions (1)

Google¹

27 Feb 2017-arXiv: Learning

TL;DR: For log-depth networks, this article showed that SGD can learn constant degree polynomials in polynomial time, with bounded coefficients in the conjugate kernel space.

...read moreread less

Abstract: We show that the standard stochastic gradient decent (SGD) algorithm is guaranteed to learn, in polynomial time, a function that is competitive with the best function in the conjugate kernel space of the network, as defined in Daniely, Frostig and Singer. The result holds for log-depth networks from a rich family of architectures. To the best of our knowledge, it is the first polynomial-time guarantee for the standard neural network learning algorithm for networks of depth more that two. As corollaries, it follows that for neural networks of any depth between $2$ and $\log(n)$, SGD is guaranteed to learn, in polynomial time, constant degree polynomials with polynomially bounded coefficients. Likewise, it follows that SGD on large enough networks can learn any continuous function (not in polynomial time), complementing classical expressivity results.

...read moreread less

Proceedings Article•

Geometry of neural network loss surfaces via random matrix theory

[...]

Jeffrey Pennington¹, Yasaman Bahri¹•Institutions (1)

Google¹

06 Aug 2017

TL;DR: An analytical framework and a set of tools from random matrix theory that allow us to compute an approximation of the distribution of eigenvalues of the Hessian matrix at critical points of varying energy are introduced.

...read moreread less

Abstract: Understanding the geometry of neural network loss surfaces is important for the development of improved optimization algorithms and for building a theoretical understanding of why deep learning works. In this paper, we study the geometry in terms of the distribution of eigenvalues of the Hessian matrix at critical points of varying energy. We introduce an analytical framework and a set of tools from random matrix theory that allow us to compute an approximation of this distribution under a set of simplifying assumptions. The shape of the spectrum depends strongly on the energy and another key parameter, ϕ, which measures the ratio of parameters to data points. Our analysis predicts and numerical simulations support that for critical points of small index, the number of negative eigenvalues scales like the 3/2 power of the energy. We leave as an open problem an explanation for our observation that, in the context of a certain memorization task, the energy of minimizers is well-approximated by the function 1/2(1 - ϕ)2.

...read moreread less

Journal Article•DOI•

3D computer-generated holography by non-convex optimization

[...]

Jingzhao Zhang¹, Nicolas C. Pégard¹, Jingshan Zhong¹, Hillel Adesnik¹, Laura Waller¹ - Show less +1 more•Institutions (1)

University of California, Berkeley¹

20 Oct 2017

TL;DR: A new non-convex optimization algorithm is proposed that computes holograms by minimizing a custom cost function that is tailored to particular applications or leverages additional information like sample shape and nonlinearity.

...read moreread less

Abstract: 3D computer-generated holography uses a digital phase mask to shape the wavefront of a laser beam into a user-specified 3D intensity pattern. Algorithms take the target 3D intensity as input and compute the hologram that generates it. However, arbitrary patterns are generally infeasible, so solutions are approximate and often sub-optimal. Here, we propose a new non-convex optimization algorithm that computes holograms by minimizing a custom cost function that is tailored to particular applications (e.g., lithography, neural photostimulation) or leverages additional information like sample shape and nonlinearity. Our method is robust and accurate, and it out-performs existing algorithms.

...read moreread less

Journal Article•DOI•

Optimal Approximation for Submodular and Supermodular Optimization with Bounded Curvature

[...]

Maxim Sviridenko, Jan Vondrák, Justin Ward

16 May 2017-Mathematics of Operations Research

TL;DR: In this paper, a (1 − c/e)-approximation algorithm was proposed for the problem of maximizing a monotone increasing submodular function subject to a single matroid constraint.

...read moreread less

Abstract: We design new approximation algorithms for the problems of optimizing submodular and supermodular functions subject to a single matroid constraint. Specifically, we consider the case in which we wish to maximize a monotone increasing submodular function or minimize a monotone decreasing supermodular function with a bounded total curvature c. Intuitively, the parameter c represents how nonlinear a function f is: when c = 0, f is linear, while for c = 1, f may be an arbitrary monotone increasing submodular function. For the case of submodular maximization with total curvature c, we obtain a (1 − c/e)-approximation—the first improvement over the greedy algorithm of of Conforti and Cornuejols from 1984, which holds for a cardinality constraint, as well as a recent analogous result for an arbitrary matroid constraint. Our approach is based on modifications of the continuous greedy algorithm and nonoblivious local search, and allows us to approximately maximize the sum of a nonnegative, monotone increasing subm...

...read moreread less

Proceedings Article•

Sobolev Training for Neural Networks

[...]

Wojciech Marian Czarnecki¹, Simon Osindero², Max Jaderberg¹, Grzegorz Swirszcz¹, Razvan Pascanu¹ - Show less +1 more•Institutions (2)

Google¹, Yahoo!²

15 Jun 2017

TL;DR: Sobolev Training for neural networks is introduced, which is a method for incorporating target derivatives in addition the to target values while training, and results in models with higher accuracy and stronger generalisation on three distinct domains.

...read moreread less

Abstract: At the heart of deep learning we aim to use neural networks as function approximators - training them to produce outputs from inputs in emulation of a ground truth function or data creation process. In many cases we only have access to input-output pairs from the ground truth, however it is becoming more common to have access to derivatives of the target output with respect to the input -- for example when the ground truth function is itself a neural network such as in network compression or distillation. Generally these target derivatives are not computed, or are ignored. This paper introduces Sobolev Training for neural networks, which is a method for incorporating these target derivatives in addition the to target values while training. By optimising neural networks to not only approximate the function’s outputs but also the function’s derivatives we encode additional information about the target function within the parameters of the neural network. Thereby we can improve the quality of our predictors, as well as the data-efficiency and generalization capabilities of our learned function approximation. We provide theoretical justifications for such an approach as well as examples of empirical evidence on three distinct domains: regression on classical optimisation datasets, distilling policies of an agent playing Atari, and on large-scale applications of synthetic gradients. In all three domains the use of Sobolev Training, employing target derivatives in addition to target values, results in models with higher accuracy and stronger generalisation.

...read moreread less

Journal Article•DOI•

A Novel Improved Accuracy Function for Interval Valued Pythagorean Fuzzy Sets and Its Applications in the Decision-Making Process

[...]

Harish Garg¹•Institutions (1)

Thapar University¹

01 Dec 2017-International Journal of Intelligent Systems

TL;DR: An improved accuracy function for the ranking order of interval‐valued Pythagorean fuzzy sets (IVPFSs) is presented and multicriteria decision‐making method has been proposed for finding the desirable alternative(s).

...read moreread less

Abstract: The objective of this work is to present an improved accuracy function for the ranking order of interval-valued Pythagorean fuzzy sets (IVPFSs). Shortcomings of the existing score and accuracy functions in interval-valued Pythagorean environment have been overcome by the proposed accuracy function. In the proposed function, degree of hesitation between the element of IVPFS has been taken into account during the analysis. Based on it, multicriteria decision-making method has been proposed for finding the desirable alternative(s). Finally, an illustrative example for solving the decision-making problem has been presented to demonstrate application of the proposed approach.

...read moreread less

Proceedings Article•DOI•

Mimicking Word Embeddings using Subword RNNs

[...]

Yuval Pinter¹, Robert Guthrie, Jacob Eisenstein¹•Institutions (1)

Georgia Institute of Technology¹

01 Sep 2017

TL;DR: This article proposed MIMICK, an approach to generate OOV word embeddings compositionally by learning a function from spellings to distributional embedding, which does not require re-training on the original word embedding corpus.

...read moreread less

Abstract: Word embeddings improve generalization over lexical features by placing each word in a lower-dimensional space, using distributional information obtained from unlabeled data. However, the effectiveness of word embeddings for downstream NLP tasks is limited by out-of-vocabulary (OOV) words, for which embeddings do not exist. In this paper, we present MIMICK, an approach to generating OOV word embeddings compositionally, by learning a function from spellings to distributional embeddings. Unlike prior work, MIMICK does not require re-training on the original word embedding corpus; instead, learning is performed at the type level. Intrinsic and extrinsic evaluations demonstrate the power of this simple approach. On 23 languages, MIMICK improves performance over a word-based baseline for tagging part-of-speech and morphosyntactic attributes. It is competitive with (and complementary to) a supervised character-based model in low resource settings.

...read moreread less

Journal Article•DOI•

A unified approach to error bounds for structured convex optimization problems

[...]

Zirui Zhou¹, Anthony Man-Cho So¹•Institutions (1)

The Chinese University of Hong Kong¹

25 Jan 2017-Mathematical Programming

TL;DR: In this paper, the authors present a new framework for establishing error bounds for a class of structured convex optimization problems, in which the objective function is the sum of a smooth convex function and a general closed proper function.

...read moreread less

Abstract: Error bounds, which refer to inequalities that bound the distance of vectors in a test set to a given set by a residual function, have proven to be extremely useful in analyzing the convergence rates of a host of iterative methods for solving optimization problems. In this paper, we present a new framework for establishing error bounds for a class of structured convex optimization problems, in which the objective function is the sum of a smooth convex function and a general closed proper convex function. Such a class encapsulates not only fairly general constrained minimization problems but also various regularized loss minimization formulations in machine learning, signal processing, and statistics. Using our framework, we show that a number of existing error bound results can be recovered in a unified and transparent manner. To further demonstrate the power of our framework, we apply it to a class of nuclear-norm regularized loss minimization problems and establish a new error bound for this class under a strict complementarity-type regularity condition. We then complement this result by constructing an example to show that the said error bound could fail to hold without the regularity condition. We believe that our approach will find further applications in the study of error bounds for structured convex optimization problems.

...read moreread less

Journal Article•

Distributed Learning with Regularized Least Squares

[...]

Shaobo Lin, Xin Guo, Ding-Xuan Zhou

01 Jan 2017-Journal of Machine Learning Research

TL;DR: In this article, the authors study distributed learning with the least squares regularization scheme in a reproducing kernel Hilbert space (RKHS) and show that the global output function of this distributed learning is a good approximation to the algorithm processing the whole data in one single machine.

...read moreread less

Abstract: We study distributed learning with the least squares regularization scheme in a reproducing kernel Hilbert space (RKHS). By a divide-and-conquer approach, the algorithm partitions a data set into disjoint data subsets, applies the least squares regularization scheme to each data subset to produce an output function, and then takes an average of the individual output functions as a final global estimator or predictor. We show with error bounds in expectation in both the $L^2$-metric and RKHS-metric that the global output function of this distributed learning is a good approximation to the algorithm processing the whole data in one single machine. Our error bounds are sharp and stated in a general setting without any eigenfunction assumption. The analysis is achieved by a novel second order decomposition of operator differences in our integral operator approach. Even for the classical least squares regularization scheme in the RKHS associated with a general kernel, we give the best learning rate in the literature.

...read moreread less

Proceedings Article•

Multi-Information Source Optimization

[...]

Matthias Poloczek¹, Jialei Wang¹, Peter I. Frazier¹•Institutions (1)

Cornell University¹

01 Jan 2017

TL;DR: This work presents a novel algorithm that provides a rigorous mathematical treatment of the uncertainties arising from model discrepancies and noisy observations, and conducts an experimental evaluation that demonstrates that the method consistently outperforms other state-of-the-art techniques.

...read moreread less

Abstract: We consider Bayesian methods for multi-information source optimization (MISO), in which we seek to optimize an expensive-to-evaluate black-box objective function while also accessing cheaper but biased and noisy approximations ("information sources"). We present a novel algorithm that outperforms the state of the art for this problem by using a Gaussian process covariance kernel better suited to MISO than those used by previous approaches, and an acquisition function based on a one-step optimality analysis supported by efficient parallelization. We also provide a novel technique to guarantee the asymptotic quality of the solution provided by this algorithm. Experimental evaluations demonstrate that this algorithm consistently finds designs of higher value at less cost than previous approaches.

...read moreread less

Journal Article•DOI•

A non-planar two-loop three-point function beyond multiple polylogarithms

[...]

Andreas von Manteuffel¹, Andreas von Manteuffel², Lorenzo Tancredi³•Institutions (3)

Michigan State University¹, University of Mainz², Karlsruhe Institute of Technology³

01 Jun 2017-Journal of High Energy Physics

TL;DR: In this article, the analytic calculation of a non-planar three-point function which contributes to the two-loop amplitudes for gluon fusion through a massive top-quark loop is considered.

...read moreread less

Abstract: We consider the analytic calculation of a two-loop non-planar three-point function which contributes to the two-loop amplitudes for $$ t\overline{t} $$ production and γγ production in gluon fusion through a massive top-quark loop. All subtopology integrals can be written in terms of multiple polylogarithms over an irrational alphabet and we employ a new method for the integration of the differential equations which does not rely on the rationalization of the latter. The top topology integrals, instead, in spite of the absence of a massive three-particle cut, cannot be evaluated in terms of multiple polylogarithms and require the introduction of integrals over complete elliptic integrals and polylogarithms. We provide one-fold integral representations for the solutions and continue them analytically to all relevant regions of the phase space in terms of real functions, extracting all imaginary parts explicitly. The numerical evaluation of our expressions becomes straightforward in this way.

...read moreread less

Collapse