Showing papers on "Function (mathematics) published in 2018"

PDF

Open Access

Journal Article•DOI•

Sigmoid-weighted linear units for neural network function approximation in reinforcement learning.

[...]

Stefan Elfwing, Eiji Uchibe¹, Kenji Doya¹•Institutions (1)

Okinawa Institute of Science and Technology¹

01 Nov 2018-Neural Networks

TL;DR: This study proposes two activation functions for neural network function approximation in reinforcement learning: the sigmoid-weighted linear unit (SiLU) and its derivative function (dSiLU), and suggests the more traditional approach of using on-policy learning with eligibility traces, instead of experience replay, and softmax action selection can be competitive with DQN, without the need for a separate target network.

...read moreread less

696 citations

Proceedings Article•DOI•

Visualizing the Loss Landscape of Neural Nets

[...]

Hao Li¹, Zheng Xu¹, Gavin Taylor², Christoph Studer³, Tom Goldstein¹ - Show less +1 more•Institutions (3)

University of Maryland, College Park¹, United States Naval Academy², Cornell University³

15 Feb 2018

TL;DR: This paper explore the structure of neural loss functions and the effect of loss landscapes on generalization, using a range of visualization methods, and explore how network architecture affects the loss landscape, and how training parameters affect the shape of minimizers.

...read moreread less

Abstract: Neural network training relies on our ability to find "good" minimizers of highly non-convex loss functions. It is well known that certain network architecture designs (e.g., skip connections) produce loss functions that train easier, and well-chosen training parameters (batch size, learning rate, optimizer) produce minimizers that generalize better. However, the reasons for these differences, and their effect on the underlying loss landscape, is not well understood. In this paper, we explore the structure of neural loss functions, and the effect of loss landscapes on generalization, using a range of visualization methods. First, we introduce a simple "filter normalization" method that helps us visualize loss function curvature, and make meaningful side-by-side comparisons between loss functions. Then, using a variety of visualizations, we explore how network architecture affects the loss landscape, and how training parameters affect the shape of minimizers.

...read moreread less

554 citations

Journal Article•DOI•

On the ψ-Hilfer fractional derivative

[...]

J. Vanterler da C. Sousa¹, E. Capelas de Oliveira¹•Institutions (1)

State University of Campinas¹

01 Jul 2018-Communications in Nonlinear Science and Numerical Simulation

TL;DR: In this article, a new fractional derivative with respect to another function is introduced, the so-called ψ-Hilfer fractional derivatives, which can be used to obtain uniformly convergent sequence of function, uniformly continuous function and examples including the Mittag-Leffler function with one parameter.

...read moreread less

485 citations

Journal Article•DOI•

Harnessing Smoothness to Accelerate Distributed Optimization

[...]

Guannan Qu¹, Na Li¹•Institutions (1)

Harvard University¹

01 Sep 2018-IEEE Transactions on Control of Network Systems

TL;DR: It is shown that it is impossible for a class of distributed algorithms like DGD to achieve a linear convergence rate without using history information even if the objective function is strongly convex and smooth, and a novel gradient estimation scheme is proposed that uses history information to achieve fast and accurate estimation of the average gradient.

...read moreread less

Abstract: There has been a growing effort in studying the distributed optimization problem over a network. The objective is to optimize a global function formed by a sum of local functions, using only local computation and communication. The literature has developed consensus-based distributed (sub)gradient descent (DGD) methods and has shown that they have the same convergence rate $O(\frac{\log t}{\sqrt{t}})$ as the centralized (sub)gradient methods (CGD), when the function is convex but possibly nonsmooth. However, when the function is convex and smooth, under the framework of DGD, it is unclear how to harness the smoothness to obtain a faster convergence rate comparable to CGD's convergence rate. In this paper, we propose a distributed algorithm that, despite using the same amount of communication per iteration as DGD, can effectively harnesses the function smoothness and converge to the optimum with a rate of $O(\frac{1}{t})$ . If the objective function is further strongly convex, our algorithm has a linear convergence rate. Both rates match the convergence rate of CGD. The key step in our algorithm is a novel gradient estimation scheme that uses history information to achieve fast and accurate estimation of the average gradient. To motivate the necessity of history information, we also show that it is impossible for a class of distributed algorithms like DGD to achieve a linear convergence rate without using history information even if the objective function is strongly convex and smooth.

...read moreread less

440 citations

Proceedings Article•DOI•

Weakly Supervised Action Localization by Sparse Temporal Pooling Network

[...]

Phuc Xuan Nguyen¹, Bohyung Han², Ting Liu², Gautam Prasad³•Institutions (3)

University of California, Irvine¹, Google², Seoul National University³

18 Jun 2018

TL;DR: In this article, a weakly supervised temporal action localization algorithm is proposed, which learns from video-level class labels and predicts temporal intervals of human actions with no requirement of temporal localization annotations.

...read moreread less

Abstract: We propose a weakly supervised temporal action localization algorithm on untrimmed videos using convolutional neural networks. Our algorithm learns from video-level class labels and predicts temporal intervals of human actions with no requirement of temporal localization annotations. We design our network to identify a sparse subset of key segments associated with target actions in a video using an attention module and fuse the key segments through adaptive temporal pooling. Our loss function is comprised of two terms that minimize the video-level action classification error and enforce the sparsity of the segment selection. At inference time, we extract and score temporal proposals using temporal class activations and class-agnostic attentions to estimate the time intervals that correspond to target actions. The proposed algorithm attains state-of-the-art results on the THUMOS14 dataset and outstanding performance on ActivityNet1.3 even with its weak supervision.

...read moreread less

353 citations

Journal Article•DOI•

Non validity of index law in fractional calculus: A fractional differential operator with Markovian and non-Markovian properties

[...]

Abdon Atangana¹•Institutions (1)

University of the Free State¹

01 Sep 2018-Physica A-statistical Mechanics and Its Applications

TL;DR: In this paper, an analysis of evolutions equations generated by three fractional derivatives namely the Riemann-Liouville, Caputo-Fabrizio, and Atangana-Baleanu derivatives is presented.

...read moreread less

Abstract: We presented an analysis of evolutions equations generated by three fractional derivatives namely the Riemann–Liouville, Caputo–Fabrizio and the Atangana–Baleanu fractional derivatives. For each evolution equation, we presented the exact solution for time variable and studied the semigroup principle. The Riemann–Liouville fractional operator verifies the semigroup principle but the associate evolution equation does not. The Caputo–Fabrizio fractional derivative does not satisfy the semigroup principle but surprisingly, the exact solution satisfies very well all the principle of semigroup. However, the Atangana–Baleanu for small time is the stretched exponential derivative, which does not satisfy the semigroup as operators. For a large time the Atangana–Baleanu derivative is the same with Riemann–Liouville fractional derivative, thus satisfies semigroup principle as an operator. The solution of the associated evolution equation does not satisfy the semigroup principle as Riemann–Liouville. With the connection between semigroup theory and the Markovian processes, we found out that the Atangana–Baleanu fractional derivative has at the same time Markovian and non-Markovian processes. We concluded that, the fractional differential operator does not need to satisfy the semigroup properties as they portray the memory effects, which are not always Markovian. We presented the exact solutions of some evolutions equation using the Laplace transform. In addition to this, we presented the numerical solution of a nonlinear equation and show that, the model with the Atangana–Baleanu fractional derivative has random walk for small time. We also observed that, the Mittag-Leffler function is a good filter than the exponential and power law functions, which makes the Atangana–Baleanu fractional derivatives powerful mathematical tools to model complex real world problems.

...read moreread less

289 citations

Journal Article•DOI•

Adaptive Trajectory Tracking Control of a Fully Actuated Surface Vessel With Asymmetrically Constrained Input and Output

[...]

Zewei Zheng¹, Yanting Huang¹, Lihua Xie², Bing Zhu¹•Institutions (2)

Beihang University¹, Nanyang Technological University²

01 Sep 2018-IEEE Transactions on Control Systems and Technology

TL;DR: This brief addresses the trajectory tracking control problem of a fully actuated surface vessel subjected to asymmetrically constrained input and output with the proposed control, which will never be violated during operation, and all system states are bounded.

...read moreread less

Abstract: This brief addresses the trajectory tracking control problem of a fully actuated surface vessel subjected to asymmetrically constrained input and output. The controller design process is based on the backstepping technique. An asymmetric time-varying barrier Lyapunov function is proposed to address the output constraint. To overcome the difficulty of nondifferentiable input saturation, a smooth hyperbolic tangent function is employed to approximate the asymmetric saturation function. A Nussbaum function is introduced to compensate for the saturation approximation and ensure the system stability. The command filters and auxiliary systems are integrated with the control law to avoid the complicated calculation of the derivative of the virtual control in backstepping. In addition, the bounds of uncertainties and disturbances are estimated and compensated with an adaptive algorithm. With the proposed control, the constraints will never be violated during operation, and all system states are bounded. Simulation results and comparisons with standard method illustrate the effectiveness and advantages of the proposed controller.

...read moreread less

266 citations

Journal Article•DOI•

Solving Systems of Random Quadratic Equations via Truncated Amplitude Flow

[...]

Gang Wang¹, Georgios B. Giannakis¹, Yonina C. Eldar²•Institutions (2)

University of Minnesota¹, Technion – Israel Institute of Technology²

01 Feb 2018-IEEE Transactions on Information Theory

TL;DR: This paper presents a new algorithm, termed Truncated amplitude flow (TAF), to recover an unknown vector from a system of quadratic equations, and proves that as soon as the number of equations is on the order of theNumber of unknowns, TAF recovers the solution exactly.

...read moreread less

Abstract: This paper presents a new algorithm, termed truncated amplitude flow (TAF), to recover an unknown vector $ {x}$ from a system of quadratic equations of the form $y_{i}=|\langle {a}_{i}, {x}\rangle |^{2}$ , where $ {a}_{i}$ ’s are given random measurement vectors. This problem is known to be NP-hard in general. We prove that as soon as the number of equations is on the order of the number of unknowns, TAF recovers the solution exactly (up to a global unimodular constant) with high probability and complexity growing linearly with both the number of unknowns and the number of equations. Our TAF approach adapts the amplitude-based empirical loss function and proceeds in two stages. In the first stage, we introduce an orthogonality-promoting initialization that can be obtained with a few power iterations. Stage two refines the initial estimate by successive updates of scalable truncated generalized gradient iterations , which are able to handle the rather challenging nonconvex and nonsmooth amplitude-based objective function. In particular, when vectors $ {x}$ and ${a}_{i}$ ’s are real valued, our gradient truncation rule provably eliminates erroneously estimated signs with high probability to markedly improve upon its untruncated version. Numerical tests using synthetic data and real images demonstrate that our initialization returns more accurate and robust estimates relative to spectral initializations. Furthermore, even under the same initialization, the proposed amplitude-based refinement outperforms existing Wirtinger flow variants, corroborating the superior performance of TAF over state-of-the-art algorithms.

...read moreread less

266 citations

Journal Article•DOI•

Fractional derivatives with no-index law property: Application to chaos and statistics

[...]

Abdon Atangana¹, José Francisco Gómez-Aguilar•Institutions (1)

University of the Free State¹

01 Sep 2018-Chaos Solitons & Fractals

TL;DR: In this article, the authors showed that fractional operators obeying index law cannot model real world problems taking place in two states, more precisely they cannot describe phenomena taking place beyond their boundaries, as they are scaling invariant, more specifically their results show that, mathematical models based on these differential operators are not able to describe the inverse memory, meaning the full history of a physical problem cannot be described accurately using these derivatives with index law properties.

...read moreread less

Abstract: Recently fractional differential operators with non-index law properties have being recognized to have brought new weapons to accurately model real world problems particularly those with non-Markovian processes This present paper has two double aims, the first was to prove the inadequacy and failure of index law fractional calculus and secondly to show the application of fractional differential operators with no index law properties to statistic and dynamical systems To achieve this, we presented the historical construction of the concept of fractional differential operators from Leibniz to date Using a matrix based on the fractional differential operators, we proved that, fractional operators obeying index law cannot model real world problems taking place in two states, more precisely they cannot describe phenomena taking place beyond their boundaries, as they are scaling invariant, more precisely our results show that, mathematical models based on these differential operators are not able to describe the inverse memory, meaning the full history of a physical problem cannot be described accurately using these derivatives with index law properties On the other hand, we proved that, differential operators with no index-law properties are scaling variant, thus can describe situations taking place in different states and are able to localize the frontiers between two states We present the renewal process properties included in differential equation build out of the Atangana–Baleanu fractional derivative and counting process, which is connected to its inter-arrival time distribution Mittag–Leffler distribution which is the kernel of these derivatives We presented the connection of each derivative to a statistical family, for instance Riemann–Liouville–Caputo derivatives are connected to the Pareto statistic, which has no well-defined average when alpha is less than 1 corresponding to the interval where fractional operators mostly defined We established new properties and theorem for the Atangana–Baleanu derivative of an analytic function, in particular we proved that, they are convolution of the Mittag–Leffler function with the Riemann–Liouville–Caputo derivatives To see the accuracy of the non-index law derivative to in modeling real chaotic problems, 4 examples were considered, including the nine-term 3-D novel chaotic system, King Cobra chaotic system, the Ikeda delay system and chaotic chameleon system The numerical simulations show very interesting and novel attractors The king cobra system with the Atangana–Baleanu presented a very novel attractor where at the earlier time we observed a random walk and latter time we observed the real sharp of the cobra The Ikeda model with Atangana–Baleanu presented different attractors for each value of fractional order, in particular we obtain a square and circular explosions The results obtained in this paper show that, the future of modeling real world problem relies on fractional differential operators with non-index law property Our numerical results showed that, to not model physical problems with fractional differential operators with non-singular kernel and imposing index law in fractional calculus is rightfully living with closed eyes without ever taking a risk to open them

...read moreread less

261 citations

Proceedings Article•

Learning to Explain: An Information-Theoretic Perspective on Model Interpretation

[...]

Jianbo Chen¹, Le Song², Martin J. Wainwright¹, Michael I. Jordan¹•Institutions (2)

University of California, Berkeley¹, Georgia Institute of Technology²

21 Feb 2018

TL;DR: In this article, instancewise feature selection is introduced as a methodology for modelinterpretation, which is based on learning a function to extract a subset of features that are most informative for each given example.

...read moreread less

Abstract: We introduce instancewise feature selection as a methodology for model interpretation. Our method is based on learning a function to extract a subset of features that are most informative for each given example. This feature selector is trained to maximize the mutual information between selected features and the response variable, where the conditional distribution of the response variable given the input is the model to be explained. We develop an efficient variational approximation to the mutual information, and show the effectiveness of our method on a variety of synthetic and real data sets using both quantitative metrics and human evaluation.

...read moreread less

257 citations

Book Chapter•DOI•

The Contextual Loss for Image Transformation with Non-Aligned Data

[...]

Roey Mechrez¹, Itamar Talmi¹, Lihi Zelnik-Manor¹•Institutions (1)

Technion – Israel Institute of Technology¹

08 Sep 2018

TL;DR: ContextualLoss as mentioned in this paper is based on both context and semantics to compare regions with similar semantic meaning, while considering the context of the entire image, which can translate eyes-to-eyes and mouth-tomouth.

...read moreread less

Abstract: Feed-forward CNNs trained for image transformation problems rely on loss functions that measure the similarity between the generated image and a target image. Most of the common loss functions assume that these images are spatially aligned and compare pixels at corresponding locations. However, for many tasks, aligned training pairs of images will not be available. We present an alternative loss function that does not require alignment, thus providing an effective and simple solution for a new space of problems. Our loss is based on both context and semantics – it compares regions with similar semantic meaning, while considering the context of the entire image. Hence, for example, when transferring the style of one face to another, it will translate eyes-to-eyes and mouth-to-mouth. Our code can be found at https://www.github.com/roimehrez/contextualLoss.

...read moreread less

Proceedings Article•

Sensitivity and Generalization in Neural Networks: an Empirical Study

[...]

Roman Novak¹, Yasaman Bahri¹, Daniel A. Abolafia¹, Jeffrey Pennington¹, Jascha Sohl-Dickstein¹ - Show less +1 more•Institutions (1)

Google¹

15 Feb 2018

TL;DR: In this article, the authors investigate the tension between complexity and generalization through an extensive empirical exploration of two natural metrics of complexity related to sensitivity to input perturbations, and demonstrate how the input-output Jacobian norm can be predictive of generalization at the level of individual test points.

...read moreread less

Abstract: In practice it is often found that large over-parameterized neural networks generalize better than their smaller counterparts, an observation that appears to conflict with classical notions of function complexity, which typically favor smaller models. In this work, we investigate this tension between complexity and generalization through an extensive empirical exploration of two natural metrics of complexity related to sensitivity to input perturbations. Our experiments survey thousands of models with various fully-connected architectures, optimizers, and other hyper-parameters, as well as four different image classification datasets. We find that trained neural networks are more robust to input perturbations in the vicinity of the training data manifold, as measured by the norm of the input-output Jacobian of the network, and that it correlates well with generalization. We further establish that factors associated with poor generalization $-$ such as full-batch training or using random labels $-$ correspond to lower robustness, while factors associated with good generalization $-$ such as data augmentation and ReLU non-linearities $-$ give rise to more robust functions. Finally, we demonstrate how the input-output Jacobian norm can be predictive of generalization at the level of individual test points.

...read moreread less

Posted Content•

Neural networks as Interacting Particle Systems: Asymptotic convexity of the Loss Landscape and Universal Scaling of the Approximation Error

[...]

Grant M. Rotskoff, Eric Vanden-Eijnden

01 Jan 2018-arXiv: Machine Learning

TL;DR: A Law of Large Numbers and a Central Limit Theorem for the empirical distribution are established, which together show that the approximation error of the network universally scales as O(n-1) and the scale and nature of the noise introduced by stochastic gradient descent are quantified.

...read moreread less

Abstract: Neural networks, a central tool in machine learning, have demonstrated remarkable, high fidelity performance on image recognition and classification tasks. These successes evince an ability to accurately represent high dimensional functions, potentially of great use in computational and applied mathematics. That said, there are few rigorous results about the representation error and trainability of neural networks, as well as how they scale with the network size. Here we characterize both the error and scaling by reinterpreting the standard optimization algorithm used in machine learning applications, stochastic gradient descent, as the evolution of a particle system with interactions governed by a potential related to the objective or "loss" function used to train the network. We show that, when the number $n$ of parameters is large, the empirical distribution of the particles descends on a convex landscape towards a minimizer at a rate independent of $n$. We establish a Law of Large Numbers and a Central Limit Theorem for the empirical distribution, which together show that the approximation error of the network universally scales as $o(n^{-1})$. Remarkably, these properties do not depend on the dimensionality of the domain of the function that we seek to represent. Our analysis also quantifies the scale and nature of the noise introduced by stochastic gradient descent and provides guidelines for the step size and batch size to use when training a neural network. We illustrate our findings on examples in which we train neural network to learn the energy function of the continuous 3-spin model on the sphere. The approximation error scales as our analysis predicts in as high a dimension as $d=25$.

...read moreread less

Proceedings Article•

Essentially No Barriers in Neural Network Energy Landscape

[...]

Felix Draxler¹, Kambis Veschgini¹, Manfred Salmhofer¹, Fred A. Hamprecht¹•Institutions (1)

Heidelberg University¹

02 Mar 2018

TL;DR: Surprisingly, the paths between minima of recent neural network architectures on CIFAR10 and CIFar100 are essentially flat, which implies that neural networks have enough capacity for structural changes, or that these changes are small betweenMinima.

...read moreread less

Abstract: Training neural networks involves finding minima of a high-dimensional non-convex loss function. Knowledge of the structure of this energy landscape is sparse. Relaxing from linear interpolations, we construct continuous paths between minima of recent neural network architectures on CIFAR10 and CIFAR100. Surprisingly, the paths are essentially flat in both the training and test landscapes. This implies that neural networks have enough capacity for structural changes, or that these changes are small between minima. Also, each minimum has at least one vanishing Hessian eigenvalue in addition to those resulting from trivial invariance.

...read moreread less

Journal Article•DOI•

Output feedback stabilization of stochastic feedforward systems with unknown control coefficients and unknown output function

[...]

Quanxin Zhu¹, Hui Wang¹•Institutions (1)

Nanjing Normal University¹

01 Jan 2018-Automatica

TL;DR: A new form of K-filters with time-varying low-gain is introduced in this paper to compensate for unmeasurable/unknown states of stochastic feedforward systems with unknown control coefficients and unknown output function.

...read moreread less

Journal Article•DOI•

A new high performance method for determining the parameters of PV cells and modules based on guaranteed convergence particle swarm optimization

[...]

Hugo Nunes¹, José Pombo¹, Sílvio Mariano¹, M.R.A. Calado¹, J. A. M. Felippe de Souza¹ - Show less +1 more•Institutions (1)

University of Beira Interior¹

01 Feb 2018-Applied Energy

TL;DR: Comparisons with other published methods demonstrate that the proposed GCPSO method produces very good results in the extraction of the PV model parameters, which can find highly accurate solutions while demanding a reduced computational cost.

...read moreread less

Journal Article•DOI•

Joint state and fault estimation for time-varying nonlinear systems with randomly occurring faults and sensor saturations

[...]

Jun Hu¹, Jun Hu², Zidong Wang³, Huijun Gao⁴•Institutions (4)

University of New South Wales¹, Harbin University of Science and Technology², Brunel University London³, Harbin Institute of Technology⁴

01 Nov 2018-Automatica

TL;DR: The aim of this paper is to design a locally optimal time-varying estimator to simultaneously estimate both the system states and the fault signals such that, at each sampling instant, the covariance of the estimation error has an upper bound that is minimized by properly designing the estimator gain.

...read moreread less

Journal Article•DOI•

Provable approximation properties for deep neural networks

[...]

Uri Shaham¹, Alexander Cloninger¹, Ronald R. Coifman¹•Institutions (1)

Yale University¹

01 May 2018-Applied and Computational Harmonic Analysis

TL;DR: In this paper, a sparsely-connected depth-4 neural network is constructed to approximate a function f on a d-dimensional manifold and bound its error in approximating f. The size of the network depends on the dimension and curvature of the manifold.

...read moreread less

Journal Article•DOI•

Quadratic transformation inequalities for Gaussian hypergeometric function.

[...]

Tie-Hong Zhao¹, Miao-Kun Wang, Wen Zhang², Yu-Ming Chu•Institutions (2)

Hangzhou Normal University¹, Icahn School of Medicine at Mount Sinai²

01 Jan 2018-Journal of Inequalities and Applications

TL;DR: Several quadratic transformation inequalities for Gaussian hypergeometric function are presented and the analogs of duplication inequalities for the generalized Grötzsch ring function are found.

...read moreread less

Abstract: In the article, we present several quadratic transformation inequalities for Gaussian hypergeometric function and find the analogs of duplication inequalities for the generalized Grotzsch ring function

...read moreread less

Proceedings Article•DOI•

Towards Open-Set Identity Preserving Face Synthesis

[...]

Jianmin Bao¹, Dong Chen², Fang Wen², Houqiang Li¹, Gang Hua² - Show less +1 more•Institutions (2)

University of Science and Technology of China¹, Microsoft²

18 Jun 2018

TL;DR: The authors disentangle the identity and attributes of faces, and then recombine the identity vector and the attribute vector to synthesize a new face of the subject with the extracted attribute.

...read moreread less

Abstract: We propose a framework based on Generative Adversarial Networks to disentangle the identity and attributes of faces, such that we can conveniently recombine different identities and attributes for identity preserving face synthesis in open domains. Previous identity preserving face synthesis processes are largely confined to synthesizing faces with known identities that are already in the training dataset. To synthesize a face with identity outside the training dataset, our framework requires one input image of that subject to produce an identity vector, and any other input face image to extract an attribute vector capturing, e.g., pose, emotion, illumination, and even the background. We then recombine the identity vector and the attribute vector to synthesize a new face of the subject with the extracted attribute. Our proposed framework does not need to annotate the attributes of faces in any way. It is trained with an asymmetric loss function to better preserve the identity and stabilize the training process. It can also effectively leverage large amounts of unlabeled training face images to further improve the fidelity of the synthesized faces for subjects that are not presented in the labeled training face dataset. Our experiments demonstrate the efficacy of the proposed framework. We also present its usage in a much broader set of applications including face frontalization, face attribute morphing, and face adversarial example detection.

...read moreread less

Journal Article•DOI•

Nonlocal nonlinear Schrödinger equations and their soliton solutions

[...]

Metin Gürses¹, Aslı Pekcan²•Institutions (2)

Bilkent University¹, Hacettepe University²

04 May 2018-Journal of Mathematical Physics

TL;DR: In this article, the standard and non-local nonlinear Schrodinger (NLS) equations obtained from the coupled NLS system of equations (AKNS) were studied by using the Hirota bilinear method.

...read moreread less

Abstract: We study standard and nonlocal nonlinear Schrodinger (NLS) equations obtained from the coupled NLS system of equations (Ablowitz-Kaup-Newell-Segur (AKNS) equations) by using standard and nonlocal reductions, respectively. By using the Hirota bilinear method, we first find soliton solutions of the coupled NLS system of equations; then using the reduction formulas, we find the soliton solutions of the standard and nonlocal NLS equations. We give examples for particular values of the parameters and plot the function |q(t, x)|2 for the standard and nonlocal NLS equations.

...read moreread less

Posted Content•

Trainability and Accuracy of Neural Networks: An Interacting Particle System Approach

[...]

Grant M. Rotskoff, Eric Vanden-Eijnden

02 May 2018-arXiv: Machine Learning

TL;DR: Conditions for global convergence of the standard optimization algorithm used in machine learning applications, stochastic gradient descent (SGD), are established and the scaling of its error with the size of the network is quantified.

...read moreread less

Abstract: Neural networks, a central tool in machine learning, have demonstrated remarkable, high fidelity performance on image recognition and classification tasks. These successes evince an ability to accurately represent high dimensional functions, but rigorous results about the approximation error of neural networks after training are few. Here we establish conditions for global convergence of the standard optimization algorithm used in machine learning applications, stochastic gradient descent (SGD), and quantify the scaling of its error with the size of the network. This is done by reinterpreting SGD as the evolution of a particle system with interactions governed by a potential related to the objective or "loss" function used to train the network. We show that, when the number $n$ of units is large, the empirical distribution of the particles descends on a convex landscape towards the global minimum at a rate independent of $n$, with a resulting approximation error that universally scales as $O(n^{-1})$. These properties are established in the form of a Law of Large Numbers and a Central Limit Theorem for the empirical distribution. Our analysis also quantifies the scale and nature of the noise introduced by SGD and provides guidelines for the step size and batch size to use when training a neural network. We illustrate our findings on examples in which we train neural networks to learn the energy function of the continuous 3-spin model on the sphere. The approximation error scales as our analysis predicts in as high a dimension as $d=25$.

...read moreread less

Journal Article•DOI•

Deep relaxation: partial differential equations for optimizing deep neural networks

[...]

Pratik Chaudhari¹, Adam M. Oberman², Stanley Osher¹, Stefano Soatto¹, Guillaume Carlier³ - Show less +1 more•Institutions (3)

University of California, Los Angeles¹, McGill University², University of Paris³

01 Sep 2018-Research in the Mathematical Sciences

TL;DR: Stochastic homogenization theory allows us to better understand the convergence of the algorithm, and a stochastic control interpretation is used to prove that a modified algorithm converges faster than SGD in expectation.

...read moreread less

Abstract: Entropy-SGD is a first-order optimization method which has been used successfully to train deep neural networks. This algorithm, which was motivated by statistical physics, is now interpreted as gradient descent on a modified loss function. The modified, or relaxed, loss function is the solution of a viscous Hamilton–Jacobi partial differential equation (PDE). Experimental results on modern, high-dimensional neural networks demonstrate that the algorithm converges faster than the benchmark stochastic gradient descent (SGD). Well-established PDE regularity results allow us to analyze the geometry of the relaxed energy landscape, confirming empirical evidence. Stochastic homogenization theory allows us to better understand the convergence of the algorithm. A stochastic control interpretation is used to prove that a modified algorithm converges faster than SGD in expectation.

...read moreread less

Journal Article•DOI•

Plug-and-Play Unplugged: Optimization-Free Reconstruction Using Consensus Equilibrium

[...]

Gregery T. Buzzard, Stanley H. Chan, Suhas Sreehari, Charles A. Bouman

04 Sep 2018-Siam Journal on Imaging Sciences

TL;DR: In this paper, consensus equilibrium (CE) is introduced, which generalizes regularized inversion to include a much wider variety of both forward (or data fidelity) components and prior (or regularity) components without the need for either to be expressed using a cost function.

...read moreread less

Abstract: Regularized inversion methods for image reconstruction are used widely due to their tractability and their ability to combine complex physical sensor models with useful regularity criteria. Such methods motivated the recently developed Plug-and-Play prior method, which provides a framework to use advanced denoising algorithms as regularizers in inversion. However, the need to formulate regularized inversion as the solution to an optimization problem limits the expressiveness of possible regularity conditions and physical sensor models. In this paper, we introduce the idea of consensus equilibrium (CE), which generalizes regularized inversion to include a much wider variety of both forward (or data fidelity) components and prior (or regularity) components without the need for either to be expressed using a cost function. CE is based on the solution of a set of equilibrium equations that balance data fit and regularity. In this framework, the problem of MAP estimation in regularized inversion is replaced by...

...read moreread less

Journal Article•

Extension of Soft Set to Hypersoft Set, and then to Plithogenic Hypersoft Set

[...]

Florentin Smarandache¹•Institutions (1)

University of New Mexico¹

10 Dec 2018-Neutrosophic Sets and Systems

TL;DR: The soft set is generalized to the hypersoft set by transforming the function F into a multi-attributefunction, and the hybrids of Crisp, Fuzzy, Intuitionistic FBuzzy, Neutrosophic, and Plithogenic Hypersoft Set are introduced.

...read moreread less

Abstract: In this paper, we generalize the soft set to the hypersoft set by transforming the function F into a multi-attributefunction. Then we introduce the hybrids of Crisp, Fuzzy, Intuitionistic Fuzzy, Neutrosophic, and Plithogenic Hypersoft Set.

...read moreread less

Journal Article•DOI•

Discrete-Time Local Value Iteration Adaptive Dynamic Programming: Convergence Analysis

[...]

Qinglai Wei¹, Frank L. Lewis², Derong Liu³, Ruizhuo Song³, Hanquan Lin¹ - Show less +1 more•Institutions (3)

Chinese Academy of Sciences¹, University of Texas at Arlington², University of Science and Technology Beijing³

01 Jun 2018-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: Monotonicity of the local value iteration ADP algorithm is presented, which shows that under some special conditions of the initial value function and the learning rate function, the iterative value function can monotonically converge to the optimum.

...read moreread less

Abstract: In this paper, convergence properties are established for the newly developed discrete-time local value iteration adaptive dynamic programming (ADP) algorithm. The present local iterative ADP algorithm permits an arbitrary positive semidefinite function to initialize the algorithm. Employing a state-dependent learning rate function, for the first time, the iterative value function and iterative control law can be updated in a subset of the state space instead of the whole state space, which effectively relaxes the computational burden. A new analysis method for the convergence property is developed to prove that the iterative value functions will converge to the optimum under some mild constraints. Monotonicity of the local value iteration ADP algorithm is presented, which shows that under some special conditions of the initial value function and the learning rate function, the iterative value function can monotonically converge to the optimum. Finally, three simulation examples and comparisons are given to illustrate the performance of the developed algorithm.

...read moreread less

Journal Article•DOI•

Stochastic optimization using a trust-region method and random models

[...]

Ruobing Chen¹, Matt Menickelly², Katya Scheinberg²•Institutions (2)

Bosch¹, Lehigh University²

01 Jun 2018-Mathematical Programming

TL;DR: A trust-region model-based algorithm for solving unconstrained stochastic optimization problems that utilizes random models of an objective function f(x), obtained from stochastically observations of the function or its gradient.

...read moreread less

Abstract: In this paper, we propose and analyze a trust-region model-based algorithm for solving unconstrained stochastic optimization problems. Our framework utilizes random models of an objective function f(x), obtained from stochastic observations of the function or its gradient. Our method also utilizes estimates of function values to gauge progress that is being made. The convergence analysis relies on requirements that these models and these estimates are sufficiently accurate with high enough, but fixed, probability. Beyond these conditions, no assumptions are made on how these models and estimates are generated. Under these general conditions we show an almost sure global convergence of the method to a first order stationary point. In the second part of the paper, we present examples of generating sufficiently accurate random models under biased or unbiased noise assumptions. Lastly, we present some computational results showing the benefits of the proposed method compared to existing approaches that are based on sample averaging or stochastic gradients.

...read moreread less

Proceedings Article•

Learning Deep Disentangled Embeddings With the F-Statistic Loss

[...]

Karl Ridgeway¹, Michael C. Mozer¹•Institutions (1)

University of Colorado Boulder¹

01 Feb 2018

TL;DR: A new paradigm for discovering disentangled representations of class structure is proposed and a novel loss function based on the $F$ statistic is proposed, which describes the separation of two or more distributions.

...read moreread less

Abstract: Deep-embedding methods aim to discover representations of a domain that make explicit the domain's class structure and thereby support few-shot learning. Disentangling methods aim to make explicit compositional or factorial structure. We combine these two active but independent lines of research and propose a new paradigm suitable for both goals. We propose and evaluate a novel loss function based on the $F$ statistic, which describes the separation of two or more distributions. By ensuring that distinct classes are well separated on a subset of embedding dimensions, we obtain embeddings that are useful for few-shot learning. By not requiring separation on all dimensions, we encourage the discovery of disentangled representations. Our embedding method matches or beats state-of-the-art, as evaluated by performance on recall@$k$ and few-shot learning tasks. Our method also obtains performance superior to a variety of alternatives on disentangling, as evaluated by two key properties of a disentangled representation: modularity and explicitness. The goal of our work is to obtain more interpretable, manipulable, and generalizable deep representations of concepts and categories.

...read moreread less

Posted Content•

For Fixed Control Parameters the Quantum Approximate Optimization Algorithm's Objective Function Value Concentrates for Typical Instances

[...]

Fernando G. S. L. Brandão, Michael Broughton, Edward Farhi, Sam Gutmann, Hartmut Neven - Show less +1 more

11 Dec 2018-arXiv: Quantum Physics

TL;DR: Findings suggest ways to run the QAOA that reduce or eliminate the use of the outer loop optimization and may allow us to find good solutions with fewer calls to the quantum computer.

...read moreread less

Abstract: The Quantum Approximate Optimization Algorithm, QAOA, uses a shallow depth quantum circuit to produce a parameter dependent state. For a given combinatorial optimization problem instance, the quantum expectation of the associated cost function is the parameter dependent objective function of the QAOA. We demonstrate that if the parameters are fixed and the instance comes from a reasonable distribution then the objective function value is concentrated in the sense that typical instances have (nearly) the same value of the objective function. This applies not just for optimal parameters as the whole landscape is instance independent. We can prove this is true for low depth quantum circuits for instances of MaxCut on large 3-regular graphs. Our results generalize beyond this example. We support the arguments with numerical examples that show remarkable concentration. For higher depth circuits the numerics also show concentration and we argue for this using the Law of Large Numbers. We also observe by simulation that if we find parameters which result in good performance at say 10 bits these same parameters result in good performance at say 24 bits. These findings suggest ways to run the QAOA that reduce or eliminate the use of the outer loop optimization and may allow us to find good solutions with fewer calls to the quantum computer.

...read moreread less

Journal Article•DOI•

Granular Differentiability of Fuzzy-Number-Valued Functions

[...]

Mehran Mazandarani¹, Naser Pariz¹, Ali Vahidian Kamyad¹•Institutions (1)

Ferdowsi University of Mashhad¹

01 Feb 2018-IEEE Transactions on Fuzzy Systems

TL;DR: It is proved that the result of each of the four basic operations on fuzzy numbers introduced based on the proposed approach leads to a fuzzy number, and the condition for the existence of the granular derivative of a fuzzy function is provided by a theorem.

...read moreread less

Abstract: In this paper, using the concept of horizontal membership functions, a new definition of fuzzy derivative called granular derivative is proposed based on granular difference. Moreover, a new definition of fuzzy integral called granular integral is defined, and its relation with the granular derivative is given. A new definition of a metric—granular metric—on the space of type-1 fuzzy numbers, and a concept of continuous fuzzy functions are also presented. Restrictions associated to previous approaches—Hukuhara differentiability, strongly generalized Hukuhara differentiability, generalized Hukuhara differentiability, generalized differentiability, Zadeh's extension principle, and fuzzy differential inclusions—dealing with fuzzy differential equations (FDEs) are expressed. It is shown that the proposed approach does not have the drawbacks of the previous approaches. It is also demonstrated how this approach enables researchers to solve FDEs more conveniently than ever before. Moreover, we showed that this approach does not necessitate that the diameter of the fuzzy function be monotonic. It is also proved that the result of each of the four basic operations on fuzzy numbers introduced based on the proposed approach leads to a fuzzy number. Moreover, the condition for the existence of the granular derivative of a fuzzy function is provided by a theorem. Additionally, by two examples, it is shown that the existence of the granular derivative of a fuzzy function does not imply the existence of the generalized Hukuhara differentiability of the fuzzy function, and vice versa. The terms doubling property and unnatural behavior in modeling phenomenon are also introduced. Furthermore, using some examples, the paper proceeds to elaborate on the efficiency and effectiveness of the proposed approach. Moreover, as an application of the proposed approach, the response of Boeing 747 to impulsive elevator input is obtained in the presence of uncertain initial conditions and parameters.

...read moreread less

Collapse