scispace - formally typeset
Search or ask a question

Showing papers on "Piecewise linear function published in 2019"


Journal Article
TL;DR: New upper and lower bounds on the VC-dimension of deep neural networks with the ReLU activation function are proved, and there is no dependence for piecewise-constant, linear dependence for Piecewise-linear, and no more than quadratic dependence for general piece wise-polynomial.
Abstract: We prove new upper and lower bounds on the VC-dimension of deep neural networks with the ReLU activation function. These bounds are tight for almost the entire range of parameters. Letting $W$ be the number of weights and $L$ be the number of layers, we prove that the VC-dimension is $O(W L \log(W))$, and provide examples with VC-dimension $\Omega( W L \log(W/L) )$. This improves both the previously known upper bounds and lower bounds. In terms of the number $U$ of non-linear units, we prove a tight bound $\Theta(W U)$ on the VC-dimension. All of these bounds generalize to arbitrary piecewise linear activation functions, and also hold for the pseudodimensions of these function classes. Combined with previous results, this gives an intriguing range of dependencies of the VC-dimension on depth for networks with different non-linearities: there is no dependence for piecewise-constant, linear dependence for piecewise-linear, and no more than quadratic dependence for general piecewise-polynomial.

209 citations


Journal ArticleDOI
TL;DR: The proposed ADPED algorithm can be adaptive to both day-ahead and intra-day operation under uncertainty and can make full use of historical prediction error distribution to reduce the influence of inaccurate forecast on the system operation.
Abstract: This paper proposes an approximate dynamic programming (ADP)-based approach for the economic dispatch (ED) of microgrid with distributed generations. The time-variant renewable generation, electricity price, and the power demand are considered as stochastic variables in this paper. An ADP-based ED (ADPED) algorithm is proposed to optimally operate the microgrid under these uncertainties. To deal with the uncertainties, Monte Carlo method is adopted to sample the training scenarios to give empirical knowledge to ADPED. The piecewise linear function (PLF) approximation with improved slope updating strategy is employed for the proposed method. With sufficient information extracted from these scenarios and embedded in the PLF function, the proposed ADPED algorithm can not only be used in day-ahead scheduling but also the intra-day optimization process. The algorithm can make full use of historical prediction error distribution to reduce the influence of inaccurate forecast on the system operation. Numerical simulations demonstrate the effectiveness of the proposed approach. The near-optimal decision obtained by ADPED is very close to the global optimality. And it can be adaptive to both day-ahead and intra-day operation under uncertainty.

198 citations


Posted Content
TL;DR: In this article, it is shown that the number of regions in a piecewise linear network grows linearly in the total number of neurons, far below the exponential upper bound, and that the average distance to the nearest region boundary at initialization scales like the inverse of the neurons.
Abstract: It is well-known that the expressivity of a neural network depends on its architecture, with deeper networks expressing more complex functions. In the case of networks that compute piecewise linear functions, such as those with ReLU activation, the number of distinct linear regions is a natural measure of expressivity. It is possible to construct networks with merely a single region, or for which the number of linear regions grows exponentially with depth; it is not clear where within this range most networks fall in practice, either before or after training. In this paper, we provide a mathematical framework to count the number of linear regions of a piecewise linear network and measure the volume of the boundaries between these regions. In particular, we prove that for networks at initialization, the average number of regions along any one-dimensional subspace grows linearly in the total number of neurons, far below the exponential upper bound. We also find that the average distance to the nearest region boundary at initialization scales like the inverse of the number of neurons. Our theory suggests that, even after training, the number of linear regions is far below exponential, an intuition that matches our empirical observations. We conclude that the practical expressivity of neural networks is likely far below that of the theoretical maximum, and that this gap can be quantified.

142 citations


Journal ArticleDOI
TL;DR: A new approach forming a dynamic linear battery model is proposed in this paper, which enables the application of the linear Kalman filter for SOC estimation and also avoids the usage of online parameter identification methods.
Abstract: The performance of model-based state-of-charge (SOC) estimation method relies on an accurate battery model. Nonlinear models are thus proposed to accurately describe the external characteristics of the lithium-ion battery. The nonlinear estimation algorithms and online parameter identification methods are needed to guarantee the accuracy of the model-based SOC estimation with nonlinear battery models. A new approach forming a dynamic linear battery model is proposed in this paper, which enables the application of the linear Kalman filter for SOC estimation and also avoids the usage of online parameter identification methods. With a moving window technology, partial least squares regression is able to establish a series of piecewise linear battery models automatically. One element state-space equation is then obtained to estimate the SOC from the linear Kalman filter. The experiments on a LiFePO4 battery prove the effectiveness of the proposed method compared with the extended Kalman filter with two resistance and capacitance equivalent circuit model and the adaptive unscented Kalman filter with least squares support vector machines.

133 citations


Journal ArticleDOI
TL;DR: In this paper, the authors studied the number and stability of limit cycles for planar piecewise linear (PWL) systems of node-saddle type with two linear regions.

127 citations


Journal ArticleDOI
TL;DR: In this paper, the authors dealt with the problem of limit cycles for a general planar piecewise linear differential system of saddle-focus type, using the Lienard-like canonical form with five parameters and dividing the total parameter space into several regions.

125 citations


Journal ArticleDOI
TL;DR: In this paper, the authors propose a generic and flexible methodology for non-parametric function estimation, in which they first estimate the number and locations of any features that may be present in the function and then estimate the function parametrically between each pair of neighbouring detected features.
Abstract: We propose a new, generic and flexible methodology for non-parametric function estimation, in which we first estimate the number and locations of any features that may be present in the function and then estimate the function parametrically between each pair of neighbouring detected features. Examples of features handled by our methodology include change points in the piecewise constant signal model, kinks in the piecewise linear signal model and other similar irregularities, which we also refer to as generalized change points. Our methodology works with only minor modifications across a range of generalized change point scenarios, and we achieve such a high degree of generality by proposing and using a new multiple generalized change point detection device, termed narrowest-over-threshold (NOT) detection. The key ingredient of the NOT method is its focus on the smallest local sections of the data on which the existence of a feature is suspected. For selected scenarios, we show the consistency and near optimality of the NOT algorithm in detecting the number and locations of generalized change points. The NOT estimators are easy to implement and rapid to compute. Importantly, the NOT approach is easy to extend by the user to tailor to their own needs. Our methodology is implemented in the R package not.

83 citations


Journal ArticleDOI
TL;DR: In this article, a complete error analysis is presented for fully discrete solutions of the subdiffusion equation with a time-dependent diffusion coefficient, obtained by the Galerkin finite element method with conforming piecewise linear finite elements in space and backward Euler convolution quadrature in time.
Abstract: In this work, a complete error analysis is presented for fully discrete solutions of the subdiffusion equation with a time-dependent diffusion coefficient, obtained by the Galerkin finite element method with conforming piecewise linear finite elements in space and backward Euler convolution quadrature in time. The regularity of the solutions of the subdiffusion model is proved for both nonsmooth initial data and incompatible source term. Optimal-order convergence of the numerical solutions is established using the proven solution regularity and a novel perturbation argument via freezing the diffusion coefficient at a fixed time. The analysis is supported by numerical experiments.

68 citations


Proceedings Article
24 May 2019
TL;DR: Every sublevel set of the loss function of a class of deep over-parameterized neural nets with piecewise linear activation functions is connected and unbounded, implying that the loss has no bad local valleys and all of its global minima are connected within a unique and potentially very large global valley.
Abstract: This paper shows that every sublevel set of the loss function of a class of deep over-parameterized neural nets with piecewise linear activation functions is connected and unbounded. This implies that the loss has no bad local valleys and all of its global minima are connected within a unique and potentially very large global valley.

60 citations


Proceedings Article
24 May 2019
TL;DR: The theory suggests that, even after training, the number of linear regions is far below exponential, an intuition that matches the empirical observations and concludes that the practical expressivity of neural networks is likely far below that of the theoretical maximum, and this gap can be quantified.
Abstract: It is well-known that the expressivity of a neural network depends on its architecture, with deeper networks expressing more complex functions. In the case of networks that compute piecewise linear functions, such as those with ReLU activation, the number of distinct linear regions is a natural measure of expressivity. It is possible to construct networks with merely a single region, or for which the number of linear regions grows exponentially with depth; it is not clear where within this range most networks fall in practice, either before or after training. In this paper, we provide a mathematical framework to count the number of linear regions of a piecewise linear network and measure the volume of the boundaries between these regions. In particular, we prove that for networks at initialization, the average number of regions along any one-dimensional subspace grows linearly in the total number of neurons, far below the exponential upper bound. We also find that the average distance to the nearest region boundary at initialization scales like the inverse of the number of neurons. Our theory suggests that, even after training, the number of linear regions is far below exponential, an intuition that matches our empirical observations. We conclude that the practical expressivity of neural networks is likely far below that of the theoretical maximum, and that this gap can be quantified.

52 citations


Journal ArticleDOI
TL;DR: Under some reasonable assumptions on the switching threshold and activation functions, by using the state-space decomposition method, contraction mapping theorem, and strictly diagonally dominant matrix theory, the number of equilibria as well as the stability/instability of theEquilibria is characterized.
Abstract: This paper is concerned with the multistability of switched neural networks with piecewise linear activation functions under state-dependent switching. Under some reasonable assumptions on the switching threshold and activation functions, by using the state-space decomposition method, contraction mapping theorem, and strictly diagonally dominant matrix theory, we can characterize the number of equilibria as well as analyze the stability/instability of the equilibria. More interesting, we can find that the switching threshold plays an important role for stable equilibria in the unsaturation regions of activation functions, and the number of stable equilibria of an $n$ -neuron switched neural network with state-dependent parameters increases to $3^{n}$ from $2^{n}$ in the conventional one. Furthermore, for two-neuron switched neural networks, the precise attraction basin of each stable equilibrium point can be figured out, and its boundary is composed of the stable manifolds of unstable equilibrium points and the switching lines. Two simulation examples are discussed in detail to substantiate the effectiveness of the theoretical analysis.

Journal ArticleDOI
TL;DR: In this article, the authors compare piecewise Cubic Hermite Interpolating Polynomial (PCHIP), cubic splines, and piecewise linear functions to approximate the aerodynamic coefficients of a generic small arms projectile.

Journal ArticleDOI
TL;DR: An implicit difference scheme with the truncation of order 2 − α ( 0 α 1 ) for time and order 2 for space is considered for the one-dimensional time-fractional Burgers equations and a novel iterative algorithm is proposed and implemented to solve the nonlinear systems.

Posted Content
TL;DR: GeoCert as mentioned in this paper finds the largest norm ball with a fixed center in a non-convex polytope, within which the output class of a given neural network with ReLU nonlinearities remains unchanged.
Abstract: We propose a novel method for computing exact pointwise robustness of deep neural networks for all convex $\ell_p$ norms. Our algorithm, GeoCert, finds the largest $\ell_p$ ball centered at an input point $x_0$, within which the output class of a given neural network with ReLU nonlinearities remains unchanged. We relate the problem of computing pointwise robustness of these networks to that of computing the maximum norm ball with a fixed center that can be contained in a non-convex polytope. This is a challenging problem in general, however we show that there exists an efficient algorithm to compute this for polyhedral complices. Further we show that piecewise linear neural networks partition the input space into a polyhedral complex. Our algorithm has the ability to almost immediately output a nontrivial lower bound to the pointwise robustness which is iteratively improved until it ultimately becomes tight. We empirically show that our approach generates distance lower bounds that are tighter compared to prior work, under moderate time constraints.


Journal ArticleDOI
TL;DR: Numerical studies illustrate that the SGMP is capable of eliminating the cell crossing noise with little extra computational effort and the extra cost ratio reduces as the number of the grid cells or the particles increases.

Journal ArticleDOI
TL;DR: In this paper, a Cartesian componentwise description of the covariant derivative of tangential tensor fields of any degree on Riemannian manifolds is derived, which allows to reformulate any vector- and tensor-valued surface PDE in a form suitable to be solved by established tools for scalar-valued surfaces PDEs.

Journal ArticleDOI
TL;DR: The proposed PL-SGDLR can be an effective tool for maintenance agencies during periodic survey of buildings and enhance the capability of logistic regression in dealing with spall detection as a complex pattern classification problem.
Abstract: Recognition of spalling on surface of concrete wall is crucial in building condition survey. Early detection of this form of defect can help to develop cost-effective rehabilitation methods for maintenance agencies. This study develops a method for automatic detection of spalled areas. The proposed approach includes image texture computation for image feature extraction and a piecewise linear stochastic gradient descent logistic regression (PL-SGDLR) used for pattern recognition. Image texture obtained from statistical properties of color channels, gray-level cooccurrence matrix, and gray-level run lengths is used as features to characterize surface condition of concrete wall. Based on these extracted features, PL-SGDLR is employed to categorize image samples into two classes of “nonspall” (negative class) and “spall” (positive class). Notably, PL-SGDLR is an extension of the standard logistic regression within which a linear decision surface is replaced by a piecewise linear one. This improvement can enhance the capability of logistic regression in dealing with spall detection as a complex pattern classification problem. Experiments with 1240 collected image samples show that PL-SGDLR can help to deliver a good detection accuracy (classification accuracy rate = 90.24%). To ease the model implementation, the PL-SGDLR program has been developed and compiled in MATLAB and Visual C# .NET. Thus, the proposed PL-SGDLR can be an effective tool for maintenance agencies during periodic survey of buildings.

Book ChapterDOI
17 Jul 2019
TL;DR: This paper introduces an efficient histogram-based algorithm for building gradient boosting ensembles of piecewise linear tree trees, which is independent of a linear transformation of individual features.
Abstract: One of the most popular machine learning algorithms is gradient boosting over decision trees. This algorithm achieves high quality out of the box combined with comparably low training and inference time. However, modern machine learning applications require machine learning algorithms, that can achieve better quality in less inference time, which leads to an exploration of grading boosting algorithms over other forms of base learners. One of such advanced base learners is a piecewise linear tree, which has linear functions as predictions in leaves. This paper introduces an efficient histogram-based algorithm for building gradient boosting ensembles of such trees. The algorithm was compared with modern gradient boosting libraries on publicly available datasets and achieved better quality with a decrease in ensemble size and inference time. It was proven, that algorithm is independent of a linear transformation of individual features.

Journal ArticleDOI
TL;DR: The new developed high-order strain NS-FEM is applied for static, free and forced vibration analyses of solids, and the numerical results support the theorems.

Journal ArticleDOI
TL;DR: A new backfitting algorithm whose iterations can be run in parallel is described, which (as far as the authors know) is the first of its kind and derived fast error rates for additive trend filtering estimates.
Abstract: We study additive models built with trend filtering, that is, additive models whose components are each regularized by the (discrete) total variation of their $k$th (discrete) derivative, for a chosen integer $k\geq0$. This results in $k$th degree piecewise polynomial components, (e.g., $k=0$ gives piecewise constant components, $k=1$ gives piecewise linear, $k=2$ gives piecewise quadratic, etc.). Analogous to its advantages in the univariate case, additive trend filtering has favorable theoretical and computational properties, thanks in large part to the localized nature of the (discrete) total variation regularizer that it uses. On the theory side, we derive fast error rates for additive trend filtering estimates, and show these rates are minimax optimal when the underlying function is additive and has component functions whose derivatives are of bounded variation. We also show that these rates are unattainable by additive smoothing splines (and by additive models built from linear smoothers, in general). On the computational side, we use backfitting, to leverage fast univariate trend filtering solvers; we also describe a new backfitting algorithm whose iterations can be run in parallel, which (as far as we can tell) is the first of its kind. Lastly, we present a number of experiments to examine the empirical performance of trend filtering.

Journal ArticleDOI
TL;DR: A convex optimization based algorithm for queue profile estimation in a connected vehicle environment, which can also be used for trajectory reconstruction, delay evaluation, etc, is proposed, and it is demonstrated that by considering a piecewise linear BoQ curve, the estimation accuracy can be improved by up to 16%.
Abstract: This paper proposes a convex optimization based algorithm for queue profile estimation in a connected vehicle environment, which can also be used for trajectory reconstruction, delay evaluation, etc. This algorithm generalizes the widely-adopted assumption of a linear back of queue (BoQ) curve to a piecewise linear BoQ curve to consider more practical scenarios. The piecewise linear BoQ curve is estimated via a convex optimization model, ensuring efficient computation. Moreover, this paper explicitly handles cases with low penetration rates and low sampling rates, as well as measurement noises. In addition, the proposed methodology is extended to an urban arterial, reusing the estimated departure information from the upstream intersections to further improve the estimation accuracy. Finally, two online implementation approaches are presented to perform real-time queue estimation. The proposed methodology is tested with two datasets: the Lankershim data set in the NGSIM project and the simulated dataset of Wehntalerstrasse, Zurich, Switzerland. Results show that the error is less than 1.5 cars in undersaturated scenarios and 5.2 cars in oversaturated scenarios if the penetration rates are larger than 0.1 and sampling rates are higher than 0.05 s−1. It is demonstrated that by considering a piecewise linear BoQ curve, the estimation accuracy can be improved by up to 16%. Incorporating flow successfully can also reduce the estimation error by up to 16%. Results further show that the proposed methodology is robust to measurement errors. It is finally shown that the proposed framework can be solved within a reasonable time (0.8 s), which is sufficient for most real-time applications.

Proceedings Article
07 Jul 2019
TL;DR: In this paper, the authors propose a new learning problem to encourage deep networks to have stable derivatives over larger regions, which is similar to ours in that they focus on networks with piecewise linear activation functions.
Abstract: Deep networks realize complex mappings that are often understood by their locally linear behavior at or around points of interest. For example, we use the derivative of the mapping with respect to its inputs for sensitivity analysis, or to explain (obtain coordinate relevance for) a prediction. One key challenge is that such derivatives are themselves inherently unstable. In this paper, we propose a new learning problem to encourage deep networks to have stable derivatives over larger regions. While the problem is challenging in general, we focus on networks with piecewise linear activation functions. Our algorithm consists of an inference step that identifies a region around a point where linear approximation is provably stable, and an optimization step to expand such regions. We propose a novel relaxation to scale the algorithm to realistic models. We illustrate our method with residual and recurrent networks on image and sequence datasets.

Journal ArticleDOI
TL;DR: In this paper, a general class of continuous non-monotonic piecewise linear activation functions is introduced, and it is shown that under some conditions, such n -neuron competitive neural networks have exactly 5 n equilibrium points.
Abstract: This paper addresses the issue of multistability for competitive neural networks. First, a general class of continuous non-monotonic piecewise linear activation functions is introduced. Then, based on the fixed point theorem, the contraction mapping theorem and the eigenvalue properties of strict diagonal dominance matrix, it is shown that under some conditions, such n -neuron competitive neural networks have exactly 5 n equilibrium points, among which 3 n equilibrium points are locally exponentially stable and the others are unstable. Moreover, it is revealed that the neural networks with non-monotonic piecewise linear activation functions introduced in this paper can have greater storage capacity than the ones with Mexican-hat-type activation function and nondecreasing saturated activation function. In addition, unlike most existing multistability results of neural networks with nondecreasing activation functions, the location of those obtained 3 n locally stable equilibrium points in this paper is more flexible. Finally, a numerical example is provided to illustrate and validate the theoretical findings via comprehensive computer simulations.

Journal ArticleDOI
TL;DR: In this article, a weak-equality algorithm is proposed to compute primitive moments needed in the updates of a Fokker-Planck collision operator and a reconstruction procedure that allows an efficient and accurate discretization of the diffusion term.
Abstract: We present a novel discontinuous Galerkin algorithm for the solution of a class of Fokker-Planck collision operators. These operators arise in many fields of physics, and our particular application is for kinetic plasma simulations. In particular, we focus on an operator often known as the `Lenard-Bernstein,' or `Dougherty,' operator. Several novel algorithmic innovations are reported. The concept of weak-equality is introduced and used to define weak-operators to compute primitive moments needed in the updates. Weak-equality is also used to determine a reconstruction procedure that allows an efficient and accurate discretization of the diffusion term. We show that when two integration by parts are used to construct the discrete weak-form, and finite velocity-space extents are accounted for, a scheme that conserves density, momentum and energy exactly is obtained. One novel feature is that the requirements of momentum and energy conservation lead to unique formulas to compute primitive moments. Careful definition of discretized moments also ensure that energy is conserved in the piecewise linear case, even though the $v^2$ term is not included in the basis-set used in the discretization. A series of benchmark problems are presented and show that the scheme conserves momentum and energy to machine precision. Empirical evidence also indicates that entropy is a non-decreasing function. The collision terms are combined with the Vlasov equation to study collisional Landau damping and plasma heating via magnetic pumping. We conclude with an outline of future work, in particular with some indications of how the algorithms presented here can be extended to use the Rosenbluth potentials to compute the drag and diffusion coefficients.

Journal ArticleDOI
15 Feb 2019-Symmetry
TL;DR: The authors developed algorithms for calculating the dynamic characteristics of discrete systems, i.e. areas of the existence of steady-state motion, areas of stability, capture band, and parameters of transients, and developed methods and algorithms for analyzing these characteristics.
Abstract: This paper deals with the methods for investigating the nonlinear dynamics of discrete chaotic systems (DCS) applied to piecewise linear systems of the third order. The paper proposes an approach to the analysis of the systems under research and their improvement. Thus, effective and mathematically sound methods for the analysis of nonlinear motions in the models under consideration are proposed. It makes it possible to obtain simple calculated relations for determining the basic dynamic characteristics of systems. Based on these methods, the authors developed algorithms for calculating the dynamic characteristics of discrete systems, i.e. areas of the existence of steady-state motion, areas of stability, capture band, and parameters of transients. By virtue of the developed methods and algorithms, the dynamic modes of several models of discrete phase synchronization systems can be analyzed. They are as follows: Pulsed and digital different orders, dual-ring systems of various types, including combined ones, and systems with cyclic interruption of auto-tuning. The efficiency of various devices for information processing, generation and stabilization could be increased by using the mentioned discrete synchronization systems on the grounds of the results of the analysis. We are now developing original software for analyzing the dynamic characteristics of various classes of discrete phase synchronization systems, based on the developed methods and algorithms.

Journal ArticleDOI
TL;DR: In this paper, a numerical method for stochastic time-fractional diffusion driven by additive fractionally integrated Gaussian noise is developed and analyzed, which involves two nonlocal terms in time.
Abstract: We develop and analyze a numerical method for stochastic time-fractional diffusion driven by additive fractionally integrated Gaussian noise. The model involves two nonlocal terms in time, i.e., a Caputo fractional derivative of order α ∈ (0, 1), and fractionally integrated Gaussian noise (with a Riemann-Liouville fractional integral of order γ ∈ [0, 1] in the front). The numerical scheme approximates the model in space by the standard Galerkin method with continuous piecewise linear finite elements and in time by the classical Gr¨unwald-Letnikov method (for both Caputo fractional derivative and Riemann-Liouville fractional integral), and the noise by the L 2 -projection. Sharp strong and weak convergence rates are established, using suitable nonsmooth data error estimates for the discrete solution operators for the deterministic inhomogeneous problem. One- and two-dimensional numerical results are presented to support the theoretical findings.

Journal ArticleDOI
TL;DR: In this article, the phase portraits in the Poincare disc of all piecewise linear continuous differential systems with two zones separated by a straight line having a unique finite singular point which is a node or a focus are classified.

Journal ArticleDOI
TL;DR: The integral version of the fractional Laplacian on a bounded domain is discretized by a Galerkin approximation based on piecewise linear functions on a quasi-uniform mesh by showing that the inverse of the associated stiffness matrix can be approximated by blockwise low-rank matrices at an exponential rate in the block rank.
Abstract: The integral version of the fractional Laplacian on a bounded domain is discretized by a Galerkin approximation based on piecewise linear functions on a quasiuniform mesh. We show that the inverse of the associated stiffness matrix can be approximated by blockwise low-rank matrices at an exponential rate in the block rank.

Journal ArticleDOI
TL;DR: This work studies how to change the maximum number of limit cycles of the discontinuous piecewise linear differential systems with only two pieces in function of the degree of the discontinuedness of the algebraic curve between the twolinear differential systems.
Abstract: We study how to change the maximum number of limit cycles of the discontinuous piecewise linear differential systems with only two pieces in function of the degree of the discontinuity of the algeb...