scispace - formally typeset
Search or ask a question
Author

Yeonjong Shin

Bio: Yeonjong Shin is an academic researcher from Brown University. The author has contributed to research in topics: Artificial neural network & Gradient descent. The author has an hindex of 11, co-authored 24 publications receiving 432 citations. Previous affiliations of Yeonjong Shin include Ohio State University & University of Utah.

Papers
More filters
Journal ArticleDOI
TL;DR: In this article, a randomized asymmetric initialization procedure was proposed to prevent the dying ReLU in deep ReLU networks, which can effectively prevent the network from dying in probability as the depth goes to infinity.
Abstract: The dying ReLU refers to the problem when ReLU neurons become inactive and only output 0 for any input. There are many empirical and heuristic explanations of why ReLU neurons die. However, little is known about its theoretical analysis. In this paper, we rigorously prove that a deep ReLU network will eventually die in probability as the depth goes to infinite. Several methods have been proposed to alleviate the dying ReLU. Perhaps, one of the simplest treatments is to modify the initialization procedure. One common way of initializing weights and biases uses symmetric probability distributions, which suffers from the dying ReLU. We thus propose a new initialization procedure, namely, a randomized asymmetric initialization. We prove that the new initialization can effectively prevent the dying ReLU. All parameters required for the new initialization are theoretically designed. Numerical examples are provided to demonstrate the effectiveness of the new initialization procedure.

208 citations

Journal ArticleDOI
TL;DR: This is the first theoretical work that shows the consistency of PINNs, and it is shown that the sequence of minimizers strongly converges to the PDE solution in $C^0$.
Abstract: Physics informed neural networks (PINNs) are deep learning based techniques for solving partial differential equations (PDEs) encounted in computational science and engineering. Guided by data and physical laws, PINNs find a neural network that approximates the solution to a system of PDEs. Such a neural network is obtained by minimizing a loss function in which any prior knowledge of PDEs and data are encoded. Despite its remarkable empirical success in one, two or three dimensional problems, there is little theoretical justification for PINNs. As the number of data grows, PINNs generate a sequence of minimizers which correspond to a sequence of neural networks. We want to answer the question: Does the sequence of minimizers converge to the solution to the PDE? We consider two classes of PDEs: linear second-order elliptic and parabolic. By adapting the Schauder approach and the maximum principle, we show that the sequence of minimizers strongly converges to the PDE solution in $C^0$. Furthermore, we show that if each minimizer satisfies the initial/boundary conditions, the convergence mode becomes $H^1$. Computational examples are provided to illustrate our theoretical findings. To the best of our knowledge, this is the first theoretical work that shows the consistency of PINNs.

153 citations

Journal ArticleDOI
TL;DR: This paper rigorously proves that a deep ReLU network will eventually die in probability as the depth goes to infinite, and proposes a new initialization procedure, namely, a randomized asymmetric initialization, which can effectively prevent the dying ReLU.
Abstract: The dying ReLU refers to the problem when ReLU neurons become inactive and only output 0 for any input. There are many empirical and heuristic explanations of why ReLU neurons die. However, little is known about its theoretical analysis. In this paper, we rigorously prove that a deep ReLU network will eventually die in probability as the depth goes to infinite. Several methods have been proposed to alleviate the dying ReLU. Perhaps, one of the simplest treatments is to modify the initialization procedure. One common way of initializing weights and biases uses symmetric probability distributions, which suffers from the dying ReLU. We thus propose a new initialization procedure, namely, a randomized asymmetric initialization. We prove that the new initialization can effectively prevent the dying ReLU. All parameters required for the new initialization are theoretically designed. Numerical examples are provided to demonstrate the effectiveness of the new initialization procedure.

100 citations

Posted Content
03 Apr 2020
TL;DR: This is the first theoretical work that shows the consistency of the PINNs methodology and shows that if each minimizer satisfies the initial/boundary conditions, the convergence mode can be improved to $H^1$.
Abstract: Physics informed neural networks (PINNs) are deep learning based techniques for solving partial differential equations (PDEs). Guided by data and physical laws, PINNs find a neural network that approximates the solution to a system of PDEs. Such a neural network is obtained by minimizing a loss function in which any prior knowledge of PDEs and data are encoded. Despite its remarkable empirical success, there is little theoretical justification for PINNs. In this paper, we establish a mathematical foundation of the PINNs methodology. As the number of data grows, PINNs generate a sequence of minimizers which correspond to a sequence of neural networks. We want to answer the question: Does the sequence of minimizers converge to the solution to the PDE? This question is also related to the generalization of PINNs. We consider two classes of PDEs: elliptic and parabolic. By adapting the Schuader approach, we show that the sequence of minimizers strongly converges to the PDE solution in $L^2$. Furthermore, we show that if each minimizer satisfies the initial/boundary conditions, the convergence mode can be improved to $H^1$. Computational examples are provided to illustrate our theoretical findings. To the best of our knowledge, this is the first theoretical work that shows the consistency of the PINNs methodology.

66 citations

Journal ArticleDOI
TL;DR: This paper presents a quasi-optimal sample set for ordinary least squares (OLS) regression, and presents its efficient implementation via a greedy algorithm, along with several numerical examples to demonstrate its efficacy.
Abstract: In this paper we present a quasi-optimal sample set for ordinary least squares (OLS) regression. The quasi-optimal set is designed in such a way that, for a given number of samples, it can deliver the regression result as close as possible to the result obtained by a (much) larger set of candidate samples. The quasi-optimal set is determined by maximizing a quantity measuring the mutual column orthogonality and the determinant of the model matrix. This procedure is nonadaptive, in the sense that it does not depend on the sample data. This is useful in practice, as it allows one to determine, prior to the potentially expensive data collection procedure, where to sample the underlying system. In addition to presenting the theoretical motivation of the quasi-optimal set, we also present its efficient implementation via a greedy algorithm, along with several numerical examples to demonstrate its efficacy. Since the quasi-optimal set allows one to obtain a near optimal regression result under any affordable nu...

51 citations


Cited by
More filters
Journal ArticleDOI
01 Jun 2021
TL;DR: Some of the prevailing trends in embedding physics into machine learning are reviewed, some of the current capabilities and limitations are presented and diverse applications of physics-informed learning both for forward and inverse problems, including discovering hidden physics and tackling high-dimensional problems are discussed.
Abstract: Despite great progress in simulating multiphysics problems using the numerical discretization of partial differential equations (PDEs), one still cannot seamlessly incorporate noisy data into existing algorithms, mesh generation remains complex, and high-dimensional problems governed by parameterized PDEs cannot be tackled. Moreover, solving inverse problems with hidden physics is often prohibitively expensive and requires different formulations and elaborate computer codes. Machine learning has emerged as a promising alternative, but training deep neural networks requires big data, not always available for scientific problems. Instead, such networks can be trained from additional information obtained by enforcing the physical laws (for example, at random points in the continuous space-time domain). Such physics-informed learning integrates (noisy) data and mathematical models, and implements them through neural networks or other kernel-based regression networks. Moreover, it may be possible to design specialized network architectures that automatically satisfy some of the physical invariants for better accuracy, faster training and improved generalization. Here, we review some of the prevailing trends in embedding physics into machine learning, present some of the current capabilities and limitations and discuss diverse applications of physics-informed learning both for forward and inverse problems, including discovering hidden physics and tackling high-dimensional problems. The rapidly developing field of physics-informed learning integrates data and mathematical models seamlessly, enabling accurate inference of realistic and high-dimensional multiphysics problems. This Review discusses the methodology and provides diverse examples and an outlook for further developments.

1,114 citations

Journal ArticleDOI
TL;DR: In this article, the authors consider the problem of finding the best approximation operator for a given function, and the uniqueness of best approximations and the existence of best approximation operators.
Abstract: Preface 1. The approximation problem and existence of best approximations 2. The uniqueness of best approximations 3. Approximation operators and some approximating functions 4. Polynomial interpolation 5. Divided differences 6. The uniform convergence of polynomial approximations 7. The theory of minimax approximation 8. The exchange algorithm 9. The convergence of the exchange algorithm 10. Rational approximation by the exchange algorithm 11. Least squares approximation 12. Properties of orthogonal polynomials 13. Approximation of periodic functions 14. The theory of best L1 approximation 15. An example of L1 approximation and the discrete case 16. The order of convergence of polynomial approximations 17. The uniform boundedness theorem 18. Interpolation by piecewise polynomials 19. B-splines 20. Convergence properties of spline approximations 21. Knot positions and the calculation of spline approximations 22. The Peano kernel theorem 23. Natural and perfect splines 24. Optimal interpolation Appendices Index.

841 citations

Journal ArticleDOI
TL;DR: A new deep neural network called DeepONet can lean various mathematical operators with small generalization error and can learn various explicit operators, such as integrals and fractional Laplacians, as well as implicit operators that represent deterministic and stochastic differential equations.
Abstract: It is widely known that neural networks (NNs) are universal approximators of continuous functions. However, a less known but powerful result is that a NN with a single hidden layer can accurately approximate any nonlinear continuous operator. This universal approximation theorem of operators is suggestive of the structure and potential of deep neural networks (DNNs) in learning continuous operators or complex systems from streams of scattered data. Here, we thus extend this theorem to DNNs. We design a new network with small generalization error, the deep operator network (DeepONet), which consists of a DNN for encoding the discrete input function space (branch net) and another DNN for encoding the domain of the output functions (trunk net). We demonstrate that DeepONet can learn various explicit operators, such as integrals and fractional Laplacians, as well as implicit operators that represent deterministic and stochastic differential equations. We study different formulations of the input function space and its effect on the generalization error for 16 different diverse applications. Neural networks are known as universal approximators of continuous functions, but they can also approximate any mathematical operator (mapping a function to another function), which is an important capability for complex systems such as robotics control. A new deep neural network called DeepONet can lean various mathematical operators with small generalization error.

675 citations

Journal ArticleDOI
TL;DR: This work proposes deep operator networks (DeepONets) to learn operators accurately and efficiently from a relatively small dataset, and demonstrates that DeepONet significantly reduces the generalization error compared to the fully-connected networks.
Abstract: While it is widely known that neural networks are universal approximators of continuous functions, a less known and perhaps more powerful result is that a neural network with a single hidden layer can approximate accurately any nonlinear continuous operator. This universal approximation theorem is suggestive of the potential application of neural networks in learning nonlinear operators from data. However, the theorem guarantees only a small approximation error for a sufficient large network, and does not consider the important optimization and generalization errors. To realize this theorem in practice, we propose deep operator networks (DeepONets) to learn operators accurately and efficiently from a relatively small dataset. A DeepONet consists of two sub-networks, one for encoding the input function at a fixed number of sensors $x_i, i=1,\dots,m$ (branch net), and another for encoding the locations for the output functions (trunk net). We perform systematic simulations for identifying two types of operators, i.e., dynamic systems and partial differential equations, and demonstrate that DeepONet significantly reduces the generalization error compared to the fully-connected networks. We also derive theoretically the dependence of the approximation error in terms of the number of sensors (where the input function is defined) as well as the input function type, and we verify the theorem with computational results. More importantly, we observe high-order error convergence in our computational tests, namely polynomial rates (from half order to fourth order) and even exponential convergence with respect to the training dataset size.

324 citations

Journal ArticleDOI
TL;DR: The proposed XPINN method is the generalization of PINN and cPINN approaches, both in terms of applicability as well as domain decomposition approach, which efficiently lends itself to parallelized computation.
Abstract: We propose a generalized space-time domain decomposition framework for the physics-informed neural networks (PINNs) to solve nonlinear partial differential equations (PDEs) on arbitrary complex-geometry domains. The proposed framework, named eXtended PINNs (XPINNs), further pushes the boundaries of both PINNs as well as conservative PINNs (cPINNs), which is a recently proposed domain decomposition approach in the PINN framework tailored to conservation laws. Compared to PINN, the XPINN method has large representation and parallelization capacity due to the inherent property of deployment of multiple neural networks in the smaller subdomains. Unlike cPINN, XPINN can be extended to any type of PDEs. Moreover, the domain can be decomposed in any arbitrary way (in space and time), which is not possible in cPINN. Thus, XPINN offers both space and time parallelization, thereby reducing the training cost more effectively. In each subdomain, a separate neural network is employed with optimally selected hyperparameters, e.g., depth/width of the network, number and location of residual points, activation function, optimization method, etc. A deep network can be employed in a subdomain with complex solution, whereas a shallow neural network can be used in a subdomain with relatively simple and smooth solutions. We demonstrate the versatility of XPINN by solving both forward and inverse PDE problems, ranging from one-dimensional to three-dimensional problems, from timedependent to time-independent problems, and from continuous to discontinuous problems, which clearly shows that the XPINN method is promising in many practical problems. The proposed XPINN method is the generalization of PINN and cPINN approaches, both in terms of applicability as well as domain decomposition approach, which efficiently lends itself to parallelized computation. The XPINN code will be available on https://github.com/AmeyaJagtap/XPINNs.

308 citations