Author
Vincent Dubourg
Bio: Vincent Dubourg is an academic researcher from International Facility Management Association. The author has contributed to research in topic(s): Kriging & Reliability (statistics). The author has an hindex of 8, co-authored 16 publication(s) receiving 63374 citation(s).
Papers
More filters
Journal Article•
TL;DR: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems, focusing on bringing machine learning to non-specialists using a general-purpose high-level language.
Abstract: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net.
33,540 citations
Posted Content•
TL;DR: Scikit-learn as mentioned in this paper is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems.
Abstract: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from this http URL.
28,898 citations
TL;DR: In this paper, the authors propose to use a Kriging surrogate for the performance function as a means to build a quasi-optimal importance sampling density, which can be applied to analytical and finite element reliability problems and proves efficient up to 100 basic random variables.
Abstract: Structural reliability methods aim at computing the probability of failure of systems with respect to some prescribed performance functions. In modern engineering such functions usually resort to running an expensive-to-evaluate computational model (e.g. a finite element model). In this respect simulation methods which may require 10 3 − 6 runs cannot be used directly. Surrogate models such as quadratic response surfaces, polynomial chaos expansions or Kriging (which are built from a limited number of runs of the original model) are then introduced as a substitute for the original model to cope with the computational cost. In practice it is almost impossible to quantify the error made by this substitution though. In this paper we propose to use a Kriging surrogate for the performance function as a means to build a quasi-optimal importance sampling density. The probability of failure is eventually obtained as the product of an augmented probability computed by substituting the metamodel for the original performance function and a correction term which ensures that there is no bias in the estimation even if the metamodel is not fully accurate. The approach is applied to analytical and finite element reliability problems and proves efficient up to 100 basic random variables.
329 citations
TL;DR: The aim of the present paper is to develop a strategy for solving reliability-based design optimization (RBDO) problems that remains applicable when the performance models are expensive to evaluate.
Abstract: The aim of the present paper is to develop a strategy for solving reliability-based design optimization (RBDO) problems that remains applicable when the performance models are expensive to evaluate. Starting with the premise that simulation-based approaches are not affordable for such problems, and that the most-probable-failure-point-based approaches do not permit to quantify the error on the estimation of the failure probability, an approach based on both metamodels and advanced simulation techniques is explored. The kriging metamodeling technique is chosen in order to surrogate the performance functions because it allows one to genuinely quantify the surrogate error. The surrogate error onto the limit-state surfaces is propagated to the failure probabilities estimates in order to provide an empirical error measure. This error is then sequentially reduced by means of a population-based adaptive refinement technique until the kriging surrogates are accurate enough for reliability analysis. This original refinement strategy makes it possible to add several observations in the design of experiments at the same time. Reliability and reliability sensitivity analyses are performed by means of the subset simulation technique for the sake of numerical efficiency. The adaptive surrogate-based strategy for reliability estimation is finally involved into a classical gradient-based optimization algorithm in order to solve the RBDO problem. The kriging surrogates are built in a so-called augmented reliability space thus making them reusable from one nested RBDO iteration to the other. The strategy is compared to other approaches available in the literature on three academic examples in the field of structural mechanics.
303 citations
Dissertation•
05 Dec 2011
TL;DR: This manuscript proposes a surrogate-based strategy where the limit-state function is progressively replaced by a Kriging meta-model, a probabilistic design approach aimed at considering the uncertainty attached to the system of interest in order to provide optimal and safe solutions.
Abstract: This thesis is a contribution to the resolution of the reliability-based design optimization problem. This probabilistic design approach is aimed at considering the uncertainty attached to the system of interest in order to provide optimal and safe solutions. The safety level is quantified in the form of a probability of failure. Then, the optimization problem consists in ensuring that this failure probability remains less than a threshold specified by the stakeholders. The resolution of this problem requires a high number of calls to the limit-state design function underlying the reliability analysis. Hence it becomes cumbersome when the limit-state function involves an expensive-to-evaluate numerical model (e.g. a finite element model). In this context, this manuscript proposes a surrogate-based strategy where the limit-state function is progressively replaced by a Kriging meta-model. A special interest has been given to quantifying, reducing and eventually eliminating the error introduced by the use of this meta-model instead of the original model. The proposed methodology is applied to the design of geometrically imperfect shells prone to buckling.
131 citations
Cited by
More filters
Journal Article•
TL;DR: This book by a teacher of statistics (as well as a consultant for "experimenters") is a comprehensive study of the philosophical background for the statistical design of experiment.
Abstract: THE DESIGN AND ANALYSIS OF EXPERIMENTS. By Oscar Kempthorne. New York, John Wiley and Sons, Inc., 1952. 631 pp. $8.50. This book by a teacher of statistics (as well as a consultant for \"experimenters\") is a comprehensive study of the philosophical background for the statistical design of experiment. It is necessary to have some facility with algebraic notation and manipulation to be able to use the volume intelligently. The problems are presented from the theoretical point of view, without such practical examples as would be helpful for those not acquainted with mathematics. The mathematical justification for the techniques is given. As a somewhat advanced treatment of the design and analysis of experiments, this volume will be interesting and helpful for many who approach statistics theoretically as well as practically. With emphasis on the \"why,\" and with description given broadly, the author relates the subject matter to the general theory of statistics and to the general problem of experimental inference. MARGARET J. ROBERTSON
12,326 citations
13 Aug 2016
TL;DR: XGBoost as discussed by the authors proposes a sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning to achieve state-of-the-art results on many machine learning challenges.
Abstract: Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.
10,428 citations
13 Aug 2016
TL;DR: In this article, the authors propose LIME, a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem.
Abstract: Despite widespread adoption, machine learning models remain mostly black boxes. Understanding the reasons behind predictions is, however, quite important in assessing trust, which is fundamental if one plans to take action based on a prediction, or when choosing whether to deploy a new model. Such understanding also provides insights into the model, which can be used to transform an untrustworthy model or prediction into a trustworthy one. In this work, we propose LIME, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning an interpretable model locally varound the prediction. We also propose a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem. We demonstrate the flexibility of these methods by explaining different models for text (e.g. random forests) and image classification (e.g. neural networks). We show the utility of explanations via novel experiments, both simulated and with human subjects, on various scenarios that require trust: deciding if one should trust a prediction, choosing between models, improving an untrustworthy classifier, and identifying why a classifier should not be trusted.
6,284 citations
University of Jyväskylä1, University of California, Los Angeles2, California Polytechnic State University3, Los Alamos National Laboratory4, National Research University – Higher School of Economics5, University of California, Berkeley6, University of Birmingham7, Australian Nuclear Science and Technology Organisation8, University of Washington9, University of Massachusetts Amherst10, University of West Bohemia11, University of Texas at Austin12, Brigham Young University13, Universidade Federal de Minas Gerais14, Google15
TL;DR: SciPy as discussed by the authors is an open-source scientific computing library for the Python programming language, which has become a de facto standard for leveraging scientific algorithms in Python, with over 600 unique code contributors, thousands of dependent packages, over 100,000 dependent repositories and millions of downloads per year.
Abstract: SciPy is an open-source scientific computing library for the Python programming language. Since its initial release in 2001, SciPy has become a de facto standard for leveraging scientific algorithms in Python, with over 600 unique code contributors, thousands of dependent packages, over 100,000 dependent repositories and millions of downloads per year. In this work, we provide an overview of the capabilities and development practices of SciPy 1.0 and highlight some recent technical developments.
6,244 citations
Posted Content•
TL;DR: GraphSAGE is presented, a general, inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data and outperforms strong baselines on three inductive node-classification benchmarks.
Abstract: Low-dimensional embeddings of nodes in large graphs have proved extremely useful in a variety of prediction tasks, from content recommendation to identifying protein functions. However, most existing approaches require that all nodes in the graph are present during training of the embeddings; these previous approaches are inherently transductive and do not naturally generalize to unseen nodes. Here we present GraphSAGE, a general, inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data. Instead of training individual embeddings for each node, we learn a function that generates embeddings by sampling and aggregating features from a node's local neighborhood. Our algorithm outperforms strong baselines on three inductive node-classification benchmarks: we classify the category of unseen nodes in evolving information graphs based on citation and Reddit post data, and we show that our algorithm generalizes to completely unseen graphs using a multi-graph dataset of protein-protein interactions.
4,132 citations