scispace - formally typeset
Search or ask a question

Showing papers on "Nonparametric statistics published in 2015"


Journal ArticleDOI
TL;DR: An overview of control function (CF) methods for solving the problem of endogenous explanatory variables (EEVs) in linear and nonlinear models can be found in this article, with a focus on estimating average partial effects, along with theoretical results on nonparametric identification, suggests some simple, flexible parametric CF strategies.
Abstract: This paper provides an overview of control function (CF) methods for solving the problem of endogenous explanatory variables (EEVs) in linear and nonlinear models. CF methods often can be justified in situations where “plug- in” approaches are known to produce inconsistent estimators of parameters and partial effects. Usually, CF approaches require fewer assumptions than maximum likelihood, and CF methods are computationally simpler. The recent focus on estimating average partial effects, along with theoretical results on nonparametric identification, suggests some simple, flexible parametric CF strategies. The CF approach for handling discrete EEVs in nonlinear models is more controversial but approximate solutions are available.

819 citations


Book
22 Dec 2015
TL;DR: This book discusses Computational Statistics, a branch of Statistics, and its applications in medicine, education, and research.
Abstract: Prefaces Introduction What Is Computational Statistics? An Overview of the Book Probability Concepts Introduction Probability Conditional Probability and Independence Expectation Common Distributions Sampling Concepts Introduction Sampling Terminology and Concepts Sampling Distributions Parameter Estimation Empirical Distribution Function Generating Random Variables Introduction General Techniques for Generating Random Variables Generating Continuous Random Variables Generating Discrete Random Variables Exploratory Data Analysis Introduction Exploring Univariate Data Exploring Bivariate and Trivariate Data Exploring Multidimensional Data Finding Structure Introduction Projecting Data Principal Component Analysis Projection Pursuit EDA Independent Component Analysis Grand Tour Nonlinear Dimensionality Reduction Monte Carlo Methods for Inferential Statistics Introduction Classical Inferential Statistics Monte Carlo Methods for Inferential Statistics Bootstrap Methods Data Partitioning Introduction Cross-Validation Jackknife Better Bootstrap Confidence Intervals Jackknife-after-Bootstrap Probability Density Estimation Introduction Histograms Kernel Density Estimation Finite Mixtures Generating Random Variables Supervised Learning Introduction Bayes' Decision Theory Evaluating the Classifier Classification Trees Combining Classifiers Unsupervised Learning Introduction Measures of Distance Hierarchical Clustering K-Means Clustering Model-Based Clustering Assessing Cluster Results Parametric Models Introduction Spline Regression Models Logistic Regression Generalized Linear Models Nonparametric models Introduction Some Smoothing Methods Kernel Methods Smoothing Splines Nonparametric Regression-Other Details Regression Trees Additive Models Markov Chain Monte Carlo Methods Introduction Background Metropolis-Hastings Algorithms The Gibbs Sampler Convergence Monitoring Spatial Statistics Introduction Visualizing Spatial Point Processes Exploring First-Order and Second-Order Properties Modeling Spatial Point Processes Simulating Spatial Point Processes Appendix A: Introduction to Matlab What Is MATLAB? Getting Help in MATLAB File and Workspace Management Punctuation in MATLAB Arithmetic Operators Data Constructs in MATLAB Script Files and Functions Control Flow Simple Plotting Contact Information Appendix B: Projection Pursuit Indexes Indexes MATLAB Source Code Appendix C: Matlab Statistics Toolbox Appendix D: Computational Statistics Toolbox Appendix E: Exploratory Data Analysis Toolboxes Introduction EDA Toolbox EDA GUI Toolbox Appendix F: Data Sets Appendix G: NOTATION References INDEX MATLAB Code, Further Reading, and Exercises appear at the end of each chapter.

766 citations


01 Jan 2015
TL;DR: In this paper, the curse of dimensionality and dimension reduction is discussed in the context of multivariate data representation and geometrical properties of multi-dimensional data, including Histograms and Kernel Density Estimators.
Abstract: Representation and Geometry of Multivariate Data. Nonparametric Estimation Criteria. Histograms: Theory and Practice. Frequency Polygons. Averaged Shifted Histograms. Kernel Density Estimators. The Curse of Dimensionality and Dimension Reduction. Nonparametric Regression and Additive Models. Special Topics. Appendices. Indexes.

731 citations


Journal ArticleDOI
TL;DR: Dunn's test is the appropriate nonparametric pairwise multiple-comparison procedure when a Kruskal-Wallis test is rejected as mentioned in this paper, and it is now implemented for Stata in the dunntest command.
Abstract: Dunn's test is the appropriate nonparametric pairwise multiple-comparison procedure when a Kruskal–Wallis test is rejected, and it is now implemented for Stata in the dunntest command. dunntest pro...

433 citations


Journal ArticleDOI
TL;DR: The Standardized Drought Analysis Toolbox (SDAT) as mentioned in this paper is based on a nonparametric framework that can be applied to different climatic variables including precipitation, soil moisture and relative humidity, without having to assume representative parametric distributions.

274 citations


Journal ArticleDOI
TL;DR: The ecp package is designed to perform multiple change point analysis while making as few assumptions as possible, and is suitable for both univariate and multivariate observations.
Abstract: There are many different ways in which change point analysis can be performed, from purely parametric methods to those that are distribution free. The ecp package is designed to perform multiple change point analysis while making as few assumptions as possible. While many other change point methods are applicable only for univariate data, this R package is suitable for both univariate and multivariate observations. Hierarchical estimation can be based upon either a divisive or agglomerative algorithm. Divisive estimation sequentially identifies change points via a bisection algorithm. The agglomerative algorithm estimates change point locations by determining an optimal segmentation. Both approaches are able to detect any type of distributional change within the data. This provides an advantage over many existing change point algorithms which are only able to detect changes within the marginal distributions.

259 citations


Journal ArticleDOI
TL;DR: A new R package nparcomp is introduced which provides an easy and user-friendly access to rank-based methods for the analysis of unbalanced one-way layouts and provides procedures performing multiple comparisons and computing simultaneous confidence intervals for the estimated effects which can be easily visualized.
Abstract: One-way layouts, i.e., a single factor with several levels and multiple observations at each level, frequently arise in various fields. Usually not only a global hypothesis is of interest but also multiple comparisons between the different treatment levels. In most practical situations, the distribution of observed data is unknown and there may exist a number of atypical measurements and outliers. Hence, use of parametric and semiparametric procedures that impose restrictive distributional assumptions on observed samples becomes questionable. This, in turn, emphasizes the demand on statistical procedures that enable us to accurately and reliably analyze one-way layouts with minimal conditions on available data. Nonparametric methods offer such a possibility and thus become of particular practical importance. In this article, we introduce a new R package nparcomp which provides an easy and user-friendly access to rank-based methods for the analysis of unbalanced one-way layouts. It provides procedures performing multiple comparisons and computing simultaneous confidence intervals for the estimated effects which can be easily visualized. The special case of two samples, the nonparametric Behrens-Fisher problem, is included. We illustrate the implemented procedures by examples from biology and medicine.

241 citations


Journal ArticleDOI
TL;DR: This work adopts a U-statistics-based C estimator that is asymptotically normal and develops a nonparametric analytical approach to estimate the variance of the C estimATOR and the covariance of two C estimators, which is illustrated with an example from the Framingham Heart Study.
Abstract: The area under the receiver operating characteristic curve is often used as a summary index of the diagnostic ability in evaluating biomarkers when the clinical outcome (truth) is binary. When the clinical outcome is right-censored survival time, the C index, motivated as an extension of area under the receiver operating characteristic curve, has been proposed by Harrell as a measure of concordance between a predictive biomarker and the right-censored survival outcome. In this work, we investigate methods for statistical comparison of two diagnostic or predictive systems, of which they could either be two biomarkers or two fixed algorithms, in terms of their C indices. We adopt a U-statistics-based C estimator that is asymptotically normal and develop a nonparametric analytical approach to estimate the variance of the C estimator and the covariance of two C estimators. A z-score test is then constructed to compare the two C indices. We validate our one-shot nonparametric method via simulation studies in terms of the type I error rate and power. We also compare our one-shot method with resampling methods including the jackknife and the bootstrap. Simulation results show that the proposed one-shot method provides almost unbiased variance estimations and has satisfactory type I error control and power. Finally, we illustrate the use of the proposed method with an example from the Framingham Heart Study.

238 citations


Journal ArticleDOI
TL;DR: The results suggest that analyses of 1D data based on 0D models of randomness are generally biased unless one explicitly identifies 0D variables before the experiment, and parametric and non-parametric 1D hypothesis testing provide an unambiguous framework for analysis when one׳s hypothesis explicitly or implicitly pertains to whole 1D trajectories.

225 citations


Posted Content
TL;DR: In this article, the effects of bias correction on confidence interval coverage in the context of kernel density and local polynomial regression estimation were studied. But bias correction can be preferred to undersmoothing for minimizing coverage error and increasing robustness to tuning parameter choice.
Abstract: Nonparametric methods play a central role in modern empirical work. While they provide inference procedures that are more robust to parametric misspecification bias, they may be quite sensitive to tuning parameter choices. We study the effects of bias correction on confidence interval coverage in the context of kernel density and local polynomial regression estimation, and prove that bias correction can be preferred to undersmoothing for minimizing coverage error and increasing robustness to tuning parameter choice. This is achieved using a novel, yet simple, Studentization, which leads to a new way of constructing kernel-based bias-corrected confidence intervals. In addition, for practical cases, we derive coverage error optimal bandwidths and discuss easy-to-implement bandwidth selectors. For interior points, we show that the MSE-optimal bandwidth for the original point estimator (before bias correction) delivers the fastest coverage error decay rate after bias correction when second-order (equivalent) kernels are employed, but is otherwise suboptimal because it is too "large". Finally, for odd-degree local polynomial regression, we show that, as with point estimation, coverage error adapts to boundary points automatically when appropriate Studentization is used; however, the MSE-optimal bandwidth for the original point estimator is suboptimal. All the results are established using valid Edgeworth expansions and illustrated with simulated data. Our findings have important consequences for empirical work as they indicate that bias-corrected confidence intervals, coupled with appropriate standard errors, have smaller coverage error and are less sensitive to tuning parameter choices in practically relevant cases where additional smoothness is available.

202 citations


Journal ArticleDOI
TL;DR: The R package cpm is described, which provides a fast implementation of all the above change point models in both batch (Phase I) and sequential (Phase II) settings, where the sequences may contain either a single or multiple change points.
Abstract: The change point model framework introduced in Hawkins, Qiu, and Kang (2003) and Hawkins and Zamba (2005a) provides an effective and computationally efficient method for detecting multiple mean or variance change points in sequences of Gaussian random variables, when no prior information is available regarding the parameters of the distribution in the various segments. It has since been extended in various ways by Hawkins and Deng (2010), Ross, Tasoulis, and Adams (2011), Ross and Adams (2012) to allow for fully nonparametric change detection in non-Gaussian sequences, when no knowledge is available regarding even the distributional form of the sequence. Another extension comes from Ross and Adams (2011) and Ross (2014) which allows change detection in streams of Bernoulli and Exponential random variables respectively, again when the values of the parameters are unknown. This paper describes the R package cpm, which provides a fast implementation of all the above change point models in both batch (Phase I) and sequential (Phase II) settings, where the sequences may contain either a single or multiple change points.

Book
19 Jan 2015
TL;DR: In depth, and in terms that someone with only one year of graduate econometrics can understand, basic to advanced nonparametric methods are discussed,basic to advancedNonparametric Methods of Econometricians.
Abstract: 1. Introduction 2. Univariate density estimation 3. Multivariate density estimation 4. Inference about the density 5. Regression 6. Testing in regression 7. Smoothing discrete variables 8. Regression with discrete covariates 9. Semiparametric methods 10. Instrumental variables 11. Panel data 12. Constrained estimation and inference Bibliography Index.


Book
18 Jun 2015
TL;DR: This book reviews nonparametric Bayesian methods and models that have proven useful in the context of data analysis and includes an extensive discussion of computational methods and details on their implementation.
Abstract: This book reviews nonparametric Bayesian methods and models that have proven useful in the context of data analysis. Rather than providing an encyclopedic review of probability models, the books structure follows a data analysis perspective. As such, the chapters are organized by traditional data analysis problems. In selecting specific nonparametric models, simpler and more traditional models are favored over specialized ones. The discussed methods are illustrated with a wealth of examples, including applications ranging from stylized examples to case studies from recent literature. The book also includes an extensive discussion of computational methods and details on their implementation. R code for many examples is included in online software pages.

Journal ArticleDOI
TL;DR: This work develops an efficient, data-driven technique for estimating the parameters of these models from observed equilibria, and supports both parametric and nonparametric estimation by leveraging ideas from statistical learning (kernel methods and regularization operators).
Abstract: Equilibrium modeling is common in a variety of fields such as game theory and transportation science. The inputs for these models, however, are often difficult to estimate, while their outputs, i.e., the equilibria they are meant to describe, are often directly observable. By combining ideas from inverse optimization with the theory of variational inequalities, we develop an efficient, data-driven technique for estimating the parameters of these models from observed equilibria. We use this technique to estimate the utility functions of players in a game from their observed actions and to estimate the congestion function on a road network from traffic count data. A distinguishing feature of our approach is that it supports both parametric and nonparametric estimation by leveraging ideas from statistical learning (kernel methods and regularization operators). In computational experiments involving Nash and Wardrop equilibria in a nonparametric setting, we find that a) we effectively estimate the unknown demand or congestion function, respectively, and b) our proposed regularization technique substantially improves the out-of-sample performance of our estimators.

Journal ArticleDOI
TL;DR: In this paper, a nonparametric approach compared to the parametric approach of CB-SEM was used to compare the performance of Variance Based SEM and Covariance Based SemEval.
Abstract: Lately, there was some attention for the Variance Based SEM (VB-SEM) against that of Covariance Based SEM (CB-SEM) from social science researches regarding the fitness indexes, sample size requirement, and normality assumption Not many of them aware that VB-SEM is developed based on the non-parametric approach compared to the parametric approach of CB-SEM In fact the fitness of a model should not be taken lightly since it reflects the behavior of data in relation to the proposed model for the study Furthermore, the adequacy of sample size and the normality of data are among the main assumptions of parametric test itself This study intended to clarify the ambiguities among the social science community by employing the data-set which do not meet the fitness requirements and normality assumptions to execute both CB-SEM and VB-SEM The findings reveal that the result of CB-SEM with bootstrapping is almost similar to that of VB-SEM (bootstrapping as usual) Therefore, the failure to meet the fitness and normality requirements should not be the reason for employing Non-Parametric SEM


Journal ArticleDOI
TL;DR: The results suggest that use of SL to estimate the PS can improve covariate balance and reduce bias in a meaningful manner in cases of serious model misspecification for treatment assignment.
Abstract: The consistency of propensity score (PS) estimators relies on correct specification of the PS model. The PS is frequently estimated using main-effects logistic regression. However, the underlying model assumptions may not hold. Machine learning methods provide an alternative nonparametric approach to PS estimation. In this simulation study, we evaluated the benefit of using Super Learner (SL) for PS estimation. We created 1,000 simulated data sets (n = 500) under 4 different scenarios characterized by various degrees of deviance from the usual main-term logistic regression model for the true PS. We estimated the average treatment effect using PS matching and inverse probability of treatment weighting. The estimators' performance was evaluated in terms of PS prediction accuracy, covariate balance achieved, bias, standard error, coverage, and mean squared error. All methods exhibited adequate overall balancing properties, but in the case of model misspecification, SL performed better for highly unbalanced variables. The SL-based estimators were associated with the smallest bias in cases of severe model misspecification. Our results suggest that use of SL to estimate the PS can improve covariate balance and reduce bias in a meaningful manner in cases of serious model misspecification for treatment assignment.

Book
25 Sep 2015
TL;DR: Nonparametric theory for analyzing data on manifolds, methods for working with specific spaces, and extensive applications to practical research problems are discussed in this article, with a discussion of current related research and graduate level teaching topics as well as considerations related to computational statistics.
Abstract: A New Way of Analyzing Object Data from a Nonparametric Viewpoint Nonparametric Statistics on Manifolds and Their Applications to Object Data Analysis provides one of the first thorough treatments of the theory and methodology for analyzing data on manifolds. It also presents in-depth applications to practical problems arising in a variety of fields, including statistics, medical imaging, computer vision, pattern recognition, and bioinformatics. The book begins with a survey of illustrative examples of object data before moving to a review of concepts from mathematical statistics, differential geometry, and topology. The authors next describe theory and methods for working on various manifolds, giving a historical perspective of concepts from mathematics and statistics. They then present problems from a wide variety of areas, including diffusion tensor imaging, similarity shape analysis, directional data analysis, and projective shape analysis for machine vision. The book concludes with a discussion of current related research and graduate-level teaching topics as well as considerations related to computational statistics. Researchers in diverse fields must combine statistical methodology with concepts from projective geometry, differential geometry, and topology to analyze data objects arising from non-Euclidean object spaces. An expert-driven guide to this approach, this book covers the general nonparametric theory for analyzing data on manifolds, methods for working with specific spaces, and extensive applications to practical research problems. These problems show how object data analysis opens a formidable door to the realm of big data analysis.

Journal ArticleDOI
TL;DR: In this paper, a nonparametric test for the population mean vector of non-normal high-dimensional multivariate data was proposed, and the authors proved that the limiting null distribution of the proposed test is normal under mild conditions.
Abstract: This work is concerned with testing the population mean vector of nonnormal high-dimensional multivariate data. Several tests for high-dimensional mean vector, based on modifying the classical Hotelling T2 test, have been proposed in the literature. Despite their usefulness, they tend to have unsatisfactory power performance for heavy-tailed multivariate data, which frequently arise in genomics and quantitative finance. This article proposes a novel high-dimensional nonparametric test for the population mean vector for a general class of multivariate distributions. With the aid of new tools in modern probability theory, we proved that the limiting null distribution of the proposed test is normal under mild conditions when p is substantially larger than n. We further study the local power of the proposed test and compare its relative efficiency with a modified Hotelling T2 test for high-dimensional data. An interesting finding is that the newly proposed test can have even more substantial power gain with l...

Journal ArticleDOI
TL;DR: In this paper, a nonparametric graph-based approach is proposed to detect change points in a data sequence, which can be applied to any data set as long as an informative similarity measure on the sample space can be defined.
Abstract: We consider the testing and estimation of change-points—locations where the distribution abruptly changes—in a data sequence. A new approach, based on scan statistics utilizing graphs representing the similarity between observations, is proposed. The graph-based approach is nonparametric, and can be applied to any data set as long as an informative similarity measure on the sample space can be defined. Accurate analytic approximations to the significance of graph-based scan statistics for both the single change-point and the changed interval alternatives are provided. Simulations reveal that the new approach has better power than existing approaches when the dimension of the data is moderate to high. The new approach is illustrated on two applications: The determination of authorship of a classic novel, and the detection of change in a network over time.

Journal ArticleDOI
TL;DR: The proposed nonparametric car-following model exhibits traffic dynamics in a simple and parsimonious manner and is able to well replicate periodic traffic oscillations from the precursor stage to the decay stage.
Abstract: Car-following models are always of great interest of traffic engineers and researchers. In the age of mass data, this paper proposes a nonparametric car-following model driven by field data. Different from most of the existing car-following models, neither driver’s behaviour parameters nor fundamental diagrams are assumed in the data-driven model. The model is proposed based on the simple k-nearest neighbour, which outputs the average of the most similar cases, i.e., the most likely driving behaviour under the current circumstance. The inputs and outputs are selected, and the determination of the only parameter k is introduced. Three simulation scenarios are conducted to test the model. The first scenario is to simulate platoons following real leaders, where traffic waves with constant speed and the detailed trajectories are observed to be consistent with the empirical data. Driver’s rubbernecking behaviour and driving errors are simulated in the second and third scenarios, respectively. The time–space diagrams of the simulated trajectories are presented and explicitly analysed. It is demonstrated that the model is able to well replicate periodic traffic oscillations from the precursor stage to the decay stage. Without making any assumption, the fundamental diagrams for the simulated scenario coincide with the empirical fundamental diagrams. These all validate that the model can well reproduce the traffic characteristics contained by the field data. The nonparametric car-following model exhibits traffic dynamics in a simple and parsimonious manner.

Proceedings Article
07 Dec 2015
TL;DR: These estimators are derived from the von Mises expansion and are based on the theory of influence functions, which appear in the semiparametric statistics literature and it is shown that estimators based either on data-splitting or a leave-one-out technique enjoy fast rates of convergence and other favorable theoretical properties.
Abstract: We propose and analyse estimators for statistical functionals of one or more distributions under nonparametric assumptions. Our estimators are derived from the von Mises expansion and are based on the theory of influence functions, which appear in the semiparametric statistics literature. We show that estimators based either on data-splitting or a leave-one-out technique enjoy fast rates of convergence and other favorable theoretical properties. We apply this framework to derive estimators for several popular information theoretic quantities, and via empirical evaluation, show the advantage of this approach over existing estimators.

Journal ArticleDOI
TL;DR: A market discovery algorithm that starts with a parsimonious set of types and enlarge it by automatically generating new types that increase the likelihood value is proposed, which improves the root mean square errors between predicted and observed purchases computed under independent demand model estimates.
Abstract: We propose an approach for estimating customer preferences for a set of substitutable products using only sales transactions and product availability data. The underlying demand framework combines a general, nonparametric discrete choice model with a Bernoulli process of arrivals over time. The choice model is defined by a discrete probability mass function pmf on a set of possible preference rankings of alternatives, and it is compatible with any random utility model. An arriving customer is assumed to purchase the available option that ranks highest in her preference list. The problem we address is how to jointly estimate the arrival rate and the pmf of the rank-based choice model under a maximum likelihood criterion. Since the potential number of customer types is factorial, we propose a market discovery algorithm that starts with a parsimonious set of types and enlarge it by automatically generating new types that increase the likelihood value. Numerical experiments confirm the potential of our proposal. For a realistic data set in the hospitality industry, our approach improves the root mean square errors between predicted and observed purchases computed under independent demand model estimates by 67% to 93%. This paper was accepted by Serguei Netessine, operations management.

Journal ArticleDOI
TL;DR: This work applies the proposed methodology to compare various prediction models using repeated measures of two psychometric tests to predict dementia in the elderly, accounting for the competing risk of death.
Abstract: Thanks to the growing interest in personalized medicine, joint modeling of longitudinal marker and time-to-event data has recently started to be used to derive dynamic individual risk predictions. Individual predictions are called dynamic because they are updated when information on the subject's health profile grows with time. We focus in this work on statistical methods for quantifying and comparing dynamic predictive accuracy of this kind of prognostic models, accounting for right censoring and possibly competing events. Dynamic area under the ROC curve (AUC) and Brier Score (BS) are used to quantify predictive accuracy. Nonparametric inverse probability of censoring weighting is used to estimate dynamic curves of AUC and BS as functions of the time at which predictions are made. Asymptotic results are established and both pointwise confidence intervals and simultaneous confidence bands are derived. Tests are also proposed to compare the dynamic prediction accuracy curves of two prognostic models. The finite sample behavior of the inference procedures is assessed via simulations. We apply the proposed methodology to compare various prediction models using repeated measures of two psychometric tests to predict dementia in the elderly, accounting for the competing risk of death. Models are estimated on the French Paquid cohort and predictive accuracies are evaluated and compared on the French Three-City cohort.

Proceedings Article
25 Jan 2015
TL;DR: The hardness of estimation of test statistics is differentiated from the hardness of testing whether these statistics are zero or not, and a notion of "fair" alternative hypotheses for these problems as dimension increases is discussed.
Abstract: This paper is about two related decision theoretic problems, nonparametric two-sample testing and independence testing. There is a belief that two recently proposed solutions, based on kernels and distances between pairs of points, behave well in high-dimensional settings. We identify different sources of misconception that give rise to the above belief. Specifically, we differentiate the hardness of estimation of test statistics from the hardness of testing whether these statistics are zero or not, and explicitly discuss a notion of "fair" alternative hypotheses for these problems as dimension increases. We then demonstrate that the power of these tests actually drops polynomially with increasing dimension against fair alternatives. We end with some theoretical insights and shed light on the median heuristic for kernel bandwidth selection. Our work advances the current understanding of the power of modern nonpara-metric hypothesis tests in high dimensions.

Journal ArticleDOI
TL;DR: In this paper, the authors consider nonparametric identification of nonseparable instrumental variables models with continuous endogenous variables and show that many kinds of continuous, discrete, and even binary instruments can be used to point-identify the levels of the outcome equation.
Abstract: I consider nonparametric identification of nonseparable instrumental variables models with continuous endogenous variables. If both the outcome and first stage equations are strictly increasing in a scalar unobservable, then many kinds of continuous, discrete, and even binary instruments can be used to point-identify the levels of the outcome equation. This contrasts sharply with related work by Imbens and Newey, 2009 that requires continuous instruments with large support. One implication is that assumptions about the dimension of heterogeneity can provide nonparametric point-identification of the distribution of treatment response for a continuous treatment in a randomized controlled experiment with partial compliance.

Journal ArticleDOI
TL;DR: A robust methodology to simplify car-following models, that is, to reduce the number of parameters (to calibrate) without sensibly affecting the capability of reproducing reality is provided, and variance-based sensitivity analysis is proposed and formulated in a “factor fixing” setting.
Abstract: Automated calibration of microscopic traffic flow models is all but simple for a number of reasons, including the computational complexity of black-box optimization and the asymmetric importance of parameters in influencing model performances. The main objective of this paper is therefore to provide a robust methodology to simplify car-following models, that is, to reduce the number of parameters (to calibrate) without sensibly affecting the capability of reproducing reality. To this aim, variance-based sensitivity analysis is proposed and formulated in a “factor fixing” setting. Among the novel contributions are a robust design of the Monte Carlo framework that also includes, as an analysis factor, the main nonparametric input of car-following models, i.e., the leader's trajectory, and a set of criteria for “data assimilation” in car-following models. The methodology was applied to the intelligent driver model (IDM) and to all the trajectories in the “reconstructed” Next Generation SIMulation (NGSIM) I80-1 data set. The analysis unveiled that the leader's trajectory is considerably more important than the parameters in affecting the variability of model performances. Sensitivity analysis also returned the importance ranking of the IDM parameters. Basing on this, a simplified model version with three (out of six) parameters is proposed. After calibrations, the full model and the simplified model show comparable performances, in face of a sensibly faster convergence of the simplified version.

Journal ArticleDOI
TL;DR: In this article, the authors consider a scale of priors of varying regularity and choose the regularity by an empirical Bayes method, and show that an adaptive Bayes credible set gives correct uncertainty quantification of "polished tail" parameters, in the sense of high probability of coverage of such parameters.
Abstract: We investigate the frequentist coverage of Bayesian credible sets in a nonparametric setting. We consider a scale of priors of varying regularity and choose the regularity by an empirical Bayes method. Next we consider a central set of prescribed posterior probability in the posterior distribution of the chosen regularity. We show that such an adaptive Bayes credible set gives correct uncertainty quantification of "polished tail" parameters, in the sense of high probability of coverage of such parameters. On the negative side, we show by theory and example that adaptation of the prior necessarily leads to gross and haphazard uncertainty quantification for some true parameters that are still within the hyperrectangle regularity scale.

Journal ArticleDOI
TL;DR: The possibility of automating the process of constructing summary statistics by training deep neural networks to predict the parameters from artificially generated data is explored: the resulting summary statistics are approximately posterior means of the parameters.
Abstract: Approximate Bayesian Computation (ABC) methods are used to approximate posterior distributions in models with unknown or computationally intractable likelihoods. Both the accuracy and computational efficiency of ABC depend on the choice of summary statistic, but outside of special cases where the optimal summary statistics are known, it is unclear which guiding principles can be used to construct effective summary statistics. In this paper we explore the possibility of automating the process of constructing summary statistics by training deep neural networks to predict the parameters from artificially generated data: the resulting summary statistics are approximately posterior means of the parameters. With minimal model-specific tuning, our method constructs summary statistics for the Ising model and the moving-average model, which match or exceed theoretically-motivated summary statistics in terms of the accuracies of the resulting posteriors.