scispace - formally typeset
Search or ask a question

Showing papers on "Nonparametric statistics published in 2013"


Posted Content
TL;DR: This paper abandon the normality assumption and instead use statistical methods for nonparametric density estimation for kernel estimation, which suggests that kernel estimation is a useful tool for learning Bayesian models.
Abstract: When modeling a probability distribution with a Bayesian network, we are faced with the problem of how to handle continuous variables. Most previous work has either solved the problem by discretizing, or assumed that the data are generated by a single Gaussian. In this paper we abandon the normality assumption and instead use statistical methods for nonparametric density estimation. For a naive Bayesian classifier, we present experimental results on a variety of natural and artificial domains, comparing two methods of density estimation: assuming normality and modeling each conditional distribution with a single Gaussian; and using nonparametric kernel density estimation. We observe large reductions in error on several natural and artificial data sets, which suggests that kernel estimation is a useful tool for learning Bayesian models.

3,071 citations


Journal ArticleDOI
TL;DR: This work proposes nonparametric inverse probability of censoring weighting estimators of the AUC corresponding to these two definitions of the specificity, and studies their asymptotic properties.
Abstract: The area under the time-dependent ROC curve (AUC) may be used to quantify the ability of a marker to predict the onset of a clinical outcome in the future. For survival analysis with competing risks, two alternative definitions of the specificity may be proposed depending of the way to deal with subjects who undergo the competing events. In this work, we propose nonparametric inverse probability of censoring weighting estimators of the AUC corresponding to these two definitions, and we study their asymptotic properties. We derive confidence intervals and test statistics for the equality of the AUCs obtained with two markers measured on the same subjects. A simulation study is performed to investigate the finite sample behaviour of the test and the confidence intervals. The method is applied to the French cohort PAQUID to compare the abilities of two psychometric tests to predict dementia onset in the elderly accounting for death without dementia competing risk. The 'timeROC' R package is provided to make the methodology easily usable.

842 citations


Journal ArticleDOI
TL;DR: A simple, non-parametric method with resampling to account for the different sequencing depths is introduced, and it is found that the method discovers more consistent patterns than competing methods.
Abstract: We discuss the identification of features that are associated with an outcome in RNA-Sequencing (RNA-Seq) and other sequencing-based comparative genomic experiments. RNA-Seq data takes the form of counts, so models based on the normal distribution are generally unsuitable. The problem is especially challenging because different sequencing experiments may generate quite different total numbers of reads, or 'sequencing depths'. Existing methods for this problem are based on Poisson or negative binomial models: they are useful but can be heavily influenced by 'outliers' in the data. We introduce a simple, non-parametric method with resampling to account for the different sequencing depths. The new method is more robust than parametric methods. It can be applied to data with quantitative, survival, two-class or multiple-class outcomes. We compare our proposed method to Poisson and negative binomial-based methods in simulated and real data sets, and find that our method discovers more consistent patterns than competing methods.

431 citations


Journal ArticleDOI
TL;DR: A simple nonparametric modeling technique and an algorithm implementing it—an improved and generalized version of Bayesian Blocks [Scargle 1998]—that finds the optimal segmentation of the data in the observation interval are presented.
Abstract: This paper addresses the problem of detecting and characterizing local variability in time series and other forms of sequential data. The goal is to identify and characterize statistically significant variations, at the same time suppressing the inevitable corrupting observational errors. We present a simple nonparametric modeling technique and an algorithm implementing it—an improved and generalized version of Bayesian Blocks [Scargle 1998]—that finds the optimal segmentation of the data in the observation interval. The structure of the algorithm allows it to be used in either a real-time trigger mode, or a retrospective mode. Maximum likelihood or marginal posterior functions to measure model fitness are presented for events, binned counts, and measurements at arbitrary times with known error distributions. Problems addressed include those connected with data gaps, variable exposure, extension to piecewise linear and piecewise exponential representations, multi-variate time series data, analysis of variance, data on the circle, other data modes, and dispersed data. Simulations provide evidence that the detection efficiency for weak signals is close to a theoretical asymptotic limit derived by [Arias-Castro, Donoho and Huo 2003]. In the spirit of Reproducible Research [Donoho et al. (2008)] all of the code and data necessary to reproduce all of the figures in this paper are included as auxiliary material.

417 citations


Book
31 May 2013
TL;DR: In this paper, a link between frontier estimation and extreme value theory has been established, and several approaches exist for introducing environmental variables into production models; both two-stage approaches, in which estimated efficiencies are regressed on environmental variables, and conditional efficiency measures, as well as the underlying assumptions required for either approach, are examined.
Abstract: Nonparametric estimators are widely used to estimate the productive efficiency of firms and other organizations, but often without any attempt to make statistical inference. Recent work has provided statistical properties of these estimators as well as methods for making statistical inference, and a link between frontier estimation and extreme value theory has been established. New estimators that avoid many of the problems inherent with traditional efficiency estimators have also been developed; these new estimators are robust with respect to outliers and avoid the well-known curse of dimensionality. Statistical properties, including asymptotic distributions, of the new estimators have been uncovered. Finally, several approaches exist for introducing environmental variables into production models; both two-stage approaches, in which estimated efficiencies are regressed on environmental variables, and conditional efficiency measures, as well as the underlying assumptions required for either approach, are examined.

359 citations


Book
23 Jul 2013
TL;DR: In this paper, the second-order intensity of spatial point patterns is estimated using the K-function and goodness-of-fit assessment using nearest neighbor distributions, which is based on the past Empirical and mechanistic models.
Abstract: Introduction Spatial point patterns Sampling Edge-effects Complete spatial randomness Objectives of statistical analysis The Dirichlet tessellation Monte Carlo tests Software Preliminary Testing Tests of complete spatial randomness Inter-event distances Nearest neighbor distances Point to nearest event distances Quadrat counts Scales of pattern Recommendations Methods for Sparsely Sampled Patterns General remarks Quadrat counts Distance measurements Tests of independence Recommendations Spatial Point Processes Processes and summary descriptions Second-order properties Higher order moments and nearest neighbor distributions The homogeneous Poisson process Independence and random labeling Estimation of second-order properties Displaced amacrine cells in the retina of a rabbit Estimation of nearest neighbor distributions Concluding remarks Nonparametric Methods Estimating weighted integrals of the second-order intensity Nonparametric estimation of a spatially varying intensity Analyzing replicated spatial point patterns Parametric or nonparametric methods? Models Contagious distributions Poisson cluster processes Inhomogeneous Poisson processes Cox processes Trans-Gaussian Cox processes Simple inhibition processes Markov point processes Other constructions Multivariate models Model-Fitting Using Summary Descriptions Parameter estimation using the K-function Goodness-of-fit assessment using nearest neighbor distributions Examples Parameter estimation via goodness-of-fit testing Model-Fitting Using Likelihood-Based Methods Likelihood inference for inhomogeneous Poisson processes Likelihood inference for Markov point processes Likelihood inference for Cox processes Additional reading Point Process Methods in Spatial Epidemiology Spatial clustering Spatial variation in risk Point source models Stratification and matching Disentangling heterogeneity and clustering Spatio-Temporal Point Processes Motivating examples A classification of spatio-temporal point patterns and processes Second-order properties Conditioning on the past Empirical and mechanistic models Exploratory Analysis Animation Marginal and conditional summaries Second-order properties Empirical Models and Methods Poisson processes Cox processes Log-Gaussian Cox processes Inference Gastro-intestinal illness in Hampshire, UK Concluding remarks: point processes and geostatistics Mechanistic Models and Methods Conditional intensity and likelihood Partial likelihood The 2001 foot-and-mouth epidemic in Cumbria, UK Nesting patterns of Arctic terns References

349 citations


Journal ArticleDOI
TL;DR: In this article, structural breaks in the unconditional and conditional mean as well as in the variance and covariance/correlation structure of time series models have been investigated for data exhibiting serial dependence.
Abstract: This paper gives an account of some of the recent work on structural breaks in time series models. In particular, we show how procedures based on the popular cumulative sum, CUSUM, statistics can be modified to work also for data exhibiting serial dependence. Both structural breaks in the unconditional and conditional mean as well as in the variance and covariance/correlation structure are covered. CUSUM procedures are nonparametric by design. If the data allows for parametric modeling, we demonstrate how likelihood approaches may be utilized to recover structural breaks. The estimation of multiple structural breaks is discussed. Furthermore, we cover how one can disentangle structural breaks (in the mean and/or the variance) on one hand and long memory or unit roots on the other. Several new lines of research are briefly mentioned.

343 citations


Posted Content
TL;DR: In this paper, a hierarchical clustering approach is proposed to estimate both the number and location of change points in a set of multivariate observations of arbitrary dimension and the positions at which they occur.
Abstract: Change point analysis has applications in a wide variety of fields. The general problem concerns the inference of a change in distribution for a set of time-ordered observations. Sequential detection is an online version in which new data is continually arriving and is analyzed adaptively. We are concerned with the related, but distinct, offline version, in which retrospective analysis of an entire sequence is performed. For a set of multivariate observations of arbitrary dimension, we consider nonparametric estimation of both the number of change points and the positions at which they occur. We do not make any assumptions regarding the nature of the change in distribution or any distribution assumptions beyond the existence of the alpha-th absolute moment, for some alpha in (0,2). Estimation is based on hierarchical clustering and we propose both divisive and agglomerative algorithms. The divisive method is shown to provide consistent estimates of both the number and location of change points under standard regularity assumptions. We compare the proposed approach with competing methods in a simulation study. Methods from cluster analysis are applied to assess performance and to allow simple comparisons of location estimates, even when the estimated number differs. We conclude with applications in genetics, finance and spatio-temporal analysis.

321 citations


Posted Content
01 Jan 2013
TL;DR: Several approaches exist for introducing environmental variables into production models; both two-stage approaches, in which estimated efficiencies are regressed on environmental variables, and conditional efficiency measures, as well as the underlying assumptions required for either approach, are examined.
Abstract: Nonparametric estimators are widely used to estimate the productive efficiency of firms and other organizations, but often without any attempt to make statistical inference. Recent work has provided statistical properties of these estimators as well as methods for making statistical inference, and a link between frontier estimation and extreme value theory has been established. New estimators that avoid many of the problems inherent with traditional efficiency estimators have also been developed; these new estimators are robust with respect to outliers and avoid the well-known curse of dimensionality. Statistical properties, including asymptotic distributions, of the new estimators have been uncovered. Finally, several approaches exist for introducing environmental variables into production models; both two-stage approaches, in which estimated efficiencies are regressed on environmental variables, and conditional efficiency measures, as well as the underlying assumptions required for either approach, are examined. (This abstract was borrowed from another version of this item.)

297 citations


Journal ArticleDOI
TL;DR: Through rigorous analysis, it is shown that under this new ILC scheme, uniform convergence of state tracking error is guaranteed and an illustrative example is presented to demonstrate the efficacy of the proposed I LC scheme.

215 citations


Posted Content
TL;DR: A nonparametric framework for the analysis of networks, based on a natural limit object termed a graphon, is proposed, and consistency of graphon estimation under general conditions is proved.
Abstract: We propose a nonparametric framework for the analysis of networks, based on a natural limit object termed a graphon. We prove consistency of graphon estimation under general conditions, giving rates which include the important practical setting of sparse networks. Our results cover dense and sparse stochastic blockmodels with a growing number of classes, under model misspecification. We use profile likelihood methods, and connect our results to approximation theory, nonparametric function estimation, and the theory of graph limits.

Book
14 Oct 2013
TL;DR: This book discusses statistical methods for describing data descriptions of distributions, and some of the methods used to describe the distribution of nonparametric ratings in the data described in this book have been developed.
Abstract: Introduction Quality and the Early History of Quality Improvement Quality Management Statistical Process Control Organization of the Book Basic Statistical Concepts and Methods Introduction Population and Population Distribution Important Continuous Distributions Important Discrete Distributions Data and Data Description Tabular and Graphical Methods for Describing Data Parametric Statistical Inferences Nonparametric Statistical Inferences Univariate Shewhart Charts and Process Capability Introduction Shewhart Charts for Numerical Variables Shewhart Charts for Categorical Variables Process Capability Analysis Some Discussions Univariate CUSUM Charts Introduction Monitoring the Mean of a Normal Process Monitoring the Variance of a Normal Process CUSUM Charts for Distributions in Exponential Family Self-Starting and Adaptive CUSUM Charts Some Theory for Computing ARL Values Some Discussions Univariate EWMA Charts Introduction Monitoring the Mean of a Normal Process Monitoring the Variance of a Normal Process Self-Starting and Adaptive EWMA Charts Some Discussions Univariate Control Charts by Change-Point Detection Introduction Univariate Change-Point Detection Control Charts by Change-Point Detection Some Discussions Multivariate Statistical Process Control Introduction Multivariate Shewhart Charts Multivariate CUSUM Charts Multivariate EWMA Charts Multivariate Control Charts by Change-Point Detection Multivariate Control Charts by LASSO Some Discussions Univariate Nonparametric Process Control Introduction Rank-Based Nonparametric Control Charts Nonparametric SPC by Categorical Data Analysis Some Discussions Multivariate Nonparametric Process Control Introduction Rank-Based Multivariate Nonparametric Control Charts Multivariate Nonparametric SPC by Log-Linear Modeling Some Discussions Profile Monitoring Introduction Parametric Profile Monitoring Nonparametric Profile Monitoring Some Discussions Appendix A: R Functions for SPC Appendix B: Datasets Used in the Book Bibliography Index Exercises appear at the end of each chapter.

Journal ArticleDOI
25 Jul 2013-Test
TL;DR: In this article, the authors present a survey of the developments on Goodness-of-Fit for regression models during the last 20 years, from the very first origins with the idea of the tests for density and distribution, until the most recent advances for complex data and models.
Abstract: This survey intends to collect the developments on Goodness-of-Fit for regression models during the last 20 years, from the very first origins with the proposals based on the idea of the tests for density and distribution, until the most recent advances for complex data and models. Far from being exhaustive, the contents in this paper are focused on two main classes of tests statistics: smoothing-based tests (kernel-based) and tests based on empirical regression processes, although other tests based on Maximum Likelihood ideas will be also considered. Starting from the simplest case of testing a parametric family for the regression curves, the contributions in this field provide also testing procedures in semiparametric, nonparametric, and functional models, dealing also with more complex settings, as those ones involving dependent or incomplete data.

Journal ArticleDOI
TL;DR: A hierarchical estimation procedure for the parameters and an asymptotic analysis for the marginal distributions is introduced and the effectiveness of the grouping procedure in the sense of structure selection is shown.

Journal ArticleDOI
TL;DR: In this paper, the authors present a fully nonparametric framework to estimate relative performance of production units when accounting for continuous and discrete background variables, and show how conditional efficiency scores can be estimated using a tailored mixed kernel function with a data-driven bandwidth selection.
Abstract: Efficiency estimations which do not account for the operational environment where production units are operating in may have only a limited value. This article presents a fully nonparametric framework to estimate relative performance of production units when accounting for continuous and discrete background variables. Using insights from recent developments in nonparametric econometrics, we show how conditional efficiency scores can be estimated using a tailored mixed kernel function with a data-driven bandwidth selection. The methodology is applied to the sample of Dutch pupils from the Organization for Economic Co-operation and Development's Programme for International Student Assessment (OECD PISA) data set. We estimate students' performance and the influence of its background characteristics. The results of our application show that several family- and student-specific characteristics have a statistically significant effect on the educational efficiency, while school-level variables do not have impact...

Journal ArticleDOI
TL;DR: It is demonstrated that regression RF outperforms the optimized logistic regression model, kNN, and bNN on the test data of the short-term installment credits.
Abstract: Consumer credit scoring is often considered a classification task where clients receive either a good or a bad credit status. Default probabilities provide more detailed information about the creditworthiness of consumers, and they are usually estimated by logistic regression. Here, we present a general framework for estimating individual consumer credit risks by use of machine learning methods. Since a probability is an expected value, all nonparametric regression approaches which are consistent for the mean are consistent for the probability estimation problem. Among others, random forests (RF), k-nearest neighbors (kNN), and bagged k-nearest neighbors (bNN) belong to this class of consistent nonparametric regression approaches. We apply the machine learning methods and an optimized logistic regression to a large dataset of complete payment histories of short-termed installment credits. We demonstrate probability estimation in Random Jungle, an RF package written in C++ with a generalized framework for fast tree growing, probability estimation, and classification. We also describe an algorithm for tuning the terminal node size for probability estimation. We demonstrate that regression RF outperforms the optimized logistic regression model, kNN, and bNN on the test data of the short-term installment credits.

Journal ArticleDOI
TL;DR: This article considers the problem of constructing nonparametric tolerance/prediction sets by starting from the general conformal prediction approach, and uses a kernel density estimator as a measure of agreement between a sample point and the underlying distribution.
Abstract: This article introduces a new approach to prediction by bringing together two different nonparametric ideas: distribution-free inference and nonparametric smoothing. Specifically, we consider the problem of constructing nonparametric tolerance/prediction sets. We start from the general conformal prediction approach, and we use a kernel density estimator as a measure of agreement between a sample point and the underlying distribution. The resulting prediction set is shown to be closely related to plug-in density level sets with carefully chosen cutoff values. Under standard smoothness conditions, we get an asymptotic efficiency result that is near optimal for a wide range of function classes. But the coverage is guaranteed whether or not the smoothness conditions hold and regardless of the sample size. The performance of our method is investigated through simulation studies and illustrated in a real data example.

Journal ArticleDOI
TL;DR: Threshold-free cluster-enhancement has recently been proposed as a useful analysis tool for fMRI datasets and this approach is adapted to optimally deal with EEG datasets and use permutation-based statistics to build an efficient statistical analysis.

Posted Content
TL;DR: In this article, a nonparametric independence screening (NIS) method is proposed to select variables by ranking a measure of the non-parametric marginal contributions of each covariate given the exposure variable.
Abstract: The varying-coefficient model is an important nonparametric statistical model that allows us to examine how the effects of covariates vary with exposure variables. When the number of covariates is big, the issue of variable selection arrives. In this paper, we propose and investigate marginal nonparametric screening methods to screen variables in ultra-high dimensional sparse varying-coefficient models. The proposed nonparametric independence screening (NIS) selects variables by ranking a measure of the nonparametric marginal contributions of each covariate given the exposure variable. The sure independent screening property is established under some mild technical conditions when the dimensionality is of nonpolynomial order, and the dimensionality reduction of NIS is quantified. To enhance practical utility and the finite sample performance, two data-driven iterative NIS methods are proposed for selecting thresholding parameters and variables: conditional permutation and greedy methods, resulting in Conditional-INIS and Greedy-INIS. The effectiveness and flexibility of the proposed methods are further illustrated by simulation studies and real data applications.

Book
13 Mar 2013
TL;DR: In this paper, the authors define distribution of order statistics for discrete distributions and moment relations for order statistics: normal distribution, asymptotic behavior of middle and intermediate order statistics, and extreme order statistics.
Abstract: Basic definitions.- Distributions of order statistics.- Sample quantiles and ranges.- Representations for order statistics.- Conditional distributions of order statistics.-Order statistics for discrete distributions.- Moments of order statistics: general relations.- Moments of uniform and exponential order statistics.- Moment relations for order statistics: normal distribution.- Asymptotic behavior of the middle and intermediate order statistics.- Asymptotic behavior of the extreme order statistics.- Some properties of estimators based on order statistics.- Minimum variance linear unbiased estimators.- Minimum variance linear unbiased estimators and predictors based on censored samples.- Estimation of parameters based on fixed number of sample quantiles.- Order statistics from extended samples.- Order statistics and record values.- Characterizations of distributions based on properties of order statistics.- Order statistics and record values based on Falpha distributions.- Generalized order statistics.- Compliments and problems.

Journal ArticleDOI
TL;DR: In this article, partial-identification results for average and quantile effects are given for discrete regressors, under static or dynamic conditions, in fully nonparametric and in semiparametric models, with time effects.
Abstract: Nonseparable panel models are important in a variety of economic settings, including discrete choice. This paper gives identification and estimation results for nonseparable models under time-homogeneity conditions that are like �time is randomly assigned� or �time is an instrument.� Partial-identification results for average and quantile effects are given for discrete regressors, under static or dynamic conditions, in fully nonparametric and in semiparametric models, with time effects. It is shown that the usual, linear, fixed-effects estimator is not a consistent estimator of the identified average effect, and a consistent estimator is given. A simple estimator of identified quantile treatment effects is given, providing a solution to the important problem of estimating quantile treatment effects from panel data. Bounds for overall effects in static and dynamic models are given. The dynamic bounds provide a partial-identification solution to the important problem of estimating the effect of state dependence in the presence of unobserved heterogeneity. The impact of T, the number of time periods, is shown by deriving shrinkage rates for the identified set as T grows. We also consider semiparametric, discrete-choice models and find that semiparametric panel bounds can be much tighter than nonparametric bounds. Computationally convenient methods for semiparametric models are presented. We propose a novel inference method that applies in panel data and other settings and show that it produces uniformly valid confidence regions in large samples. We give empirical illustrations.

Journal ArticleDOI
TL;DR: An R package is described that allows the computation of pointwise estimates of the HRs—and their corresponding confidence limits— of continuous predictors introduced nonlinearly, and provides functions for choosing automatically the degrees of freedom in multivariable Cox models.
Abstract: The Cox proportional hazards regression model has become the traditional choice for modeling survival data in medical studies. To introduce flexibility into the Cox model, several smoothing methods may be applied, and approaches based on splines are the most frequently considered in this context. To better understand the effects that each continuous covariate has on the outcome, results can be expressed in terms of splines-based hazard ratio (HR) curves, taking a specific covariate value as reference. Despite the potential advantages of using spline smoothing methods in survival analysis, there is currently no analytical method in the R software to choose the optimal degrees of freedom in multivariable Cox models (with two or more nonlinear covariate effects). This paper describes an R package, called smoothHR, that allows the computation of pointwise estimates of the HRs—and their corresponding confidence limits—of continuous predictors introduced nonlinearly. In addition the package provides functions for choosing automatically the degrees of freedom in multivariable Cox models. The package is available from the R homepage. We illustrate the use of the key functions of the smoothHR package using data from a study on breast cancer and data on acute coronary syndrome, from Galicia, Spain.

Journal ArticleDOI
TL;DR: In this article, the Bernstein-von Mises theorems for nonparametric Bayes priors in the Gaussian white noise model are proved and it is demonstrated how such results justify Bayes methods as efficient frequentist inference procedures in a variety of concrete non-parametric problems.
Abstract: Bernstein–von Mises theorems for nonparametric Bayes priors in the Gaussian white noise model are proved. It is demonstrated how such results justify Bayes methods as efficient frequentist inference procedures in a variety of concrete nonparametric problems. Particularly Bayesian credible sets are constructed that have asymptotically exact $1-\alpha$ frequentist coverage level and whose $L^{2}$-diameter shrinks at the minimax rate of convergence (within logarithmic factors) over Holder balls. Other applications include general classes of linear and nonlinear functionals and credible bands for auto-convolutions. The assumptions cover nonconjugate product priors defined on general orthonormal bases of $L^{2}$ satisfying weak conditions.

Journal ArticleDOI
TL;DR: A novel sufficient dimension-reduction method using a squared-loss variant of mutual information as a dependency measure that is formulated as a minimum contrast estimator on parametric or nonparametric models and a natural gradient algorithm on the Grassmann manifold for sufficient subspace search.
Abstract: The goal of sufficient dimension reduction in supervised learning is to find the low-dimensional subspace of input features that contains all of the information about the output values that the input features possess. In this letter, we propose a novel sufficient dimension-reduction method using a squared-loss variant of mutual information as a dependency measure. We apply a density-ratio estimator for approximating squared-loss mutual information that is formulated as a minimum contrast estimator on parametric or nonparametric models. Since cross-validation is available for choosing an appropriate model, our method does not require any prespecified structure on the underlying distributions. We elucidate the asymptotic bias of our estimator on parametric models and the asymptotic convergence rate on nonparametric models. The convergence analysis utilizes the uniform tail-bound of a U-process, and the convergence rate is characterized by the bracketing entropy of the model. We then develop a natural gradient algorithm on the Grassmann manifold for sufficient subspace search. The analytic formula of our estimator allows us to compute the gradient efficiently. Numerical experiments show that the proposed method compares favorably with existing dimension-reduction approaches on artificial and benchmark data sets.

01 Sep 2013
TL;DR: To compare the accuracy and fidelity of drop plate method vs. spread plate method by parametric and nonparametric statistical tests, successive dilutions of second subculture of Lactobacillus casei and Salmonella Typhimurium were transferred to selective agar and the correlation of agreement between both methods was evaluated.
Abstract: Drop plate technique has a priority and preference compared with the spread plate procedure, because of less time, quantity of media, effort requirement, little incubator space, and less labor intensive. The objective of this research was to compare the accuracy and fidelity of drop plate method vs. spread plate method by parametric and nonparametric statistical tests. For bacterial enumeration by drop and spread plate methods, successive dilutions of second subculture of Lactobacillus casei and Salmonella Typhimurium were transferred to selective agar. The correlation of agreement between both methods was evaluated by using statistical proofs. Results showed that mean value (parametric unpaired t-test) comparison at 95 percent confidence level did not reject null hypothesis, which it meant that the equality of the mean data could not be ruled out. Nonparametric method was used because of approximately Gaussian pattern of data distribution. For this purpose, Mann-Whitney test (equivalent nonparametric t-test) was used. It meant that the equality of medians obtained from two methods were similar. Spearman’s rho correlation coefficient (r) via both methods due to data distribution patterns for enumeration of S. Typhimurium and L. casei were 0.62 and 0.87, respectively; which represented moderately strong and strong relationship between two methods, respectively. Besides, there was a significant and strong positive correlation (p < 0.001) between spread and drop plate procedures. Because of aforementioned reasons, the spread plate method can be replaced by drop plate method.

Journal ArticleDOI
TL;DR: An overview of this literature covering some of the one-and two-chart schemes, including those that are appropriate in parameters known and unknown situations, and noting that normality is often an elusive assumption is discussed.
Abstract: In the control chart literature, a number of one-and two-chart schemes has been developed to simultaneously monitor the mean and variance parameters of normally distributed processes. These “joint” monitoring schemes are useful for situations in which special causes can result in a change in both the mean and the variance, and they allow practitioners to avoid the inflated false alarm rate which results from simply using two independent control charts (one each for mean and variance) without adjusting for multiple testing. We present an overview of this literature covering some of the one-and two-chart schemes, including those that are appropriate in parameters known (standards known) and unknown (standards unknown) situations. We also discuss some of the joint monitoring schemes for multivariate processes, autocorrelated data, and individual observations. In addition, noting that normality is often an elusive assumption, we discuss some available nonparametric schemes for jointly monitoring locat...

Journal ArticleDOI
TL;DR: This work presents a fully Bayesian, joint modeling approach to multiple imputation for categorical data based on Dirichlet process mixtures of multinomial distributions, which automatically models complex dependencies while being computationally expedient.
Abstract: In many surveys, the data comprise a large number of categorical variables that suffer from item nonresponse. Standard methods for multiple imputation, like log-linear models or sequential regressi...

Journal ArticleDOI
TL;DR: This article proposed a data-driven least-square cross-validation method to optimally select smoothing parameters for the nonparametric estimation of conditional cumulative distribution functions and conditional quantile functions.
Abstract: We propose a data-driven least-square cross-validation method to optimally select smoothing parameters for the nonparametric estimation of conditional cumulative distribution functions and conditional quantile functions. We allow for general multivariate covariates that can be continuous, categorical, or a mix of either. We provide asymptotic analysis, examine finite-sample properties via Monte Carlo simulation, and consider an application involving testing for first-order stochastic dominance of children’s health conditional on parental education and income. This article has supplementary materials online.

Journal ArticleDOI
TL;DR: The main objectives of this article are to investigate if there are any meaningful differences between adjusted and unadjusted effect sizes and compare the outcomes from parametric estimation of effect sizes.

ReportDOI
TL;DR: In this paper, the authors provide efficient estimators and honest bands for a variety of treatment effects including local average (LATE) and local quantile treatment effects (LQTE) in data-rich environments.
Abstract: In this paper, we provide efficient estimators and honest confidence bands for a variety of treatment effects including local average (LATE) and local quantile treatment effects (LQTE) in data-rich environments. We can handle very many control variables, endogenous receipt of treatment, heterogeneous treatment effects, and function-valued outcomes. Our framework covers the special case of exogenous receipt of treatment, either conditional on controls or unconditionally as in randomized control trials. In the latter case, our approach produces efficient estimators and honest bands for (functional) average treatment effects (ATE) and quantile treatment effects (QTE). To make informative inference possible, we assume that key reduced form predictive relationships are approximately sparse. This assumption allows the use of regularization and selection methods to estimate those relations, and we provide methods for post-regularization and post-selection inference that are uniformly valid (honest) across a wide-range of models. We show that a key ingredient enabling honest inference is the use of orthogonal or doubly robust moment conditions in estimating certain reduced form functional parameters. We illustrate the use of the proposed methods with an application to estimating the effect of 401(k) eligibility and participation on accumulated assets. The results on program evaluation are obtained as a consequence of more general results on honest inference in a general moment condition framework, where we work with possibly a continuum of moments. We provide results on honest inference for (function-valued) parameters within this general framework where modern machine learning methods are used to fit the nonparametric/highdimensional components of the model. These include a number of supporting new results that are of major independent interest: namely, we (1) prove uniform validity of a multiplier bootstrap, (2) offer a uniformly valid functional delta method, and (3) provide results for sparsity-based estimation of regression functions for function-valued outcomes.