scispace - formally typeset
Search or ask a question

Showing papers by "Paul W. Wilson published in 2016"


Journal ArticleDOI
TL;DR: In this paper, the authors use central limit theorem results from their previous work to develop additional theoretical results permitting consistent tests of model structure and provide Monte Carlo evidence on the performance of the tests in terms of size and power.
Abstract: Data envelopment analysis (DEA) and free disposal hull (FDH) estimators are widely used to estimate efficiency of production. Practitioners use DEA estimators far more frequently than FDH estimators, implicitly assuming that production sets are convex. Moreover, use of the constant returns to scale (CRS) version of the DEA estimator requires an assumption of CRS. Although bootstrap methods have been developed for making inference about the efficiencies of individual units, until now no methods exist for making consistent inference about differences in mean efficiency across groups of producers or for testing hypotheses about model structure such as returns to scale or convexity of the production set. We use central limit theorem results from our previous work to develop additional theoretical results permitting consistent tests of model structure and provide Monte Carlo evidence on the performance of the tests in terms of size and power. In addition, the variable returns to scale version of the DEA estima...

92 citations


Posted Content
TL;DR: This paper showed that the standard central limit theorem (CLT) results do not hold for means of nonparametric conditional efficiency estimators, and provided new CLTs that do hold, permitting applied researchers to estimate confidence intervals for mean conditional efficiency or to compare mean efficiency across groups of producers along the lines of the test developed by Kneip et al. (2015b).
Abstract: This paper demonstrates that standard central limit theorem (CLT) results do not hold for means of nonparametric conditional efficiency estimators, and provides new CLTs that do hold, permitting applied researchers to estimate confidence intervals for mean conditional efficiency or to compare mean efficiency across groups of producers along the lines of the test developed by Kneip et al. (JBES, 2015b). The new CLTs are used to develop a test of the "separability" condition that is necessary for second-stage regressions of efficiency estimates on environmental variables. We show that if this condition is violated, not only are second-stage regressions meaningless,but also first-stage, unconditional efficiency estimates are without meaning. As such,the test developed here is of fundamental importance to applied researchers using non-parametric methods for efficiency estimation. Our simulation results indicate that our tests perform well both in terms of size and power. We present a real-world empirical example by updating the analysis performed by Aly et al. (R. E. Stat., 1990) on U.S. commercial banks; our tests easily reject the assumption required for two-stage estimation, calling into question results that appear in hundreds of papers that have been published in recent years.

9 citations


Posted Content
TL;DR: This paper introduces and empirically analyze Clustered Latent Dirichlet Allocation (CLDA), a method for extracting dynamic latent topics from a collection of documents based on data decomposition in which the data is partitioned into segments, followed by topic modeling on the individual segments.
Abstract: Topic modeling, a method for extracting the underlying themes from a collection of documents, is an increasingly important component of the design of intelligent systems enabling the sense-making of highly dynamic and diverse streams of text data Traditional methods such as Dynamic Topic Modeling (DTM) do not lend themselves well to direct parallelization because of dependencies from one time step to another In this paper, we introduce and empirically analyze Clustered Latent Dirichlet Allocation (CLDA), a method for extracting dynamic latent topics from a collection of documents Our approach is based on data decomposition in which the data is partitioned into segments, followed by topic modeling on the individual segments The resulting local models are then combined into a global solution using clustering The decomposition and resulting parallelization leads to very fast runtime even on very large datasets Our approach furthermore provides insight into how the composition of topics changes over time and can also be applied using other data partitioning strategies over any discrete features of the data, such as geographic features or classes of users In this paper CLDA is applied successfully to seventeen years of NIPS conference papers (2,484 documents and 3,280,697 words), seventeen years of computer science journal abstracts (533,560 documents and 32,551,540 words), and to forty years of the PubMed corpus (4,025,978 documents and 273,853,980 words)

9 citations