Journal•

# arXiv: Methodology

About: arXiv: Methodology is an academic journal. The journal publishes majorly in the area(s): Estimator & Bayesian probability. Over the lifetime, 13056 publication(s) have been published receiving 98753 citation(s).

Topics: Estimator, Bayesian probability, Inference, Covariate, Population

##### Papers published on a yearly basis

##### Papers

More filters

•

TL;DR: The mixed membership stochastic block model as discussed by the authors extends block models for relational data to ones which capture mixed membership latent relational structure, thus providing an object-specific low-dimensional representation.

Abstract: Observations consisting of measurements on relationships for pairs of objects arise in many settings, such as protein interaction and gene regulatory networks, collections of author-recipient email, and social networks. Analyzing such data with probabilisic models can be delicate because the simple exchangeability assumptions underlying many boilerplate models no longer hold. In this paper, we describe a latent variable model of such data called the mixed membership stochastic blockmodel. This model extends blockmodels for relational data to ones which capture mixed membership latent relational structure, thus providing an object-specific low-dimensional representation. We develop a general variational inference algorithm for fast approximate posterior inference. We explore applications to social and protein interaction networks.

1,546 citations

••

TL;DR: The purpose of this article is to clarify the distinction between explanatory and predictive modeling, to discuss its sources, and to reveal the practical implications of the distinction to each step in the modeling process.

Abstract: Statistical modeling is a powerful tool for developing and testing theories by way of causal explanation, prediction, and description. In many disciplines there is near-exclusive use of statistical modeling for causal explanation and the assumption that models with high explanatory power are inherently of high predictive power. Conflation between explanation and prediction is common, yet the distinction must be understood for progressing scientific knowledge. While this distinction has been recognized in the philosophy of science, the statistical literature lacks a thorough discussion of the many differences that arise in the process of modeling for an explanatory versus a predictive goal. The purpose of this article is to clarify the distinction between explanatory and predictive modeling, to discuss its sources, and to reveal the practical implications of the distinction to each step in the modeling process.

1,085 citations

••

TL;DR: In this paper, a general procedure is studied to perturb a multivariate density satisfying a weak form of multivariate symmetry, and to generate a whole set of non-symmetric densities.

Abstract: A fairly general procedure is studied to perturbate a multivariate density satisfying a weak form of multivariate symmetry, and to generate a whole set of non-symmetric densities. The approach is general enough to encompass a number of recent proposals in the literature, variously related to the skew normal distribution. The special case of skew elliptical densities is examined in detail, establishing connections with existing similar work. The final part of the paper specializes further to a form of multivariate skew $t$ density. Likelihood inference for this distribution is examined, and it is illustrated with numerical examples.

1,051 citations

••

TL;DR: Azzalini and Dalla Valle as mentioned in this paper have recently discussed the multivariate skew-normal distribution which extends the class of normal distributions by the addition of a shape parameter.

Abstract: Azzalini & Dalla Valle (1996) have recently discussed the multivariate skew-normal distribution which extends the class of normal distributions by the addition of a shape parameter. The first part of the present paper examines further probabilistic properties of the distribution, with special emphasis on aspects of statistical relevance. Inferential and other statistical issues are discussed in the following part, with applications to some multivariate statistics problems, illustrated by numerical examples. Finally, a further extension is described which introduces a skewing factor of an elliptical density.

983 citations

•

TL;DR: This is the first set of results that allows any type of random forest, including classification and regression forests, to be used for provably valid statistical inference and is found to be substantially more powerful than classical methods based on nearest-neighbor matching.

Abstract: Many scientific and engineering challenges -- ranging from personalized medicine to customized marketing recommendations -- require an understanding of treatment effect heterogeneity. In this paper, we develop a non-parametric causal forest for estimating heterogeneous treatment effects that extends Breiman's widely used random forest algorithm. In the potential outcomes framework with unconfoundedness, we show that causal forests are pointwise consistent for the true treatment effect, and have an asymptotically Gaussian and centered sampling distribution. We also discuss a practical method for constructing asymptotic confidence intervals for the true treatment effect that are centered at the causal forest estimates. Our theoretical results rely on a generic Gaussian theory for a large family of random forest algorithms. To our knowledge, this is the first set of results that allows any type of random forest, including classification and regression forests, to be used for provably valid statistical inference. In experiments, we find causal forests to be substantially more powerful than classical methods based on nearest-neighbor matching, especially in the presence of irrelevant covariates.

816 citations