Journal•ISSN: 1133-0686
Test
Springer Science+Business Media
About: Test is an academic journal published by Springer Science+Business Media. The journal publishes majorly in the area(s): Estimator & Nonparametric statistics. It has an ISSN identifier of 1133-0686. Over the lifetime, 1409 publications have been published receiving 22757 citations. The journal is also known as: Test (Berlin. Springer. Print).
Papers published on a yearly basis
Papers
More filters
••
TL;DR: The present article reviews the most recent theoretical and methodological developments for random forests, with special attention given to the selection of parameters, the resampling mechanism, and variable importance measures.
Abstract: The random forest algorithm, proposed by L. Breiman in 2001, has been extremely successful as a general-purpose classification and regression method. The approach, which combines several randomized decision trees and aggregates their predictions by averaging, has shown excellent performance in settings where the number of variables is much larger than the number of observations. Moreover, it is versatile enough to be applied to large-scale problems, is easily adapted to various ad hoc learning tasks, and returns measures of variable importance. The present article reviews the most recent theoretical and methodological developments for random forests. Emphasis is placed on the mathematical forces driving the algorithm, with special attention given to the selection of parameters, the resampling mechanism, and variable importance measures. This review is intended to provide non-experts easy access to the main ideas.
1,279 citations
••
TL;DR: The proliferation of panel data studies is explained in terms of data availability, the more heightened capacity for modeling the complexity of human behavior than a single cross-section or time series data can possibly allow, and challenging methodology.
Abstract: We explain the proliferation of panel data studies in terms of (i) data availability, (ii) the more heightened capacity for modeling the complexity of human behavior than a single cross-section or time series data can possibly allow, and (iii) challenging methodology. Advantages and issues of panel data modeling are also discussed.
691 citations
••
Purdue University1, University of Granada2, Simón Bolívar University3, University of Valencia4, University of Murcia5, Autonomous University of Madrid6, Technical University of Madrid7, University of Nottingham8, University of Basel9, University of Rouen10, University College London11, Sapienza University of Rome12, University of Cincinnati13
TL;DR: An overview of the subject of robust Bayesian analysis is provided, one that is accessible to statisticians outside the field, and recent developments in the area are reviewed.
Abstract: Robust Bayesian analysis is the study of the sensitivity of Bayesian answers to uncertain inputs. This paper seeks to provide an overview of the subject, one that is accessible to statisticians outside the field. Recent developments in the area are also reviewed, though with very uneven emphasis.
587 citations
••
TL;DR: This comparison will consider the implementations of global Moran's I, Getis–Ord G and Geary’s C, local $$I_i$$Ii and $$G-i$$Gi, available in a range of software including Crimestat, GeoDa, ArcGIS, PySAL and R contributed packages.
Abstract: Functions to calculate measures of spatial association, especially measures of spatial autocorrelation, have been made available in many software applications Measures may be global, applying to the whole data set under consideration, or local, applying to each observation in the data set Methods of statistical inference may also be provided, but these will, like the measures themselves, depend on the support of the observations, chosen assumptions, and the way in which spatial association is represented; spatial weights are often used as a representational technique In addition, assumptions may be made about the underlying mean model, and about error distributions Different software implementations may choose to expose these choices to the analyst, but the sets of choices available may vary between these implementations, as may default settings This comparison will consider the implementations of global Moran’s I, Getis–Ord G and Geary’s C, local $$I_i$$
and $$G_i$$
, available in a range of software including Crimestat, GeoDa, ArcGIS, PySAL and R contributed packages
537 citations
••
TL;DR: In this paper, the authors discuss dierent criteria for error control in resampling-based multiple testing, including (a) the family wise error rate of Westfall and Young (1993), and (b) the false discovery rate developed by Benjamini and Hochberg (1995), both from a frequentist viewpoint; and (c) the positive false discovery ratio of Storey (2002a), which has a Bayesian motivation.
Abstract: The burgeoning field of genomics has revived interest in multiple testing procedures by raising new methodological and computational challenges. For example, microarray experiments generate large multiplicity problems in which thousands of hypotheses are tested simultaneously. Westfall and Young (1993) propose resampling-based p-value adjustment procedures which are highly relevant to microarray experiments. This article discusses dierent criteria for error control in resampling-based multiple testing, including (a) the family wise error rate of Westfall and Young (1993) and (b) the false discovery rate developed by Benjamini and Hochberg (1995), both from a frequentist viewpoint; and (c) the positive false discovery rate of Storey (2002a), which has a Bayesian motivation. We also introduce our recently developed fast algorithm for implementing the minP adjustment to control family-wise error rate. Adjusted p-values for dierent approaches are applied to gene expression data from two recently published microarray studies. The properties of these procedures for multiple testing are compared.
489 citations