scispace - formally typeset
Search or ask a question
Topic

Resampling

About: Resampling is a research topic. Over the lifetime, 5428 publications have been published within this topic receiving 242291 citations.


Papers
More filters
Journal ArticleDOI
01 Feb 2004
TL;DR: It is concluded that combining different expressions of the resampling approach is an effective solution to the tuning problem and the proposed combination scheme is evaluated on imbalanced subsets of the Reuters‐21578 text collection and is shown to be quite effective for these problems.
Abstract: Resampling methods are commonly used for dealing with the class-imbalance problem. Their advantage over other methods is that they are external and thus, easily transportable. Although such approaches can be very simple to implement, tuning them most effectively is not an easy task. In particular, it is unclear whether oversampling is more effective than undersampling and which oversampling or undersampling rate should be used. This paper presents an experimental study of these questions and concludes that combining different expressions of the resampling approach is an effective solution to the tuning problem. The proposed combination scheme is evaluated on imbalanced subsets of the Reuters-21578 text collection and is shown to be quite effective for these problems.

904 citations

Book
01 Jan 2008
TL;DR: In this paper, the authors present an approach to multivariate data analysis for paleontological data, which is based on the allometric equation and a set of properties of the data.
Abstract: Preface. Acknowledgments. 1 Introduction. 1.1 The nature of paleontological data. 1.2 Advantages and pitfalls of paleontological data analysis. 1.3 Software. 2 Basic statistical methods. 2.1 Introduction. 2.2 Statistical distributions. 2.3 Shapiro-Wilk test for normal distribution. 2.4 F test for equality of variances. 2.5 Student's t test and Welch test for equality of means. 2.6 Mann-Whitney U test for equality of medians. 2.7 Kolmogorov-Smirnov test for equality of distributions. 2.8 Permutation and resampling. 2.9 One-way ANOVA. 2.10 Kruskal-Wallis test. 2.11 Linear correlation. 2.12 Non-parametric tests for correlation. 2.13 Linear regression. 2.14 Reduced major axis regression. 2.15 Nonlinear curve fitting. 2.16 Chi-square test. 3 Introduction to multivariate data analysis. 3.1 Approaches to multivariate data analysis. 3.2 Multivariate distributions. 3.3 Parametric multivariate tests. 3.4 Non-parametric multivariate tests. 3.5 Hierarchical cluster analysis. 3.5 K-means cluster analysis. 4 Morphometrics. 4.1 Introduction. 4.2 The allometric equation. 4.3 Principal components analysis (PCA). 4.4 Multivariate allometry. 4.5 Discriminant analysis for two groups. 4.6 Canonical variate analysis (CVA). 4.7 MANOVA. 4.8 Fourier shape analysis. 4.9 Elliptic Fourier analysis. 4.10 Eigenshape analysis. 4.11 Landmarks and size measures. 4.12 Procrustean fitting. 4.13 PCA of landmark data. 4.14 Thin-plate spline deformations. 4.15 Principal and partial warps. 4.16 Relative warps. 4.17 Regression of partial warp scores. 4.18 Disparity measures. 4.19 Point distribution statistics. 4.20 Directional statistics. Case study: The ontogeny of a Silurian trilobite. 5 Phylogenetic analysis. 5.1 Introduction. 5.2 Characters. 5.3 Parsimony analysis. 5.4 Character state reconstruction. 5.5 Evaluation of characters and tree topologies. 5.6 Consensus trees. 5.7 Consistency index. 5.8 Retention index. 5.9 Bootstrapping. 5.10 Bremer support. 5.11 Stratigraphical congruency indices. 5.12 Phylogenetic analysis with Maximum Likelihood. Case study: The systematics of heterosporous ferns. 6 Paleobiogeography and paleoecology. 6.1 Introduction. 6.2 Diversity indices. 6.3 Taxonomic distinctness. 6.4 Comparison of diversity indices. 6.5 Abundance models. 6.6 Rarefaction. 6.7 Diversity curves. 6.8 Size-frequency and survivorship curves. 6.9 Association similarity indices for presence/absence data. 6.10 Association similarity indices for abundance data. 6.11 ANOSIM and NPMANOVA. 6.12 Correspondence analysis. 6.13 Principal Coordinates analysis (PCO). 6.14 Non-metric Multidimensional Scaling (NMDS). 6.15 Seriation. Case study: Ashgill brachiopod paleocommunities from East China. 7 Time series analysis. 7.1 Introduction. 7.2 Spectral analysis. 7.3 Autocorrelation. 7.4 Cross-correlation. 7.5 Wavelet analysis. 7.6 Smoothing and filtering. 7.7 Runs test. Case study: Sepkoski's generic diversity curve for the Phanerozoic. 8 Quantitative biostratigraphy. 8.1 Introduction. 8.2 Parametric confidence intervals on stratigraphic ranges. 8.3 Non-parametric confidence intervals on stratigraphic ranges. 8.4 Graphic correlation. 8.5 Constrained optimisation. 8.6 Ranking and scaling. 8.7 Unitary Associations. 8.8 Biostratigraphy by ordination. 8.9 What is the best method for quantitative biostratigraphy?. Appendix A: Plotting techniques. Appendix B: Mathematical concepts and notation. References. Index

867 citations

Journal Article
TL;DR: In this article, a sampling-resampling perspective on Bayesian inference is presented, which has both pedagogic appeal and suggests easily implemented calculation strategies, such as sampling-based methods.
Abstract: Even to the initiated, statistical calculations based on Bayes's Theorem can be daunting because of the numerical integrations required in all but the simplest applications. Moreover, from a teaching perspective, introductions to Bayesian statistics—if they are given at all—are circumscribed by these apparent calculational difficulties. Here we offer a straightforward sampling-resampling perspective on Bayesian inference, which has both pedagogic appeal and suggests easily implemented calculation strategies.

861 citations

Journal ArticleDOI
TL;DR: A straightforward sampling-resampling perspective on Bayesian inference is offered, which has both pedagogic appeal and suggests easily implemented calculation strategies.
Abstract: Even to the initiated, statistical calculations based on Bayes's Theorem can be daunting because of the numerical integrations required in all but the simplest applications. Moreover, from a teaching perspective, introductions to Bayesian statistics—if they are given at all—are circumscribed by these apparent calculational difficulties. Here we offer a straightforward sampling-resampling perspective on Bayesian inference, which has both pedagogic appeal and suggests easily implemented calculation strategies.

852 citations

Journal ArticleDOI
TL;DR: The present paper’s principal topic is the estimation of the variability of fitted parameters and derived quantities, such as thresholds and slopes, and introduces improved confidence intervals that improve on the parametric and percentile-based bootstrap confidence intervals previously used.
Abstract: The psychometric function relates an observer's performance to an independent variable, usually a physical quantity of an experimental stimulus Even if a model is successfully fit to the data and its goodness of fit is acceptable, experimenters require an estimate of the variability of the parameters to assess whether differences across conditions are significant Accurate estimates of variability are difficult to obtain, however, given the typically small size of psychophysical data sets: Traditional statistical techniques are only asymptotically correct and can be shown to be unreliable in some common situations Here and in our companion paper (Wichmann & Hill, 2001), we suggest alternative statistical techniques based on Monte Carlo resampling methods The present paper's principal topic is the estimation of the variability of fitted parameters and derived quantities, such as thresholds and slopes First, we outline the basic bootstrap procedure and argue in favor of the parametric, as opposed to the nonparametric, bootstrap Second, we describe how the bootstrap bridging assumption, on which the validity of the procedure depends, can be tested Third, we show how one's choice of sampling scheme (the placement of sample points on the stimulus axis) strongly affects the reliability of bootstrap confidence intervals, and we make recommendations on how to sample the psychometric function efficiently Fourth, we show that, under certain circumstances, the (arbitrary) choice of the distribution function can exert an unwanted influence on the size of the bootstrap confidence intervals obtained, and we make recommendations on how to avoid this influence Finally, we introduce improved confidence intervals (bias corrected and accelerated) that improve on the parametric and percentile-based bootstrap confidence intervals previously used Software implementing our methods is available

838 citations


Network Information
Related Topics (5)
Estimator
97.3K papers, 2.6M citations
89% related
Inference
36.8K papers, 1.3M citations
87% related
Sampling (statistics)
65.3K papers, 1.2M citations
86% related
Regression analysis
31K papers, 1.7M citations
86% related
Markov chain
51.9K papers, 1.3M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20251
20242
2023377
2022759
2021275
2020279