scispace - formally typeset
Search or ask a question
Topic

Resampling

About: Resampling is a research topic. Over the lifetime, 5428 publications have been published within this topic receiving 242291 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: This paper proposes a generalized estimating equations (GEEs) based kernel association test, a variance component based testing method, to test for the association between a phenotype and multiple variants in an SNP set jointly using family samples, for both continuous and discrete traits.
Abstract: Family-based genetic association studies of related individuals provide opportunities to detect genetic variants that complement studies of unrelated individuals. Most statistical methods for family association studies for common variants are single marker based, which test one SNP a time. In this paper, we consider testing the effect of an SNP set, e.g., SNPs in a gene, in family studies, for both continuous and discrete traits. Specifically, we propose a generalized estimating equations (GEEs) based kernel association test, a variance component based testing method, to test for the association between a phenotype and multiple variants in an SNP set jointly using family samples. The proposed approach allows for both continuous and discrete traits, where the correlation among family members is taken into account through the use of an empirical covariance estimator. We derive the theoretical distribution of the proposed statistic under the null and develop analytical methods to calculate the P-values. We also propose an efficient resampling method for correcting for small sample size bias in family studies. The proposed method allows for easily incorporating covariates and SNP-SNP interactions. Simulation studies show that the proposed method properly controls for type I error rates under both random and ascertained sampling schemes in family studies. We demonstrate through simulation studies that our approach has superior performance for association mapping compared to the single marker based minimum P-value GEE test for an SNP-set effect over a range of scenarios. We illustrate the application of the proposed method using data from the Cleveland Family GWAS Study.

51 citations

Journal ArticleDOI
TL;DR: In this article, the authors construct prediction intervals for autoregressive conditional heteroskedasticity (ARCH) models using the bootstrap and compare their prediction intervals to traditional asymptotic prediction intervals.

51 citations

Journal ArticleDOI
TL;DR: It is shown that model-based residual bootstrapping q-ball generates results that closely match the output of the conventional bootstrap, avoiding existing limitations associated with data calibration and model selection.
Abstract: Bootstrapping of repeated diffusion-weighted image datasets enables nonparametric quantification of the uncertainty in the inferred fiber orientation. The wild bootstrap and the residual bootstrap are model-based residual resampling methods which use a single dataset. Previously, the wild bootstrap method has been presented as an alternative to conventional bootstrapping for diffusion tensor imaging. Here we present a study of an implementation of model-based residual bootstrapping using q -ball analysis and compare the outputs with conventional bootstrapping. We show that model-based residual bootstrap q-ball generates results that closely match the output of the conventional bootstrap. Both the residual and conventional bootstrap of multifiber methods can be used to estimate the probability of different numbers of fiber populations existing in different brain tissues. Also, we have shown that these methods can be used to provide input for probabilistic tractography, avoiding existing limitations associated with data calibration and model selection.

50 citations

Journal ArticleDOI
TL;DR: A datamining framework that can unravel the higher order relationships among dosimetric dose-volume prognostic variables, interrogate various radiobiological processes, and generalize to unseen data before when applied prospectively is described.
Abstract: Background. tumor control probability (tCp) to radiotherapy is determined by complex interactions between tumor biol ogy, tumor microenvironment, radiation dosimetry, and patient-related variables. the complexity of these heterogeneous variable interactions constitutes a challenge for building predictive models for routine clinical practice. We describe a datamining framework that can unravel the higher order relationships among dosimetric dose-volume prognostic variables, interrogate various radiobiological processes, and generalize to unseen data before when applied prospectively. Material and methods. Several datamining approaches are discussed that include dose-volume metrics, equivalent uniform dose, mecha nistic poisson model, and model building methods using statistical regression and machine learning techniques. Institutional datasets of non-small cell lung cancer (NSCLC) patients are used to demonstrate these methods. the performance of the different methods was evaluated using bivariate Spearman rank correlations (rs). Over-fitting was controlled via resampling methods. Results. Using a dataset of 56 patients with primary NCSLC tumors and 23 candidate variables, we estimated gtV volume and V75 to be the best model parameters for predicting tCp using statistical resampling and a logistic model. Using these variables, the support vector machine (SVM) kernel method provided superior performance for tCp prediction with an rs 0.68 on leave-one-out testing compared to logistic regression (rs 0.4), poisson-based tCp (rs0.33), and cell kill equivalent uniform dose model (rs 0.17). Conclusions. the prediction of treatment response can be improved by utilizing datamining approaches, which are able to unravel important non-linear complex interactions among model variables and have the capacity to predict on unseen data for prospective clinical applications.

50 citations

Journal ArticleDOI
TL;DR: In this paper, the authors used a bootstrap procedure to extend this approach and generate confidence intervals for diversity indexes, which are sensitive to the size, shape, and spatial arrangement of patches.
Abstract: Many landscape indexes with ecological relevance have been proposed, including diversity indexes, dominance, fractal dimension, and patch size distribution. Classified land cover data in a geographic information system (GIS) are frequently used to calculate these indexes. However, a lack of methods for quantifying uncertainty in these measures makes it difficult to test hypothesized relations among landscape indexes and ecological processes. One source of uncertainty in landscape indexes is classification error in land cover data, which can be reported in the form of an error matrix. Some researchers have used error matrices to adjust extent estimates derived from classified land cover data. Because landscape diversity indexes depend only on landscape composition – the extent of each cover in a landscape – adjusted extent estimates may be used to calculate diversity indexes. We used a bootstrap procedure to extend this approach and generate confidence intervals for diversity indexes. Bootstrapping is a technique that allows one to estimate sample variability by resampling from the empirical probability distribution defined by a single sample. Using the empirical distribution defined by an error matrix, we generated a bootstrap sample of error matrixes. The sample of error matrixes was used to generate a sample of adjusted diversity indexes from which estimated confidence intervals for the diversity indexes were calculated. We also note that present methods for accuracy assessment are not sufficient for quantifying the uncertainty in landscape indexes that are sensitive to the size, shape, and spatial arrangement of patches. More information about the spatial structure of error is needed to calculate uncertainty for these indexes. Alternative approaches should be considered, including combining traditional accuracy assessments with other probability data generated during the classification procedure.

50 citations


Network Information
Related Topics (5)
Estimator
97.3K papers, 2.6M citations
89% related
Inference
36.8K papers, 1.3M citations
87% related
Sampling (statistics)
65.3K papers, 1.2M citations
86% related
Regression analysis
31K papers, 1.7M citations
86% related
Markov chain
51.9K papers, 1.3M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20251
20242
2023377
2022759
2021275
2020279