scispace - formally typeset
Search or ask a question

Showing papers on "Resampling published in 2014"


Journal ArticleDOI
TL;DR: In this paper, an ultrafast bootstrap approximation approach (UFBoot) is proposed to compute the support of phylogenetic groups in maximum likelihood (ML) based trees, which combines the resampling estimated log-likelihood method with a simple but effective collection scheme of candidate trees.
Abstract: Nonparametric bootstrap has been a widely used tool in phylogenetic analysis to assess the clade support of phylogenetic trees. However, with the rapidly growing amount of data, this task remains a computational bottleneck. Recently, approximation methods such as the RAxML rapid bootstrap (RBS) and the Shimodaira-Hasegawa-like approximate likelihood ratio test have been introduced to speed up the bootstrap. Here, we suggest an ultrafast bootstrap approximation approach (UFBoot) to compute the support of phylogenetic groups in maximum likelihood (ML) based trees. To achieve this, we combine the resampling estimated log-likelihood method with a simple but effective collection scheme of candidate trees. We also propose a stopping rule that assesses the convergence of branch support values to automatically determine when to stop collecting candidate trees. UFBoot achieves a median speed up of 3.1 (range: 0.66-33.3) to 10.2 (range: 1.32-41.4) compared with RAxML RBS for real DNA and amino acid alignments, respectively. Moreover, our extensive simulations show that UFBoot is robust against moderate model violations and the support values obtained appear to be relatively unbiased compared with the conservative standard bootstrap. This provides a more direct interpretation of the bootstrap support. We offer an efficient and easy-to-use software (available at http://www.cibiv.at/software/iqtree) to perform the UFBoot analysis with ML tree inference.

723 citations


Journal ArticleDOI
TL;DR: The ‘bag of little bootstraps’ (BLB) is introduced, which is a new procedure which incorporates features of both the bootstrap and subsampling to yield a robust, computationally efficient means of assessing the quality of estimators.
Abstract: Summary The bootstrap provides a simple and powerful means of assessing the quality of estimators. However, in settings involving large data sets—which are increasingly prevalent—the calculation of bootstrap-based quantities can be prohibitively demanding computationally. Although variants such as subsampling and the m out of n bootstrap can be used in principle to reduce the cost of bootstrap computations, these methods are generally not robust to specification of tuning parameters (such as the number of subsampled data points), and they often require knowledge of the estimator's convergence rate, in contrast with the bootstrap. As an alternative, we introduce the ‘bag of little bootstraps’ (BLB), which is a new procedure which incorporates features of both the bootstrap and subsampling to yield a robust, computationally efficient means of assessing the quality of estimators. The BLB is well suited to modern parallel and distributed computing architectures and furthermore retains the generic applicability and statistical efficiency of the bootstrap. We demonstrate the BLB's favourable statistical performance via a theoretical analysis elucidating the procedure's properties, as well as a simulation study comparing the BLB with the bootstrap, the m out of n bootstrap and subsampling. In addition, we present results from a large-scale distributed implementation of the BLB demonstrating its computational superiority on massive data, a method for adaptively selecting the BLB's tuning parameters, an empirical study applying the BLB to several real data sets and an extension of the BLB to time series data.

318 citations


Journal ArticleDOI
TL;DR: For non-normal data alternative techniques, especially the permutation test and using the RIN (rank-based inverse normal) transformation, offer better control of type I error and good power.

242 citations


Journal ArticleDOI
TL;DR: Two different bootstrap methods for use when using propensity-score matching without replacement are proposed and their performance with a series of Monte Carlo simulations is examined.
Abstract: Propensity-score matching is frequently used to estimate the effect of treatments, exposures, and interventions when using observational data. An important issue when using propensity-score matching is how to estimate the standard error of the estimated treatment effect. Accurate variance estimation permits construction of confidence intervals that have the advertised coverage rates and tests of statistical significance that have the correct type I error rates. There is disagreement in the literature as to how standard errors should be estimated. The bootstrap is a commonly used resampling method that permits estimation of the sampling variability of estimated parameters. Bootstrap methods are rarely used in conjunction with propensity-score matching. We propose two different bootstrap methods for use when using propensity-score matching without replacementand examined their performance with a series of Monte Carlo simulations. The first method involved drawing bootstrap samples from the matched pairs in the propensity-score-matched sample. The second method involved drawing bootstrap samples from the original sample and estimating the propensity score separately in each bootstrap sample and creating a matched sample within each of these bootstrap samples. The former approach was found to result in estimates of the standard error that were closer to the empirical standard deviation of the sampling distribution of estimated effects.

197 citations


Journal ArticleDOI
TL;DR: The results that have been obtained in the tests are presented and discussed in the paper; in particular, the performances that are achieved by the four classifiers through the proposed novel resampling approach have been compared to the ones that are obtained through a widely applied and well known resamplings technique.

135 citations


Proceedings Article
21 Jun 2014
TL;DR: This work devise a procedure for detecting concept drifts in data-streams that relies on analyzing the empirical loss of learning algorithms by obtaining statistics from the loss distribution by reusing the data multiple times via resampling.
Abstract: Detecting changes in data-streams is an important part of enhancing learning quality in dynamic environments. We devise a procedure for detecting concept drifts in data-streams that relies on analyzing the empirical loss of learning algorithms. Our method is based on obtaining statistics from the loss distribution by reusing the data multiple times via resampling. We present theoretical guarantees for the proposed procedure based on the stability of the underlying learning algorithms. Experimental results show that the method has high recall and precision, and performs well in the presence of noise.

99 citations


Journal ArticleDOI
TL;DR: The results show the inadequacy of resampling series generated by white noise and red noise that are nevertheless the methods currently used in the wide majority of wavelets applications and highlight that the characteristics of a time series, namely its Fourier spectrum and autocorrelation, are important to consider when choosing the resamplings technique.
Abstract: Wavelet analysis is now frequently used to extract information from ecological and epidemiological time series. Statistical hypothesis tests are conducted on associated wavelet quantities to assess the likelihood that they are due to a random process. Such random processes represent null models and are generally based on synthetic data that share some statistical characteristics with the original time series. This allows the comparison of null statistics with those obtained from original time series. When creating synthetic datasets, different techniques of resampling result in different characteristics shared by the synthetic time series. Therefore, it becomes crucial to consider the impact of the resampling method on the results. We have addressed this point by comparing seven different statistical testing methods applied with different real and simulated data. Our results show that statistical assessment of periodic patterns is strongly affected by the choice of the resampling method, so two different resampling techniques could lead to two different conclusions about the same time series. Moreover, our results clearly show the inadequacy of resampling series generated by white noise and red noise that are nevertheless the methods currently used in the wide majority of wavelets applications. Our results highlight that the characteristics of a time series, namely its Fourier spectrum and autocorrelation, are important to consider when choosing the resampling technique. Results suggest that data-driven resampling methods should be used such as the hidden Markov model algorithm and the ‘beta-surrogate’ method.

95 citations


Journal ArticleDOI
TL;DR: Performance of prognostic models constructed using the lasso technique can be optimistic as well, although results of the internal validation are sensitive to how bootstrap resampling is performed.
Abstract: Background: In prognostic studies, the lasso technique is attractive since it improves the quality of predictions by shrinking regression coefficients, compared to predictions based on a model fitted via unpenalized maximum likelihood Since some coefficients are set to zero, parsimony is achieved as well It is unclear whether the performance of a model fitted using the lasso still shows some optimism Bootstrap methods have been advocated to quantify optimism and generalize model performance to new subjects It is unclear how resampling should be performed in the presence of multiply imputed data Method: The data were based on a cohort of Chronic Obstructive Pulmonary Disease patients We constructed models to predict Chronic Respiratory Questionnaire dyspnea 6 months ahead Optimism of the lasso model was investigated by comparing 4 approaches of handling multiply imputed data in the bootstrap procedure, using the study data and simulated data sets In the first 3 approaches, data sets that had been completed via multiple imputation (MI) were resampled, while the fourth approach resampled the incomplete data set and then performed MI Results: The discriminative model performance of the lasso was optimistic There was suboptimal calibration due to over-shrinkage The estimate of optimism was sensitive to the choice of handling imputed data in the bootstrap resampling procedure Resampling the completed data sets underestimates optimism, especially if, within a bootstrap step, selected individuals differ over the imputed data sets Incorporating the MI procedure in the validation yields estimates of optimism that are closer to the true value, albeit slightly too larger Conclusion: Performance of prognostic models constructed using the lasso technique can be optimistic as well Results of the internal validation are sensitive to how bootstrap resampling is performed

92 citations


Journal ArticleDOI
TL;DR: This article proposes a multivariate two-sample test that can be conveniently used in the high dimension low sample size setup, and investigates the performance of this test on several high-dimensional simulated and real data sets, and demonstrates its superiority over several other existing two- sample tests.

87 citations


Journal ArticleDOI
TL;DR: A novel pseudo-noise resampling (PR) based unitary root-MUSIC algorithm for direction-of-arrival (DOA) estimation is derived and a distance detection strategy which exploits the information contained in the estimated root estimator to help determine the final DOA estimates when all the DOA estimators fail to pass the reliability test is proposed.
Abstract: A novel pseudo-noise resampling (PR) based unitary root-MUSIC algorithm for direction-of-arrival (DOA) estimation is derived in this letter. Our solution is able to eliminate the abnormal DOA estimator called outlier and obtain an approximate outlier-free performance in the unitary root-MUSIC algorithm. In particular, we utilize a hypothesis test to detect the outlier. Meanwhile, a PR process is applied to form a DOA estimator bank and a corresponding root estimator bank. We propose a distance detection strategy which exploits the information contained in the estimated root estimator to help determine the final DOA estimates when all the DOA estimators fail to pass the reliability test. Furthermore, the proposed method is realized in terms of real-valued computations, leading to an efficient implementation. Simulations show that the improved MUSIC scheme can significantly improve the DOA resolution at low signal-to-noise ratios and small samples.

81 citations


Journal ArticleDOI
Yun Qian1, Yun Qian2, Yanchun Liang2, Mu Li2, Guoxiang Feng2, Xiaohu Shi2 
TL;DR: A resampling ensemble algorithm developed focused on the classification problems for imbalanced datasets shows that the algorithm performance is highly related to the ratio of minority class number and attribute number.

Journal ArticleDOI
TL;DR: In this article, a distribution-free bootstrap method was used to estimate building thermal performance. But, the results indicate that the probabilistic sensitivity analysis incorporating the bootstrap approach provides valuable insights into the variations in sensitivity indicators, which are not available from typical deterministic sensitivity analysis.

Journal ArticleDOI
TL;DR: In this article, a case study of annual maximum daily precipitation over the mountainous Mesochora catchment in Greece is presented, showing that the bias-corrected and accelerated method is best overall for the extreme percentiles, and the fixed-t method also has good average coverage probabilities.
Abstract: The generalized extreme value (GEV) distribution is often fitted to environmental time series of extreme values such as annual maxima of daily precipitation. We study two methodological issues here. First, we compare criteria for selecting the best model among 16 GEV models that allow nonstationary scale and location parameters. Simulation results showed that both the corrected Akaike information criterion and Bayesian information criterion (BIC) always detected nonstationarity, but the BIC selected the correct model more often except in very small samples. Second, we examined confidence intervals (CIs) for model parameters and other quantities such as the return levels that are usually required for hydrological and climatological time series. Four bootstrap CIs—normal, percentile, basic and bias-corrected and accelerated—constructed by random-t resampling, fixed-t resampling and the parametric bootstrap methods were compared. CIs for parameters of the stationary model do not present major differences. CIs for the more extreme quantiles tend to become very wide for all bootstrap methods. For nonstationary GEV models with linear time dependence of location or log-linear time dependence of scale, CI coverage probabilities are reasonably accurate for the parameters. For the extreme percentiles, the bias-corrected and accelerated method is best overall, and the fixed-t method also has good average coverage probabilities. A case study is presented of annual maximum daily precipitation over the mountainous Mesochora catchment in Greece. Analysis of historical data and data generated under two climate scenarios (control run and climate change) supported a stationary GEV model reducing to the Gumbel distribution. Copyright © 2013 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: An improved square root unscented fast simultaneous localization and mapping (FastSLAM) is proposed in this paper that propagates and updates the square root of the state covariance directly in Cholesky decomposition form and can maintain the diversity of particles and consequently avoid inconsistency for longer time periods.
Abstract: An improved square root unscented fast simultaneous localization and mapping (FastSLAM) is proposed in this paper. The proposed method propagates and updates the square root of the state covariance directly in Cholesky decomposition form. Since the choice of the proposal distribution and that of the resampling method are the most critical issues to ensure the performance of the algorithm, its optimization is considered by improving the sampling and resampling steps. For this purpose, particle swarm optimization (PSO) is used to optimize the proposal distribution. PSO causes the particle set to tend to the high probability region of the posterior before the weights are updated; thereby, the impoverishment of particles can be overcome. Moreover, a new resampling algorithm is presented to improve the resampling step. The new resampling algorithm can conquer the defects of the resampling algorithm and solve the degeneracy and sample impoverishment problem simultaneously. Compared to unscented FastSLAM (UFastSLAM), the proposed algorithm can maintain the diversity of particles and consequently avoid inconsistency for longer time periods, and furthermore, it can improve the estimation accuracy compared to UFastSLAM. These advantages are verified by simulations and experimental tests for benchmark environments.

Journal ArticleDOI
TL;DR: This paper proposes to estimate the class ratio in the test dataset by matching probability distributions of training and test input data and demonstrates the utility of the proposed approach through experiments.

Journal ArticleDOI
TL;DR: A test is introduced based on a recently studied variant of the sequential empirical copula process that detects distributional changes in multivariate time series better and proposes a multiplier resampling scheme that takes the serial dependence into account.

Journal ArticleDOI
TL;DR: Applying resampling methods in a class-imbalanced dataset improved the classification power of CART, random forests, and generalized boosted trees.
Abstract: In the medical field, many outcome variables are dichotomized, and the two possible values of a dichotomized variable are referred to as classes. A dichotomized dataset is class-imbalanced if it consists mostly of one class, and performance of common classification models on this type of dataset tends to be suboptimal. To tackle such a problem, resampling methods, including oversampling and undersampling can be used. This paper aims at illustrating the effect of resampling methods using the National Health and Nutrition Examination Survey (NHANES) wave 2009–2010 dataset. A total of 4677 participants aged ≥20 without self-reported diabetes and with valid blood test results were analyzed. The Classification and Regression Tree (CART) procedure was used to build a classification model on undiagnosed diabetes. A participant demonstrated evidence of diabetes according to WHO diabetes criteria. Exposure variables included demographics and socio-economic status. CART models were fitted using a randomly selected 70% of the data (training dataset), and area under the receiver operating characteristic curve (AUC) was computed using the remaining 30% of the sample for evaluation (testing dataset). CART models were fitted using the training dataset, the oversampled training dataset, the weighted training dataset, and the undersampled training dataset. In addition, resampling case-to-control ratio of 1:1, 1:2, and 1:4 were examined. Resampling methods on the performance of other extensions of CART (random forests and generalized boosted trees) were also examined. CARTs fitted on the oversampled (AUC = 0.70) and undersampled training data (AUC = 0.74) yielded a better classification power than that on the training data (AUC = 0.65). Resampling could also improve the classification power of random forests and generalized boosted trees. To conclude, applying resampling methods in a class-imbalanced dataset improved the classification power of CART, random forests, and generalized boosted trees.

Journal ArticleDOI
TL;DR: In this article, the effects of sampling intensity on statistical power and the selection of a resampling interval were analyzed for stream chemistry data at Biscuit Brook, New York, to evaluate statistical confidence in the detection of change over time.

Journal ArticleDOI
TL;DR: It was found that the bootstrap methods provided better estimates of uncertainty for parameters in NLMEM with high nonlinearity and having balanced designs compared to the Asym, as implemented in MONOLIX.
Abstract: Bootstrap methods are used in many disciplines to estimate the uncertainty of parameters, including multi-level or linear mixed-effects models. Residual-based bootstrap methods which resample both random effects and residuals are an alternative approach to case bootstrap, which resamples the individuals. Most PKPD applications use the case bootstrap, for which software is available. In this study, we evaluated the performance of three bootstrap methods (case bootstrap, nonparametric residual bootstrap and parametric bootstrap) by a simulation study and compared them to that of an asymptotic method (Asym) in estimating uncertainty of parameters in nonlinear mixed-effects models (NLMEM) with heteroscedastic error. This simulation was conducted using as an example of the PK model for aflibercept, an anti-angiogenic drug. As expected, we found that the bootstrap methods provided better estimates of uncertainty for parameters in NLMEM with high nonlinearity and having balanced designs compared to the Asym, as implemented in MONOLIX. Overall, the parametric bootstrap performed better than the case bootstrap as the true model and variance distribution were used. However, the case bootstrap is faster and simpler as it makes no assumptions on the model and preserves both between subject and residual variability in one resampling step. The performance of the nonparametric residual bootstrap was found to be limited when applying to NLMEM due to its failure to reflate the variance before resampling in unbalanced designs where the Asym and the parametric bootstrap performed well and better than case bootstrap even with stratification.

Journal ArticleDOI
01 Nov 2014-Geoderma
TL;DR: There are complex interactions between sampling design, regression approaches and validation approaches, which can greatly influence the final soil property maps and their accuracy estimates, as well as specific ‘sampling-for-validation' approaches.

Journal ArticleDOI
TL;DR: In this article, the performance of the tetrad protocol was compared to that of the triangle test under conditions that could possibly lower its sensitivity, consequently resulting in the loss of its theoretical power advantage.

Journal ArticleDOI
TL;DR: In this paper, a nonparametric permutation test was proposed to assess the presence of trends in the residuals of multivariate calibration models, applied to the residual of models generated by principal component regression (PCR), partial least squares (PLS) regression and support vector regression (SVR).

Journal ArticleDOI
TL;DR: A new approach is proposed in which hybrid PSO algorithms incorporate noise mitigation mechanisms from the other two approaches, and the quality of their results is better than that of the state of the art with a few exceptions.
Abstract: Particle swarm optimization (PSO) is a metaheuristic designed to find good solutions to optimization problems. However, when optimization problems are subject to noise, the quality of the resulting solutions significantly deteriorates, hence prompting the need to incorporate noise mitigation mechanisms into PSO. Based on the allocation of function evaluations, two opposite approaches are generally taken. On the one hand, resampling-based PSO algorithms incorporate resampling methods to better estimate the objective function values of the solutions at the cost of performing fewer iterations. On the other hand, single-evaluation PSO algorithms perform more iterations at the cost of dealing with very inaccurately estimated objective function values. In this paper, we propose a new approach in which hybrid PSO algorithms incorporate noise mitigation mechanisms from the other two approaches, and the quality of their results is better than that of the state of the art with a few exceptions. The performance of the algorithms is analyzed by means of a set of population statistics that measure different characteristics of the swarms throughout the search process. Amongst the hybrid PSO algorithms, we find a promising algorithm whose simplicity, flexibility and quality of results questions the importance of incorporating complex resampling methods into state-of-the-art PSO algorithms.

Proceedings ArticleDOI
01 Jan 2014
TL;DR: Spatio-temporal optimisation of the multi-view resampling is introduced to extract a coherent multi-layer texture map video and results in a compact representation with minimal loss of information allowing high-quality free-viewpoint rendering.
Abstract: Multi-view video acquisition is widely used for reconstruction and free-viewpoint rendering of dynamic scenes by directly resampling from the captured images. This paper addresses the problem of optimally resampling and representing multi-view video to obtain a compact representation without loss of the view-dependent dynamic surface appearance. Spatio-temporal optimisation of the multi-view resampling is introduced to extract a coherent multi-layer texture map video. This resampling is combined with a surface-based optical flow alignment between views to correct for errors in geometric reconstruction and camera calibration which result in blurring and ghosting artefacts. The multi-view alignment and optimised resampling results in a compact representation with minimal loss of information allowing high-quality free-viewpoint rendering. Evaluation is performed on multi-view datasets for dynamic sequences of cloth, faces and people. The representation achieves >90% compression without significant loss of visual quality.

Journal ArticleDOI
TL;DR: A method to test the correlation of two random fields when they are both spatially autocorrelated, using Monte‐Carlo methods, and focuses on permuting, and then smoothing and scaling one of the variables to destroy the correlation with the other, while maintaining at the same time the initial Autocorrelation.
Abstract: Summary. We propose a method to test the correlation of two random fields when they are both spatially autocorrelated. In this scenario, the assumption of independence for the pair of observations in the standard test does not hold, and as a result we reject in many cases where there is no effect (the precision of the null distribution is overestimated). Our method recovers the null distribution taking into account the autocorrelation. It uses Monte-Carlo methods, and focuses on permuting, and then smoothing and scaling one of the variables to destroy the correlation with the other, while maintaining at the same time the initial autocorrelation. With this simulation model, any test based on the independence of two (or more) random fields can be constructed. This research was motivated by a project in biodiversity and conservation in the Biology Department at Stanford University.

Journal ArticleDOI
TL;DR: This paper addresses the gap in the literature by extending MWW and other nonparametric statistics to provide causal inference for nonrandomized study data by integrating the potential outcome paradigm with the functional response models (FRM).
Abstract: The nonparametric Mann-Whitney-Wilcoxon (MWW) rank sum test is widely used to test treatment effect by comparing the outcome distributions between two groups, especially when there are outliers in the data. However, such statistics generally yield invalid conclusions when applied to nonrandomized studies, particularly those in epidemiologic research. Although one may control for selection bias by using available approaches of covariates adjustment such as matching, regression analysis, propensity score matching, and marginal structural models, such analyses yield results that are not only subjective based on how the outliers are handled but also often difficult to interpret. A popular alternative is a conditional permutation test based on randomization inference [Rosenbaum PR. Covariance adjustment in randomized experiments and observational studies. Statistical Science 2002; 17(3):286-327]. Because it requires strong and implausible assumptions that may not be met in most applications, this approach has limited applications in practice. In this paper, we address this gap in the literature by extending MWW and other nonparametric statistics to provide causal inference for nonrandomized study data by integrating the potential outcome paradigm with the functional response models (FRM). FRM is uniquely positioned to model dynamic relationships between subjects, rather than attributes of a single subject as in most regression models, such as the MWW test within our context. The proposed approach is illustrated with data from both real and simulated studies.

Journal ArticleDOI
TL;DR: The general methodology for sequential inference in nonlinear stochastic state-space models to simultaneously estimate dynamic states and fixed parameters is presented and the negative impact of using multinomial resampling is shown.
Abstract: We present general methodology for sequential inference in nonlinear stochastic state-space models to simultaneously estimate dynamic states and fixed parameters. We show that basic particle filters may fail due to degeneracy in fixed parameter estimation and suggest the use of a kernel density approximation to the filtered distribution of the fixed parameters to allow the fixed parameters to regenerate. In addition, we show that “seemingly” uninformative uniform priors on fixed parameters can affect posterior inferences and suggest the use of priors bounded only by the support of the parameter. We show the negative impact of using multinomial resampling and suggest the use of either stratified or residual resampling within the particle filter. As a motivating example, we use a model for tracking and prediction of a disease outbreak via a syndromic surveillance system. Finally, we use this improved particle filtering methodology to relax prior assumptions on model parameters yet still provide reasonable estimates for model parameters and disease states.

Posted Content
TL;DR: In this article, the authors propose to construct a distribution of placebo estimates in regions without a policy kink, and apply their procedure to three empirical RK applications, two administrative UI datasets with true policy kinks and the 1980 Census, which has no policy Kinks, and find that statistical significance based on conventional p-values may be spurious.
Abstract: The Regression Kink (RK) design is an increasingly popular empirical method, with more than 20 studies circulated using RK in the last 5 years since the initial circulation of Card, Lee, Pei and Weber (2012). We document empirically that these estimates, which typically use local linear regression, are highly sensitive to curvature in the underlying relationship between the outcome and the assignment variable. As an alternative inference procedure, motivated by randomization inference, we propose that researchers construct a distribution of placebo estimates in regions without a policy kink. We apply our procedure to three empirical RK applications – two administrative UI datasets with true policy kinks and the 1980 Census, which has no policy kinks – and we find that statistical significance based on conventional p-values may be spurious. In contrast, our permutation test reinforces the asymptotic inference results of a recent Regression Discontinuity study and a Difference-in-Difference study. Finally, we propose estimating RK models with a modified cubic splines framework and test the performance of different estimators in a simulation exercise. Cubic specifications – in particular recently proposed robust estimators (Calonico, Cattaneo and Titiunik 2014) – yield short interval lengths with good coverage rates.

Journal ArticleDOI
TL;DR: In this article, the authors compare three methods to reconstruct galaxy cluster density fields with weak lensing data, and conclude that sensitive priors can help to get high signal-to-noise ratio, and unbiased reconstructions.
Abstract: In this paper, we compare three methods to reconstruct galaxy cluster density fields with weak lensing data. The first method called FLens integrates an inpainting concept to invert the shear field with possible gaps, and a multi-scale entropy denoising procedure to remove the noise contained in the final reconstruction, that arises mostly from the random intrinsic shape of the galaxies. The second and third methods are based on a model of the density field made of a multi-scale grid of radial basis functions. In one case, the model parameters are computed with a linear inversion involving a singular value decomposition (SVD). In the other case, the model parameters are estimated using a Bayesian Monte Carlo Markov Chain optimization implemented in the lensing software Lenstool. Methods are compared on simulated data with varying galaxy density fields. We pay particular attention to the errors estimated with resampling. We find the multi-scale grid model optimized with Monte Carlo Markov Chain to provide the best results, but at high computational cost, especially when considering resampling. The SVD method is much faster but yields noisy maps, although this can be mitigated with resampling. The FLens method is a good compromise with fast computation, high signal-to-noise ratio reconstruction, but lower resolution maps. All three methods are applied to the MACS J0717+3745 galaxy cluster field, and reveal the filamentary structure discovered in Jauzac et al. We conclude that sensitive priors can help to get high signal-to-noise ratio, and unbiased reconstructions.

Book
28 Jan 2014
TL;DR: In this article, the authors present a guide for selecting a statistical test from a set of statistical tests, including correlation tests of association, regression tests of prediction, and linear regression Tests of prediction.
Abstract: PART I. Introduction and Background 1. R Basics 2. Research Methods 3. Probability 4. Sampling and Populations PART II. Statistical Theory and Inference 5. Central Limit Theorem 6. Sampling Distributions 7. Statistical Distributions PART III. Descriptive Methods 8. Graphing Data 9. Central Tendency and Dispersion PART IV. Statistical Methods 10. Hypothesis Testing 11. Chi-Square Test for Categorical Data 12. z Test for Differences in Proportions 13. t Test for Mean Differences (2 groups) 14. F Test for Mean Differences (3 or more groups) 15. Correlation Tests of Association 16. Linear Regression Tests of Prediction 17. Multiple Regression 18. Logistic Regression 19. Loglinear Regression PART V. Replication and Validation of Research Findings 20. Replication of Statistical Tests 21. Synthesis of Research Findings Glossary of Terms Glossary of Packages, Functions, and Commands Used in Book Statistical Tables Guide for Selecting a Statistical Test