Showing papers on "Resampling published in 2007"

PDF

Open Access

Book Chapter•

Correcting sample selection bias by unlabeled data

[...]

Jiayuan Huang, Alexander J. Smola, Arthur Gretton, Karsten M. Borgwardt, Bernhard Schölkopf - Show less +1 more

01 Dec 2007

TL;DR: This paper proposed a nonparametric method which directly produces resampling weights without distribution estimation, which works by matching distributions between training and testing sets in feature space, and experimental results demonstrate that their method works well in practice.

...read moreread less

Abstract: We consider the scenario where training and test data are drawn from different distributions, commonly referred to as sample selection bias. Most algorithms for this setting try to first recover sampling distributions and then make appropriate corrections based on the distribution estimate. We present a nonparametric method which directly produces resampling weights without distribution estimation. Our method works by matching distributions between training and testing sets in feature space. Experimental results demonstrate that our method works well in practice.

...read moreread less

1,227 citations

Proceedings Article•DOI•

A comparison of statistical significance tests for information retrieval evaluation

[...]

Mark D. Smucker¹, James Allan¹, Ben Carterette¹•Institutions (1)

University of Massachusetts Amherst¹

06 Nov 2007

TL;DR: It is discovered that there is little practical difference between the randomization, bootstrap, and t tests and their use should be discontinued for measuring the significance of a difference between means.

...read moreread less

Abstract: Information retrieval (IR) researchers commonly use three tests of statistical significance: the Student's paired t-test, the Wilcoxon signed rank test, and the sign test. Other researchers have previously proposed using both the bootstrap and Fisher's randomization (permutation) test as non-parametric significance tests for IR but these tests have seen little use. For each of these five tests, we took the ad-hoc retrieval runs submitted to TRECs 3 and 5-8, and for each pair of runs, we measured the statistical significance of the difference in their mean average precision. We discovered that there is little practical difference between the randomization, bootstrap, and t tests. Both the Wilcoxon and sign test have a poor ability to detect significance and have the potential to lead to false detections of significance. The Wilcoxon and sign tests are simplified variants of the randomization test and their use should be discontinued for measuring the significance of a difference between means.

...read moreread less

728 citations

Journal Article•DOI•

Sensitivity of MRQAP Tests to Collinearity and Autocorrelation Conditions

[...]

David Dekker¹, David Krackhardt², Tom A. B. Snijders³•Institutions (3)

Erasmus University Rotterdam¹, Carnegie Mellon University², University of Oxford³

07 Aug 2007-Psychometrika

TL;DR: In this article, a new permutation method called double semi-partialing (DSP) was proposed, which complements the family of existing approaches to multiple regression quadratic assignment procedures.

...read moreread less

Abstract: Multiple regression quadratic assignment procedures (MRQAP) tests are permutation tests for multiple linear regression model coefficients for data organized in square matrices of relatedness among n objects. Such a data structure is typical in social network studies, where variables indicate some type of relation between a given set of actors. We present a new permutation method (called "double semi-partialing", or DSP) that complements the family of extant approaches to MRQAP tests. We assess the statistical bias (type I error rate) and statistical power of the set of five methods, including DSP, across a variety of conditions of network autocorrelation, of spuriousness (size of confounder effect), and of skewness in the data. These conditions are explored across three assumed data distributions: normal, gamma, and negative binomial. We find that the Freedman-Lane method and the DSP method are the most robust against a wide array of these conditions. We also find that all five methods perform better if the test statistic is pivotal. Finally, we find limitations of usefulness for MRQAP tests: All tests degrade under simultaneous conditions of extreme skewness and high spuriousness for gamma and negative binomial distributions.

...read moreread less

610 citations

Journal Article•DOI•

On-line inference for multiple changepoint problems

[...]

Paul Fearnhead¹, Zhen Liu¹•Institutions (1)

Lancaster University¹

01 Sep 2007-Journal of The Royal Statistical Society Series B-statistical Methodology

TL;DR: It is shown how resampling ideas from particle filters can be used to reduce the computational cost to linear in the number of observations, at the expense of introducing small errors, and two new, optimum resamplings algorithms are proposed for this problem.

...read moreread less

Abstract: We propose an on-line algorithm for exact filtering of multiple changepoint problems. This algorithm enables simulation from the true joint posterior distribution of the number and position of the changepoints for a class of changepoint models. The computational cost of this exact algorithm is quadratic in the number of observations. We further show how resampling ideas from particle filters can be used to reduce the computational cost to linear in the number of observations, at the expense of introducing small errors; and propose two new, optimum resampling algorithms for this problem. One, a version of rejection control, allows the particle filter to automatically choose the number of particles required at each time-step. The new resampling algorithms substantially out-perform standard resampling algorithms on examples we consider; and we demonstrate how the resulting particle filter is practicable for segmentation of human GC content.

...read moreread less

336 citations

Proceedings Article•DOI•

Adaptively sampled particle fluids

[...]

Bart Adams¹, Mark Pauly², Richard Keiser², Leonidas J. Guibas¹•Institutions (2)

Stanford University¹, ETH Zurich²

29 Jul 2007

TL;DR: A sampling condition based on geometric local feature size that allows focusing computational resources in geometrically complex regions, while reducing the number of particles deep inside the fluid or near thick flat surfaces is introduced.

...read moreread less

Abstract: We present novel adaptive sampling algorithms for particle-based fluid simulation We introduce a sampling condition based on geometric local feature size that allows focusing computational resources in geometrically complex regions, while reducing the number of particles deep inside the fluid or near thick flat surfaces Further performance gains are achieved by varying the sampling density according to visual importance In addition, we propose a novel fluid surface definition based on approximate particle-to-surface distances that are carried along with the particles and updated appropriately The resulting surface reconstruction method has several advantages over existing methods, including stability under particle resampling and suitability for representing smooth flat surfaces We demonstrate how our adaptive sampling and distance-based surface reconstruction algorithms lead to significant improvements in time and memory as compared to single resolution particle simulations, without significantly affecting the fluid flow behavior

...read moreread less

319 citations

A Whiff of Oxygen Before the Great Oxidation Event

[...]

K. S. Habicht, M. Gade, B. Thamdrup, P. Berg, James Farquhar, J. Savarino, T. L. Jackson, Mark H. Thiemens - Show less +4 more

01 Jan 2007

TL;DR: In this paper, Monte Carlo resampling of data suggests that the two intervals carry a unique isotopic mean, and uncertainties are 0.14, 0.008, and 0.20 for d 34 S, D 33 S, and D 36 S.

...read moreread less

Abstract: S values, which is comparable with uncertainties based on multiple standard measurements during each analytical session. For SF6 analyses, uncertainties are 0.14, 0.008, and 0.20‰ for d 34 S, D 33 S, and D 36 S, respectively. 19. Monte Carlo resampling of data suggests that the two intervals carry a unique isotopic mean. In the lower half, we calculated means of d 34

...read moreread less

144 citations

Patent•

Resampling and picture resizing operations for multi-resolution video coding and decoding

[...]

Gary J. Sullivan¹•Institutions (1)

Microsoft¹

08 Jan 2007

TL;DR: In this article, the authors present techniques and tools for high accuracy position calculation for picture resizing in applications such as spatially-scalable video coding and decoding, which is performed according to a resampling scale factor.

...read moreread less

Abstract: Techniques and tools for high accuracy position calculation for picture resizing in applications such as spatially-scalable video coding and decoding are described. In one aspect, resampling of a video picture is performed according to a resampling scale factor. The resampling comprises computation of a sample value at a position i, j in a resampled array. The computation includes computing a derived horizontal or vertical sub-sample position x or y in a manner that involves approximating a value in part by multiplying a 2n value by an inverse (approximate or exact) of the upsampling scale factor. The approximating can be a rounding or some other kind of approximating, such as a ceiling or floor function that approximates to a nearby integer. The sample value is interpolated using a filter.

...read moreread less

142 citations

Journal Article•DOI•

Sensitivity analysis for m-estimates, tests, and confidence intervals in matched observational studies.

[...]

Paul R. Rosenbaum¹•Institutions (1)

University of Pennsylvania¹

01 Jun 2007-Biometrics

TL;DR: In observational studies, a method of sensitivity analysis is developed for m-tests, m-intervals, and m-estimates: it shows the extent to which inferences would be altered by biases of various magnitudes due to nonrandom treatment assignment.

...read moreread less

Abstract: Huber's m-estimates use an estimating equation in which observations are permitted a controlled level of influence. The family of m-estimates includes least squares and maximum likelihood, but typical applications give extreme observations limited weight. Maritz proposed methods of exact and approximate permutation inference for m-tests, confidence intervals, and estimators, which can be derived from random assignment of paired subjects to treatment or control. In contrast, in observational studies, where treatments are not randomly assigned, subjects matched for observed covariates may differ in terms of unobserved covariates, so differing outcomes may not be treatment effects. In observational studies, a method of sensitivity analysis is developed for m-tests, m-intervals, and m-estimates: it shows the extent to which inferences would be altered by biases of various magnitudes due to nonrandom treatment assignment. The method is developed for both matched pairs, with one treated subject matched to one control, and for matched sets, with one treated subject matched to one or more controls. The method is illustrated using two studies: (i) a paired study of damage to DNA from exposure to chromium and nickel and (ii) a study with one or two matched controls comparing side effects of two drug regimes to treat tuberculosis. The approach yields sensitivity analyses for: (i) m-tests with Huber's weight function and other robust weight functions, (ii) the permutational t-test which uses the observations directly, and (iii) various other procedures such as the sign test, Noether's test, and the permutation distribution of the efficient score test for a location family of distributions. Permutation inference with covariance adjustment is briefly discussed.

...read moreread less

138 citations

Journal Article•DOI•

Resampling methods for parameter-free and robust feature selection with mutual information

[...]

Damien François¹, Fabrice Rossi², Vincent Wertz¹, Michel Verleysen¹•Institutions (2)

Université catholique de Louvain¹, French Institute for Research in Computer Science and Automation²

01 Mar 2007-Neurocomputing

TL;DR: In this article, the authors combine the mutual information criterion with a forward feature selection strategy and propose to use resampling methods, a K-fold cross-validation and the permutation test, to address both issues, which can then be used to automatically set the parameter and calculate a threshold to stop the forward procedure.

...read moreread less

132 citations

Journal Article•DOI•

Efron-type measures of prediction error for survival analysis.

[...]

Thomas A. Gerds¹, Martin Schumacher¹•Institutions (1)

University of Freiburg¹

01 Dec 2007-Biometrics

TL;DR: It is found that flexible rules, like artificial neural nets, classification and regression trees, or regression splines can be assessed, and compared to less flexible rules in the same data where they are developed.

...read moreread less

Abstract: Estimates of the prediction error play an important role in the development of statistical methods and models, and in their applications. We adapt the resampling tools of Efron and Tibshirani (1997, Journal of the American Statistical Association92, 548-560) to survival analysis with right-censored event times. We find that flexible rules, like artificial neural nets, classification and regression trees, or regression splines can be assessed, and compared to less flexible rules in the same data where they are developed. The methods are illustrated with data from a breast cancer trial.

...read moreread less

126 citations

Journal Article•DOI•

Assessment of survival prediction models based on microarray data

[...]

Martin Schumacher¹, Harald Binder¹, Thomas A. Gerds¹•Institutions (1)

University Medical Center Freiburg¹

01 Jul 2007-Bioinformatics

TL;DR: Estimating the prediction error of some recently proposed techniques for fitting a multivariate Cox regression model applied to the data of a prognostic study in patients with diffuse large-B-cell lymphoma is illustrated.

...read moreread less

Abstract: Motivation: In the process of developing risk prediction models, various steps of model building and model selection are involved. If this process is not adequately controlled, overfitting may result in serious overoptimism leading to potentially erroneous conclusions. Methods: For right censored time-to-event data, we estimate the prediction error for assessing the performance of a risk prediction model (Gerds and Schumacher, 2006; Graf et al., 1999). Furthermore, resampling methods are used to detect overfitting and resulting overoptimism and to adjust the estimates of prediction error (Gerds and Schumacher, 2007). Results: We show how and to what extent the methodology can be used in situations characterized by a large number of potential predictor variables where overfitting may be expected to be overwhelming. This is illustrated by estimating the prediction error of some recently proposed techniques for fitting a multivariate Cox regression model applied to the data of a prognostic study in patients with diffuse large-B-cell lymphoma (DLBCL). Availability: Resampling-based estimation of prediction error curves is implemented in an R package called pec available from the authors. Contact: sec@imbi.uni-freiburg.de

...read moreread less

Journal Article•DOI•

Verification of an Ensemble Prediction System against Observations

[...]

G. Candille¹, C. Côté¹, P. L. Houtekamer¹, Gerald Pellerin¹•Institutions (1)

Environment Canada¹

01 Jul 2007-Monthly Weather Review

TL;DR: The proposed verification methodology is based on the continuous ranked probability score (CRPS), which provides an evaluation of the global skill of an EPS, and its reliability/resolution partition, proposed by Hersbach, is used to measure the two main attributes of a probabilistic system.

...read moreread less

Abstract: A verification system has been developed for the ensemble prediction system (EPS) at the Canadian Meteorological Centre (CMC). This provides objective criteria for comparing two EPSs, necessary when deciding whether or not to implement a new or revised EPS. The proposed verification methodology is based on the continuous ranked probability score (CRPS), which provides an evaluation of the global skill of an EPS. Its reliability/resolution partition, proposed by Hersbach, is used to measure the two main attributes of a probabilistic system. Also, the characteristics of the reliability are obtained from the two first moments of the reduced centered random variable (RCRV), which define the bias and the dispersion of an EPS. Resampling bootstrap techniques have been applied to these scores. Confidence intervals are thus defined, expressing the uncertainty due to the finiteness of the number of realizations used to compute the scores. All verifications are performed against observations to provide mor...

...read moreread less

Proceedings Article•DOI•

Computational time-lapse video

[...]

Eric P. Bennett¹, Leonard McMillan¹•Institutions (1)

University of North Carolina at Chapel Hill¹

29 Jul 2007

TL;DR: This work downsamples the source material into a time-lapse video and provides user controls for retaining, removing, and resampling events, and employs two techniques for selecting and combining source frames to form the output.

...read moreread less

Abstract: We present methods for generating novel time-lapse videos that address the inherent sampling issues that arise with traditional photographic techniques. Starting with video-rate footage as input, our post-process downsamples the source material into a time-lapse video and provides user controls for retaining, removing, and resampling events. We employ two techniques for selecting and combining source frames to form the output. First, we present a non-uniform sampling method, based on dynamic programming, which optimizes the sampling of the input video to match the user's desired duration and visual objectives. We present multiple error metrics for this optimization, each resulting in different sampling characteristics. To complement the non-uniform sampling, we present the virtual shutter, a non-linear filtering technique that synthetically extends the exposure time of time-lapse frames.

...read moreread less

Journal Article•DOI•

Analysis of parallelizable resampling algorithms for particle filtering

[...]

Joaquín Míguez¹•Institutions (1)

Charles III University of Madrid¹

01 Dec 2007-Signal Processing

TL;DR: This paper investigates two classes of particle filtering techniques, distributed resampling with non-proportional allocation (DRNA) and local selection (LS), and analyzes the effect of DRNA and LS on the sample variance of the importance weights; the distortion, due to the resamplings step, of the discrete probability measure given by the particle filter; and the variance of estimators after resampled.

...read moreread less

Journal Article•DOI•

A decomposition of Moran's I for clustering detection

[...]

Tonglin Zhang¹, Ge Lin²•Institutions (2)

Purdue University¹, West Virginia University²

01 Aug 2007-Computational Statistics & Data Analysis

TL;DR: These test statistics were used to reexamine spatial distributions of sudden infant death syndrome in North Carolina and the pH values of streams in the Great Smoky Mountains, where low-value clustering and high- Value clustering were shown to exit simultaneously.

...read moreread less

Journal Article•DOI•

A classification-based assessment of the optimal spectral and spatial resolutions for Great Lakes coastal wetland imagery

[...]

Brian L. Becker¹, David P. Lusch², Jiaguo Qi²•Institutions (2)

Central Michigan University¹, Michigan State University²

15 May 2007-Remote Sensing of Environment

TL;DR: In this paper, the authors analyzed hyperspectral airborne imagery (CASI 2 with 46 contiguous VIS/NIR bands) that was acquired over a Lake Huron coastal wetland and performed a series of image classification experiments incorporating three independent band selection methodologies (derivative magnitude, fixed interval and derivative histogram), in order to explore the effects of spectral resampling on classification resiliency.

...read moreread less

Journal Article•DOI•

Applying the bootstrap to the multivariate case: bootstrap component/factor analysis.

[...]

Linda Reichwein Zientek¹, Bruce Thompson², Bruce Thompson³•Institutions (3)

Sam Houston State University¹, Texas A&M University², Baylor College of Medicine³

01 May 2007-Behavior Research Methods

TL;DR: A strategy for applying the bootstrap method to conduct either a bootstrap component or a factor analysis with a program syntax for SPSS with the Holzinger–Swineford data set is described.

...read moreread less

Abstract: The bootstrap method, which empirically estimates the sampling distribution for either inferential or descriptive sstatistical purposes, can be applied to the multivariate case. When conducting bootstrap component, or factor, analysis, resampling results must be located in a common factor space before summary statistics for each estimated parameter can be computed. The present article describes a strategy for applying the bootstrap method to conduct either a bootstrap component or a factor analysis with a program syntax for SPSS. The Holzinger–Swineford data set is employed to make the discussion more concrete.

...read moreread less

Journal Article•DOI•

A comparison of the statistical power of different methods for the analysis of cluster randomization trials with binary outcomes.

[...]

Peter C. Austin

30 Aug 2007-Statistics in Medicine

TL;DR: A series of Monte Carlo simulations are conducted to examine the statistical power of three methods that compare cluster-specific response rates between arms of the trial: the t-test, the Wilcoxon rank sum test, and the permutation test; and a generalized estimating equations (GEE) method.

...read moreread less

Abstract: Cluster randomization trials are randomized controlled trials (RCTs) in which intact clusters of subjects are randomized to either the intervention or to the control. Cluster randomization trials require different statistical methods of analysis than do conventional randomized controlled trials due to the potential presence of within-cluster homogeneity in responses. A variety of statistical methods have been proposed in the literature for the analysis of cluster randomization trials with binary outcomes. However, little is known about the relative statistical power of these methods to detect a statistically significant intervention effect. We conducted a series of Monte Carlo simulations to examine the statistical power of three methods that compare cluster-specific response rates between arms of the trial: the t-test, the Wilcoxon rank sum test, and the permutation test; and three methods that compare subject-level response rates: an adjusted chi-square test, a logistic-normal random effects model, and a generalized estimating equations (GEE) method. In our simulations we allowed the number of clusters, the number of subjects per cluster, the intraclass correlation coefficient and the magnitude of the intervention effect to vary. We demonstrated that the GEE approach tended to have the highest power for detecting a statistically significant intervention effect. However, in most of the 240 scenarios examined, the differences between the competing statistical methods were negligible. The largest mean difference in power between any two different statistical methods across the 240 scenarios was 0.02. The largest observed difference in power between two different statistical methods across the 240 scenarios and 15 pair-wise comparisons of methods was 0.14.

...read moreread less

Journal Article•DOI•

Bootstrap hypothesis testing for some common statistical problems: A critical evaluation of size and power properties

[...]

Michael A. Martin¹•Institutions (1)

Australian National University¹

01 Aug 2007-Computational Statistics & Data Analysis

TL;DR: This work describes and develops null and alternative resampling schemes for common scenarios, constructing bootstrap tests for the correlation coefficient, variance, and regression/ANOVA models and critically assess the performance ofbootstrap tests, examining size and power properties of the tests numerically using both real and simulated data.

...read moreread less

Journal Article•DOI•

Pivotal Bootstrap Methods for k-Sample Problems in Directional Statistics and Shape Analysis

[...]

Getúlio J. A. Amaral, Ian L. Dryden, Andrew T. A. Wood

01 Jun 2007-Journal of the American Statistical Association

TL;DR: In this article, a nonparametric bootstrap hypothesis testing approach is proposed for the problem of testing a null hypothesis of a common mean direction, mean polar axis, or mean shape across several populations of real unit vectors (the directional case) or complex unit vector (the two-dimensional shape case).

...read moreread less

Abstract: We propose a novel bootstrap hypothesis testing approach for the problem of testing a null hypothesis of a common mean direction, mean polar axis, or mean shape across several populations of real unit vectors (the directional case) or complex unit vectors (the two-dimensional shape case). Multisample testing problems of this type arise frequently in directional statistics and shape analysis (as in other areas of statistics), but to date there has been relatively little discussion of nonparametric bootstrap approaches to this problem. The bootstrap approach described here is based on a statistic that can be expressed as the smallest eigenvalue of a certain positive definite matrix. We prove that this statistic has a limiting chi-squared distribution under the null hypothesis of equality of means across populations. Although we focus mainly on the version of the statistic in which neither isotropy within populations nor constant dispersion structure across populations is assumed, we explain how to modify th...

...read moreread less

Journal Article•DOI•

Range Resampling in the Polar Format Algorithm for Spotlight SAR Image Formation Using the Chirp $z$ -Transform

[...]

Daiyin Zhu¹, Zhaoda Zhu¹•Institutions (1)

Nanjing University¹

01 Mar 2007-IEEE Transactions on Signal Processing

TL;DR: This paper focuses on a new implementation of the first stage, i.e., the range resampling, using the chirp z-transform (CZT) and can achieve a PFA totally free of interpolation.

...read moreread less

Abstract: Besides an inverse two-dimensional (2-D) Fourier transform, the polar format algorithm (PFA) for the spotlight synthetic aperture radar (SAR) image formation can be normally divided into two cascaded processing stages, which are called the range and azimuth resampling, respectively. This paper focuses on a new implementation of the first stage, i.e., the range resampling, using the chirp z-transform (CZT). The presented algorithm requires no interpolation. It works for the SAR system directly digitizing the echo signal, as well as that employing the dechirp-on-receive approach. The parameters of the CZT, including the frequency spacing and the start frequency, are derived to accommodate the PFA in both cases. Related filtering and compensation procedures are developed for the chirp z-transformed range signal with and without dechirp, respectively, in order to achieve a signal format entirely suitable for the azimuth resampling. Furthermore, incorporating the new algorithm with the existing CZT-based azimuth resampling and focusing algorithm, we can achieve a PFA totally free of interpolation. The presented approach has been validated by point target simulation, and the test is carried out with a very critical relative bandwidth of 30%

...read moreread less

Journal Article•DOI•

Further evaluating the conditional decision rule for comparing two independent means

[...]

Andrew F. Hayes¹, Li Cai²•Institutions (2)

Ohio State University¹, University of North Carolina at Chapel Hill²

01 Nov 2007-British Journal of Mathematical and Statistical Psychology

TL;DR: This paper empirically examines the Type I error rate of the conditional decision rule using four variance equality tests and compares this error rate to the unconditional use of either of the t tests as well as several resampling-based alternatives when sampling from 49 distributions varying in skewness and kurtosis.

...read moreread less

Abstract: Many books on statistical methods advocate a 'conditional decision rule' when comparing two independent group means. This rule states that the decision as to whether to use a 'pooled variance' test that assumes equality of variance or a 'separate variance' Welch t test that does not should be based on the outcome of a variance equality test. In this paper, we empirically examine the Type I error rate of the conditional decision rule using four variance equality tests and compare this error rate to the unconditional use of either of the t tests (i.e. irrespective of the outcome of a variance homogeneity test) as well as several resampling-based alternatives when sampling from 49 distributions varying in skewness and kurtosis. Several unconditional tests including the separate variance test performed as well as or better than the conditional decision rule across situations. These results extend and generalize the findings of previous researchers who have argued that the conditional decision rule should be abandoned.

...read moreread less

Journal Article•DOI•

Objective detection of evoked potentials using a bootstrap technique.

[...]

Jing Lv¹, David M. Simpson¹, Stephen L. Bell¹•Institutions (1)

University of Southampton¹

01 Mar 2007-Medical Engineering & Physics

TL;DR: The bootstrap method is proposed, which is based on randomly resampling the original data and gives an estimate of the probability that the response obtained is due to random variation in the data rather than a physiological response.

...read moreread less

Journal Article•DOI•

A Statistical Analysis of Brain Morphology Using Wild Bootstrapping

[...]

Hongtu Zhu¹, Joseph G. Ibrahim¹, Nian-Sheng Tang², Daniel B. Rowe³, Xuejun Hao⁴, Ravi Bansal⁴, Bradley S. Peterson⁴ - Show less +3 more•Institutions (4)

University of North Carolina at Chapel Hill¹, Yunnan University², University of Wisconsin–Milwaukee³, Columbia University⁴

02 Jul 2007-IEEE Transactions on Medical Imaging

TL;DR: A robust test procedure based on a resampling method, called wild bootstrapping, is developed that is computationally simplicity and applicability to a wide range of imaging data, including data from both anatomical and functional magnetic resonance imaging (fMRI).

...read moreread less

Abstract: Methods for the analysis of brain morphology, including voxel-based morphology and surface-based morphometries, have been used to detect associations between brain structure and covariates of interest, such as diagnosis, severity of disease, age, IQ, and genotype. The statistical analysis of morphometric measures usually involves two statistical procedures: 1) invoking a statistical model at each voxel (or point) on the surface of the brain or brain subregion, followed by mapping test statistics (e.g., t test) or their associated p values at each of those voxels; 2) correction for the multiple statistical tests conducted across all voxels on the surface of the brain region under investigation. We propose the use of new statistical methods for each of these procedures. We first use a heteroscedastic linear model to test the associations between the morphological measures at each voxel on the surface of the specified subregion (e.g., cortical or subcortical surfaces) and the covariates of interest. Moreover, we develop a robust test procedure that is based on a resampling method, called wild bootstrapping. This procedure assesses the statistical significance of the associations between a measure of given brain structure and the covariates of interest. The value of this robust test procedure lies in its computationally simplicity and in its applicability to a wide range of imaging data, including data from both anatomical and functional magnetic resonance imaging (fMRI). Simulation studies demonstrate that this robust test procedure can accurately control the family-wise error rate. We demonstrate the application of this robust test procedure to the detection of statistically significant differences in the morphology of the hippocampus over time across gender groups in a large sample of healthy subjects.

...read moreread less

Journal Article•DOI•

Marginal analysis of correlated failure time data with informative cluster sizes.

[...]

Xiuyu J. Cong¹, Guosheng Yin², Yu Shen²•Institutions (2)

Boehringer Ingelheim¹, University of Texas MD Anderson Cancer Center²

01 Sep 2007-Biometrics

TL;DR: This work derives the large sample properties for the WCR estimators under the Cox proportional hazards model, establishes consistency and asymptotic normality of the regression coefficient estimators, and the weak convergence property of the estimated baseline cumulative hazard function.

...read moreread less

Abstract: We consider modeling correlated survival data when cluster sizes may be informative to the outcome of interest based on a within-cluster resampling (WCR) approach and a weighted score function (WSF) method. We derive the large sample properties for the WCR estimators under the Cox proportional hazards model. We establish consistency and asymptotic normality of the regression coefficient estimators, and the weak convergence property of the estimated baseline cumulative hazard function. The WSF method is to incorporate the inverse of cluster sizes as weights in the score function. We conduct simulation studies to assess and compare the finite-sample behaviors of the estimators and apply the proposed methods to a dental study as an illustration.

...read moreread less

Journal Article•DOI•

Simulation-Based Tests that Can Use Any Number of Simulations

[...]

Jeffrey S. Racine¹, James G. MacKinnon²•Institutions (2)

McMaster University¹, Queen's University²

07 Mar 2007-Communications in Statistics - Simulation and Computation

TL;DR: A new procedure is proposed that yields exact Monte Carlo tests for any positive value of B, the number of simulations, and is likely to be most useful when simulation is expensive.

...read moreread less

Abstract: Conventional procedures for Monte Carlo and bootstrap tests require that B, the number of simulations, satisfy a specific relationship with the level of the test. Otherwise, a test that would instead be exact will either overreject or underreject for finite B. We present expressions for the rejection frequencies associated with existing procedures and propose a new procedure that yields exact Monte Carlo tests for any positive value of B. This procedure, which can also be used for bootstrap tests, is likely to be most useful when simulation is expensive.

...read moreread less

Journal Article•DOI•

Fast cross-validation of high-breakdown resampling methods for PCA

[...]

Mia Hubert¹, Sanne Engelen¹•Institutions (1)

Katholieke Universiteit Leuven¹

01 Jun 2007-Computational Statistics & Data Analysis

TL;DR: Fast algorithms are presented for LOO-CV when using a high-breakdown method based on resampling, in the context of robust covariance estimation by means of the MCD estimator and robust principal component analysis.

...read moreread less

Journal Article•DOI•

Survival analysis with temporal covariate effects

[...]

Limin Peng¹, Yijian Huang¹•Institutions (1)

Emory University¹

01 Aug 2007-Biometrika

TL;DR: In this paper, a smoothing free estimation procedure with a set of martingale-based equations is proposed, and the estimator is shown to converge weakly to a Gaussian process.

...read moreread less

Abstract: SUMMARY We propose a natural generalization of the Cox regression model, in which the regression coefficients have direct interpretations as temporal covariate effects on the survival function. Under the conditionally independent censoring mechanism, we develop a smoothing free estimation procedure with a set of martingale-based equations. Our estimator is shown to be uniformly consistent and to converge weakly to a Gaussian process. A simple resampling method is proposed for approximating the limiting distribution of the estimated coefficients. Second-stage inferences with time-varying coefficients are developed accordingly. Simulations and a real example illustrate the practical utility of the proposed method. Finally, we extend this proposal of temporal covariate effects to the general class of linear transformation models and also establish a connection with the additive hazards model.

...read moreread less

Journal Article•DOI•

A framework for significance analysis of gene expression data using dimension reduction methods

[...]

Lars Gidskehaug¹, Endre Anderssen¹, Arnar Flatberg¹, Bjørn K. Alsberg¹•Institutions (1)

Norwegian University of Science and Technology¹

18 Sep 2007-BMC Bioinformatics

TL;DR: The results show that underlying biological phenomena and unknown relationships in the data can be detected by a simple visual interpretation of the model parameters, and it is found that measured phenotypic responses may model the expression data more accurately than if the design-parameters are used as input.

...read moreread less

Abstract: The most popular methods for significance analysis on microarray data are well suited to find genes differentially expressed across predefined categories. However, identification of features that correlate with continuous dependent variables is more difficult using these methods, and long lists of significant genes returned are not easily probed for co-regulations and dependencies. Dimension reduction methods are much used in the microarray literature for classification or for obtaining low-dimensional representations of data sets. These methods have an additional interpretation strength that is often not fully exploited when expression data are analysed. In addition, significance analysis may be performed directly on the model parameters to find genes that are important for any number of categorical or continuous responses. We introduce a general scheme for analysis of expression data that combines significance testing with the interpretative advantages of the dimension reduction methods. This approach is applicable both for explorative analysis and for classification and regression problems. Three public data sets are analysed. One is used for classification, one contains spiked-in transcripts of known concentrations, and one represents a regression problem with several measured responses. Model-based significance analysis is performed using a modified version of Hotelling's T2-test, and a false discovery rate significance level is estimated by resampling. Our results show that underlying biological phenomena and unknown relationships in the data can be detected by a simple visual interpretation of the model parameters. It is also found that measured phenotypic responses may model the expression data more accurately than if the design-parameters are used as input. For the classification data, our method finds much the same genes as the standard methods, in addition to some extra which are shown to be biologically relevant. The list of spiked-in genes is also reproduced with high accuracy. The dimension reduction methods are versatile tools that may also be used for significance testing. Visual inspection of model components is useful for interpretation, and the methodology is the same whether the goal is classification, prediction of responses, feature selection or exploration of a data set. The presented framework is conceptually and algorithmically simple, and a Matlab toolbox (Mathworks Inc, USA) is supplemented.

...read moreread less

Journal Article•DOI•

Block permutation principles for the change analysis of dependent data

[...]

Claudia Kirch¹•Institutions (1)

Kaiserslautern University of Technology¹

01 Jul 2007-Journal of Statistical Planning and Inference

TL;DR: In this paper, the authors study an AMOC model with an abrupt change in the mean and dependent errors that form a linear process and obtain an approximation of the critical values for change-point tests through permutation methods.

...read moreread less

Collapse