scispace - formally typeset
Search or ask a question

Showing papers in "arXiv: Methodology in 2014"


Posted Content
TL;DR: This work proposes a new estimation method by incorporating the sample size that greatly improves existing methods and provides a nearly unbiased estimate of the true sample standard deviation for normal data and a slightly biased estimate for skewed data.
Abstract: In systematic reviews and meta-analysis, researchers often pool the results of the sample mean and standard deviation from a set of similar clinical trials. A number of the trials, however, reported the study using the median, the minimum and maximum values, and/or the first and third quartiles. Hence, in order to combine results, one may have to estimate the sample mean and standard deviation for such trials. In this paper, we propose to improve the existing literature in several directions. First, we show that the sample standard deviation estimation in Hozo et al. (2005) has some serious limitations and is always less satisfactory in practice. Inspired by this, we propose a new estimation method by incorporating the sample size. Second, we systematically study the sample mean and standard deviation estimation problem under more general settings where the first and third quartiles are also available for the trials. Through simulation studies, we demonstrate that the proposed methods greatly improve the existing methods and enrich the literature. We conclude our work with a summary table that serves as a comprehensive guidance for performing meta-analysis in different situations.

1,812 citations


Journal ArticleDOI
TL;DR: A parametric Bayesian model is developed by employing a likelihood function that is based on a mode uniform distribution and it is shown that irrespective of the original distribution of the data, the use of this special uniform distribution is a very natural and effective way for Bayesian mode regression.
Abstract: Like mean, quantile and variance, mode is also an important measure of central tendency and data summary. Many practical questions often focus on "Which element (gene or file or signal) occurs most often or is the most typical among all elements in a network?". In such cases mode regression provides a convenient summary of how the regressors affect the conditional mode and is totally different from other regression models based on conditional mean or conditional quantile or conditional variance. Some inference methods have been used for mode regression but none of them from the Bayesian perspective. This paper introduces Bayesian mode regression by exploring three different approaches. We start from a parametric Bayesian model by employing a likelihood function that is based on a mode uniform distribution. It is shown that irrespective of the original distribution of the data, the use of this special uniform distribution is a very natural and effective way for Bayesian mode regression. Posterior estimates based on this parametric likelihood, even under misspecification, are consistent and asymptotically normal. We then develop a nonparametric Bayesian model by using Dirichlet process (DP) mixtures of mode uniform distributions and finally we explore Bayesian empirical likelihood mode regression by taking empirical likelihood into a Bayesian framework. The paper also demonstrates that a variety of improper priors for the unknown model parameters yield a proper joint posterior. The proposed approach is illustrated using simulated datasets and a real data set.

690 citations


Posted Content
TL;DR: A new concept for constructing prior distributions that is invariant to reparameterisations, have a natural connection to Jeffreys’ priors, seem to have excellent robustness properties, and allow this approach to define default prior distributions.
Abstract: In this paper, we introduce a new concept for constructing prior distributions. We exploit the natural nested structure inherent to many model components, which defines the model component to be a flexible extension of a base model. Proper priors are defined to penalise the complexity induced by deviating from the simpler base model and are formulated after the input of a user-defined scaling parameter for that model component, both in the univariate and the multivariate case. These priors are invariant to reparameterisations, have a natural connection to Jeffreys' priors, are designed to support Occam's razor and seem to have excellent robustness properties, all which are highly desirable and allow us to use this approach to define default prior distributions. Through examples and theoretical results, we demonstrate the appropriateness of this approach and how it can be applied in various situations.

579 citations


Journal ArticleDOI
TL;DR: The knockoff filter is introduced, a new variable selection procedure controlling the FDR in the statistical linear model whenever there are at least as many observations as variables, and empirical results show that the resulting method has far more power than existing selection rules when the proportion of null variables is high.
Abstract: In many fields of science, we observe a response variable together with a large number of potential explanatory variables, and would like to be able to discover which variables are truly associated with the response. At the same time, we need to know that the false discovery rate (FDR) - the expected fraction of false discoveries among all discoveries - is not too high, in order to assure the scientist that most of the discoveries are indeed true and replicable. This paper introduces the knockoff filter, a new variable selection procedure controlling the FDR in the statistical linear model whenever there are at least as many observations as variables. This method achieves exact FDR control in finite sample settings no matter the design or covariates, the number of variables in the model, or the amplitudes of the unknown regression coefficients, and does not require any knowledge of the noise level. As the name suggests, the method operates by manufacturing knockoff variables that are cheap - their construction does not require any new data - and are designed to mimic the correlation structure found within the existing variables, in a way that allows for accurate FDR control, beyond what is possible with permutation-based methods. The method of knockoffs is very general and flexible, and can work with a broad class of test statistics. We test the method in combination with statistics from the Lasso for sparse regression, and obtain empirical results showing that the resulting method has far more power than existing selection rules when the proportion of null variables is high.

503 citations


Posted Content
TL;DR: Functional data analysis (FDA) involves the analysis of data whose ideal units of observation are functions defined on some continuous domain, and the observed data consist of a sample of functions taken from some population, sampled on a discrete grid.
Abstract: Functional data analysis (FDA) involves the analysis of data whose ideal units of observation are functions defined on some continuous domain, and the observed data consist of a sample of functions taken from some population, sampled on a discrete grid Ramsay and Silverman's 1997 textbook sparked the development of this field, which has accelerated in the past 10 years to become one of the fastest growing areas of statistics, fueled by the growing number of applications yielding this type of data One unique characteristic of FDA is the need to combine information both across and within functions, which Ramsay and Silverman called replication and regularization, respectively This article will focus on functional regression, the area of FDA that has received the most attention in applications and methodological development First will be an introduction to basis functions, key building blocks for regularization in functional regression methods, followed by an overview of functional regression methods, split into three types: [1] functional predictor regression (scalar-on-function), [2] functional response regression (function-on-scalar) and [3] function-on-function regression For each, the role of replication and regularization will be discussed and the methodological development described in a roughly chronological manner, at times deviating from the historical timeline to group together similar methods The primary focus is on modeling and methodology, highlighting the modeling structures that have been developed and the various regularization approaches employed At the end is a brief discussion describing potential areas of future development in this field

311 citations


Posted Content
TL;DR: This article develops a software package softlmpute in R for implementing the two approaches for large matrix factorization and completion, and develops a distributed version for very large matrices using the Spark cluster programming environment.
Abstract: The matrix-completion problem has attracted a lot of attention, largely as a result of the celebrated Netflix competition. Two popular approaches for solving the problem are nuclear-norm-regularized matrix approximation (Candes and Tao, 2009, Mazumder, Hastie and Tibshirani, 2010), and maximum-margin matrix factorization (Srebro, Rennie and Jaakkola, 2005). These two procedures are in some cases solving equivalent problems, but with quite different algorithms. In this article we bring the two approaches together, leading to an efficient algorithm for large matrix factorization and completion that outperforms both of these. We develop a software package "softImpute" in R for implementing our approaches, and a distributed version for very large matrices using the "Spark" cluster programming environment.

249 citations


Journal ArticleDOI
TL;DR: This paper showed that the usual estimator of I2 is biased when a meta-analysis has few studies and little heterogeneity, and that confidence intervals may be preferable to point estimates for I2.
Abstract: In meta-analysis, the fraction of variance that is due to heterogeneity is known as I2. We show that the usual estimator of I2 is biased. The bias is largest when a meta-analysis has few studies and little heterogeneity. For example, with 7 studies and the true value of I2 at 0, the average estimate of I2 is .124. Estimates of I2 should be interpreted cautiously when the meta-analysis is small and the null hypothesis of homogeneity (I2=0) has not been rejected. In small meta-analyses, confidence intervals may be preferable to point estimates for I2.

234 citations


Posted Content
TL;DR: In this article, a second-order Langevin dynamics with a friction term is introduced to counter the effects of the noisy gradient, maintaining the desired target distribution as the invariant distribution.
Abstract: Hamiltonian Monte Carlo (HMC) sampling methods provide a mechanism for defining distant proposals with high acceptance probabilities in a Metropolis-Hastings framework, enabling more efficient exploration of the state space than standard random-walk proposals. The popularity of such methods has grown significantly in recent years. However, a limitation of HMC methods is the required gradient computation for simulation of the Hamiltonian dynamical system-such computation is infeasible in problems involving a large sample size or streaming data. Instead, we must rely on a noisy gradient estimate computed from a subset of the data. In this paper, we explore the properties of such a stochastic gradient HMC approach. Surprisingly, the natural implementation of the stochastic approximation can be arbitrarily bad. To address this problem we introduce a variant that uses second-order Langevin dynamics with a friction term that counteracts the effects of the noisy gradient, maintaining the desired target distribution as the invariant distribution. Results on simulated data validate our theory. We also provide an application of our methods to a classification task using neural networks and to online Bayesian matrix factorization.

188 citations


Journal ArticleDOI
TL;DR: In this paper, a general class of weighting strategies for balancing covariates is proposed, which unifies existing weighting methods, including commonly used weights such as inverse probability weights as special cases.
Abstract: Covariate balance is crucial for unconfounded descriptive or causal comparisons. However, lack of balance is common in observational studies. This article considers weighting strategies for balancing covariates. We define a general class of weights---the balancing weights---that balance the weighted distributions of the covariates between treatment groups. These weights incorporate the propensity score to weight each group to an analyst-selected target population. This class unifies existing weighting methods, including commonly used weights such as inverse-probability weights as special cases. General large-sample results on nonparametric estimation based on these weights are derived. We further propose a new weighting scheme, the overlap weights, in which each unit's weight is proportional to the probability of that unit being assigned to the opposite group. The overlap weights are bounded, and minimize the asymptotic variance of the weighted average treatment effect among the class of balancing weights. The overlap weights also possess a desirable small-sample exact balance property, based on which we propose a new method that achieves exact balance for means of any selected set of covariates. Two applications illustrate these methods and compare them with other approaches.

174 citations


Journal ArticleDOI
TL;DR: A (selective) review of recent frequentist highdimensional inference methods for constructing p-values and confidence intervals in linear and generalized linear models and introduces the Rpackage hdi which easily allows the use of different methods and supports reproducibility.
Abstract: We present a (selective) review of recent frequentist high-dimensional inference methods for constructing $p$-values and confidence intervals in linear and generalized linear models. We include a broad, comparative empirical study which complements the viewpoint from statistical methodology and theory. Furthermore, we introduce and illustrate the R-package hdi which easily allows the use of different methods and supports reproducibility.

164 citations


Journal ArticleDOI
TL;DR: A flexible Bayesian nonparametric approach for modeling the population distribution of network-valued data through a mixture model that reduces dimensionality and efficiently incorporates network information within each mixture component by leveraging latent space representations is proposed.
Abstract: Replicated network data are increasingly available in many research fields. In connectomic applications, inter-connections among brain regions are collected for each patient under study, motivating statistical models which can flexibly characterize the probabilistic generative mechanism underlying these network-valued data. Available models for a single network are not designed specifically for inference on the entire probability mass function of a network-valued random variable and therefore lack flexibility in characterizing the distribution of relevant topological structures. We propose a flexible Bayesian nonparametric approach for modeling the population distribution of network-valued data. The joint distribution of the edges is defined via a mixture model which reduces dimensionality and efficiently incorporates network information within each mixture component by leveraging latent space representations. The formulation leads to an efficient Gibbs sampler and provides simple and coherent strategies for inference and goodness-of-fit assessments. We provide theoretical results on the flexibility of our model and illustrate improved performance --- compared to state-of-the-art models --- in simulations and application to human brain networks.

Posted Content
TL;DR: In this paper, the authors compare the power of MIC to that of standard Pearson correlation and distance correlation, and find that MIC is sometimes less powerful than Pearson correlation as well, the linear case being particularly worrisome.
Abstract: The proposal of Reshef et al. (2011) is an interesting new approach for discovering non-linear dependencies among pairs of measurements in exploratory data mining. However, it has a potentially serious drawback. The authors laud the fact that MIC has no preference for some alternatives over others, but as the authors know, there is no free lunch in Statistics: tests which strive to have high power against all alternatives can have low power in many important situations. To investigate this, we ran simulations to compare the power of MIC to that of standard Pearson correlation and distance correlation (dcor). We simulated pairs of variables with different relationships (most of which were considered by the Reshef et. al.), but with varying levels of noise added. To determine proper cutoffs for testing the independence hypothesis, we simulated independent data with the appropriate marginals. As one can see from the Figure, MIC has lower power than dcor, in every case except the somewhat pathological high-frequency sine wave. MIC is sometimes less powerful than Pearson correlation as well, the linear case being particularly worrisome.

Journal ArticleDOI
TL;DR: A flexible semi-parametric factor model is proposed, which decomposes the factor loading matrix into the component that can be explained by subject-specific covariates and the orthogonal residual component and the rates of convergence of the smooth factor loading matrices are obtained are much faster than those of the conventional factor analysis.
Abstract: This paper introduces a Projected Principal Component Analysis (Projected-PCA), which employs principal component analysis to the projected (smoothed) data matrix onto a given linear space spanned by covariates. When it applies to high-dimensional factor analysis, the projection removes noise components. We show that the unobserved latent factors can be more accurately estimated than the conventional PCA if the projection is genuine, or more precisely, when the factor loading matrices are related to the projected linear space. When the dimensionality is large, the factors can be estimated accurately even when the sample size is finite. We propose a flexible semiparametric factor model, which decomposes the factor loading matrix into the component that can be explained by subject-specific covariates and the orthogonal residual component. The covariates' effects on the factor loadings are further modeled by the additive model via sieve approximations. By using the newly proposed Projected-PCA, the rates of convergence of the smooth factor loading matrices are obtained, which are much faster than those of the conventional factor analysis. The convergence is achieved even when the sample size is finite and is particularly appealing in the high-dimension-low-sample-size situation. This leads us to developing nonparametric tests on whether observed covariates have explaining powers on the loadings and whether they fully explain the loadings. The proposed method is illustrated by both simulated data and the returns of the components of the S&P 500 index.

Journal ArticleDOI
TL;DR: In this paper, the authors investigate empirical risk minimization based on a robust estimate proposed by Catoni and develop performance bounds based on chaining arguments tailored to Catoni's mean estimator.
Abstract: The purpose of this paper is to discuss empirical risk minimization when the losses are not necessarily bounded and may have a distribution with heavy tails. In such situations, usual empirical averages may fail to provide reliable estimates and empirical risk minimization may provide large excess risk. However, some robust mean estimators proposed in the literature may be used to replace empirical means. In this paper, we investigate empirical risk minimization based on a robust estimate proposed by Catoni. We develop performance bounds based on chaining arguments tailored to Catoni's mean estimator.

Journal ArticleDOI
TL;DR: In the context of an analysis of longitudinal multivariate relational data, it is shown how the multilinear tensor regression model can represent patterns that often appear in relational and network data, such as reciprocity and transitivity.
Abstract: A fundamental aspect of relational data, such as from a social network, is the possibility of dependence among the relations. In particular, the relations between members of one pair of nodes may have an effect on the relations between members of another pair. This article develops a type of regression model to estimate such effects in the context of longitudinal and multivariate relational data, or other data that can be represented in the form of a tensor. The model is based on a general multilinear tensor regression model, a special case of which is a tensor autoregression model in which the tensor of relations at one time point are parsimoniously regressed on relations from previous time points. This is done via a separable, or Kronecker-structured, regression parameter along with a separable covariance model. In the context of an analysis of longitudinal multivariate relational data, it is shown how the multilinear tensor regression model can represent patterns that often appear in relational and network data, such as reciprocity and transitivity.

Posted Content
TL;DR: A nonlinear regression model is constructed for the ratios of consecutive frequency counts to predict the unobserved count and hence estimate the total diversity of a population, believed to be the first approach to depart from the classical mixed Poisson model in this problem.
Abstract: We wish to estimate the total number of classes in a population based on sample counts, especially in the presence of high latent diversity. Drawing on probability theory that characterizes distributions on the integers by ratios of consecutive probabilities, we construct a nonlinear regression model for the ratios of consecutive frequency counts. This allows us to predict the unobserved count and hence estimate the total diversity. We believe that this is the first approach to depart from the classical mixed Poisson model in this problem. Our method is geometrically intuitive and yields good fits to data with reasonable standard errors. It is especially well-suited to analyzing high diversity datasets derived from next-generation sequencing in microbial ecology. We demonstrate the method's performance in this context and via simulation, and we present a dataset for which our method outperforms all competitors.

Posted Content
TL;DR: In this article, a model-free approach for testing for the presence of unexplained treatment effect variation was proposed, and applied to the National Head Start Impact Study, a large-scale randomized evaluation of a Federal preschool program.
Abstract: Applied researchers are increasingly interested in whether and how treatment effects vary in randomized evaluations, especially variation not explained by observed covariates. We propose a model-free approach for testing for the presence of such unexplained variation. To use this randomization-based approach, we must address the fact that the average treatment effect, generally the object of interest in randomized experiments, actually acts as a nuisance parameter in this setting. We explore potential solutions and advocate for a method that guarantees valid tests in finite samples despite this nuisance. We also show how this method readily extends to testing for heterogeneity beyond a given model, which can be useful for assessing the sufficiency of a given scientific theory. We finally apply our method to the National Head Start Impact Study, a large-scale randomized evaluation of a Federal preschool program, finding that there is indeed significant unexplained treatment effect variation.

Journal ArticleDOI
TL;DR: A simple nonparametric method for modal regression, based on a kernel density estimate of the joint distribution of Y and X, is studied, and asymptotic error bounds for this method are derived, and techniques for constructing confidence sets and prediction sets are proposed.
Abstract: Modal regression estimates the local modes of the distribution of $Y$ given $X=x$, instead of the mean, as in the usual regression sense, and can hence reveal important structure missed by usual regression methods. We study a simple nonparametric method for modal regression, based on a kernel density estimate (KDE) of the joint distribution of $Y$ and $X$. We derive asymptotic error bounds for this method, and propose techniques for constructing confidence sets and prediction sets. The latter is used to select the smoothing bandwidth of the underlying KDE. The idea behind modal regression is connected to many others, such as mixture regression and density ridge estimation, and we discuss these ties as well.

Posted Content
TL;DR: In this paper, a posterior variable selection summary is proposed, which distills a full posterior distribution over regression coefficients into a sequence of sparse linear predictors, and then selects a subset of variables for linear models.
Abstract: Selecting a subset of variables for linear models remains an active area of research. This paper reviews many of the recent contributions to the Bayesian model selection and shrinkage prior literature. A posterior variable selection summary is proposed, which distills a full posterior distribution over regression coefficients into a sequence of sparse linear predictors.

Posted Content
TL;DR: The theoretical analysis provides the first full characterization of the asymptotic behavior of both standard and implicit stochastic gradient descent-based estimators, including finite-sample error bounds, and suggests that implicit stochy gradient descent procedures are poised to become a workhorse for approximate inference from large data sets.
Abstract: Stochastic gradient descent procedures have gained popularity for parameter estimation from large data sets. However, their statistical properties are not well understood, in theory. And in practice, avoiding numerical instability requires careful tuning of key parameters. Here, we introduce implicit stochastic gradient descent procedures, which involve parameter updates that are implicitly defined. Intuitively, implicit updates shrink standard stochastic gradient descent updates. The amount of shrinkage depends on the observed Fisher information matrix, which does not need to be explicitly computed; thus, implicit procedures increase stability without increasing the computational burden. Our theoretical analysis provides the first full characterization of the asymptotic behavior of both standard and implicit stochastic gradient descent-based estimators, including finite-sample error bounds. Importantly, analytical expressions for the variances of these stochastic gradient-based estimators reveal their exact loss of efficiency. We also develop new algorithms to compute implicit stochastic gradient descent-based estimators for generalized linear models, Cox proportional hazards, M-estimators, in practice, and perform extensive experiments. Our results suggest that implicit stochastic gradient descent procedures are poised to become a workhorse for approximate inference from large data sets

Journal ArticleDOI
TL;DR: In this article, an ensemble model output statistics (EMOS) method for calibration of wind speed forecasts based on the log-normal (LN) distribution, and also a regime-switching extension of the model which combines the previously studied truncated normal (TN) distribution with the LN.
Abstract: Ensembles of forecasts are obtained from multiple runs of numerical weather forecasting models with different initial conditions and typically employed to account for forecast uncertainties. However, biases and dispersion errors often occur in forecast ensembles, they are usually under-dispersive and uncalibrated and require statistical post-processing. We present an Ensemble Model Output Statistics (EMOS) method for calibration of wind speed forecasts based on the log-normal (LN) distribution, and we also show a regime-switching extension of the model which combines the previously studied truncated normal (TN) distribution with the LN. Both presented models are applied to wind speed forecasts of the eight-member University of Washington mesoscale ensemble, of the fifty-member ECMWF ensemble and of the eleven-member ALADIN-HUNEPS ensemble of the Hungarian Meteorological Service, and their predictive performances are compared to those of the TN and general extreme value (GEV) distribution based EMOS methods and to the TN-GEV mixture model. The results indicate improved calibration of probabilistic and accuracy of point forecasts in comparison to the raw ensemble and to climatological forecasts. Further, the TN-LN mixture model outperforms the traditional TN method and its predictive performance is able to keep up with the models utilizing the GEV distribution without assigning mass to negative values.

Posted Content
TL;DR: In this paper, the authors investigate the problem of overfitting a non-stationary Gaussian random field model to a dataset of annual precipitation in the conterminous US, where exploratory data analysis shows strong evidence of a nonstationary covariance structure.
Abstract: A stationary spatial model is an idealization and we expect that the true dependence structures of physical phenomena are spatially varying, but how should we handle this non-stationarity in practice? We study the challenges involved in applying a flexible non-stationary model to a dataset of annual precipitation in the conterminous US, where exploratory data analysis shows strong evidence of a non-stationary covariance structure. The aim of this paper is to investigate the modelling pipeline once non-stationarity has been detected in spatial data. We show that there is a real danger of over-fitting the model and that careful modelling is necessary in order to properly account for varying second-order structure. In fact, the example shows that sometimes non-stationary Gaussian random fields are not necessary to model non-stationary spatial data.

Posted Content
TL;DR: In this article, the authors consider the problem of estimating the effect of individual specific heterogeneity on the overall contribution of the time varying variables after eliminating the individual-specific heterogeneity can be captured by a relatively small number of available variables whose identities are unknown.
Abstract: We consider estimation and inference in panel data models with additive unobserved individual specific heterogeneity in a high dimensional setting. The setting allows the number of time varying regressors to be larger than the sample size. To make informative estimation and inference feasible, we require that the overall contribution of the time varying variables after eliminating the individual specific heterogeneity can be captured by a relatively small number of the available variables whose identities are unknown. This restriction allows the problem of estimation to proceed as a variable selection problem. Importantly, we treat the individual specific heterogeneity as fixed effects which allows this heterogeneity to be related to the observed time varying variables in an unspecified way and allows that this heterogeneity may be non-zero for all individuals. Within this framework, we provide procedures that give uniformly valid inference over a fixed subset of parameters in the canonical linear fixed effects model and over coefficients on a fixed vector of endogenous variables in panel data instrumental variables models with fixed effects and many instruments. An input to developing the properties of our proposed procedures is the use of a variant of the Lasso estimator that allows for a grouped data structure where data across groups are independent and dependence within groups is unrestricted. We provide formal conditions within this structure under which the proposed Lasso variant selects a sparse model with good approximation properties. We present simulation results in support of the theoretical developments and illustrate the use of the methods in an application aimed at estimating the effect of gun prevalence on crime rates.

Posted Content
TL;DR: In this article, the authors propose to estimate the probability weight of a given model within a mixture model, and show that generic improper priors are acceptable, while not putting convergence in jeopardy.
Abstract: We consider a novel paradigm for Bayesian testing of hypotheses and Bayesian model comparison. Our alternative to the traditional construction of posterior probabilities that a given hypothesis is true or that the data originates from a specific model is to consider the models under comparison as components of a mixture model. We therefore replace the original testing problem with an estimation one that focus on the probability weight of a given model within a mixture model. We analyse the sensitivity on the resulting posterior distribution on the weights of various prior modelling on the weights. We stress that a major appeal in using this novel perspective is that generic improper priors are acceptable, while not putting convergence in jeopardy. Among other features, this allows for a resolution of the Lindley-Jeffreys paradox. When using a reference Beta B(a,a) prior on the mixture weights, we note that the sensitivity of the posterior estimations of the weights to the choice of a vanishes with the sample size increasing and advocate the default choice a=0.5, derived from Rousseau and Mengersen (2011). Another feature of this easily implemented alternative to the classical Bayesian solution is that the speeds of convergence of the posterior mean of the weight and of the corresponding posterior probability are quite similar.

Posted Content
TL;DR: In this paper, the authors present a general solution for multi-target tracking with superpositional measurements, which are functions of the sum of the contributions of the targets present in the surveillance area.
Abstract: In this paper we present a general solution for multi-target tracking with superpositional measurements. Measurements that are functions of the sum of the contributions of the targets present in the surveillance area are called superpositional measurements. We base our modelling on Labeled Random Finite Set (RFS) in order to jointly estimate the number of targets and their trajectories. This modelling leads to a labeled version of Mahler's multi-target Bayes filter. However, a straightforward implementation of this tracker using Sequential Monte Carlo (SMC) methods is not feasible due to the difficulties of sampling in high dimensional spaces. We propose an efficient multi-target sampling strategy based on Superpositional Approximate CPHD (SA-CPHD) filter and the recently introduced Labeled Multi-Bernoulli (LMB) and Vo-Vo densities. The applicability of the proposed approach is verified through simulation in a challenging radar application with closely spaced targets and low signal-to-noise ratio.

Posted Content
TL;DR: In this article, the authors used the underlying geometry of Hamiltonian Monte Carlo to construct a universal optimization criteria for tuning the step size of the symplectic integrator crucial to any implementation of the algorithm and diagnostics to monitor for any signs of invalidity.
Abstract: Hamiltonian Monte Carlo can provide powerful inference in complex statistical problems, but ultimately its performance is sensitive to various tuning parameters. In this paper we use the underlying geometry of Hamiltonian Monte Carlo to construct a universal optimization criteria for tuning the step size of the symplectic integrator crucial to any implementation of the algorithm as well as diagnostics to monitor for any signs of invalidity. An immediate outcome of this result is that the suggested target average acceptance probability of 0.651 can be relaxed to $0.6 \lesssim a \lesssim 0.9$ with larger values more robust in practice.

Journal ArticleDOI
TL;DR: In this article, the authors consider density ridges, which are a higher-dimensional extension of modes and show that the distribution of the estimated ridge converges to a Gaussian process.
Abstract: The large sample theory of estimators for density modes is well understood. In this paper we consider density ridges, which are a higher-dimensional extension of modes. Modes correspond to zero-dimensional, local high-density regions in point clouds. Density ridges correspond to $s$-dimensional, local high-density regions in point clouds. We establish three main results. First we show that under appropriate regularity conditions, the local variation of the estimated ridge can be approximated by an empirical process. Second, we show that the distribution of the estimated ridge converges to a Gaussian process. Third, we establish that the bootstrap leads to valid confidence sets for density ridges.

Posted Content
TL;DR: The cyclic coordinate descent algorithm of Friedman, Hastie, and Tibshirani (2010) is applied to the fitting of a conditional logistic regression model with lasso and elastic net penalties and it is found that the conditional model performs admirably on datasets drawn from a suitable conditional distribution, outperforming its unconditional counterpart at variable selection.
Abstract: We apply the cyclic coordinate descent algorithm of Friedman, Hastie and Tibshirani (2010) to the fitting of a conditional logistic regression model with lasso ($\ell_1$) and elastic net penalties. The sequential strong rules of Tibshirani et al (2012) are also used in the algorithm and it is shown that these offer a considerable speed up over the standard coordinate descent algorithm with warm starts. Once implemented, the algorithm is used in simulation studies to compare the variable selection and prediction performance of the conditional logistic regression model against that of its unconditional (standard) counterpart. We find that the conditional model performs admirably on datasets drawn from a suitable conditional distribution, outperforming its unconditional counterpart at variable selection. The conditional model is also fit to a small real world dataset, demonstrating how we obtain regularisation paths for the parameters of the model and how we apply cross validation for this method where natural unconditional prediction rules are hard to come by.

Journal ArticleDOI
TL;DR: In this paper, the authors investigated the concept of spatial distribution for data in infinite dimensional Banach spaces and proved some Glivenko-Cantelli and Donsker-type results for the empirical spatial distribution process in finite dimensional spaces.
Abstract: The spatial distribution has been widely used to develop various nonparametric procedures for finite dimensional multivariate data. In this paper, we investigate the concept of spatial distribution for data in infinite dimensional Banach spaces. Many technical difficulties are encountered in such spaces that are primarily due to the noncompactness of the closed unit ball. In this work, we prove some Glivenko-Cantelli and Donsker-type results for the empirical spatial distribution process in infinite dimensional spaces. The spatial quantiles in such spaces can be obtained by inverting the spatial distribution function. A Bahadur-type asymptotic linear representation and the associated weak convergence results for the sample spatial quantiles in infinite dimensional spaces are derived. A study of the asymptotic efficiency of the sample spatial median relative to the sample mean is carried out for some standard probability distributions in function spaces. The spatial distribution can be used to define the spatial depth in infinite dimensional Banach spaces, and we study the asymptotic properties of the empirical spatial depth in such spaces. We also demonstrate the spatial quantiles and the spatial depth using some real and simulated functional data.

Journal ArticleDOI
TL;DR: Empirical Bayes methods use the data from parallel experiments, for instance observations Xk ~ 𝒩 (Θ k , 1) for k = 1, 2, …, N, to estimate the conditional distributions Θ k |Xk .
Abstract: Empirical Bayes methods use the data from parallel experiments, for instance, observations $X_k\sim\mathcal{N}(\Theta_k,1)$ for $k=1,2,\ldots,N$, to estimate the conditional distributions $\Theta_k|X_k$. There are two main estimation strategies: modeling on the $\theta$ space, called "$g$-modeling" here, and modeling on the $x$ space, called "$f$-modeling." The two approaches are described and compared. A series of computational formulas are developed to assess their frequentist accuracy. Several examples, both contrived and genuine, show the strengths and limitations of the two strategies.