scispace - formally typeset
Search or ask a question

Showing papers in "Journal of the American Statistical Association in 2020"


Journal ArticleDOI
TL;DR: An intuitive and easy-to-implement nonparametric density estimator based on local polynomial techniques that is fully boundary adaptive and automatic, but does not require prebinning or any other transformation of the data is introduced.
Abstract: This article introduces an intuitive and easy-to-implement nonparametric density estimator based on local polynomial techniques. The estimator is fully boundary adaptive and automatic, but does not...

235 citations


Journal ArticleDOI
TL;DR: Combining individual p-values to aggregate multiple small effects has a long-standing interest in statistics, dating back to the classic Fisher's combination test in modern large-scale data sets as discussed by the authors.
Abstract: –Combining individual p-values to aggregate multiple small effects has a long-standing interest in statistics, dating back to the classic Fisher’s combination test In modern large-scale da

196 citations


Journal ArticleDOI
TL;DR: Judea Pearl is a giant in the field of causal inference whose many contributions, including the discovery of the d-separation criterion, have been immeasurably valuable as discussed by the authors.
Abstract: Judea Pearl is a giant in the field of causal inference, whose many contributions, including the discovery of the d-separation criterion, have been immeasurably valuable. He, along with science wri...

146 citations


Journal ArticleDOI
TL;DR: A sharp phase transition is established for robust estimation of regression parameters in both low and high dimensions: when, the estimator admits a sub- Gaussian-type deviation bound without sub-Gaussian assumptions on the data, while only a slower rate is available in the regime and the transition is smooth and optimal.
Abstract: Big data can easily be contaminated by outliers or contain variables with heavy-tailed distributions, which makes many conventional methods inadequate. To address this challenge, we propose the adaptive Huber regression for robust estimation and inference. The key observation is that the robustification parameter should adapt to the sample size, dimension and moments for optimal tradeoff between bias and robustness. Our theoretical framework deals with heavy-tailed distributions with bounded (1 + δ)-th moment for any δ > 0. We establish a sharp phase transition for robust estimation of regression parameters in both low and high dimensions: when δ ≥ 1, the estimator admits a sub-Gaussian-type deviation bound without sub-Gaussian assumptions on the data, while only a slower rate is available in the regime 0 < δ < 1 and the transition is smooth and optimal. In addition, we extend the methodology to allow both heavy-tailed predictors and observation noise. Simulation studies lend further support to the theory. In a genetic study of cancer cell lines that exhibit heavy-tailedness, the proposed methods are shown to be more robust and predictive.

122 citations


Journal ArticleDOI
TL;DR: A new reinforcement learning method is proposed for estimating an optimal treatment regime that is applicable to data collected using mobile technologies in an outpatient setting and accommodates an indefinite time horizon and minute-by-minute decision making that are common in mobile health applications.
Abstract: The vision for precision medicine is to use individual patient characteristics to inform a personalized treatment plan that leads to the best possible healthcare for each patient. Mobile technologi...

97 citations


Journal ArticleDOI
TL;DR: In this article, a lack of methodological results to design efficient Markov chain Monte Carlo (MCMC) algorithms for statistical models with discrete-valued high-dimensional parameters is discussed.
Abstract: There is a lack of methodological results to design efficient Markov chain Monte Carlo ( MCMC) algorithms for statistical models with discrete-valued high-dimensional parameters. Motivated by this ...

93 citations


Journal ArticleDOI
TL;DR: Several key discrepancies will be examined, centering on the differences between prediction and estimation or prediction and attribution (significance testing), most of the discussion is carried out through small numerical examples.
Abstract: The scientific needs and computational limitations of the twentieth century fashioned classical statistical methodology. Both the needs and limitations have changed in the twenty-first, and so has ...

93 citations


Journal ArticleDOI
TL;DR: A new method for efficiently computing the expected SFS and linear functionals of it, for demographies described by general directed acyclic graphs is presented, which can scale to more populations than previously possible for complex demographic histories including admixture.
Abstract: The sample frequency spectrum (SFS), or histogram of allele counts, is an important summary statistic in evolutionary biology, and is often used to infer the history of population size changes, mig...

79 citations


Journal ArticleDOI
TL;DR: This article provides uniform rates of convergence for the inferred community membership vector of each node in a network generated from the mixed membership stochastic blockmodel (MMSB) by establishing sharp row-wise eigenvector deviation bounds for MMSB, the first work to establish per-node rates for overlapping community detection in networks.
Abstract: We consider the problem of estimating community memberships of nodes in a network, where every node is associated with a vector determining its degree of membership in each community. Existing prov...

73 citations


Journal ArticleDOI
TL;DR: This work proposes a framework that allows flexible models for the observed data and a clean separation of the identified and unidentified parts of the sensitivity model, and provides heuristics for calibrating these parameters against observable quantities.
Abstract: A fundamental challenge in observational causal inference is that assumptions about unconfoundedness are not testable from data. Assessing sensitivity to such assumptions is therefore important in ...

70 citations


Journal ArticleDOI
TL;DR: In this article, the temporal sum of the curve process is shown to be asymptotically normally distributed, and the conditi cientity of the time series with long-range dependence is analyzed.
Abstract: We introduce methods and theory for functional or curve time series with long-range dependence. The temporal sum of the curve process is shown to be asymptotically normally distributed, the conditi...

Journal ArticleDOI
TL;DR: In this article, sensitivity analyses quantifying the confounding effect of random effects meta-analyses of observational studies are proposed to quantify the effect of unmeasured confounding in synthesized studies.
Abstract: Random-effects meta-analyses of observational studies can produce biased estimates if the synthesized studies are subject to unmeasured confounding. We propose sensitivity analyses quantifying the ...

Journal ArticleDOI
TL;DR: The synthetic control (SC) method, a powerful tool for estimating average treatment effects (ATE), is increasingly popular in fields such as statistics, economics, political science, and mathematics as mentioned in this paper.
Abstract: The synthetic control (SC) method, a powerful tool for estimating average treatment effects (ATE), is increasingly popular in fields such as statistics, economics, political science, and ma...

Journal ArticleDOI
TL;DR: In this paper, the authors define a coefficient of correlation which is as simple as Pearson's correlation or Spearman's correlation, and yet consistently scales well with the number of pairs in a dataset.
Abstract: –Is it possible to define a coefficient of correlation which is (a) as simple as the classical coefficients like Pearson’s correlation or Spearman’s correlation, and yet (b) consistently es...

Journal ArticleDOI
TL;DR: This article derives optimal Poisson subsampling probabilities in the context of quasi-likelihood estimation under the A- and L-optimality criteria, and establishes the consistency and asymptotic normality of the resultant estimators.
Abstract: Nonuniform subsampling methods are effective to reduce computational burden and maintain estimation efficiency for massive data. Existing methods mostly focus on subsampling with replacement due to...

Journal ArticleDOI
TL;DR: Glioblastoma multiforme (GBM) is an aggressive form of human brain cancer that is under active study in the field of cancer biology as discussed by the authors, and its rapid progression and the relative time cost of obtaining mo...
Abstract: Glioblastoma multiforme (GBM) is an aggressive form of human brain cancer that is under active study in the field of cancer biology. Its rapid progression and the relative time cost of obtaining mo...

Journal ArticleDOI
TL;DR: In this paper, an unbiased estimator of smoothing is proposed for state-space models with noisy measurements related to the process, and the estimator is shown to be unbiased.
Abstract: In state–space models, smoothing refers to the task of estimating a latent stochastic process given noisy measurements related to the process. We propose an unbiased estimator of smoothing expectat...

Journal ArticleDOI
TL;DR: This paper established a general framework for statistical inferences with non-probability survey samples when relevant auxiliary information is available from a probability survey sample, and developed a rigoro-...
Abstract: We establish a general framework for statistical inferences with nonprobability survey samples when relevant auxiliary information is available from a probability survey sample. We develop a rigoro...

Journal ArticleDOI
TL;DR: In this article, the authors provide theorems for reproducibility in big data applications with general high-dimensional nonlinear models, which are key to enabling refined scientific discoveries.
Abstract: Power and reproducibility are key to enabling refined scientific discoveries in contemporary big data applications with general high-dimensional nonlinear models. In this article, we provide theore...

Journal ArticleDOI
TL;DR: It is shown that the IPW estimator can have different (Gaussian or non-Gaussian) asymptotic distributions, depending on how “close to zero” the probability weights are and on how large the trimming threshold is, and an inference procedure is proposed that remains valid with the use of a data-driven trimmedming threshold.
Abstract: Inverse probability weighting (IPW) is widely used in empirical work in economics and other disciplines. As Gaussian approximations perform poorly in the presence of “small denominators,” trimming ...

Journal ArticleDOI
TL;DR: In this paper, a general framework for incorporating domain and prior knowledge in the matrix factor model through linear constraints is proposed, which is useful in achieving parsimonious parameterization, facilitating interpretation of the latent matrix factor, and identifying specific factors of interest.
Abstract: High-dimensional matrix-variate time series data are becoming widely available in many scientific fields, such as economics, biology, and meteorology. To achieve significant dimension reduction while preserving the intrinsic matrix structure and temporal dynamics in such data, Wang, Liu, and Chen proposed a matrix factor model, that is, shown to be able to provide effective analysis. In this article, we establish a general framework for incorporating domain and prior knowledge in the matrix factor model through linear constraints. The proposed framework is shown to be useful in achieving parsimonious parameterization, facilitating interpretation of the latent matrix factor, and identifying specific factors of interest. Fully utilizing the prior-knowledge-induced constraints results in more efficient and accurate modeling, inference, dimension reduction as well as a clear and better interpretation of the results. Constrained, multi-term, and partially constrained factor models for matrix-variate ti...

Journal ArticleDOI
TL;DR: In this paper, the authors consider estimation of causal effects combining big main data with unmeasured confounders and smaller validation data with supplementary information on these confoundsers, and propose appropriate bootstrap procedures to implement using software routines for existing estimators.
Abstract: The era of big data has witnessed an increasing availability of multiple data sources for statistical analyses. We consider estimation of causal effects combining big main data with unmeasured confounders and smaller validation data with supplementary information on these confounders. Under the unconfoundedness assumption with completely observed confounders, the smaller validation data allow for constructing consistent estimators for causal effects, but the big main data can only give error-prone estimators in general. However, by leveraging the information in the big main data in a principled way, we can improve the estimation efficiencies yet preserve the consistencies of the initial estimators based solely on the validation data. Our framework applies to asymptotically normal estimators, including the commonly used regression imputation, weighting, and matching estimators, and does not require a correct specification of the model relating the unmeasured confounders to the observed variables. We also propose appropriate bootstrap procedures, which makes our method straightforward to implement using software routines for existing estimators. Supplementary materials for this article are available online.

Journal ArticleDOI
TL;DR: In this article, the authors propose separable effects to study the causal effect of a treatment on an e ect in time-to-event settings, where the presence of competing events complicates the definition of causal effects.
Abstract: In time-to-event settings, the presence of competing events complicates the definition of causal effects. Here we propose the new separable effects to study the causal effect of a treatment on an e...

Journal ArticleDOI
TL;DR: In this paper, a new approach for sequential monitoring of a general class of parameters of a d-dimensional time series, which can be estimated by approximately linear functionals of t.
Abstract: In this article, we propose a new approach for sequential monitoring of a general class of parameters of a d-dimensional time series, which can be estimated by approximately linear functionals of t...

Journal ArticleDOI
TL;DR: In statistical prediction, classical approaches for model selection and model evaluation based on covariance penalties are still widely used as discussed by the authors, and most of the literature on this topic is based on what w...
Abstract: In statistical prediction, classical approaches for model selection and model evaluation based on covariance penalties are still widely used. Most of the literature on this topic is based on what w...

Journal ArticleDOI
TL;DR: A fast and efficient screening procedure based on the spectral norm of each coefficient matrix to deal with the case when the number of covariates is extremely large and a theoretical guarantee for the overall solution of the two-step screening and estimation procedure is established.
Abstract: The aim of this article is to develop a low-rank linear regression model to correlate a high-dimensional response matrix with a high-dimensional vector of covariates when coefficient matrices have ...

Journal ArticleDOI
TL;DR: The proposed angle-based direct learning (AD-learning) to efficiently estimate optimal ITRs with multiple treatments has an interesting geometric interpretation on the effect of different treatments for each individual patient, which can help doctors and patients make better decisions.
Abstract: Estimating an optimal individualized treatment rule (ITR) based on patients’ information is an important problem in precision medicine. An optimal ITR is a decision function that optimizes patients...

Journal ArticleDOI
TL;DR: In this paper, the authors propose a correlation measure that can detect general dependencies in statistics and machine learning, but also crucial to general scientific discovery and understanding and developing such correlation measure is not only imperative to statistics and Machine Learning, and also crucial for general scientific research.
Abstract: Understanding and developing a correlation measure that can detect general dependencies is not only imperative to statistics and machine learning, but also crucial to general scientific discovery i...

Journal ArticleDOI
TL;DR: In this paper, the number of common factors in high-dimensional factor models is determined based on the eigenvalues of the covariance matrix, and the existing literature is mainly based on eigenvectors of the matrix.
Abstract: Determining the number of common factors is an important and practical topic in high-dimensional factor models. The existing literature is mainly based on the eigenvalues of the covariance matrix. ...

Journal ArticleDOI
TL;DR: The Ball Covariance is proposed as a generic measure of dependence between two random objects in two possibly different Banach spaces and is nonparametric and model-free, which make the proposed measure robust to model mis-specification.
Abstract: Technological advances in science and engineering have led to the routine collection of large and complex data objects, where the dependence structure among those objects is often of great interest...