scispace - formally typeset
Search or ask a question

Showing papers in "Journal of the American Statistical Association in 2016"


Journal ArticleDOI
TL;DR: In this article, a general framework for smoothing parameter estimation for models with regular likelihoods constructed in terms of unknown smooth functions of covariates is discussed, where the smoothing parameters controlling the extent of penalization are estimated by Laplace approximate marginal likelihood.
Abstract: This article discusses a general framework for smoothing parameter estimation for models with regular likelihoods constructed in terms of unknown smooth functions of covariates. Gaussian random effects and parametric terms may also be present. By construction the method is numerically stable and convergent, and enables smoothing parameter uncertainty to be quantified. The latter enables us to fix a well known problem with AIC for such models, thereby improving the range of model selection tools available. The smooth functions are represented by reduced rank spline like smoothers, with associated quadratic penalties measuring function smoothness. Model estimation is by penalized likelihood maximization, where the smoothing parameters controlling the extent of penalization are estimated by Laplace approximate marginal likelihood. The methods cover, for example, generalized additive models for nonexponential family responses (e.g., beta, ordered categorical, scaled t distribution, negative binomial a...

782 citations


Journal ArticleDOI
TL;DR: A class of highly scalable nearest-neighbor Gaussian process (NNGP) models to provide fully model-based inference for large geostatistical datasets are developed and it is established that the NNGP is a well-defined spatial process providing legitimate finite-dimensional Gaussian densities with sparse precision matrices.
Abstract: Spatial process models for analyzing geostatistical data entail computations that become prohibitive as the number of spatial locations become large. This article develops a class of highly scalable nearest-neighbor Gaussian process (NNGP) models to provide fully model-based inference for large geostatistical datasets. We establish that the NNGP is a well-defined spatial process providing legitimate finite-dimensional Gaussian densities with sparse precision matrices. We embed the NNGP as a sparsity-inducing prior within a rich hierarchical modeling framework and outline how computationally efficient Markov chain Monte Carlo (MCMC) algorithms can be executed without storing or decomposing large matrices. The floating point operations (flops) per iteration of this algorithm is linear in the number of spatial locations, thereby rendering substantial scalability. We illustrate the computational and inferential benefits of the NNGP over competing methods using simulation studies and also analyze fores...

543 citations


Journal ArticleDOI
TL;DR: A hierarchical mixed membership model for analyzing topical content of documents, in which mixing weights are parameterized by observed covariates is posit, enabling researchers to introduce elements of the experimental design that informed document collection into the model, within a generally applicable framework.
Abstract: Statistical models of text have become increasingly popular in statistics and computer science as a method of exploring large document collections. Social scientists often want to move beyond exploration, to measurement and experimentation, and make inference about social and political processes that drive discourse and content. In this article, we develop a model of text data that supports this type of substantive research. Our approach is to posit a hierarchical mixed membership model for analyzing topical content of documents, in which mixing weights are parameterized by observed covariates. In this model, topical prevalence and topical content are specified as a simple generalized linear model on an arbitrary number of document-level covariates, such as news source and time of release, enabling researchers to introduce elements of the experimental design that informed document collection into the model, within a generally applicable framework. We demonstrate the proposed methodology by analyzi...

429 citations


Journal ArticleDOI
TL;DR: In this article, the authors propose new inference tools for forward stepwise regression, least angle regression, and the lasso, which can be expressed as polyhedral constraints on the observation vector y.
Abstract: We propose new inference tools for forward stepwise regression, least angle regression, and the lasso. Assuming a Gaussian model for the observation vector y, we first describe a general scheme to perform valid inference after any selection event that can be characterized as y falling into a polyhedral set. This framework allows us to derive conditional (post-selection) hypothesis tests at any step of forward stepwise or least angle regression, or any step along the lasso regularization path, because, as it turns out, selection events for these procedures can be expressed as polyhedral constraints on y. The p-values associated with these tests are exactly uniform under the null distribution, in finite samples, yielding exact Type I error control. The tests can also be inverted to produce confidence intervals for appropriate underlying regression parameters. The R package selectiveInference, freely available on the CRAN repository, implements the new inference tools described in this article. Suppl...

292 citations


Journal ArticleDOI
TL;DR: A fast penalized ℓ1 estimation method, called sisVIVE, is introduced for estimating the causal effect without knowing which instruments are valid, with theoretical guarantees on its performance.
Abstract: Instrumental variables have been widely used for estimating the causal effect between exposure and outcome Conventional estimation methods require complete knowledge about all the instruments’ validity; a valid instrument must not have a direct effect on the outcome and not be related to unmeasured confounders Often, this is impractical as highlighted by Mendelian randomization studies where genetic markers are used as instruments and complete knowledge about instruments’ validity is equivalent to complete knowledge about the involved genes’ functions In this article, we propose a method for estimation of causal effects when this complete knowledge is absent It is shown that causal effects are identified and can be estimated as long as less than 50% of instruments are invalid, without knowing which of the instruments are invalid We also introduce conditions for identification when the 50% threshold is violated A fast penalized l1 estimation method, called sisVIVE, is introduced for estimating the ca

261 citations


Journal ArticleDOI
TL;DR: The generalized fiducial inference (GFI) as mentioned in this paper generalizes the idea of Fisher's approach by transferring randomness from the data to the parameter space using an inverse of a data-generating equation without the use of Bayes' theorem.
Abstract: R. A. Fisher, the father of modern statistics, proposed the idea of fiducial inference during the first half of the 20th century. While his proposal led to interesting methods for quantifying uncertainty, other prominent statisticians of the time did not accept Fisher’s approach as it became apparent that some of Fisher’s bold claims about the properties of fiducial distribution did not hold up for multi-parameter problems. Beginning around the year 2000, the authors and collaborators started to reinvestigate the idea of fiducial inference and discovered that Fisher’s approach, when properly generalized, would open doors to solve many important and difficult inference problems. They termed their generalization of Fisher’s idea as generalized fiducial inference (GFI). The main idea of GFI is to carefully transfer randomness from the data to the parameter space using an inverse of a data-generating equation without the use of Bayes’ theorem. The resulting generalized fiducial distribution (GFD) can ...

182 citations


Journal ArticleDOI
TL;DR: In this paper, the authors propose stationary covariance functions for processes that evolve temporally over a sphere, as well as cross-covariance function for multivariate random fields defined over a circle.
Abstract: In this article, we propose stationary covariance functions for processes that evolve temporally over a sphere, as well as cross-covariance functions for multivariate random fields defined over a sphere For such processes, the great circle distance is the natural metric that should be used to describe spatial dependence Given the mathematical difficulties for the construction of covariance functions for processes defined over spheres cross time, approximations of the state of nature have been proposed in the literature by using the Euclidean (based on map projections) and the chordal distances We present several methods of construction based on the great circle distance and provide closed-form expressions for both spatio-temporal and multivariate cases A simulation study assesses the discrepancy between the great circle distance, chordal distance, and Euclidean distance based on a map projection both in terms of estimation and prediction in a space-time and a bivariate spatial setting, where t

155 citations


Journal ArticleDOI
TL;DR: An unsupervised approach for linking records across arbitrarily many files, while simultaneously detecting duplicate records within files, which lends itself to an efficient, linear-time, hybrid Markov chain Monte Carlo algorithm, which overcomes many obstacles encountered by previously record linkage approaches.
Abstract: We propose an unsupervised approach for linking records across arbitrarily many files, while simultaneously detecting duplicate records within files. Our key innovation involves the representation of the pattern of links between records as a bipartite graph, in which records are directly linked to latent true individuals, and only indirectly linked to other records. This flexible representation of the linkage structure naturally allows us to estimate the attributes of the unique observable people in the population, calculate transitive linkage probabilities across records (and represent this visually), and propagate the uncertainty of record linkage into later analyses. Our method makes it particularly easy to integrate record linkage with post-processing procedures such as logistic regression, capture–recapture, etc. Our linkage structure lends itself to an efficient, linear-time, hybrid Markov chain Monte Carlo algorithm, which overcomes many obstacles encountered by previously record linkage ap...

128 citations


Journal ArticleDOI
TL;DR: In this article, a framework for using optical player tracking data to estimate, in real time, the expected number of points obtained by the end of a possession was proposed. But this method relies on discretized summaries of the game that reduce such interactions to tallies of points, assists, and similar events.
Abstract: Basketball games evolve continuously in space and time as players constantly interact with their teammates, the opposing team, and the ball. However, current analyses of basketball outcomes rely on discretized summaries of the game that reduce such interactions to tallies of points, assists, and similar events. In this article, we propose a framework for using optical player tracking data to estimate, in real time, the expected number of points obtained by the end of a possession. This quantity, called expected possession value (EPV), derives from a stochastic process model for the evolution of a basketball possession. We model this process at multiple levels of resolution, differentiating between continuous, infinitesimal movements of players, and discrete events such as shot attempts and turnovers. Transition kernels are estimated using hierarchical spatiotemporal models that share information across players while remaining computationally tractable on very large data sets. In addition to estima...

118 citations


Journal ArticleDOI
TL;DR: A weight choice criterion based on the Kullback–Leibler loss with a penalty term is proposed and it is proved that the corresponding model averaging estimator is asymptotically optimal under certain assumptions.
Abstract: Considering model averaging estimation in generalized linear models, we propose a weight choice criterion based on the Kullback–Leibler (KL) loss with a penalty term. This criterion is different from that for continuous observations in principle, but reduces to the Mallows criterion in the situation. We prove that the corresponding model averaging estimator is asymptotically optimal under certain assumptions. We further extend our concern to the generalized linear mixed-effects model framework and establish associated theory. Numerical experiments illustrate that the proposed method is promising.

114 citations


Journal ArticleDOI
TL;DR: A set of very general constraints are identified that link internal and external models and these constraints are used to develop a framework for semiparametric maximum likelihood inference that allows the distribution of covariates to be estimated using either the internal sample or an external reference sample.
Abstract: Information from various public and private data sources of extremely large sample sizes are now increasingly available for research purposes. Statistical methods are needed for using information from such big data sources while analyzing data from individual studies that may collect more detailed information required for addressing specific hypotheses of interest. In this article, we consider the problem of building regression models based on individual-level data from an “internal” study while using summary-level information, such as information on parameters for reduced models, from an “external” big data source. We identify a set of very general constraints that link internal and external models. These constraints are used to develop a framework for semiparametric maximum likelihood inference that allows the distribution of covariates to be estimated using either the internal sample or an external reference sample. We develop extensions for handling complex stratified sampling designs, such as case-co...

Journal ArticleDOI
TL;DR: The ergodicity of the approximate Markov chain is proved, showing that it samples asymptotically from the exact posterior distribution of interest, and variations of the algorithm that employ either local polynomial approximations or local Gaussian process regressors are described.
Abstract: We construct a new framework for accelerating Markov chain Monte Carlo in posterior sampling problems where standard methods are limited by the computational cost of the likelihood, or of numerical models embedded therein. Our approach introduces local approximations of these models into the Metropolis–Hastings kernel, borrowing ideas from deterministic approximation theory, optimization, and experimental design. Previous efforts at integrating approximate models into inference typically sacrifice either the sampler’s exactness or efficiency; our work seeks to address these limitations by exploiting useful convergence characteristics of local approximations. We prove the ergodicity of our approximate Markov chain, showing that it samples asymptotically from the exact posterior distribution of interest. We describe variations of the algorithm that employ either local polynomial approximations or local Gaussian process regressors. Our theoretical results reinforce the key observation underlying this...

Journal ArticleDOI
TL;DR: It is argued that the self-exciting point process models adequately capture major temporal clustering features in the data and perform better than traditional stationary Poisson models.
Abstract: We propose various self-exciting point process models for the times when e-mails are sent between individuals in a social network. Using an expectation–maximization (EM)-type approach, we fit these models to an e-mail network dataset from West Point Military Academy and the Enron e-mail dataset. We argue that the self-exciting models adequately capture major temporal clustering features in the data and perform better than traditional stationary Poisson models. We also investigate how accounting for diurnal and weekly trends in e-mail activity improves the overall fit to the observed network data. A motivation and application for fitting these self-exciting models is to use parameter estimates to characterize important e-mail communication behaviors such as the baseline sending rates, average reply rates, and average response times. A primary goal is to use these features, estimated from the self-exciting models, to infer the underlying leadership status of users in the West Point and Enron network...

Journal ArticleDOI
TL;DR: This work deploys intermediate factor rotations throughout the learning process, greatly enhancing the effectiveness of sparsity inducing priors, within a PXL-EM algorithm, a Bayesian variant of parameter-expanded EM for posterior mode detection.
Abstract: Rotational post hoc transformations have traditionally played a key role in enhancing the interpretability of factor analysis. Regularization methods also serve to achieve this goal by prioritizing sparse loading matrices. In this work, we bridge these two paradigms with a unifying Bayesian framework. Our approach deploys intermediate factor rotations throughout the learning process, greatly enhancing the effectiveness of sparsity inducing priors. These automatic rotations to sparsity are embedded within a PXL-EM algorithm, a Bayesian variant of parameter-expanded EM for posterior mode detection. By iterating between soft-thresholding of small factor loadings and transformations of the factor basis, we obtain (a) dramatic accelerations, (b) robustness against poor initializations, and (c) better oriented sparse solutions. To avoid the prespecification of the factor cardinality, we extend the loading matrix to have infinitely many columns with the Indian buffet process (IBP) prior. The factor dimen...

Journal ArticleDOI
TL;DR: Conditional Sure Independence Screening (CSIS) as mentioned in this paper is a family of alternative screening methods by different choices of the conditioning set and can help reduce the false positive and false negative selections when covariates are highly correlated.
Abstract: Independence screening is powerful for variable selection when the number of variables is massive Commonly used independence screening methods are based on marginal correlations or its variants When some prior knowledge on a certain important set of variables is available, a natural assessment on the relative importance of the other predictors is their conditional contributions to the response given the known set of variables This results in conditional sure independence screening (CSIS) CSIS produces a rich family of alternative screening methods by different choices of the conditioning set and can help reduce the number of false positive and false negative selections when covariates are highly correlated This article proposes and studies CSIS in generalized linear models We give conditions under which sure screening is possible and derive an upper bound on the number of selected variables We also spell out the situation under which CSIS yields model selection consistency and the propertie

Journal ArticleDOI
TL;DR: It is shown that words that are both frequent and exclusive to a theme are more effective at characterizing topical content, and a regularization scheme is proposed that leads to better estimates of these quantities.
Abstract: An ongoing challenge in the analysis of document collections is how to summarize content in terms of a set of inferred themes that can be interpreted substantively in terms of topics. The current practice of parameterizing the themes in terms of most frequent words limits interpretability by ignoring the differential use of words across topics. Here, we show that words that are both frequent and exclusive to a theme are more effective at characterizing topical content, and we propose a regularization scheme that leads to better estimates of these quantities. We consider a supervised setting where professional editors have annotated documents to topic categories, organized into a tree, in which leaf-nodes correspond to more specific topics. Each document is annotated to multiple categories, at different levels of the tree. We introduce a hierarchical Poisson convolution model to analyze these annotated documents. A parallelized Hamiltonian Monte Carlo sampler allows the inference to scale to millio...

Journal ArticleDOI
TL;DR: A new time series model, the slowly evolving locally stationary process (SEv-LSP), is designed to capture nonstationarity both within a trial and across trials and is used to demonstrate the evolving dynamics between the hippocampus and the nucleus accumbens during an associative learning experiment.
Abstract: We develop a new time series model to investigate the dynamic interactions between the nucleus accumbens and the hippocampus during an associative learning experiment. Preliminary analyses indicated that the spectral properties of the local field potentials at these two regions changed over the trials of the experiment. While many models already take into account nonstationarity within a single trial, the evolution of the dynamics across trials is often ignored. Our proposed model, the slowly evolving locally stationary process (SEv-LSP), is designed to capture nonstationarity both within a trial and across trials. We rigorously define the evolving evolutionary spectral density matrix, which we estimate using a two-stage procedure. In the first stage, we compute the within-trial time-localized periodogram matrix. In the second stage, we develop a data-driven approach that combines information from trial-specific local periodogram matrices. Through simulation studies, we show the utility of our pro...

Journal ArticleDOI
TL;DR: The authors provide a detailed overview of the literature on the I-optimal design of mixture experiments and identify several contradictions, and present continuous I-optimality designs for the second-order and the special cubic model.
Abstract: In mixture experiments, the factors under study are proportions of the ingredients of a mixture. The special nature of the factors necessitates specific types of regression models, and specific types of experimental designs. Although mixture experiments usually are intended to predict the response(s) for all possible formulations of the mixture and to identify optimal proportions for each of the ingredients, little research has been done concerning their I-optimal design. This is surprising given that I-optimal designs minimize the average variance of prediction and, therefore, seem more appropriate for mixture experiments than the commonly used D-optimal designs, which focus on a precise model estimation rather than precise predictions. In this article, we provide the first detailed overview of the literature on the I-optimal design of mixture experiments and identify several contradictions. For the second-order and the special cubic model, we present continuous I-optimal designs and contrast the...

Journal ArticleDOI
TL;DR: A new statistical tool known as InSilicoVA is developed to classify cause of death using information acquired through verbal autopsy, which shares uncertainty between cause ofdeath assignments for specific individuals and the distribution of deaths by cause across the population.
Abstract: In regions without complete-coverage civil registration and vital statistics systems there is uncertainty about even the most basic demographic indicators. In such regions the majority of deaths occur outside hospitals and are not recorded. Worldwide, fewer than one-third of deaths are assigned a cause, with the least information available from the most impoverished nations. In populations like this, verbal autopsy (VA) is a commonly used tool to assess cause of death and estimate cause-specific mortality rates and the distribution of deaths by cause. VA uses an interview with caregivers of the decedent to elicit data describing the signs and symptoms leading up to the death. This paper develops a new statistical tool known as InSilicoVA to classify cause of death using information acquired through VA. InSilicoVA shares uncertainty between cause of death assignments for specific individuals and the distribution of deaths by cause across the population. Using side-by-side comparisons with both observed and simulated data, we demonstrate that InSilicoVA has distinct advantages compared to currently available methods.

Journal ArticleDOI
TL;DR: Extremal depth (ED) as discussed by the authors is a new notion for functional data, which is based on a measure of extreme "outlyingness" and is especially suited for obtaining central regions of functional data and function spaces.
Abstract: We propose a new notion called “extremal depth” (ED) for functional data, discuss its properties, and compare its performance with existing concepts. The proposed notion is based on a measure of extreme “outlyingness.” ED has several desirable properties that are not shared by other notions and is especially well suited for obtaining central regions of functional data and function spaces. In particular: (a) the central region achieves the nominal (desired) simultaneous coverage probability; (b) there is a correspondence between ED-based (simultaneous) central regions and appropriate pointwise central regions; and (c) the method is resistant to certain classes of functional outliers. The article examines the performance of ED and compares it with other depth notions. Its usefulness is demonstrated through applications to constructing central regions, functional boxplots, outlier detection, and simultaneous confidence bands in regression problems. Supplementary materials for this article are availab...

Journal ArticleDOI
TL;DR: A randomized trial design where candidate dose levels assigned to study subjects are randomly chosen from a continuous distribution within a safe range is advocated and an outcome weighted learning method based on a nonconvex loss function is proposed, which can be solved efficiently using a difference of convex functions algorithm.
Abstract: In dose-finding clinical trials, it is becoming increasingly important to account for individual level heterogeneity while searching for optimal doses to ensure an optimal individualized dose rule (IDR) maximizes the expected beneficial clinical outcome for each individual. In this paper, we advocate a randomized trial design where candidate dose levels assigned to study subjects are randomly chosen from a continuous distribution within a safe range. To estimate the optimal IDR using such data, we propose an outcome weighted learning method based on a nonconvex loss function, which can be solved efficiently using a difference of convex functions algorithm. The consistency and convergence rate for the estimated IDR are derived, and its small-sample performance is evaluated via simulation studies. We demonstrate that the proposed method outperforms competing approaches. Finally, we illustrate this method using data from a cohort study for Warfarin (an anti-thrombotic drug) dosing.

Journal ArticleDOI
TL;DR: In this paper, a new framework of structured matrix completion (SMC) is proposed to treat structured missingness by design, which aims at efficient matrix recovery when a subset of the rows and columns of an approximately low-rank matrix are observed.
Abstract: Matrix completion has attracted significant recent attention in many fields including statistics, applied mathematics, and electrical engineering. Current literature on matrix completion focuses primarily on independent sampling models under which the individual observed entries are sampled independently. Motivated by applications in genomic data integration, we propose a new framework of structured matrix completion (SMC) to treat structured missingness by design. Specifically, our proposed method aims at efficient matrix recovery when a subset of the rows and columns of an approximately low-rank matrix are observed. We provide theoretical justification for the proposed SMC method and derive lower bound for the estimation errors, which together establish the optimal rate of recovery over certain classes of approximately low-rank matrices. Simulation studies show that the method performs well in finite sample under a variety of configurations. The method is applied to integrate several ovarian can...

Journal ArticleDOI
TL;DR: This article considers the reduction of variance that can be achieved by exploiting control variates in this setting and applies whenever the gradient of both the log-likelihood and thelog-prior with respect to the parameters can be efficiently evaluated.
Abstract: Approximation of the model evidence is well known to be challenging. One promising approach is based on thermodynamic integration, but a key concern is that the thermodynamic integral can suffer from high variability in many applications. This article considers the reduction of variance that can be achieved by exploiting control variates in this setting. Our methodology applies whenever the gradient of both the log-likelihood and the log-prior with respect to the parameters can be efficiently evaluated. Results obtained on regression models and popular benchmark datasets demonstrate a significant and sometimes dramatic reduction in estimator variance and provide insight into the wider applicability of control variates to evidence estimation. Supplementary materials for this article are available online.

Journal ArticleDOI
TL;DR: An extension of LASSO, namely, prior LassO (pLASSO), is proposed, to incorporate that prior information into penalized generalized linear models and shows great robustness to the misspecification.
Abstract: LASSO is a popular statistical tool often used in conjunction with generalized linear models that can simultaneously select variables and estimate parameters. When there are many variables of interest, as in current biological and biomedical studies, the power of LASSO can be limited. Fortunately, so much biological and biomedical data have been collected and they may contain useful information about the importance of certain variables. This article proposes an extension of LASSO, namely, prior LASSO (pLASSO), to incorporate that prior information into penalized generalized linear models. The goal is achieved by adding in the LASSO criterion function an additional measure of the discrepancy between the prior information and the model. For linear regression, the whole solution path of the pLASSO estimator can be found with a procedure similar to the least angle regression (LARS). Asymptotic theories and simulation results show that pLASSO provides significant improvement over LASSO when the prior informati...

Journal ArticleDOI
TL;DR: A maximum marginal likelihood estimator is formulated, which can be computed with a quadratic cost using dynamic programming and is applicable to a wide range of models and offers appealing results in practice.
Abstract: This article studies the estimation of a stepwise signal. To determine the number and locations of change-points of the stepwise signal, we formulate a maximum marginal likelihood estimator, which can be computed with a quadratic cost using dynamic programming. We carry out an extensive investigation on the choice of the prior distribution and study the asymptotic properties of the maximum marginal likelihood estimator. We propose to treat each possible set of change-points equally and adopt an empirical Bayes approach to specify the prior distribution of segment parameters. A detailed simulation study is performed to compare the effectiveness of this method with other existing methods. We demonstrate our method on single-molecule enzyme reaction data and on DNA array comparative genomic hybridization (CGH) data. Our study shows that this method is applicable to a wide range of models and offers appealing results in practice. Supplementary materials for this article are available online.

Journal ArticleDOI
TL;DR: A novel pairwise sure independence screening method for linear discriminant analysis with an ultrahigh-dimensional predictor is proposed and it is proved that the proposed method is screening consistent.
Abstract: This article is concerned with the problem of feature screening for multiclass linear discriminant analysis under ultrahigh-dimensional setting. We allow the number of classes to be relatively large. As a result, the total number of relevant features is larger than usual. This makes the related classification problem much more challenging than the conventional one, where the number of classes is small (very often two). To solve the problem, we propose a novel pairwise sure independence screening method for linear discriminant analysis with an ultrahigh-dimensional predictor. The proposed procedure is directly applicable to the situation with many classes. We further prove that the proposed method is screening consistent. Simulation studies are conducted to assess the finite sample performance of the new procedure. We also demonstrate the proposed methodology via an empirical analysis of a real life example on handwritten Chinese character recognition.

Journal ArticleDOI
TL;DR: In this article, a dataset arising from a clinical trial involving multi-stage chemotherapy regimes for acute leukemia was used to evaluate the regimes, and mean overall survival time was expressed as a weighted average of the means of all possible sums of successive transitions times.
Abstract: We analyze a dataset arising from a clinical trial involving multi-stage chemotherapy regimes for acute leukemia. The trial design was a 2 × 2 factorial for frontline therapies only. Motivated by the idea that subsequent salvage treatments affect survival time, we model therapy as a dynamic treatment regime (DTR), that is, an alternating sequence of adaptive treatments or other actions and transition times between disease states. These sequences may vary substantially between patients, depending on how the regime plays out. To evaluate the regimes, mean overall survival time is expressed as a weighted average of the means of all possible sums of successive transitions times. We assume a Bayesian nonparametric survival regression model for each transition time, with a dependent Dirichlet process prior and Gaussian process base measure (DDP-GP). Posterior simulation is implemented by Markov chain Monte Carlo (MCMC) sampling. We provide general guidelines for constructing a prior using empirical Baye...

Journal ArticleDOI
TL;DR: In this paper, a penalized principal component (PPC) estimation procedure with an adaptive group fused LASSO was proposed to detect the multiple structural breaks in the models under some mild conditions, and the proposed method can correctly determine the unknown number of breaks and consistently estimate the common break dates.
Abstract: In this article, we consider estimation of common structural breaks in panel data models with unobservable interactive fixed effects. We introduce a penalized principal component (PPC) estimation procedure with an adaptive group fused LASSO to detect the multiple structural breaks in the models. Under some mild conditions, we show that with probability approaching one the proposed method can correctly determine the unknown number of breaks and consistently estimate the common break dates. Furthermore, we estimate the regression coefficients through the post-LASSO method and establish the asymptotic distribution theory for the resulting estimators. The developed methodology and theory are applicable to the case of dynamic panel data models. Simulation results demonstrate that the proposed method works well in finite samples with low false detection probability when there is no structural break and high probability of correctly estimating the break numbers when the structural breaks exist. We finall...

Journal ArticleDOI
TL;DR: In this article, a Markov chain Monte Carlo algorithm for posterior computation accommodating uncertainty in the predictors to be included is proposed, and the posterior distribution for the conditional probability achieves close to the parametric rate of contraction even in ultra high-dimensional settings.
Abstract: In many application areas, data are collected on a categorical response and high-dimensional categorical predictors, with the goals being to build a parsimonious model for classification while doing inferences on the important predictors. In settings such as genomics, there can be complex interactions among the predictors. By using a carefully structured Tucker factorization, we define a model that can characterize any conditional probability, while facilitating variable selection and modeling of higher-order interactions. Following a Bayesian approach, we propose a Markov chain Monte Carlo algorithm for posterior computation accommodating uncertainty in the predictors to be included. Under near low-rank assumptions, the posterior distribution for the conditional probability is shown to achieve close to the parametric rate of contraction even in ultra high-dimensional settings. The methods are illustrated using simulation examples and biomedical applications. Supplementary materials for this artic...

Journal ArticleDOI
TL;DR: In this paper, a nonparametric Bayesian joint model for multivariate continuous and categorical variables is presented, with the intention of developing a flexible engine for multiple imputation of missing values.
Abstract: We present a nonparametric Bayesian joint model for multivariate continuous and categorical variables, with the intention of developing a flexible engine for multiple imputation of missing values. The model fuses Dirichlet process mixtures of multinomial distributions for categorical variables with Dirichlet process mixtures of multivariate normal distributions for continuous variables. We incorporate dependence between the continuous and categorical variables by (1) modeling the means of the normal distributions as component-specific functions of the categorical variables and (2) forming distinct mixture components for the categorical and continuous data with probabilities that are linked via a hierarchical model. This structure allows the model to capture complex dependencies between the categorical and continuous data with minimal tuning by the analyst. We apply the model to impute missing values due to item nonresponse in an evaluation of the redesign of the Survey of Income and Program Partic...