# Showing papers in "Journal of the American Statistical Association in 1995"

••

TL;DR: In this article, the authors proposed a smoothness adaptive thresholding procedure, called SureShrink, which is adaptive to the Stein unbiased estimate of risk (sure) for threshold estimates and is near minimax simultaneously over a whole interval of the Besov scale; the size of this interval depends on the choice of mother wavelet.

Abstract: We attempt to recover a function of unknown smoothness from noisy sampled data. We introduce a procedure, SureShrink, that suppresses noise by thresholding the empirical wavelet coefficients. The thresholding is adaptive: A threshold level is assigned to each dyadic resolution level by the principle of minimizing the Stein unbiased estimate of risk (Sure) for threshold estimates. The computational effort of the overall procedure is order N · log(N) as a function of the sample size N. SureShrink is smoothness adaptive: If the unknown function contains jumps, then the reconstruction (essentially) does also; if the unknown function has a smooth piece, then the reconstruction is (essentially) as smooth as the mother wavelet will allow. The procedure is in a sense optimally smoothness adaptive: It is near minimax simultaneously over a whole interval of the Besov scale; the size of this interval depends on the choice of mother wavelet. We know from a previous paper by the authors that traditional smoot...

4,699 citations

••

TL;DR: In this article, the use of instruments that explain little of the variation in the endogenous explanatory variables can lead to large inconsistencies in the IV estimates even if only a weak relationship exists between the instruments and the error in the structural equation.

Abstract: We draw attention to two problems associated with the use of instrumental variables (IV), the importance of which for empirical work has not been fully appreciated. First, the use of instruments that explain little of the variation in the endogenous explanatory variables can lead to large inconsistencies in the IV estimates even if only a weak relationship exists between the instruments and the error in the structural equation. Second, in finite samples, IV estimates are biased in the same direction as ordinary least squares (OLS) estimates. The magnitude of the bias of IV estimates approaches that of OLS estimates as the R 2 between the instruments and the endogenous explanatory variable approaches 0. To illustrate these problems, we reexamine the results of a recent paper by Angrist and Krueger, who used large samples from the U.S. Census to estimate wage equations in which quarter of birth is used as an instrument for educational attainment. We find evidence that, despite huge sample sizes, th...

4,219 citations

••

TL;DR: In this paper, the authors introduce sample path properties such as boundedness, continuity, and oscillations, as well as integrability, and absolute continuity of the path in the real line.

Abstract: Stable random variables on the real line Multivariate stable distributions Stable stochastic integrals Dependence structures of multivariate stable distributions Non-linear regression Complex stable stochastic integrals and harmonizable processes Self-similar processes Chentsov random fields Introduction to sample path properties Boundedness, continuity and oscillations Measurability, integrability and absolute continuity Boundedness and continuity via metric entropy Integral representation Historical notes and extensions.

2,611 citations

••

TL;DR: In this article, the authors describe and illustrate Bayesian inference in models for density estimation using mixtures of Dirichlet processes and show convergence results for a general class of normal mixture models.

Abstract: We describe and illustrate Bayesian inference in models for density estimation using mixtures of Dirichlet processes. These models provide natural settings for density estimation and are exemplified by special cases where data are modeled as a sample from mixtures of normal distributions. Efficient simulation methods are used to approximate various prior, posterior, and predictive distributions. This allows for direct inference on a variety of practical issues, including problems of local versus global smoothing, uncertainty about density estimates, assessment of modality, and the inference on the numbers of components. Also, convergence results are established for a general class of normal mixture models.

2,473 citations

••

TL;DR: This work exploits the fact that the marginal density can be expressed as the prior times the likelihood function over the posterior density, so that Bayes factors for model comparisons can be routinely computed as a by-product of the simulation.

Abstract: In the context of Bayes estimation via Gibbs sampling, with or without data augmentation, a simple approach is developed for computing the marginal density of the sample data (marginal likelihood) given parameter draws from the posterior distribution. Consequently, Bayes factors for model comparisons can be routinely computed as a by-product of the simulation. Hitherto, this calculation has proved extremely challenging. Our approach exploits the fact that the marginal density can be expressed as the prior times the likelihood function over the posterior density. This simple identity holds for any parameter value. An estimate of the posterior density is shown to be available if all complete conditional densities used in the Gibbs sampler have closed-form expressions. To improve accuracy, the posterior density is estimated at a high density point, and the numerical standard error of resulting estimate is derived. The ideas are applied to probit regression and finite mixture models.

1,954 citations

••

1,566 citations

••

TL;DR: In this article, the authors proposed a class of inverse probability of censoring weighted estimators for the parameters of models for the dependence of the mean of a vector of correlated response variables on the vector of explanatory variables in the presence of missing response data.

Abstract: We propose a class of inverse probability of censoring weighted estimators for the parameters of models for the dependence of the mean of a vector of correlated response variables on a vector of explanatory variables in the presence of missing response data. The proposed estimators do not require full specification of the likelihood. They can be viewed as an extension of generalized estimating equations estimators that allow for the data to be missing at random but not missing completely at random. These estimators can be used to correct for dependent censoring and nonrandom noncompliance in randomized clinical trials studying the effect of a treatment on the evolution over time of the mean of a response variable. The likelihood-based parametric G-computation algorithm estimator may also be used to attempt to correct for dependent censoring and nonrandom noncompliance. But because of possible model misspecification, the parametric G-computation algorithm estimator, in contrast with the proposed w...

1,510 citations

••

TL;DR: Methods that simultaneously model the data and the drop-out process within a unified model-based framework are discussed, and possible extensions outlined.

Abstract: Subjects often drop out of longitudinal studies prematurely, yielding unbalanced data with unequal numbers of measures for each subject. Modern software programs for handling unbalanced longitudinal data improve on methods that discard the incomplete cases by including all the data, but also yield biased inferences under plausible models for the drop-out process. This article discusses methods that simultaneously model the data and the drop-out process within a unified model-based framework. Models are classified into two broad classes—random-coefficient selection models and random-coefficient pattern-mixture models—depending on how the joint distribution of the data and drop-out mechanism is factored. Inference is likelihood-based, via maximum likelihood or Bayesian methods. A number of examples in the literature are placed in this framework, and possible extensions outlined. Data collection on the nature of the drop-out process is advocated to guide the choice of model. In cases where the drop-...

1,469 citations

••

TL;DR: It is shown that a deterministic relationship between the truncation lag and the sample size is dominated by data-dependent rules that take sample information into account, and methods based on sequential tests over those based on informat...

Abstract: We analyze the choice of the truncation lag in the context of the Said-Dickey test for the presence of a unit root in a general autoregressive moving average model. It is shown that a deterministic relationship between the truncation lag and the sample size is dominated by data-dependent rules that take sample information into account. In particular, we study data-dependent rules that are not constrained to satisfy the lower bound condition imposed by Said-Dickey. Akaike's information criterion falls into this category. The analytical properties of the truncation lag selected according to a class of information criteria are compared to those based on sequential testing for the significance of coefficients on additional lags. The asymptotic properties of the unit root test under various methods for selecting the truncation lag are analyzed, and simulations are used to show their distinctive behavior in finite samples. Our results favor methods based on sequential tests over those based on informat...

1,427 citations

••

TL;DR: In this article, the authors apply the Schwarz criterion to find an approximate solution to Bayesian testing problems, at least when the hypotheses are nested when the prior on ψ is Normal.

Abstract: To compute a Bayes factor for testing H 0: ψ = ψ0 in the presence of a nuisance parameter β, priors under the null and alternative hypotheses must be chosen As in Bayesian estimation, an important problem has been to define automatic, or “reference,” methods for determining priors based only on the structure of the model In this article we apply the heuristic device of taking the amount of information in the prior on ψ equal to the amount of information in a single observation Then, after transforming β to be “null orthogonal” to ψ, we take the marginal priors on β to be equal under the null and alternative hypotheses Doing so, and taking the prior on ψ to be Normal, we find that the log of the Bayes factor may be approximated by the Schwarz criterion with an error of order O p (n −½), rather than the usual error of order O p (1) This result suggests the Schwarz criterion should provide sensible approximate solutions to Bayesian testing problems, at least when the hypotheses are nested When

1,235 citations

••

TL;DR: This paper used two-stage least squares (TSLS) to estimate the average causal effect of variable treatments such as drug dosage, hours of exam preparation, cigarette smoking, and years of schooling.

Abstract: Two-stage least squares (TSLS) is widely used in econometrics to estimate parameters in systems of linear simultaneous equations and to solve problems of omitted-variables bias in single-equation estimation. We show here that TSLS can also be used to estimate the average causal effect of variable treatments such as drug dosage, hours of exam preparation, cigarette smoking, and years of schooling. The average causal effect in which we are interested is a conditional expectation of the difference between the outcomes of the treated and what these outcomes would have been in the absence of treatment. Given mild regularity assumptions, the probability limit of TSLS is a weighted average of per-unit average causal effects along the length of an appropriately defined causal response function. The weighting function is illustrated in an empirical example based on the relationship between schooling and earnings.

••

TL;DR: In this article, the authors consider the efficiency bound for the estimation of the parameters of semiparametric models defined solely by restrictions on the means of a vector of correlated outcomes, Y, when the data on Y are missing at random.

Abstract: We consider the efficiency bound for the estimation of the parameters of semiparametric models defined solely by restrictions on the means of a vector of correlated outcomes, Y, when the data on Y are missing at random. We show that the semiparametric variance bound is the asymptotic variance of the optimal estimator in a class of inverse probability of censoring weighted estimators and that this bound is unchanged if the data are missing completely at random. For this case we study the asymptotic performance of the generalized estimating equations (GEE) estimators of mean parameters and show that the optimal GEE estimator is inefficient except for special cases. The optimal weighted estimator depends on unknown population quantities. But for monotone missing data, we propose an adaptive estimator whose asymptotic variance can achieve the bound.

••

TL;DR: This work proposes MCMC methods distantly related to simulated annealing, which simulate realizations from a sequence of distributions, allowing the distribution being simulated to vary randomly over time.

Abstract: Markov chain Monte Carlo (MCMC; the Metropolis-Hastings algorithm) has been used for many statistical problems, including Bayesian inference, likelihood inference, and tests of significance. Though the method generally works well, doubts about convergence often remain. Here we propose MCMC methods distantly related to simulated annealing. Our samplers mix rapidly enough to be usable for problems in which other methods would require eons of computing time. They simulate realizations from a sequence of distributions, allowing the distribution being simulated to vary randomly over time. If the sequence of distributions is well chosen, then the sampler will mix well and produce accurate answers for all the distributions. Even when there is only one distribution of interest, these annealing-like samplers may be the only known way to get a rapidly mixing sampler. These methods are essential for attacking very hard problems, which arise in areas such as statistical genetics. We illustrate the methods wi...

••

TL;DR: In this paper, the authors apply the idea of plug-in bandwidth selection to develop strategies for choosing the smoothing parameter of local linear squares kernel estimators, which is applicable to odd-degree local polynomial fits and can be extended to other settings, such as derivative estimation and multiple nonparametric regression.

Abstract: Local least squares kernel regression provides an appealing solution to the nonparametric regression, or “scatterplot smoothing,” problem, as demonstrated by Fan, for example. The practical implementation of any scatterplot smoother is greatly enhanced by the availability of a reliable rule for automatic selection of the smoothing parameter. In this article we apply the ideas of plug-in bandwidth selection to develop strategies for choosing the smoothing parameter of local linear squares kernel estimators. Our results are applicable to odd-degree local polynomial fits and can be extended to other settings, such as derivative estimation and multiple nonparametric regression. An implementation in the important case of local linear fits with univariate predictors is shown to perform well in practice. A by-product of our work is the development of a class of nonparametric variance estimators, based on local least squares ideas, and plug-in rules for their implementation.

••

TL;DR: To evaluate the role of CD4 counts as a potential surrogate marker, it is necessary to better understand the relationship of clinical outcome to an individual's CD4 count history over time.

Abstract: A question that has received a great deal of attention in evaluating new treatments in acquired immune deficiency syndrome (AIDS) clinical trials is that of finding a good surrogate marker for clinical progression The identification of such a marker may be useful in assessing the efficacy of new therapies in a shorter period The number of CD4-lymphocyte counts has been proposed as such a potential marker for human immune virus (HIV) trials because of its observed correlation with clinical outcome But to evaluate the role of CD4 counts as a potential surrogate marker, we must better understand the relationship of clinical outcome to an individual's CD4 count history over time The Cox proportional hazards regression model is used to study the relationship between CD4 counts as a time-dependent covariate and survival Because the CD4 counts are measured only periodically and with substantial measurement error and biological variation, standard methods for estimating the parameters in the Cox mod

••

TL;DR: The method derives from observing that in general, a Bayes factor can be written as the product of a quantity called the Savage-Dickey density ratio and a correction factor; both terms are easily estimated from posterior simulation.

Abstract: We present a simple method for computing Bayes factors. The method derives from observing that in general, a Bayes factor can be written as the product of a quantity called the Savage-Dickey density ratio and a correction factor; both terms are easily estimated from posterior simulation. In some cases it is possible to do these computations without ever evaluating the likelihood.

••

TL;DR: In this paper, general methods for analyzing the convergence of discrete-time, general state-space Markov chains, such as those used in stochastic simulation algorithms including the Gibbs sampler, are provided.

Abstract: General methods are provided for analyzing the convergence of discrete-time, general state-space Markov chains, such as those used in stochastic simulation algorithms including the Gibbs sampler. The methods provide rigorous, a priori bounds on how long these simulations should be run to give satisfactory results. Results are applied to two models of the Gibbs sampler: a bivariate normal model, and a hierarchical Poisson model (with gamma conditionals). The methods use the notion of minorization conditions for Markov chains.

••

TL;DR: A rejuvenation procedure for improving the efficiency of sequential imputation is introduced and theoretically justified, and shows that the ideas of multiple imputations and flexible simulation techniques are as powerful in engineering as in survey sampling.

Abstract: The sequential imputation procedure is applied to adaptively and sequentially reconstruct discrete input signals that are blurred by an unknown linear moving average channel and contaminated by additive Gaussian noises, a problem known as blind deconvolution in digital communication. A rejuvenation procedure for improving the efficiency of sequential imputation is introduced and theoretically justified. The proposed method does not require the channel to be nonminimum phase and can be used in real time signal restoration. Two simulated systems are studied to illustrate the proposed method. Our result shows that the ideas of multiple imputations and flexible simulation techniques are as powerful in engineering as in survey sampling.

••

TL;DR: In this paper, the authors investigate the extension of the nonparametric regression technique of local polynomial fitting with a kernel weight to generalized linear models and quasi-likelihood contexts.

Abstract: We investigate the extension of the nonparametric regression technique of local polynomial fitting with a kernel weight to generalized linear models and quasi-likelihood contexts. In the ordinary regression case, local polynomial fitting has been seen to have several appealing features in terms of intuitive and mathematical simplicity. One noteworthy feature is the better performance near the boundaries compared to the traditional kernel regression estimators. These properties are shown to carry over to generalized linear model and quasi-likelihood settings. We also derive the asymptotic distributions of the proposed class of estimators that allow for straightforward interpretation and extensions of state-of-the-art bandwidth selection methods.

••

TL;DR: In this article, the authors propose a procedure that finds a collection of decision rules that best explain the behavior of experimental subjects, and apply their procedure to data on probabilistic updating by subjects in four different universities.

Abstract: Economists and psychologists have recently been developing new theories of decision making under uncertainty that can accommodate the observed violations of standard statistical decision theoretic axioms by experimental subjects. We propose a procedure that finds a collection of decision rules that best explain the behavior of experimental subjects. The procedure is a combination of maximum likelihood estimation of the rules together with an implicit classification of subjects to the various rules and a penalty for having too many rules. We apply our procedure to data on probabilistic updating by subjects in four different universities. We get remarkably robust results showing that the most important rules used by the subjects (in order of importance) are Bayes's rule, a representativeness rule (ignoring the prior), and, to a lesser extent, conservatism (overweighting the prior).

••

TL;DR: This article develops a full Bayesian foundation for this Gibbs sampling algorithm and presents extensions that permit relaxation of two important restrictions and presents a rank test for the assessment of the significance of multiple sequence alignment.

Abstract: A wealth of data concerning life's basic molecules, proteins and nucleic acids, has emerged from the biotechnology revolution. The human genome project has accelerated the growth of these data. Multiple observations of homologous protein or nucleic acid sequences from different organisms are often available. But because mutations and sequence errors misalign these data, multiple sequence alignment has become an essential and valuable tool for understanding structures and functions of these molecules. A recently developed Gibbs sampling algorithm has been applied with substantial advantage in this setting. In this article we develop a full Bayesian foundation for this algorithm and present extensions that permit relaxation of two important restrictions. We also present a rank test for the assessment of the significance of multiple sequence alignment. As an example, we study the set of dinucleotide binding proteins and predict binding segments for dozens of its members.

••

TL;DR: A Monte Carlo EM algorithm that uses a Markov chain sampling technique in the calculation of the expectation in the E step of the EM algorithm is discussed, and it is shown that under suitable regularity conditions, an MCEM algorithm will get close to a maximizer of the likelihood of the observed data.

Abstract: The observations in parameter-driven models for time series of counts are generated from latent unobservable processes that characterize the correlation structure. These models result in very complex likelihoods, and even the EM algorithm, which is usually well suited for problems of this type, involves high-dimensional integration. In this article we discuss a Monte Carlo EM (MCEM) algorithm that uses a Markov chain sampling technique in the calculation of the expectation in the E step of the EM algorithm. We propose a stopping criterion for the algorithm and provide rules for selecting the appropriate Monte Carlo sample size. We show that under suitable regularity conditions, an MCEM algorithm will, with high probability, get close to a maximizer of the likelihood of the observed data. We also discuss the asymptotic efficiency of the procedure. We illustrate our Monte Carlo estimation method on a time series involving small counts: the polio incidence time series previously analyzed by Zeger.

••

TL;DR: In this article, the authors provide theoretical support for SIMEX for measurement error models by establishing a strong relationship between SIMEX estimation and jackknife estimation and using the Framingham Heart Study data to illustrate the variance estimation procedure.

Abstract: This article provides theoretical support for our simulation-based estimation procedure, SIMEX, for measurement error models. We do so by establishing a strong relationship between SIMEX estimation and jackknife estimation. A result of our investigation is the identification of a variance estimation method for SIMEX that parallels jackknife variance estimation. Data from the Framingham Heart Study are used to illustrate the variance estimation procedure in logistic regression measurement error models.

••

••

TL;DR: In this paper, the authors proposed semiparametric procedures to make inferences for median regression models with possibly censored observations using simulated annealing algorithm, which can be implemented efficiently using a simulated annesaling algorithm.

Abstract: The median is a simple and meaningful measure for the center of a long-tailed survival distribution. To examine the covariate effects on survival, a natural alternative to the usual mean regression model is to regress the median of the failure time variable or a transformation thereof on the covariates. In this article we propose semiparametric procedures to make inferences for such median regression models with possibly censored observations. Our proposals can be implemented efficiently using a simulated annealing algorithm. Numerical studies are conducted to show the advantage of the new procedures over some recently developed methods for the accelerated failure time model, a special type of mean regression models in the survival analysis. The proposals discussed in the article are illustrated with a lung cancer data set.

••

TL;DR: In this article, the authors proposed that reading correspondence analysis in the social sciences is a good habit; they can develop this habit to such an interesting way that it will not only make you have any favourite activity but also it will be one of guidance of your life.

Abstract: Will reading habit influence your life? Many say yes. Reading correspondence analysis in the social sciences is a good habit; you can develop this habit to be such interesting way. Yeah, reading habit will not only make you have any favourite activity. It will be one of guidance of your life. When reading has become a habit, you will not make it as disturbing activities or as boring activity. You can gain many benefits and importances of reading.

••

TL;DR: In this paper, the use of Markov chain splitting, originally developed for the theoretical analysis of general state-space Markov chains, was introduced into regenerative methods for analyzing the output of these samplers.

Abstract: Markov chain sampling has recently received considerable attention, in particular in the context of Bayesian computation and maximum likelihood estimation. This article discusses the use of Markov chain splitting, originally developed for the theoretical analysis of general state-space Markov chains, to introduce regeneration into Markov chain samplers. This allows the use of regenerative methods for analyzing the output of these samplers and can provide a useful diagnostic of sampler performance. The approach is applied to several samplers, including certain Metropolis samplers that can be used on their own or in hybrid samplers, and is illustrated in several examples.

••

TL;DR: In this paper, the authors use the concept of data depth to introduce several new control charts for monitoring processes of multivariate quality measurements, which can be visualized and interpreted just as easily as the well-known univariate X, X, and CUSUM charts.

Abstract: This article uses the concept of data depth to introduce several new control charts for monitoring processes of multivariate quality measurements. For any dimension of the measurements, these charts are in the form of two-dimensional graphs that can be visualized and interpreted just as easily as the well-known univariate X, X, and CUSUM charts. Moreover, they have several significant advantages. First, they can detect simultaneously the location shift and scale increase of the process, unlike the existing methods, which can detect only the location shift. Second, their construction is completely nonparametric; in particular, it does not require the assumption of normality for the quality distribution, which is needed in standard approaches such as the χ2 and Hotelling's T 2 charts. Thus these new charts generalize the principle of control charts to multivariate settings and apply to a much broader class of quality distributions.

••

••

TL;DR: In this paper, the Stahel-Donoho estimators (t, V) of multivariate location and scatter are defined as a weighted mean and a weighted covariance matrix with weights of the form w(r), where w is a weight function and r is a measure of "outlyingness", obtained by considering all univariate projections of the data.

Abstract: The Stahel-Donoho estimators (t, V) of multivariate location and scatter are defined as a weighted mean and a weighted covariance matrix with weights of the form w(r), where w is a weight function and r is a measure of “outlyingness,” obtained by considering all univariate projections of the data. It has a high breakdown point for all dimensions and order √n consistency. The asymptotic bias of V for point mass contamination for suitable weight functions is compared with that of Rousseeuw's minimum volume ellipsoid (MVE) estimator. A simulation shows that for a suitable w, t and V exhibit high efficiency for both normal and Cauchy distributions and are better than their competitors for normal data with point-mass contamination. The performances of the estimators for detecting outliers are compared for both a real and a synthetic data set.