scispace - formally typeset
Search or ask a question

Showing papers in "Annals of Statistics in 1999"


Journal ArticleDOI
TL;DR: The stochastic approximation EM (SAEM), which replaces the expectation step of the EM algorithm by one iteration of a stochastics approximation procedure, is introduced and it is proved that, under mild additional conditions, the attractive stationary points of the SAEM algorithm correspond to the local maxima of the function.
Abstract: The expectation-maximization (EM) algorithm is a powerful computational technique for locating maxima of functions. It is widely used in statistics for maximum likelihood or maximum a posteriori estimation in incomplete data models. In certain situations, however, this method is not applicable because the expectation step cannot be performed in closed form. To deal with these problems, a novel method is introduced, the stochastic approximation EM (SAEM), which replaces the expectation step of the EM algorithm by one iteration of a stochastic approximation procedure. The convergence of the SAEM algorithm is established under conditions that are applicable to many practical situations. Moreover, it is proved that, under mild additional conditions, the attractive stationary points of the SAEM algorithm correspond to the local maxima of the function. presented to support our findings.

795 citations


Journal ArticleDOI
TL;DR: An overcomplete collection of atoms called wedgelets, dyadically organized indicator functions with a variety of locations, scales, and orientations are developed, which provides nearly-optimal representations of objects in the Horizon model, as measured by minimax description length.
Abstract: We study a simple \Horizon Model" for the problem of recovering an image from noisy data; in this model the image has an edge with fi-Holder regularity. Adopting the viewpoint of computational harmonic analysis, we develop an overcomplete collection of atoms called wedgelets, dyadically organized indicator functions with a variety of locations, scales, and orientations. The wedgelet representation provides nearly-optimal representations of objects in the Horizon model, as measured by minimax description length. We show how to rapidly compute a wedgelet approximation to noisy data by flnding a special edgelet-decorated recursive partition which minimizes a complexity-penalized sum of squares. This estimate, using su-cient sub-pixel resolution, achieves nearly the minimax mean-squared error in the Horizon Model. In fact, the method is adaptive in the sense that it achieves nearly the minimax risk for any value of the unknown degree of regularity of the Horizon, 1• fi• 2. Wedgelet analysis and de-noising may be used successfully outside the Horizon model. We study images modelled as indicators of star-shaped sets with smooth bound- aries and show that complexity-penalized wedgelet partitioning achieves nearly the minimax risk in that setting also.

717 citations


Journal ArticleDOI
TL;DR: It is shown that such a one-step method can not be optimal when di erent coe cient functions admit di Erent degrees of smoothness, and this drawback can be repaired by using the proposed two-step estimation procedure.
Abstract: Varying coefficient models are a useful extension of classical linear models. They arise naturally when one wishes to examine how regression coefficients change over different groups characterized by certain covariates such as age. The appeal of these models is that the coef .cient functions can easily be estimated via a simple local regression.This yields a simple one-step estimation procedure. We show that such a one-step method cannot be optimal when different coefficient functions admit different degrees of smoothness. This drawback can be repaired by using our proposed two-step estimation procedure.The asymptotic mean-squared error for the two-step procedure is obtained and is shown to achieve the optimal rate of convergence. A few simulation studies show that the gain by the two-step procedure can be quite substantial.The methodology is illustrated by an application to an environmental data set.

643 citations


Journal ArticleDOI
TL;DR: In this paper, a data depth can be used to measure the "depth" or "outlyingness" of a given multivariate sample with respect to its underlying distribution, which leads to a natural center-outward ordering of the sample points.
Abstract: A data depth can be used to measure the “depth” or “outlyingness” of a given multivariate sample with respect to its underlying distribution. This leads to a natural center-outward ordering of the sample points. Based on this ordering, quantitative and graphical methods are introduced for analyzing multivariate distributional characteristics such as location, scale, bias, skewness and kurtosis, as well as for comparing inference methods. All graphs are one-dimensional curves in the plane and can be easily visualized and interpreted. A “sunburst plot” is presented as a bivariate generalization of the box-plot. DD-(depth versus depth) plots are proposed and examined as graphical inference tools. Some new diagnostic tools for checking multivariate normality are introduced. One of them monitors the exact rate of growth of the maximum deviation from the mean, while the others examine the ratio of the overall dispersion to the dispersion of a certain central region. The affine invariance property of a data depth also leads to appropriate invariance properties for the proposed statistics and methods.

630 citations


Journal ArticleDOI
TL;DR: Some general results determining minimax bounds on statistical risk for density estimation based on certain information-theoretic considerations are presented, which depend only on metric entropy conditions and are used to identify the minimax rates of convergence.
Abstract: We present some general results determining minimax bounds on statistical risk for density estimation based on certain information-theoretic considerations. These bounds depend only on metric entropy conditions and are used to identify the minimax rates of convergence.

624 citations


Journal ArticleDOI
TL;DR: In this article, the authors consider applications where it is appropriate to assume that the region $G$ has a smooth boundary or belongs to another nonparametric class of sets and show that these rules achieve optimal rates for estimation of the set and optimal rates of convergence for Bayes risks.
Abstract: Discriminant analysis for two data sets in $\mathbb{R}^d$ with probability densities $f$ and $g$ can be based on the estimation of the set $G = \{x: f(x) \geq g(x)\}$. We consider applications where it is appropriate to assume that the region $G$ has a smooth boundary or belongs to another nonparametric class of sets. In particular, this assumption makes sense if discrimination is used as a data analytic tool. Decision rules based on minimization of empirical risk over the whole class of sets and over sieves are considered. Their rates of convergence are obtained. We show that these rules achieve optimal rates for estimation of $G$ and optimal rates of convergence for Bayes risks. An interesting conclusion is that the optimal rates for Bayes risks can be very fast, in particular, faster than the “parametric” root-$n$ rate. These fast rates cannot be guaranteed for plug-in rules.

490 citations


Journal ArticleDOI
TL;DR: In this paper, the posterior probability of every Hellinger neighborhood of the true distribution tends to 1 almost surely, assuming that the prior does not put high mass near distributions with very rough densities.
Abstract: We give conditions that guarantee that the posterior probability of every Hellinger neighborhood of the true distribution tends to 1 almost surely. The conditions are (1) a requirement that the prior not put high mass near distributions with very rough densities and (2) a requirement that the prior put positive mass in Kullback-Leibler neighborhoods of the true distribution. The results are based on the idea of approximating the set of distributions with a finite-dimensional set of distributions with sufficiently small Hellinger bracketing metric entropy. We apply the results to some examples.

394 citations


Journal ArticleDOI
TL;DR: In this paper, a Dirichlet mixture of normal densities is used for a prior distribution on densities in the problem of Bayesian density estimation, and the important issue of consistency was left open.
Abstract: A Dirichlet mixture of normal densities is a useful choice for a prior distribution on densities in the problem of Bayesian density estimation. In the recent years, efficient Markov chain Monte Carlo method for the computation of the posterior distribution has been developed. The method has been applied to data arising from different fields of interest. The important issue of consistency was however left open. In this paper, we settle this issue in affirmative.

388 citations


Journal ArticleDOI
TL;DR: In this paper, the authors studied the structural properties of stationary variable length Markov chains (VLMCs) on a finite space and proposed a new bootstrap scheme based on fitted VLMCs.
Abstract: We study estimation in the class of stationary variable length Markov chains (VLMC) on a finite space. The processes in this class are still Markovian of high order, but with memory of variable length yielding a much bigger and structurally richer class of models than ordinary high-order Markov chains. From an algorithmic view, the VLMC model class has attracted interest in information theory and machine learning, but statistical properties have not yet been explored. Provided that good estimation is available, the additional structural richness of the model class enhances predictive power by finding a better trade-off between model bias and variance and allowing better structural description which can be of specific interest. The latter is exemplified with some DNA data. A version of the tree-structured context algorithm, proposed by Rissanen in an information theoretical set-up is shown to have new good asymptotic properties for estimation in the class of VLMCs. This remains true even when the underlying model increases in dimensionality. Furthermore, consistent estimation of minimal state spaces and mixing properties of fitted models are given. We also propose a new bootstrap scheme based on fitted VLMCs. We show its validity for quite general stationary categorical time series and for a broad range of statistical procedures.

369 citations


Journal ArticleDOI
TL;DR: In this paper, an adaptive wavelet estimator for nonparametric re-gression is proposed and the optimality of the procedure is investigated, based on an oracle inequality and motivated by the data compression and localization properties of wavelets.
Abstract: We study wavelet function estimation via the approach of block thresh- olding and ideal adaptation with oracle. Oracle inequalities are derived and serve as guides for the selection of smoothing parameters. Based on an oracle inequality and motivated by the data compression and localization properties of wavelets, an adaptive wavelet estimator for nonparametric re- gression is proposed and the optimality of the procedure is investigated. We show that the estimator achieves simultaneously three objectives: adaptiv- ity, spatial adaptivity and computational efficiency. Specifically, it is proved that the estimator attains the exact optimal rates of convergence over a range of Besov classes and the estimator achieves adaptive local minimax rate for estimating functions at a point. The estimator is easy to imple- ment, at the computational cost of On� . Simulation shows that the es- timator has excellent numerical performance relative to more traditional wavelet estimators. 1. Introduction. Wavelet methods have demonstrated considerable suc- cess in nonparametric function estimation in terms of spatial adaptivity, com- putational efficiency and asymptotic optimality. In contrast to the traditional linear procedures, wavelet methods achieve (near) optimal convergence rates over large function classes such as Besov classes and enjoy excellent mean squared error properties when used to estimate functions that are spatially inhomogeneous. For example, as shown by Donoho and Johnstone (1998), wavelet methods can outperform optimal linear methods, even at the level of convergence rate, over certain Besov classes. Standard wavelet methods achieve adaptivity through term-by-term thresholding of the empirical wavelet coefficients. There, each individual empirical wavelet coefficient is compared with a predetermined threshold. A wavelet coefficient is retained if its magnitude is above the threshold level and is discarded otherwise. A well-known example of term-by-term thresholding is Donoho and Johnstone's VisuShrink (Donoho and Johnstone (1994)). VisuShrink is spatially adaptive and the estimator is within a log- arithmic factor of the optimal convergence rate over a wide range of Besov classes. VisuShrink achieves a degree of tradeoff between variance and bias contributions to the mean squared error. However, the tradeoff is not optimal. VisuShrink reconstruction is often over-smoothed. Hall, Kerkyacharian and Picard (1999) considered block thresholding for wavelet function estimation which thresholds empirical wavelet coefficients in

366 citations


Journal ArticleDOI
TL;DR: In this paper, a detailed look at some of this evidence, looking into the sources of differences, is taken into account, showing that plug-in methods are heavily dependent on arbitrary specification of pilot bandwidths and fail when this specification is wrong.
Abstract: Bandwidth selection for procedures such as kernel density estimation and local regression have been widely studied over the past decade. Substantial “evidence” has been collected to establish superior performance of modern plug-in methods in comparison to methods such as cross validation; this has ranged from detailed analysis of rates of convergence, to simulations, to superior performance on real datasets. In this work we take a detailed look at some of this evidence, looking into the sources of differences. Our findings challenge the claimed superiority of plug-in methods on several fronts. First, plug-in methods are heavily dependent on arbitrary specification of pilot bandwidths and fail when this specification is wrong. Second, the often-quoted variability and undersmoothing of cross validation simply reflects the uncertainty of band-width selection; plug-in methods reflect this uncertainty by oversmoothing and missing important features when given difficult problems. Third, we look at asymptotic theory. Plug-in methods use available curvature information in an inefficient manner, resulting in inefficient estimates. Previous comparisons with classical approaches penalized the classical approaches for this inefficiency. Asymptotically, the plug-in based estimates are beaten by their own pilot estimates.

Journal ArticleDOI
TL;DR: In this article, the authors compare the asymptotic behavior of some common block bootstrap methods based on non-random as well as random block lengths and show that using overlapping blocks is to be preferred over nonoverlapping blocks and that using random block length typically leads to mean-squared errors larger than those for nonrandom block lengths.
Abstract: In this paper, we compare the asymptotic behavior of some common block bootstrap methods based on nonrandom as well as random block lengths. It is shown that, asymptotically, bootstrap estimators derived using any of the methods considered in the paper have the same amount of bias to the first order. However, the variances of these bootstrap estimators may be different even in the first order. Expansions for the bias, the variance and the mean-squared error of different block bootstrap variance estimators are obtained. It follows from these expansions that using overlapping blocks is to be preferred over nonoverlapping blocks and that using random block lengths typically leads to mean-squared errors larger than those for nonrandom block lengths.

Journal ArticleDOI
TL;DR: This article derived the asymptotic distribution of a new backfitting procedure for estimating the closest additive approximation to a nonparametric regression function, which employs a recent projection interpretation of popular kernel estimators provided by Mammen, Marron, Turlach and Wand.
Abstract: We derive the asymptotic distribution of a new backfitting procedure for estimating the closest additive approximation to a nonparametric regression function. The procedure employs a recent projection interpretation of popular kernel estimators provided by Mammen, Marron, Turlach and Wand and the asymptotic theory of our estimators is derived using the theory of additive projections reviewed in Bickel, Klaassen, Ritov and Wellner. Our procedure achieves the same bias and variance as the oracle estimator based on knowing the other components, and in this sense improves on the method analyzed in Opsomer and Ruppert. We provide ‘‘high level’’ conditions independent of the sampling scheme. We then verify that these conditions are satisfied in a regression and a time series autoregression under weak conditions.

Journal ArticleDOI
TL;DR: In this paper, the authors consider the partially linear model relating a response $Y$ to predictors ($X, T$) with mean function $X^{\top}\beta + g(T)$ when the $X$s are measured with additive error.
Abstract: We consider the partially linear model relating a response $Y$ to predictors ($X, T$) with mean function $X^{\top}\beta + g(T)$ when the $X$’s are measured with additive error. The semiparametric likelihood estimate of Severini and Staniswalis leads to biased estimates of both the parameter $\beta$ and the function $g(\cdot)$ when measurement error is ignored. We derive a simple modification of their estimator which is a semiparametric version of the usual parametric correction for attenuation. The resulting estimator of $\beta$ is shown to be consistent and its asymptotic distribution theory is derived. Consistent standard error estimates using sandwich-type ideas are also developed.

Journal ArticleDOI
TL;DR: In this paper, two semiparametric methods for accommodating departures from a Pareto model when estimating a tail exponent by fitting the model to extreme-value data are proposed.
Abstract: We suggest two semiparametric methods for accommodating departures from a Pareto model when estimating a tail exponent by fitting the model to extreme-value data. The methods are based on approximate likelihood and on least squares, respectively. The latter is somewhat simpler to use and more robust against departures from classical extreme-value approximations, but produces estimators with approximately 64% greater variance when conventional extreme-value approximations are appropriate. Relative to the conventional assumption that the sampling population has exactly a Pareto distribution beyond a threshold, our methods reduce bias by an order of magnitude without inflating the order of variance. They are motivated by data on extrema of community sizes and are illustrated by an application in that context.

Journal ArticleDOI
TL;DR: The electronic version of this article is the complete one and can be found online at: http://projecteuclid.org/eaclid/1017939249.
Abstract: © 1999 Institute of Mathematical Statistics. The electronic version of this article is the complete one and can be found online at: http://projecteuclid.org/euclid.aos/1017939249

Journal ArticleDOI
TL;DR: In this article, a class of tests useful for testing the goodness-of-fit of an autoregressive model is studied. But their tests are based on a set of empirical processes marked by certain residuals and are not asymptotically distribution free.
Abstract: This paper studies a class of tests useful for testing the goodness-of-fit of an autoregressive model. These tests are based on a class of empirical processes marked by certain residuals. The paper first gives their large sample behavior under null hypotheses. Then a martingale transformation of the underlying process is given that makes tests based on it asymptotically distribution free. Consistency of these tests is also discussed briefly.

Journal ArticleDOI
TL;DR: This paper addresses the problem of testing hypotheses using the likelihood ratio test statistic in nonidentifiable models, with application to model selection in situations where the parametrization for the larger model leads to nonidentifiability in the smaller model.
Abstract: In this paper, we address the problem of testing hypotheses using the likelihood ratio test statistic in nonidentifiable models, with application to model selection in situations where the parametrization for the larger model leads to nonidentifiability in the smaller model. We give two major applications: the case where the number of populations has to be tested in a mixture and the case of stationary ARMA$(p, q)$ processes where the order $(p, q)$ has to be tested. We give the asymptotic distribution for the likelihood ratio test statistic when testing the order of the model. In the case of order selection for ARMAs, the asymptotic distribution is invariant with respect to the parameters generating the process. A locally conic parametrization is a key tool in deriving the limiting distributions; it allows one to discover the deep similarity between the two problems.

Journal ArticleDOI
TL;DR: In this paper, the authors show that the posterior distribution of the parameter vector around the posterior mean of the posterior probability distribution of a variable is very close to the distribution around truth of the maximum likelihood estimate around truth.
Abstract: If there are many independent, identically distributed observations governed by a smooth, finite-dimensional statistical model, the Bayes estimate and the maximum likelihood estimate will be close. Furthermore, the posterior distribution of the parameter vector around the posterior mean will be close to the distribution of the maximum likelihood estimate around truth. Thus, Bayesian confidence sets have good frequentist coverage properties, and conversely. However, even for the simplest infinite-dimensional models, such results do not hold. The object here is to give some examples.

Journal ArticleDOI
TL;DR: In this article, a relaxed variant of the generalized resolution and minimum aberration criterion is proposed and studied, which minimizes the contamination of nonnegligible interactions on the estimation of main effects in the order of importance given by the hierarchical assumption.
Abstract: Deng and Tang proposed generalized resolution and minimum aberration criteria for comparing and assessing nonregular fractional factorials, of which Plackett–Burman designs are special cases.A relaxed variant of generalized aberration is proposed and studied in this paper.We show that a best design according to this criterion minimizes the contamination of nonnegligible interactions on the estimation of main effects in the order of importance given by the hierarchical assumption.The new criterion is defined through a set of $B$ values, a generalization of word length pattern. We derive some theoretical results that relate the $B$ values of a nonregular fractional factorial and those of its complementary design. Application of this theory to the construction of the best designs according to the new aberration criterion is discussed. The results in this paper generalize those in Tang and Wu, which characterize a minimum aberration (regular) $2^{m-k}$ design through its complementary design.

Journal ArticleDOI
TL;DR: In this paper, an estimator of the memory parameter of a stationary long-memory time series originally proposed by Robinson was presented, which is based on estimating the short-memory component of the spectral density of the process over all the frequency range.
Abstract: This paper discusses the properties of an estimator of the memory parameter of a stationary long-memory time-series originally proposed by Robinson. As opposed to ‘‘narrow-band’’ estimators of the memory parameter (such as the Geweke and Porter-Hudak or the Gaussian semiparametric estimators) which use only the periodogram ordinates belonging to an interval which degenerates to zero as the sample size $n$ increases, this estimator builds a model of the spectral density of the process over all the frequency range, hence the name, “broadband.” This is achieved by estimating the ‘‘short-memory’’ component of the spectral density, $f*(x) = |1 - e^{ix}|^{2d}f(x)$, where $d \epsilon (-1/2, 1/2)$ is the memory parameter and $f(x)$ is the spectral density, by means of a truncated Fourier series estimator of log $f*$. Assuming Gaussianity and additional conditions on the regularity of $f*$ which seem mild, we obtain expressions for the asymptotic bias and variance of the long-memory parameter estimator as a function of the truncation order. Under additional assumptions, we show that this estimator is consistent and asymptotically normal. If the true spectral density is sufficiently smooth outside the origin, this broadband estimator outperforms existing semiparametric estimators, attaining an asymptotic mean-square error $O(\log(n)/n)$ .

Journal ArticleDOI
TL;DR: In this paper, the authors characterize coherent design criteria which depend only on the dispersion matrix (assumed proper and nonsingular) of the state of nature, which may be a parameter vector or a set of future observables, and describe the associated decision problems.
Abstract: We characterize those coherent design criteria which depend only on the dispersion matrix (assumed proper and nonsingular) of the “state of nature,” which may be a parameter-vector or a set of future observables, and describe the associated decision problems. Connections are established with the classical approach to optimal design theory for the normal linear model, based on concave functions of the information matrix. Implications of the theory for more general models are also considered.

Journal ArticleDOI
TL;DR: In this article, a consistent test is proposed which is based on the difference of the least square variance estimator in the assumed regression model and a nonparametric estimator, and the corresponding test statistic can be shown to be asymptotically normal under the null hypothesis and under fixed alternatives with different rates of convergence corresponding to both cases.
Abstract: In this paper we study the problem of testing the functional form of a given regression model. A consistent test is proposed which is based on the difference of the least squares variance estimator in the assumed regression model and a nonparametric variance estimator. The corresponding test statistic can be shown to be asymptotically normal under the null hypothesis and under fixed alternatives with different rates of convergence corresponding to both cases. This provides a simple asymptotic test, where the asymptotic results can also be used for the calculation of the type II error of the procedure at any particular point of the alternative and for the construction of tests for precise hypotheses. Finally, the finite sample performance of the new test is investigated in a detailed simulation study, which also contains a comparison with the commonly used tests.

Journal ArticleDOI
TL;DR: In this paper, it was shown that the multivariate two-sample test based on the number of edges in the minimal spanning tree is asymptotically distribution-free.
Abstract: For independent $d$-variate random variables $X_1,\dots,X_m$ with common density $f$ and $Y_1,\dots,Y_n$ with common density $g$, let $R_{m,n}$ be the number of edges in the minimal spanning tree with vertices $X_1,\dots,X_m$, $Y_1,\dots,Y_n$ that connect points from different samples. Friedman and Rafsky conjectured that a test of $H_0: f = g$ that rejects $H_0$ for small values of $R_{m,n}$ should have power against general alternatives. We prove that $R_{m,n}$ is asymptotically distribution-free under $H_0$ , and that the multivariate two-sample test based on $R_{m,n}$ is universally consistent.

Journal ArticleDOI
TL;DR: The Partially Linear Additive Cox (PLAC) model as mentioned in this paper is an extension of the linear additive Cox model and allows flexible modeling of covariate effects semiparametrically.
Abstract: The partly linear additive Cox model is an extension of the (linear) Cox model and allows flexible modeling of covariate effects semiparametrically. We study asymptotic properties of the maximum partial likelihood estimator of this model with right-censored data using polynomial splines. We show that, with a range of choices of the smoothing parameter (the number of spline basis functions) required for estimation of the nonparametric components, the estimator of the finite-dimensional regression parameter is root-$n$ consistent, asymptotically normal and achieves the semiparametric information bound. Rates of convergence for the estimators of the nonparametric components are obtained. They are comparable to the rates in nonparametric regression. Implementation of the estimation approach can be done easily and is illustrated by using a simulated example.

Journal ArticleDOI
TL;DR: In this article, the authors use the expected Euler characteristic (EC) of the excursion set of a random field above a threshold to detect local shape changes. But they do not consider the behavior of the field near local extrema.
Abstract: This paper is motivated by the problem of detecting local changes or differences in shape between two samples of objects via the nonlinear deformations required to map each object to an atlas standard. Local shape changes are then detected by high values of the random field of Hotelling’s $T^2$ statistics for detecting a change in mean of the vector deformations at each point in the object. To control the null probability of detecting a local shape change, we use the recent result of Adler that the probability that a random field crosses a high threshold is very accurately approximated by the expected Euler characteristic (EC) of the excursion set of the random field above the threshold. We give an exact expression for the expected EC of a Hotelling’s $T^2$ field, and we study the behavior of the field near local extrema. This extends previous results for Gaussian random fields by Adler and $\chi^2$, $t$ and $F$ fields by Worsley and Cao. For illustration, these results are applied to the detection of differences in brain shape between a sample of 29 males and 23 females.

Journal ArticleDOI
TL;DR: In this article, strong consistency for maximum quasi-likelihood estimators of regression parameters in generalized linear regression models is studied and a sufficient condition for strong consistency to hold is that the ratio of the minimum eigenvalue of a regression parameter to the logarithm of the maximum eigenvalues goes to infinity.
Abstract: Strong consistency for maximum quasi-likelihood estimators of regression parameters in generalized linear regression models is studied. Results parallel to the elegant work of Lai, Robbins and Wei and Lai and Wei on least squares estimation under both fixed and adaptive designs are obtained. Let $y_1,\dots, y_n$ and $x_1,\dots, x_n$ be the observed responses and their corresponding design points ($p \times 1$ vectors), respectively. For fixed designs, it is shown that if the minimum eigenvalue of $\Sigma x_i x^\prime_i$ goes to infinity, then the maximum quasi-likelihood estimator for the regression parameter vector is strongly consistent. For adaptive designs, it is shown that a sufficient condition for strong consistency to hold is that the ratio of the minimum eigenvalue of $\Sigma x_i \x^\prime_i$ to the logarithm of the maximum eigenvalues goes to infinity. Use of the results for the adaptive design case in quantal response experiments is also discussed.

Journal ArticleDOI
TL;DR: In this article, the authors consider hierarchical mixtures-of-experts (HME) models where exponential family regression models with generalized linear mean functions of the form π(α + ρ(x)^T \mathbf{\beta})$ are mixed, and show that the HME probability density functions can approximate the true density, at a rate of O(m^{-2/s})$ in Hellinger distance and at the rate of $O(m −4/s)$ in Kullback-Leibler divergence, where
Abstract: We consider hierarchical mixtures-of-experts (HME) models where exponential family regression models with generalized linear mean functions of the form $\psi(\alpha + \mathbf{x}^T \mathbf{\beta})$ are mixed. Here $\psi(\cdot)$ is the inverse link function. Suppose the true response $y$ follows an exponential family regression model with mean function belonging to a class of smooth functions of the form $\psi(h(\mathbf{x}))$ where $h(\cdot)\in W_{2; K_0}^{\infty}$ (a Sobolev class over $[0, 1]^s$). It is shown that the HME probability density functions can approximate the true density, at a rate of $O(m^{-2/s})$ in Hellinger distance and at a rate of $O(m^{-4/s})$ in Kullback–Leibler divergence, where $m$ is the number of experts, and $s$ is the dimension of the predictor $x$. We also provide conditions under which the mean-square error of the estimated mean response obtained from the maximum likelihood method converges to zero, as the sample size and the number of experts both increase.

Journal ArticleDOI
TL;DR: In this paper, the dimension reduction approach of Li et al. is extended to settings which allow for censoring in the data and a key identity leading to the bias correction is derived and the root-n consistency of the modified estimate is established.
Abstract: Without parametric assumptions, high-dimensional regression analysis is already complex. This is made even harder when data are subject to censoring. In this article, we seek ways of reducing the dimensionality of the regressor before applying nonparametric smoothing techniques. If the censoring time is independent of the lifetime, then the method of sliced inverse regression can be applied directly. Otherwise, modification is needed to adjust for the censoring bias. A key identity leading to the bias correction is derived and the root-n consistency of the modified estimate is established. Patterns of censoring can also be studied under a similar dimension reduction framework. Some simulation results and an application to a real data set are reported. 1. Introduction. Survival data are often subject to censoring. When this occurs, the incompleteness of the observed data may induce a substantial bias in the sample. Several approaches have been suggested to overcome the associated difficulties in regression, including the accelerated failure time model, censored linear regression, the Cox proportional hazard model and many others. Survival analysis becomes even more intricate when the dimension of the regressor increases. To apply any of the aforementioned methods, users are required to specify a functional form which relates the outcome variables to the input ones. However, in reality, knowledge needed for an appropriate model specification is often inadequate. As a matter of fact, the acquisition of such information may well turn out to be one of the primary goals of the study itself. Under such circumstances, it seems preferable to have exploratory tools that rely less on such model specification. This is the issue to be addressed in this article. The dimension reduction approach of Li Ž. 1991 will be extended to settings which allow for censoring in the data. We shall offer methods of finding low-dimensional projections of the data for visually examining the censoring pattern. We shall show how censored regression data can still be analyzed without assuming the functional form a priori.

Journal ArticleDOI
TL;DR: This paper generalizes the results of Bickel, Ritov and Ryden to state space models, where the latent process is a continuous state Markov chain satisfying regularity conditions, which are fulfilled if the latentprocess takes values in a compact space.
Abstract: State space models is a very general class of time series models capable of modelling dependent observations in a natural and interpretable way. Inference in such models has been studied by Bickel, Ritov and Ryden, who consider hidden Markov models, which are special kinds of state space models, and prove that the maximum likelihood estimator is asymptotically normal under mild regularity conditions. In this paper we generalize the results of Bickel, Ritov and Ryden to state space models, where the latent process is a continuous state Markov chain satisfying regularity conditions, which are fulfilled if the latent process takes values in a compact space.