scispace - formally typeset
Search or ask a question

Showing papers in "Annals of Statistics in 1993"


Journal ArticleDOI
TL;DR: In this paper, it was shown that the least squares estimator of the threshold parameter is N consistent and its limiting distribution is related to a compound Poisson process, and the limiting distribution of the least square estimator is derived.
Abstract: It is shown that, under some regularity conditions, the least squares estimator of a stationary ergodic threshold autoregressive model is strongly consistent. The limiting distribution of the least squares estimator is derived. It is shown that the estimator of the threshold parameter is N consistent and its limiting distribution is related to a compound Poisson Process.

1,332 citations


Journal ArticleDOI
TL;DR: In this paper, the wild bootstrap method was used to fit Engel curves in expenditure data analysis, and it was shown that the standard way of bootstrapping this statistic fails.
Abstract: In general, there will be visible differences between a parametric and a nonparametric curve estimate. It is therefore quite natural to compare these in order to decide whether the parametric model could be justified. An asymptotic quantification is the distribution of the integrated squared difference between these curves. We show that the standard way of bootstrapping this statistic fails. We use and analyse a different form of bootstrapping for this task. We call this method the wild bootstrap and apply it to fitting Engel curves in expenditure data analysis.

1,229 citations


Journal ArticleDOI
TL;DR: In this article, a smooth version of local linear regression estimators is introduced and the MSE and MISE of the estimators are computed explicitly, and connections of the minimax risk with the modulus of continuity are made.
Abstract: In this paper we introduce a smooth version of local linear regression estimators and address their advantages. The MSE and MISE of the estimators are computed explicitly. It turns out that the local linear regression smoothers have nice sampling properties and high minimax efficiency-they are not only efficient in rates but also nearly efficient in constant factors. In the nonparametric regression context, the asymptotic minimax lower bound is developed via the heuristic of the "hardest onedimensional subproblem" of Donoho and Liu. Connections of the minimax risk with the modulus of continuity are made. The lower bound is also applicable for estimating conditional mean (regression) and conditional quantiles for both fixed and random design regression problems.

922 citations


Journal ArticleDOI
TL;DR: In this article, two bootstrap procedures are considered for the estimation of the distribution of linear contrasts and of F-test statistics in high dimensional linear models, where the dimension p of the model may increase for sample size $n\rightarrow\infty.
Abstract: In this paper two bootstrap procedures are considered for the estimation of the distribution of linear contrasts and of F-test statistics in high dimensional linear models. An asymptotic approach will be chosen where the dimension p of the model may increase for sample size $n\rightarrow\infty$. The range of validity will be compared for the normal approximation and for the bootstrap procedures. Furthermore, it will be argued that the rates of convergence are different for the bootstrap procedures in this asymptotic framework. This is in contrast to the usual asymptotic approach where p is fixed.

865 citations


Journal ArticleDOI
TL;DR: In this article, a simple empirical rule for selecting the bandwidth appropriate to single-index models is proposed, which is studied in a small simulation study and an application in binary response models.
Abstract: Single-index models generalize linear regression. They have applications to a variety of fields, such as discrete choice analysis in econometrics and dose response models in biometrics, where high-dimensional regression models are often employed. Single-index models are similar to the first step of projection pursuit regression, a dimension-reduction method. In both cases the orientation vector can be estimated root-n consistently, even if the unknown univariate function (or nonparametric link function) is assumed to come from a large smoothness class. However, as we show in the present paper, the similarities end there. In particular, the amount of smoothing necessary for root-n consistent orientation estimation is very different in the two cases. We suggest a simple, empirical rule for selecting the bandwidth appropriate to single-index models. This rule is studied in a small simulation study and an application in binary response models.

700 citations


Journal ArticleDOI
TL;DR: In this article, a hyper Markov law is defined as a probability distribution over a set of probability measures on a multivariate space that is concentrated on the set of Markov probabilities over some decomposable graph, and satisfies certain conditional independence restrictions related to that graph.
Abstract: This paper introduces and investigates the notion of a hyper Markov law, which is a probability distribution over the set of probability measures on a multivariate space that (i) is concentrated on the set of Markov probabilities over some decomposable graph, and (ii) satisfies certain conditional independence restrictions related to that graph. A stronger version of this hyper Markov property is also studied. Our analysis starts by reconsidering the properties of Markov probabilities, using an abstract approach which thereafter proves equally applicable to the hyper Markov case. Next, it is shown constructively that hyper Markov laws exist, that they appear as sampling distributions of maximum likelihood estimators in decomposable graphical models, and also that they form natural conjugate prior distributions for a Bayesian analysis of these models. As examples we construct a range of specific hyper Markov laws, including the hyper multinomial, hyper Dirichlet and the hyper Wishart and inverse Wishart laws. These laws occur naturally in connection with the analysis of decomposable log-linear and covariance selection models.

548 citations


Journal ArticleDOI
TL;DR: Two notions of multi-fold cross validation (MCV and MCV*) criteria are considered and it turns out that MCV indeed reduces the chance of overfitting.
Abstract: A natural extension of the simple leave-one-out cross validation (CV) method is to allow the deletion of more than one observations. In this article, several notions of the multifold cross validation (MCV) method have been discussed. In the context of variable selection under a linear regression model, we show that the delete-d MCV criterion is asymptotically equivalent to the well known FPE criterion. Two computationally more feasible methods, the r-fold cross validation and the repeated learning-testing criterion, are also studied. The performance of these criteria are compared with the simple leave-one-out cross validation method. Simulation results are obtained to gain some understanding on the small sample properties of these methods.

531 citations


Journal ArticleDOI
TL;DR: In this article, the shape of low-dimensional projections from high-dimensional data has been studied and it has been shown that for most directions, even the most nonlinear regression is still nearly linear.
Abstract: This paper studies the shapes of low dimensional projections from high dimensional data. After standardization, let $\mathbf{x}$ be a $p$-dimensional random variable with mean zero and identity covariance. For a projection $\beta'\mathbf{x}, \|\beta\| = 1$, find another direction $b$ so that the regression curve of $b'\mathbf{x}$ against $\beta'\mathbf{x}$ is as nonlinear as possible. We show that when the dimension of $\mathbf{x}$ is large, for most directions $\beta$ even the most nonlinear regression is still nearly linear. Our method depends on the construction of a pair of $p$-dimensional random variables, $\mathbf{w}_1, \mathbf{w}_2$, called the rotational twin, and its density function with respect to the standard normal density. With this, we are able to obtain closed form expressions for measuring deviation from normality and deviation from linearity in a suitable sense of average. As an interesting by-product, from a given set of data we can find simple unbiased estimates of $E(f_{\beta'\mathbf{x}}(t)/\phi_1(t) - 1)^2$ and $E\lbrack (\|E(\mathbf{x} \mid \beta, \beta'\mathbf{x} = t)\|^2 - t^2)f^2_{\beta'\mathbf{x}}(t)/\phi^2_1(t)\rbrack$, where $\phi_1$ is the standard normal density, $f_{\beta'\mathbf{x}}$ is the density for $\beta'\mathbf{x}$ and the $"E"$ is taken with respect to the uniformly distributed $\beta$. This is achieved without any smoothing and without resorting to any laborious projection procedures such as grand tours. Our result is related to the work of Diaconis and Freedman. The impact of our result on several fronts of data analysis is discussed. For example, it helps establish the validity of regression analysis when the link function of the regression model may be grossly wrong. A further generalization, which replaces $\beta'\mathbf{x}$ by $B'\mathbf{x}$ with $B = (\beta_1,\ldots, \beta_k)$ for $k$ randomly selected orthonormal vectors $(\beta_i, i = 1,\ldots, k)$, helps broaden the scope of application of sliced inverse regression (SIR).

394 citations


Journal ArticleDOI
TL;DR: In this article, the effect of errors in variables in nonparametric regression estimation is examined, and it is shown that the optimal local and global rates of convergence of these kernel estimators can be characterized by the tail behavior of the characteristic function of the error distribution.
Abstract: The effect of errors in variables in nonparametric regression estimation is examined. To account for errors in covariates, deconvolution is involved in the construction of a new class of kernel estimators. It is shown that optimal local and global rates of convergence of these kernel estimators can be characterized by the tail behavior of the characteristic function of the error distribution. In fact, there are two types of rates of convergence according to whether the error is ordinary smooth or super smooth. It is also shown that these results hold uniformly over a class of joint distributions of the response and the covariate, which is rich enough for many practical applications. Furthermore, to achieve optimality, we show that the convergence rates of all possible estimators have a lower bound possessed by the kernel estimators.

358 citations


Journal ArticleDOI
TL;DR: In this paper, it was shown that for any $F$-integrable function, the Kaplan-Meier estimator almost surely converges almost surely and in the mean.
Abstract: Let $X_1, X_2, \ldots$ be a sequence of i.i.d. random variables with d.f. $F$. We observe $Z_i = \min(X_i,Y_i)$ and $\delta_i = 1_{\{X_i \leq Y_i\}}$, where $Y_1, Y_2, \ldots$ is a sequence of i.i.d. censoring random variables. Denote by $\hat{F}_n$ the Kaplan-Meier estimator of $F$. We show that for any $F$-integrable function $\varphi, \int\varphi d\hat{F}_n$ converges almost surely and in the mean. The result may be applied to yield consistency of many estimators under random censorship.

286 citations


Journal ArticleDOI
TL;DR: In this article, large sample approximations are developed to establish asymptotic linearity of the commonly used linear rank estimating functions, defined as stochastic integrals of counting processes over the whole line, for censored regression data.
Abstract: Large sample approximations are developed to establish asymptotic linearity of the commonly used linear rank estimating functions, defined as stochastic integrals of counting processes over the whole line, for censored regression data. These approximations lead to asymptotic normality of the resulting rank estimators defined as solutions of the linear rank estimating equations. A second kind of approximations is also developed to show that the estimating functions can be uniformly approximated by certain more manageable nonrandom functions, resulting in a simple condition that guarantees consistency of the rank estimators. This condition is verified for the two-sample problem, thereby extending earlier results by Louis and Wei and Gail, as well as in the case when the underlying error distribution has increasing failure rate, which includes most parametric regression models in survival analysis. Techniques to handle the delicate tail fluctuations are provided and discussed in detail.

Journal ArticleDOI
TL;DR: In this paper, the authors show that smoothed empirical likelihood confidence intervals for quantiles have coverage error of order $n^{-1}$ and may be Bartlett-corrected to produce intervals with an error of error only $n−1/2$.
Abstract: Standard empirical likelihood confidence intervals for quantiles are identical to sign-test intervals They have relatively large coverage error, of size $n^{-1/2}$, even though they are two-sided intervals We show that smoothed empirical likelihood confidence intervals for quantiles have coverage error of order $n^{-1}$, and may be Bartlett-corrected to produce intervals with an error of order only $n^{-2}$ Necessary and sufficient conditions on the smoothing parameter, in order for these sizes of error to be attained, are derived The effects of smoothing on the positions of endpoints of the intervals are analysed, and shown to be only of second order

Journal ArticleDOI
TL;DR: In this article, the coefficients of phi and theta were estimated using the Whittle estimator, based on the sample periodogram of the X sequence, and it was shown that their estimators are consistent, obtain their asymptotic distributions, and converge to the true values faster than in the usual L2 case.
Abstract: : We consider a standard ARMA process of the form phi(B)Xt=Theta(B)Zt, where the innovations Zt belong to the domain of attraction of a stable law, so that neither the Zt nor the Xt have a finite variance. Our aim is to estimate the coefficients of phi and theta). Since maximum likelihood estimation is not a viable possibility (due to the unknown form of the marginal density of the innovation sequence) we adopt the so-called Whittle estimator, based on the sample periodogram of the X sequence. Despite the fact that the periodogram does not, a priori, seem like a logical object to study in this non-L' situation, we show that our estimators are consistent, obtain their asymptotic distributions, and show that they converge to the true values faster than in the usual L2 case.

Journal ArticleDOI
TL;DR: In this article, the convergence of the Hellinger distance was obtained under certain entropy conditions on the class of densities, including smooth, monotone and convolutional densities.
Abstract: Consider a class $\mathscr{P}={P_\theta:\theta\in\Theta}$ of probability measures on a measurable space $(\mathscr{X},\mathscr{A})$, dominated by a $\sigma$ -finite measure $\mu$. Let $f_\theta=dP_\theta/d_\mu$, $\theta\ in\Theta$, and let $\theta_n$ be a maximum likelihood estimator based on n independent observations from $P_{\theta_0}$, $\theta_0\in\Theta$. We use results from empirical process theory to obtain convergence for the Hellinger distance $h(f_{\hat{\theta}_n}, f_{\theta_0})$, under certain entropy conditions on the class of densities ${f_\theta:\theta\in\Theta}$ The examples we present are a model with interval censored observations, smooth densities, monotone densities and convolution models. In most examples, the convexity of the class of densities is of special importance.

Journal ArticleDOI
TL;DR: In this paper, the authors investigate conditions under which dilation occurs and study some of its implications in robust Bayesian inference and in the theory of upper and lower probabilities, and characterize dilation immune neighborhoods of the uniform measure.
Abstract: Suppose that a probability measure $P$ is known to lie in a set of probability measures $M$. Upper and lower bounds on the probability of any event may then be computed. Sometimes, the bounds on the probability of an event $A$ conditional on an event $B$ may strictly contain the bounds on the unconditional probability of $A$. Surprisingly, this might happen for every $B$ in a partition $\mathscr{B}$. If so, we say that dilation has occurred. In addition to being an interesting statistical curiosity, this counterintuitive phenomenon has important implications in robust Bayesian inference and in the theory of upper and lower probabilities. We investigate conditions under which dilation occurs and we study some of its implications. We characterize dilation immune neighborhoods of the uniform measure.

Journal ArticleDOI
TL;DR: In this paper, consistency is shown for the minimum covariance determinant (MCD) estimators of multivariate location and scale and asymptotic normality for the former.
Abstract: Consistency is shown for the minimum covariance determinant (MCD) estimators of multivariate location and scale and asymptotic normality is shown for the former. The proofs are made possible by showing a separating ellipsoid property for the MCD subset of observations. An analogous property is shown for the MCD subset computed from the population distribution.

Journal ArticleDOI
TL;DR: In this paper, the warped product of the shape space and the half-line of the Riemannian metric space was studied for the general case where k = 1, k = 2 and k = 3.
Abstract: The Riemannian metric structure of the shape space $\sum^k_m$ for $k$ labelled points in $\mathbb{R}^m$ was given by Kendall for the atypically simple situations in which $m = 1$ or 2 and $k \geq 2$. Here we deal with the general case $(m \geq 1, k \geq 2)$ by using the properties of Riemannian submersions and warped products as studied by O'Neill. The approach is via the associated size-and-shape space that is the warped product of the shape space and the half-line $\mathbb{R}_+$ (carrying size), the warping function being equal to the square of the size. When combined with parallel studies by Le of the corresponding global geodesic geometry, the results obtained here determine the environment in which shape-statistical calculations have to be acted out. Finally three different applications are discussed that illustrate the theory and its use in practice.

Journal ArticleDOI
TL;DR: The method of invariants as discussed by the authors is a technique in the field of molecular evolution for inferring phylogenetic relations among a number of species on the basis of nucleotide sequence data.
Abstract: The so-called method of invariants is a technique in the field of molecular evolution for inferring phylogenetic relations among a number of species on the basis of nucleotide sequence data. An invariant is a polynomial function of the probability distribution defined by a stochastic model for the observed nucleotide sequence. This function has the special property that it is identically zero for one possible phylogeny and typically nonzero for another possible phylogeny. Thus it is possible to discriminate statistically between two competing phylogenies using an estimate of the invariant. The advantage of this technique is that it enables such inferences to be made without the need for estimating nuisance parameters that are related to the specific mechanisms by which the molecular evolution occurs. For a wide class of models found in the literature, we present a simple algebraic formalism for recognising whether or not a function is an invariant and for generating all possible invariants. Our work is based on recognising an uderlying group structure and using discrete Fourier analysis.

Journal ArticleDOI
TL;DR: In this paper, the authors compute the asymptotic distribution of the maximum likelihood ratio test when they want to check whether the parameters of normal observations have changed at an unknown point and prove that the limit distribution is based on the largest deviation between a $d$-dimensional Ornstein-Uhlenbeck process and the origin.
Abstract: We compute the asymptotic distribution of the maximum likelihood ratio test when we want to check whether the parameters of normal observations have changed at an unknown point. The proof is based on the limit distribution of the largest deviation between a $d$-dimensional Ornstein-Uhlenbeck process and the origin.

Journal ArticleDOI
TL;DR: An axiomatic basis is developed for the relationship between conditional independence and graphical models in statistical analysis and unconditional independence relative to normal models can be axiomatized with a finite set of axioms.
Abstract: This article develops an axiomatic basis for the relationship between conditional independence and graphical models in statistical analysis. In particular, the following relationships are established: (1) every axiom for conditional independence is an axiom for graph separation, (2) every graph represents a consistent set of independence and dependence constraints, (3) all binary factorizations of strictly positive probability models can be encoded and determined in polynomial time using their correspondence to graph separation, (4) binary factorizations of non-strictly positive probability models can also be derived in polynomial time albeit less efficiently and (5) unconditional independence relative to normal models can be axiomatized with a finite set of axioms.

Journal ArticleDOI
TL;DR: It is shown that the frequentist coverage probability of a variety of (1 - alpha) posterior probability regions tends to be larger than 1 - alpha, but will be infinitely often less than any epsilon 0 as n approaches infinity with prior probability 1.
Abstract: : The observation model Y sub i = Beta(i/n) + epsilon sub i, 1 or = n, is considered, where the epsilon's are i.i.d. mean zero and variance sigma-sq and beta is an unknown smooth function. A Gaussian prior distribution is specified by assuming beta is the solution of a high order stochastic differential equation. The estimation error delta = beta - beta-average is analyzed, where beta-average is the posterior expectation of beta. Asymptotic posterior and sampling distributional approximations are given for (abs. val del)square when (abs. val)square is one of a family of norms natural to the problem. It is shown that the frequentist coverage probability of a variety of (1 - alpha) posterior probability regions tends to be larger than 1 - alpha, but will be infinitely often less than any epsilon 0 as n approaches infinity with prior probability 1. A related continuous time signal estimation problem is also studied. Keywords: Bayesian inference; Nonparametric regression; Confidence regions; Signal extraction: Smoothing splices.

Journal ArticleDOI
TL;DR: In this paper, kernel-type estimators of the locations of jump points and the corresponding sizes of jump values of the regression function are proposed and analyzed with almost sure results and limiting distributions.
Abstract: In the fixed-design nonparametric regression model, kernel-type estimators of the locations of jump points and the corresponding sizes of jump values of the regression function are proposed. These kernel-type estimators are analyzed with almost sure results and limiting distributions. Using the limiting distributions, we are able to test the number of jump points and give asymptotic confidence intervals for the sizes of jump values of the regression function. Simulation studies demonstrate that the asymptotic results hold for reasonable sample sizes.

Journal ArticleDOI
TL;DR: In this paper, the exponential weighted moving average (EWMA) procedure proposed by Roberts is compared with the Shiryayev-Roberts and CUSUM procedures, and the results show that the EWMA procedure is less efficient than the other two procedures.
Abstract: Pollak and Siegmund compared the Shiryayev-Roberts procedure with the CUSUM procedure for detecting a change in the drift of a Brownian motion based on the conditional average delay time. In this paper, the exponentially weighted moving average (EWMA) procedure proposed by Roberts is compared with the Shiryayev-Roberts and CUSUM procedures. The comparison is based on the stationary average delay time as advocated by Shiryayev. The optimal design for the EWMA procedure and its asymptotic properties are studied when the average in-control run length is large. The results show that the EWMA procedure is less efficient than the other two procedures.

Journal ArticleDOI
TL;DR: In this article, the authors generalize the empirical likelihood method to biased sample problems and show that it is possible to construct confidence intervals for the mean of the mean in full parametric models.
Abstract: It is well known that we can use the likelihood ratio statistic to test hypotheses and to construct confidence intervals in full parametric models. Recently, Owen introduced the empirical likelihood method in nonparametric models. In this paper, we generalize his results to biased sample problems. A Wilks theorem leading to a likelihood ratio confidence interval for the mean is given. Some extensions, discussion and simulations are presented.

Journal ArticleDOI
TL;DR: The authors proposed local nonparametric dependence functions which measure the strength of association between response variables and covariates over different regions of values for the covariate for the response variable and covariate.
Abstract: For experiments where the strength of association between a response variable $Y$ and a covariate $X$ is different over different regions of values for the covariate $X$, we propose local nonparametric dependence functions which measure the strength of association between $Y$ and $X$ as a function of $X = x$ Our dependence functions are extensions of Galton's idea of strength of co-relation from the bivariate normal case to the nonparametric case In particular, a dependence function is obtained by expressing the usual Galton-Pearson correlation coefficient in terms of the regression line slope $\beta$ and the residual variance $\sigma^2$ and then replacing $\beta$ and $\sigma^2$ by a nonparametric regression slope $\beta(x)$ and a nonparametric residual variance $\sigma^2(x) = \operatorname{var}(Y \mid x)$, respectively Our local dependence functions are standardized nonparametric regression curves which provide universal scale-free measures of the strength of the relationship between variables in nonlinear models They share most of the properties of the correlation coefficient and they reduce to the usual correlation coefficient in the bivariate normal case For this reason we call them correlation curves We show that, in a certain sense, they quantify Lehmann's notion of regression dependence Finally, the correlation curve concept is illustrated using data from a study of the relationship between cholesterol levels $x$ and triglyceride concentrations $y$ of heart patients

Journal ArticleDOI
TL;DR: In this paper, an almost sure representation of the nonparametric MLE of the marginal d.f. of the left-truncation model was derived with improved error bounds under weaker distributional assumptions.
Abstract: In the left-truncation model, one observes data $(X_i,Y_i)$ only when $Y_i\leq X_i$. Let F denote the marginal d.f. of $X_i$ , the variable of interest. The nonparametric MLE $\hat{F}_n$ of F aims at reconstructing F from truncated data. In this paper an almost sure representation of $\hat{F}_n$ is derived with improved error bounds on the one hand and under weaker distributional assumptions on the other hand.

Journal ArticleDOI
TL;DR: In this paper, it was shown that the Hampel-Rousseeuw least median of squares estimator satisfies an exact Holder condition of order 1/2 at models with normal errors.
Abstract: Section 1 of the paper contains a general discussion of robustness. In Section 2 the influence function of the Hampel-Rousseeuw least median of squares estimator is derived. Linearly invariant weak metrics are constructed in Section 3. It is shown in Section 4 that $S$-estimators satisfy an exact Holder condition of order 1/2 at models with normal errors. In Section 5 the breakdown points of the Hampel-Krasker dispersion and regression functionals are shown to be 0. The exact breakdown point of the Krasker-Welsch dispersion functional is obtained as well as bounds for the corresponding regression functional. Section 6 contains the construction of a linearly equivariant, high breakdown and locally Lipschitz dispersion functional for any design distribution. In Section 7 it is shown that there is no inherent contradiction between efficiency and a high breakdown point. Section 8 contains a linearly equivariant, high breakdown regression functional which is Lipschitz continuous at models with normal errors.

Journal ArticleDOI
TL;DR: In this paper, the manifest probabilities of a strictly unidimensional latent variable representation (one satisfying local independence and response curve monotonicity with respect to a uniddimensional latent variable) for binary response variables, such as those arising from the dichotomous scoring of items on standardized achievement and aptitude tests, were investigated.
Abstract: We consider two recent approaches to characterizing the manifest probabilities of a strictly unidimensional latent variable representation (one satisfying local independence and response curve monotonicity with respect to a unidimensional latent variable) for binary response variables, such as those arising from the dichotomous scoring of items on standardized achievement and aptitude tests. Holland and Rosenbaum showed that conditional association is a necessary condition for strict unidimensionality; and Stout treated the class of essentially unidimensional models, in which the latent variable may be consistently estimated as the length of the response sequence grows using the proportion of positive responses. Of particular concern are strictly unidimensional representations that are minimally useful in the sense that: (1) the latent variable can be consistently estimated from the responses; (2) the regression of proportion of positive responses on the latent variable is monotone; and (3) the latent variable is not constant in the population. We introduce two new conditions, a negative association condition and a natural monotonicity condition on the empirical response curves, that help link strict unidimensionality with the conditional association and essential unidimensionality approaches. These conditions are illustrated with a partial characterization of useful, strictly unidimensional representations.

Journal ArticleDOI
TL;DR: In this article, the authors considered self-consistent estimators for survival functions based on doubly censored data and established strong uniform consistency and asymptotic normality of the estimators under mild conditions on the distributions of the censoring variables.
Abstract: This paper concerns self-consistent estimators for survival functions based on doubly censored data. We establish strong uniform consistency, asymptotic normality and asymptotic efficiency of the estimators under mild conditions on the distributions of the censoring variables.

Journal ArticleDOI
TL;DR: In this paper, a class of penalized likelihood probability density estimators is proposed and studied, where the true log density is assumed to be a member of a reproducing kernel Hilbert space on a finite domain, not necessarily univariate.
Abstract: In this article, a class of penalized likelihood probability density estimators is proposed and studied. The true log density is assumed to be a member of a reproducing kernel Hilbert space on a finite domain, not necessarily univariate, and the estimator is defined as the unique unconstrained minimizer of a penalized log likelihood functional in such a space. Under mild conditions, the existence of the estimator and the rate of convergence of the estimator in terms of the symmetrized Kullback-Leibler distance are established. To make the procedure applicable, a semiparametric approximation of the estimator is presented, which sits in an adaptive finite dimensional function space and hence can be computed in principle. The theory is developed in a generic setup and the proofs are largely elementary. Algorithms are yet to follow.