scispace - formally typeset
Search or ask a question
Author

J. Van Ryzin

Bio: J. Van Ryzin is an academic researcher. The author has contributed to research in topics: Prior probability & Bayes' theorem. The author has an hindex of 7, co-authored 7 publications receiving 310 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: In this article, it was shown that a real-valued Borel-measurable function on R^m is a density estimator on the continuity set of a random variable.
Abstract: Let $X_1, X_2, \cdots, X_n$, be a sample of $n$ independent observations of a random variable $X$ with distribution $F(x) = F(x_1, \cdots, x_m)$ or $R^m$ and Lebesgue density $f(x) = f(x_1, \cdots, x_m)$. To estimate the density $f(x)$ consider estimates of the form \begin{equation*} \tag{1} f_n(x) = n^{-1} \sum^n_{j=1} K_n(x, X_j),\quad K_n(x, X_j) = h_n^{-m}K(h_n^{-1}(x - X_j));\end{equation*} where $K(u) = K(u_1, \cdots, u_m)$ is a real-valued Borel-measurable function on $R^m$ such that \begin{equation*} \tag{2} K(u) \text{is a density on}\quad R^m\end{equation*} \begin{equation*} \tag{3} \sup_{u\varepsilon R^m} K(u) 0,\quad n = 1, 2, \cdots;\quad \lim_{n\rightarrow\infty} h_n = 0 \text{and} \lim_{n\rightarrow\infty} nh_n^m = \infty.\end{equation*} Such density estimates have been shown to be weakly consistent (that is, $f_n(x) \rightarrow f(x)$ in probability as $n \rightarrow \infty$) on the continuity set, $C(f),$ of the density $f(x)$ by Parzen [4] for $m = 1$ and by Cacoullos [1] for $m > 1$. In Theorem 1, we state conditions under which strong consistency (that is, $f_n(x) \rightarrow f(x)$ with probability one as $n \rightarrow \infty$) of such estimates obtains. Theorem 2 gives conditions under which uniform (in $x$) strong consistency of the estimates (1) is valid. In this respect, our results are very similar in the case $m = 1$ to those of Nadaraya [4], although the method of proof and conditions imposed are different. Theorem 3 concerns the estimation of the unique mode of the density $f(x)$ when it exists.

104 citations

Journal ArticleDOI
TL;DR: In this article, a sequence of decision problems is considered where for each problem the observation has a probability density function of exponential type with parameter lambda where lambda is selected independently for each problems according to an unknown prior distribution G(lambda).
Abstract: : A sequence of decision problems is considered where for each problem the observation has a probability density function of exponential type with parameter lambda where lambda is selected independently for each problem according to an unknown prior distribution G(lambda). It is supposed that in each of the problems, one of two possible actions (e.g., 'accept' or 'reject') must be taken. Under various assumptions, reasonably sharp upper bounds are found for the rate at which the risk of the nth problem approaches the smallest possible risk for certain refinements of the standard empirical Bayes procedures. For suitably chosen procedures, under situations likely to occur in practice, rates faster than n to the power (-1 + epsilon) may be obtained for arbitrarily small epsilon > 0. Arbitrarily slow rates can occur in pathological situations. (Author)

79 citations

Journal ArticleDOI
TL;DR: In this article, a sequence of decision problems is considered where for each problem the observation has discrete probability function of the form p(x) = h(x), beta (lambda) lambda to the power x, x = 0,1,2,..., and where lambda is selected independently for each decision according to an unknown prior distribution G(lambda).
Abstract: : A sequence of decision problems is considered where for each problem the observation has discrete probability function of the form p(x) = h(x) beta (lambda) lambda to the power x, x = 0,1,2,..., and where lambda is selected independently for each problem according to an unknown prior distribution G(lambda). It is supposed that for each problem one of two possible actions (e.g., 'accept' or 'reject') must be selected. Under various assumptions about h(x) and G(lambda) the rate at which the risk of the nth problem approaches the smallest possible risk is determined for standard empirical Bayes procedures. It is shown that for most practical situations, the rate of convergence to 'optimality' will be at least as fast as L(n)/n where L(n) is a slowly varying function (e.g., log n). The rate cannot be faster than 1/n and this exact rate is achieved in some cases. Arbitrarily slow rates will occur in certain pathological situations. (Author)

40 citations

Journal ArticleDOI
TL;DR: In this article, the regret function of a sequential compound decision problem is defined as the average risk of the component problems in a sequence of statistical decision problems having identical generic structure and the risk of a best "simple" procedure based on knowing the proportion of component problems, in which the probability of positive numbers converging to zero uniformly in the space of parameter-valued sequences increases.
Abstract: Consideration of a sequence of statistical decision problems having identical generic structure constitutes a sequential compound decision problem. The risk of a sequential compound decision problem is defined as the average risk of the component problems. In the case where the component decisions are between two fully specified distributions $P_1$ and $P_2, P_1 eq P_2$, Samuel (Theorem 2 of [9]) gives a sequential decision function whose risk is bounded from above by the risk of a best "simple" procedure based on knowing the proportion of component problems in which $P_2$ is the governing distribution plus a sequence of positive numbers converging to zero uniformly in the space of parameter-valued sequences as the number of problems increases. Related results are abstracted by Hannan in [2] for the sequential compound decision problem where the parameter space in the component problem is finite. The decision procedures in both instances rely on the technique of "artificial randomization," which was introduced and effectively used by Hannan in [1] for sequential games in which player I's space is finite. In the game situation such randomization is necessary. However, in the compound decision problem such "artificial randomization" is not necessary as is shown in this paper. Specifically, we consider the case where each component problem consists of making one of $n$ decisions based on an observation from one of $m$ distributions. Theorems 4.1, 4.2, and 4.3 give upper bounds for the difference in the risks (the regret function) of certain sequential compound decision procedures and a best "simple" procedure which is Bayes against the empirical distribution on the component problem parameter space. None of the sequential procedures presented depend on "artificial randomization." The upper bounds in these three theorems are all of order $N^{-\frac{1}{2}}$ and are uniform in the parameter-valued sequences. All procedures depend at stage $k$ on substitution of estimates of the $k - 1$st (or $k$th) stage empirical distribution $p_{k-1}$ (or $p_k$) on the component parameter space into a Bayes solution of the component problem with respect to $p_{k-1}$ (or $p_k$). Theorem 4.1 (except in the case where the estimates are degenerate) and Theorem 4.3 when specialized to the compound testing case between $P_1$ and $P_2$ (Theorems 5.1 and 5.2) yield a threefold improvement of Samuel's results mentioned above by simultaneously eliminating the "artificial randomization," by improving the convergence rate of the upper bound of the regret function to $N^{-\frac{1}{2}}$, and by widening the class of estimates. Higher order uniform bounds on the regret function in the sequential compound testing problem are also given. The bounds in Theorems 5.3 and 5.4 (or Theorems 5.5 and 5.6) are respectively of $O((\log N)N^{-1})$ and $o(N^{-\frac{1}{2}})$ and are attained by imposing suitable continuity assumptions on the induced distribution of a certain function of the likelihood ratio of $P_1$ and $P_2$. Theorem 6.1 extends Theorems 4.1, 4.2, and 4.3 to the related "empirical Bayes" problem. Also lower bounds of equivalent or better order are given for all theorems. The next section introduces notation and preliminaries to be used in this paper and in the following paper [15].

38 citations

Journal ArticleDOI
TL;DR: In this article, the authors considered compound decision problems in which the component decisions are between two distinct completely specified distributions and obtained a convergence order of the bound in Theorem 1.0.
Abstract: 0. Summary. Simultaneous consideration of n statistical decision problems having identical generic structure constitutes a compound decision problem. The risk of a compound decision problem is defined as the average risk of the component problems. When the component decisions are between two fully specified distributions Po and Pi, Po $ Pi, Hannan and Robbins [2] give a decision function whose risk is uniformly close (for n large) to the risk of the best "simple" procedure based on knowing the proportion of component problems in which Pi is the governing distribution. This result was motivated by heuristic arguments and an example (component decisions between N(-1, 1) and N(1, 1)) given by Robbins [4]. In both papers, the decision functions for the component problems depended on data from all n problems. The present paper considers, as in Hannan and Robbins [2], compound decision problems in which the component decisions are between two distinct completely specified distributions. The decision functions considered are those of [2]. The improvement is in the sense that a convergence order of the bound is obtained in Theorem 1. Higher order bounds are attained in Theorems 2 and 3 under certain continuity assumptions on the induced distribution of a suitably chosen function of the likelihood ratio of the two distributions.

22 citations


Cited by
More filters
Book
01 Jan 2006
TL;DR: In this paper, the authors provide a comprehensive treatment of the problem of predicting individual sequences using expert advice, a general framework within which many related problems can be cast and discussed, such as repeated game playing, adaptive data compression, sequential investment in the stock market, sequential pattern analysis, and several other problems.
Abstract: This important text and reference for researchers and students in machine learning, game theory, statistics and information theory offers a comprehensive treatment of the problem of predicting individual sequences. Unlike standard statistical approaches to forecasting, prediction of individual sequences does not impose any probabilistic assumption on the data-generating mechanism. Yet, prediction algorithms can be constructed that work well for all possible sequences, in the sense that their performance is always nearly as good as the best forecasting strategy in a given reference class. The central theme is the model of prediction using expert advice, a general framework within which many related problems can be cast and discussed. Repeated game playing, adaptive data compression, sequential investment in the stock market, sequential pattern analysis, and several other problems are viewed as instances of the experts' framework and analyzed from a common nonstochastic standpoint that often reveals new and intriguing connections.

3,615 citations

Journal ArticleDOI
TL;DR: Applications of gradient estimation to pattern recognition are presented using clustering and intrinsic dimensionality problems, with the ultimate goal of providing further understanding of these problems in terms of density gradients.
Abstract: Nonparametric density gradient estimation using a generalized kernel approach is investigated. Conditions on the kernel functions are derived to guarantee asymptotic unbiasedness, consistency, and uniform consistency of the estimates. The results are generalized to obtain a simple mcan-shift estimate that can be extended in a k -nearest-neighbor approach. Applications of gradient estimation to pattern recognition are presented using clustering and intrinsic dimensionality problems, with the ultimate goal of providing further understanding of these problems in terms of density gradients.

3,125 citations

Book
16 Apr 2013
TL;DR: How to Construct Nonparametric Regression Estimates * Lower Bounds * Partitioning Estimates * Kernel Estimates * k-NN Estimates * Splitting the Sample * Cross Validation * Uniform Laws of Large Numbers
Abstract: Why is Nonparametric Regression Important? * How to Construct Nonparametric Regression Estimates * Lower Bounds * Partitioning Estimates * Kernel Estimates * k-NN Estimates * Splitting the Sample * Cross Validation * Uniform Laws of Large Numbers * Least Squares Estimates I: Consistency * Least Squares Estimates II: Rate of Convergence * Least Squares Estimates III: Complexity Regularization * Consistency of Data-Dependent Partitioning Estimates * Univariate Least Squares Spline Estimates * Multivariate Least Squares Spline Estimates * Neural Networks Estimates * Radial Basis Function Networks * Orthogonal Series Estimates * Advanced Techniques from Empirical Process Theory * Penalized Least Squares Estimates I: Consistency * Penalized Least Squares Estimates II: Rate of Convergence * Dimension Reduction Techniques * Strong Consistency of Local Averaging Estimates * Semi-Recursive Estimates * Recursive Estimates * Censored Observations * Dependent Observations

1,931 citations

Journal ArticleDOI
TL;DR: Both the probabilistic setting and the deterministic setting of the universal prediction problem are described with emphasis on the analogy and the differences between results in the two settings.
Abstract: This paper consists of an overview on universal prediction from an information-theoretic perspective. Special attention is given to the notion of probability assignment under the self-information loss function, which is directly related to the theory of universal data compression. Both the probabilistic setting and the deterministic setting of the universal prediction problem are described with emphasis on the analogy and the differences between results in the two settings.

519 citations

Journal ArticleDOI
TL;DR: In this article, the estimation of a density and its derivatives by the kernel method is considered, and uniform consistency properties over the whole real line are studied under certain conditions on the density and on the behavior of the window width which are necessary and sufficient for weak and strong uniform consistency of the estimate of the density derivatives.
Abstract: The estimation of a density and its derivatives by the kernel method is considered. Uniform consistency properties over the whole real line are studied. For suitable kernels and uniformly continuous densities it is shown that the conditions $h \rightarrow 0$ and $(nh)^{-1} \log n \rightarrow 0$ are sufficient for strong uniform consistency of the density estimate, where $n$ is the sample size and $h$ is the "window width." Under certain conditions on the kernel, conditions are found on the density and on the behavior of the window width which are necessary and sufficient for weak and strong uniform consistency of the estimate of the density derivatives. Theorems on the rate of strong and weak consistency are also proved.

362 citations