Author

# Herman Chernoff

Other affiliations: Massachusetts Institute of Technology, University of California, Stanford University

Bio: Herman Chernoff is an academic researcher from Harvard University. The author has contributed to research in topic(s): Decision theory & Likelihood-ratio test. The author has an hindex of 36, co-authored 88 publication(s) receiving 12277 citation(s). Previous affiliations of Herman Chernoff include Massachusetts Institute of Technology & University of California.

##### Papers published on a yearly basis

##### Papers

More filters

••

TL;DR: In this paper, it was shown that the likelihood ratio test for fixed sample size can be reduced to this form, and that for large samples, a sample of size $n$ with the first test will give about the same probabilities of error as a sample with the second test.

Abstract: In many cases an optimum or computationally convenient test of a simple hypothesis $H_0$ against a simple alternative $H_1$ may be given in the following form. Reject $H_0$ if $S_n = \sum^n_{j=1} X_j \leqq k,$ where $X_1, X_2, \cdots, X_n$ are $n$ independent observations of a chance variable $X$ whose distribution depends on the true hypothesis and where $k$ is some appropriate number. In particular the likelihood ratio test for fixed sample size can be reduced to this form. It is shown that with each test of the above form there is associated an index $\rho$. If $\rho_1$ and $\rho_2$ are the indices corresponding to two alternative tests $e = \log \rho_1/\log \rho_2$ measures the relative efficiency of these tests in the following sense. For large samples, a sample of size $n$ with the first test will give about the same probabilities of error as a sample of size $en$ with the second test. To obtain the above result, use is made of the fact that $P(S_n \leqq na)$ behaves roughly like $m^n$ where $m$ is the minimum value assumed by the moment generating function of $X - a$. It is shown that if $H_0$ and $H_1$ specify probability distributions of $X$ which are very close to each other, one may approximate $\rho$ by assuming that $X$ is normally distributed.

3,527 citations

••

TL;DR: Every multivariate observation is visualized as a computer-drawn face that makes it easy for the human mind to grasp many of the essential regularities and irregularities present in the data.

Abstract: A novel method of representing multivariate data is presented. Each point in k-dimensional space, k≤18, is represented by a cartoon of a face whose features, such as length of nose and curvature of mouth, correspond to components of the point. Thus every multivariate observation is visualized as a computer-drawn face. This presentation makes it easy for the human mind to grasp many of the essential regularities and irregularities present in the data. Other graphical representations are described briefly.

1,313 citations

••

TL;DR: In this paper, the asymptotic distribution of the likelihood ratio λ is examined when the value of the parameter is a boundary point of both the set of points corresponding to the hypothesis and the set corresponding to an alternative.

Abstract: A classical result due to Wilks [1] on the distribution of the likelihood ratio $\lambda$ is the following. Under suitable regularity conditions, if the hypothesis that a parameter $\theta$ lies on an $r$-dimensional hyperplane of $k$-dimensional space is true, the distribution of $-2 \log \lambda$ is asymptotically that of $\chi^2$ with $k - r$ degrees of freedom. In many important problems it is desired to test hypotheses which are not quite of the above type. For example, one may wish to test whether $\theta$ is on one side of a hyperplane, or to test whether $\theta$ is in the positive quadrant of a two-dimensional space. The asymptotic distribution of $-2 \log \lambda$ is examined when the value of the parameter is a boundary point of both the set of $\theta$ corresponding to the hypothesis and the set of $\theta$ corresponding to the alternative. First the case of a single observation from a multivariate normal distribution, with mean $\theta$ and known covariance matrix, is treated. The general case is then shown to reduce to this special case where the covariance matrix is replaced by the inverse of the information matrix. In particular, if one tests whether $\theta$ is on one side or the other of a smooth $(k - 1)$-dimensional surface in $k$-dimensional space and $\theta$ lies on the surface, the asymptotic distribution of $\lambda$ is that of a chance variable which is zero half the time and which behaves like $\chi^2$ with one degree of freedom the other half of the time.

703 citations

••

TL;DR: In this article, it was shown that locally optimal designs for large numbers of experiments can be approximated by selecting a certain set of randomized experiments and by repeating each of these randomized experiments in certain specified proportions.

Abstract: It is desired to estimate $s$ parameters $\theta_1, \theta_2, \cdots, \theta_s.$ There is available a set of experiments which may be performed. The probability distribution of the data obtained from any of these experiments may depend on $\theta_1, \theta_2, \cdots, \theta_k, k \geqq s.$ One is permitted to select a design consisting of $n$ of these experiments to be performed independently. The repetition of experiments is permitted in the design. We shall show that, under mild conditions, locally optimal designs for large $n$ may be approximated by selecting a certain set of $r \leqq k + (k - 1) + \cdots + (k - s + 1)$ of the experiments available and by repeating each of these $r$ experiments in certain specified proportions. Examples are given illustrating how this result simplifies considerably the problem of obtaining optimal designs. The criterion of optimality that is employed is one that involves the use of Fisher's information matrix. For the case where it is desired to estimate one of the $k$ parameters, this criterion corresponds to minimizing the variance of the asymptotic distribution of the maximum likelihood estimate of that parameter. The result of this paper constitutes a generalization of a result of Elfving [1]. As in Elfving's paper, the results extend to the case where the cost depends on the experiment and the amount of money to be allocated on experimentation is determined instead of the sample size.

597 citations

••

TL;DR: In this article, it was shown that the test statistic does not have a limiting χ2-distribution, but that it is stochastically larger than would be expected under the χ 2 theory.

Abstract: The usual test that a sample comes from a distribution of given form is performed by counting the number of observations falling into specified cells and applying the χ2 test to these frequencies. In estimating the parameters for this test, one may use the maximum likelihood (or equivalent) estimate based (1) on the cell frequencies, or (2) on the original observations. This paper shows that in (2), unlike the well known result for (1), the test statistic does not have a limiting χ2-distribution, but that it is stochastically larger than would be expected under the χ2 theory. The limiting distribution is obtained and some examples are computed. These indicate that the error is not serious in the case of fitting a Poisson distribution, but may be so for the fitting of a normal.

540 citations

##### Cited by

More filters

••

TL;DR: The analysis of censored failure times is considered in this paper, where the hazard function is taken to be a function of the explanatory variables and unknown regression coefficients multiplied by an arbitrary and unknown function of time.

Abstract: The analysis of censored failure times is considered. It is assumed that on each individual arc available values of one or more explanatory variables. The hazard function (age-specific failure rate) is taken to be a function of the explanatory variables and unknown regression coefficients multiplied by an arbitrary and unknown function of time. A conditional likelihood is obtained, leading to inferences about the unknown regression coefficients. Some generalizations are outlined.

28,225 citations

•

TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.

Abstract: We present a new technique called “t-SNE” that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map. The technique is a variation of Stochastic Neighbor Embedding (Hinton and Roweis, 2002) that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map. t-SNE is better than existing techniques at creating a single map that reveals structure at many different scales. This is particularly important for high-dimensional data that lie on several different, but related, low-dimensional manifolds, such as images of objects from multiple classes seen from multiple viewpoints. For visualizing the structure of very large datasets, we show how t-SNE can use random walks on neighborhood graphs to allow the implicit structure of all of the data to influence the way in which a subset of the data is displayed. We illustrate the performance of t-SNE on a wide variety of datasets and compare it with many other non-parametric visualization techniques, including Sammon mapping, Isomap, and Locally Linear Embedding. The visualizations produced by t-SNE are significantly better than those produced by the other techniques on almost all of the datasets.

22,120 citations

•

21 Mar 2002

TL;DR: An essential textbook for any student or researcher in biology needing to design experiments, sample programs or analyse the resulting data is as discussed by the authors, covering both classical and Bayesian philosophies, before advancing to the analysis of linear and generalized linear models Topics covered include linear and logistic regression, simple and complex ANOVA models (for factorial, nested, block, split-plot and repeated measures and covariance designs), and log-linear models Multivariate techniques, including classification and ordination, are then introduced.

Abstract: An essential textbook for any student or researcher in biology needing to design experiments, sample programs or analyse the resulting data The text begins with a revision of estimation and hypothesis testing methods, covering both classical and Bayesian philosophies, before advancing to the analysis of linear and generalized linear models Topics covered include linear and logistic regression, simple and complex ANOVA models (for factorial, nested, block, split-plot and repeated measures and covariance designs), and log-linear models Multivariate techniques, including classification and ordination, are then introduced Special emphasis is placed on checking assumptions, exploratory data analysis and presentation of results The main analyses are illustrated with many examples from published papers and there is an extensive reference list to both the statistical and biological literature The book is supported by a website that provides all data sets, questions for each chapter and links to software

9,098 citations