We present a new technique called “t-SNE” that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map. The technique is a variation of Stochastic Neighbor Embedding (Hinton and Roweis, 2002) that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map. t-SNE is better than existing techniques at creating a single map that reveals structure at many different scales. This is particularly important for high-dimensional data that lie on several different, but related, low-dimensional manifolds, such as images of objects from multiple classes seen from multiple viewpoints. For visualizing the structure of very large datasets, we show how t-SNE can use random walks on neighborhood graphs to allow the implicit structure of all of the data to influence the way in which a subset of the data is displayed. We illustrate the performance of t-SNE on a wide variety of datasets and compare it with many other non-parametric visualization techniques, including Sammon mapping, Isomap, and Locally Linear Embedding. The visualizations produced by t-SNE are significantly better than those produced by the other techniques on almost all of the datasets.

/pdf/visualizing-data-using-t-sne-2w1n304iq0.pdf

Visualizing Data using t-SNE

The analysis of censored failure times is considered. It is assumed that on each individual arc available values of one or more explanatory variables. The hazard function (age-specific failure rate) is taken to be a function of the explanatory variables and unknown regression coefficients multiplied by an arbitrary and unknown function of time. A conditional likelihood is obtained, leading to inferences about the unknown regression coefficients. Some generalizations are outlined.

https://rss.onlinelibrary.wiley.com/doi/pdf/10.1111/j.2517-6161.1972.tb00899.x

Regression Models and Life-Tables

Generalized Additive Models.

An essential textbook for any student or researcher in biology needing to design experiments, sample programs or analyse the resulting data The text begins with a revision of estimation and hypothesis testing methods, covering both classical and Bayesian philosophies, before advancing to the analysis of linear and generalized linear models Topics covered include linear and logistic regression, simple and complex ANOVA models (for factorial, nested, block, split-plot and repeated measures and covariance designs), and log-linear models Multivariate techniques, including classification and ordination, are then introduced Special emphasis is placed on checking assumptions, exploratory data analysis and presentation of results The main analyses are illustrated with many examples from published papers and there is an extensive reference list to both the statistical and biological literature The book is supported by a website that provides all data sets, questions for each chapter and links to software

/pdf/experimental-design-and-data-analysis-for-biologists-8ybva8d53x.pdf

Experimental Design and Data Analysis for Biologists

Wireless Communications

In many cases an optimum or computationally convenient test of a simple hypothesis $H_0$ against a simple alternative $H_1$ may be given in the following form. Reject $H_0$ if $S_n = \sum^n_{j=1} X_j \leqq k,$ where $X_1, X_2, \cdots, X_n$ are $n$ independent observations of a chance variable $X$ whose distribution depends on the true hypothesis and where $k$ is some appropriate number. In particular the likelihood ratio test for fixed sample size can be reduced to this form. It is shown that with each test of the above form there is associated an index $\rho$. If $\rho_1$ and $\rho_2$ are the indices corresponding to two alternative tests $e = \log \rho_1/\log \rho_2$ measures the relative efficiency of these tests in the following sense. For large samples, a sample of size $n$ with the first test will give about the same probabilities of error as a sample of size $en$ with the second test. To obtain the above result, use is made of the fact that $P(S_n \leqq na)$ behaves roughly like $m^n$ where $m$ is the minimum value assumed by the moment generating function of $X - a$. It is shown that if $H_0$ and $H_1$ specify probability distributions of $X$ which are very close to each other, one may approximate $\rho$ by assuming that $X$ is normally distributed.

https://projecteuclid.org/download/pdf_1/euclid.aoms/1177729330

A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations

A novel method of representing multivariate data is presented. Each point in k-dimensional space, k≤18, is represented by a cartoon of a face whose features, such as length of nose and curvature of mouth, correspond to components of the point. Thus every multivariate observation is visualized as a computer-drawn face. This presentation makes it easy for the human mind to grasp many of the essential regularities and irregularities present in the data. Other graphical representations are described briefly.

/pdf/the-use-of-faces-to-represent-points-in-k-dimensional-space-3dl3m2zs05.pdf

The Use of Faces to Represent Points in k- Dimensional Space Graphically

A classical result due to Wilks [1] on the distribution of the likelihood ratio $\lambda$ is the following. Under suitable regularity conditions, if the hypothesis that a parameter $\theta$ lies on an $r$-dimensional hyperplane of $k$-dimensional space is true, the distribution of $-2 \log \lambda$ is asymptotically that of $\chi^2$ with $k - r$ degrees of freedom. In many important problems it is desired to test hypotheses which are not quite of the above type. For example, one may wish to test whether $\theta$ is on one side of a hyperplane, or to test whether $\theta$ is in the positive quadrant of a two-dimensional space. The asymptotic distribution of $-2 \log \lambda$ is examined when the value of the parameter is a boundary point of both the set of $\theta$ corresponding to the hypothesis and the set of $\theta$ corresponding to the alternative. First the case of a single observation from a multivariate normal distribution, with mean $\theta$ and known covariance matrix, is treated. The general case is then shown to reduce to this special case where the covariance matrix is replaced by the inverse of the information matrix. In particular, if one tests whether $\theta$ is on one side or the other of a smooth $(k - 1)$-dimensional surface in $k$-dimensional space and $\theta$ lies on the surface, the asymptotic distribution of $\lambda$ is that of a chance variable which is zero half the time and which behaves like $\chi^2$ with one degree of freedom the other half of the time.

/pdf/on-the-distribution-of-the-likelihood-ratio-1elq4xz2bg.pdf

On the Distribution of the Likelihood Ratio

It is desired to estimate $s$ parameters $\theta_1, \theta_2, \cdots, \theta_s.$ There is available a set of experiments which may be performed. The probability distribution of the data obtained from any of these experiments may depend on $\theta_1, \theta_2, \cdots, \theta_k, k \geqq s.$ One is permitted to select a design consisting of $n$ of these experiments to be performed independently. The repetition of experiments is permitted in the design. We shall show that, under mild conditions, locally optimal designs for large $n$ may be approximated by selecting a certain set of $r \leqq k + (k - 1) + \cdots + (k - s + 1)$ of the experiments available and by repeating each of these $r$ experiments in certain specified proportions. Examples are given illustrating how this result simplifies considerably the problem of obtaining optimal designs. The criterion of optimality that is employed is one that involves the use of Fisher's information matrix. For the case where it is desired to estimate one of the $k$ parameters, this criterion corresponds to minimizing the variance of the asymptotic distribution of the maximum likelihood estimate of that parameter. The result of this paper constitutes a generalization of a result of Elfving [1]. As in Elfving's paper, the results extend to the case where the cost depends on the experiment and the amount of money to be allocated on experimentation is determined instead of the sample size.

/pdf/locally-optimal-designs-for-estimating-parameters-3tbg9wzane.pdf

Locally Optimal Designs for Estimating Parameters

The usual test that a sample comes from a distribution of given form is performed by counting the number of observations falling into specified cells and applying the χ2 test to these frequencies. In estimating the parameters for this test, one may use the maximum likelihood (or equivalent) estimate based (1) on the cell frequencies, or (2) on the original observations. This paper shows that in (2), unlike the well known result for (1), the test statistic does not have a limiting χ2-distribution, but that it is stochastically larger than would be expected under the χ2 theory. The limiting distribution is obtained and some examples are computed. These indicate that the error is not serious in the case of fitting a Poisson distribution, but may be so for the fitting of a normal.

/pdf/the-use-of-maximum-likelihood-estimates-in-chi-2-tests-for-423eyq56gz.pdf

Herman Chernoff

Papers

A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations

The Use of Faces to Represent Points in k- Dimensional Space Graphically

On the Distribution of the Likelihood Ratio

Locally Optimal Designs for Estimating Parameters

The Use of Maximum Likelihood Estimates in {\chi^2} Tests for Goodness of Fit