scispace - formally typeset
Search or ask a question
Author

Robert R. Sokal

Bio: Robert R. Sokal is an academic researcher from State University of New York System. The author has contributed to research in topics: Population & Numerical taxonomy. The author has an hindex of 64, co-authored 190 publications receiving 80115 citations. Previous affiliations of Robert R. Sokal include Stony Brook University & University of New Mexico.


Papers
More filters
01 Jan 1995
TL;DR: In this paper, the authors present a model for the analysis of variance in a single-classification and two-way and multiway analysis of Variance with the assumption of correlation.
Abstract: 1. Introduction 2. Data in Biology 3. Computers and Data Analysis 4. Descriptive Statistics 5. Introduction to Probability Distributions 6. The Normal Probability Distribution 7. Hypothesis Testing and Interval Estimation 8. Introduction to Analysis of Variance 9. Single-Classification Analysis of Variance 10. Nested Analysis of Variance 11. Two-Way and Multiway Analysis of Variance 12. Statistical Power and Sample Size in the Analysis of Variance 13. Assumptions of Analysis of Variance 14. Linear Regression 15. Correlation 16. Multiple and Curvilinear Regression 17. Analysis of Frequencies 18. Meta-Analysis and Miscellaneous Methods

23,447 citations

Book
01 Jan 1969
TL;DR: In this paper, the authors present a model for the analysis of variance in a single-classification and two-way and multiway analysis of Variance with the assumption of correlation.
Abstract: 1. Introduction 2. Data in Biology 3. Computers and Data Analysis 4. Descriptive Statistics 5. Introduction to Probability Distributions 6. The Normal Probability Distribution 7. Hypothesis Testing and Interval Estimation 8. Introduction to Analysis of Variance 9. Single-Classification Analysis of Variance 10. Nested Analysis of Variance 11. Two-Way and Multiway Analysis of Variance 12. Statistical Power and Sample Size in the Analysis of Variance 13. Assumptions of Analysis of Variance 14. Linear Regression 15. Correlation 16. Multiple and Curvilinear Regression 17. Analysis of Frequencies 18. Meta-Analysis and Miscellaneous Methods

21,276 citations

Book
01 Jan 1963
TL;DR: The authors continued the story of psychology with added research and enhanced content from the most dynamic areas of the field, such as cognition, gender and diversity studies, neuroscience and more, while at the same time using the most effective teaching approaches and learning tools.
Abstract: This new edition continues the story of psychology with added research and enhanced content from the most dynamic areas of the field--cognition, gender and diversity studies, neuroscience and more, while at the same time using the most effective teaching approaches and learning tools

3,332 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The neighbor-joining method and Sattath and Tversky's method are shown to be generally better than the other methods for reconstructing phylogenetic trees from evolutionary distance data.
Abstract: A new method called the neighbor-joining method is proposed for reconstructing phylogenetic trees from evolutionary distance data. The principle of this method is to find pairs of operational taxonomic units (OTUs [= neighbors]) that minimize the total branch length at each stage of clustering of OTUs starting with a starlike tree. The branch lengths as well as the topology of a parsimonious tree can quickly be obtained by using this method. Using computer simulation, we studied the efficiency of this method in obtaining the correct unrooted tree in comparison with that of five other tree-making methods: the unweighted pair group method of analysis, Farris's method, Sattath and Tversky's method, Li's method, and Tateno et al.'s modified Farris method. The new, neighbor-joining method and Sattath and Tversky's method are shown to be generally better than the other methods.

57,055 citations

Journal ArticleDOI
TL;DR: The recently‐developed statistical method known as the “bootstrap” can be used to place confidence intervals on phylogenies and shows significant evidence for a group if it is defined by three or more characters.
Abstract: The recently-developed statistical method known as the "bootstrap" can be used to place confidence intervals on phylogenies. It involves resampling points from one's own data, with replacement, to create a series of bootstrap samples of the same size as the original data. Each of these is analyzed, and the variation among the resulting estimates taken to indicate the size of the error involved in making estimates from the original data. In the case of phylogenies, it is argued that the proper method of resampling is to keep all of the original species while sampling characters with replacement, under the assumption that the characters have been independently drawn by the systematist and have evolved independently. Majority-rule consensus trees can be used to construct a phylogeny showing all of the inferred monophyletic groups that occurred in a majority of the bootstrap samples. If a group shows up 95% of the time or more, the evidence for it is taken to be statistically significant. Existing computer programs can be used to analyze different bootstrap samples by using weights on the characters, the weight of a character being how many times it was drawn in bootstrap sampling. When all characters are perfectly compatible, as envisioned by Hennig, bootstrap sampling becomes unnecessary; the bootstrap method would show significant evidence for a group if it is defined by three or more characters.

40,349 citations

01 Jan 1967
TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.
Abstract: The main purpose of this paper is to describe a process for partitioning an N-dimensional population into k sets on the basis of a sample. The process, which is called 'k-means,' appears to give partitions which are reasonably efficient in the sense of within-class variance. That is, if p is the probability mass function for the population, S = {S1, S2, * *, Sk} is a partition of EN, and ui, i = 1, 2, * , k, is the conditional mean of p over the set Si, then W2(S) = ff=ISi f z u42 dp(z) tends to be low for the partitions S generated by the method. We say 'tends to be low,' primarily because of intuitive considerations, corroborated to some extent by mathematical analysis and practical computational experience. Also, the k-means procedure is easily programmed and is computationally economical, so that it is feasible to process very large samples on a digital computer. Possible applications include methods for similarity grouping, nonlinear prediction, approximating multivariate distributions, and nonparametric tests for independence among several variables. In addition to suggesting practical classification methods, the study of k-means has proved to be theoretically interesting. The k-means concept represents a generalization of the ordinary sample mean, and one is naturally led to study the pertinent asymptotic behavior, the object being to establish some sort of law of large numbers for the k-means. This problem is sufficiently interesting, in fact, for us to devote a good portion of this paper to it. The k-means are defined in section 2.1, and the main results which have been obtained on the asymptotic behavior are given there. The rest of section 2 is devoted to the proofs of these results. Section 3 describes several specific possible applications, and reports some preliminary results from computer experiments conducted to explore the possibilities inherent in the k-means idea. The extension to general metric spaces is indicated briefly in section 4. The original point of departure for the work described here was a series of problems in optimal classification (MacQueen [9]) which represented special

24,320 citations

Journal ArticleDOI
TL;DR: Genalex is a user-friendly cross-platform package that runs within Microsoft Excel, enabling population genetic analyses of codominant, haploid and binary data.
Abstract: genalex is a user-friendly cross-platform package that runs within Microsoft Excel, enabling population genetic analyses of codominant, haploid and binary data. Allele frequency-based analyses include heterozygosity, F statistics, Nei's genetic distance, population assignment, probabilities of identity and pairwise relatedness. Distance-based calculations include amova, principal coordinates analysis (PCA), Mantel tests, multivariate and 2D spatial autocorrelation and twogener. More than 20 different graphs summarize data and aid exploration. Sequence and genotype data can be imported from automated sequencers, and exported to other software. Initially designed as tool for teaching, genalex 6 now offers features for researchers as well. Documentation and the program are available at http://www.anu.edu.au/BoZo/GenAlEx/

15,786 citations

Journal ArticleDOI
TL;DR: Arlequin ver 3.0 as discussed by the authors is a software package integrating several basic and advanced methods for population genetics data analysis, like the computation of standard genetic diversity indices, the estimation of allele and haplotype frequencies, tests of departure from linkage equilibrium, departure from selective neutrality and demographic equilibrium, estimation or parameters from past population expansions, and thorough analyses of population subdivision under the AMOVA framework.
Abstract: Arlequin ver 3.0 is a software package integrating several basic and advanced methods for population genetics data analysis, like the computation of standard genetic diversity indices, the estimation of allele and haplotype frequencies, tests of departure from linkage equilibrium, departure from selective neutrality and demographic equilibrium, estimation or parameters from past population expansions, and thorough analyses of population subdivision under the AMOVA framework. Arlequin 3 introduces a completely new graphical interface written in C++, a more robust semantic analysis of input files, and two new methods: a Bayesian estimation of gametic phase from multi-locus genotypes, and an estimation of the parameters of an instantaneous spatial expansion from DNA sequence polymorphism. Arlequin can handle several data types like DNA sequences, microsatellite data, or standard multi-locus genotypes. A Windows version of the software is freely available on http://cmpg.unibe.ch/software/arlequin3.

14,271 citations