scispace - formally typeset
Search or ask a question
Author

Tadeusz Caliński

Bio: Tadeusz Caliński is an academic researcher from University of Life Sciences in Poznań. The author has contributed to research in topics: Block design & Block (telecommunications). The author has an hindex of 12, co-authored 38 publications receiving 5369 citations. Previous affiliations of Tadeusz Caliński include United States Department of Agriculture & Polish Academy of Sciences.

Papers
More filters
Journal ArticleDOI
TL;DR: A method for identifying clusters of points in a multidimensional Euclidean space is described and its application to taxonomy considered and an informal indicator of the "best number" of clusters is suggested.
Abstract: A method for identifying clusters of points in a multidimensional Euclidean space is described and its application to taxonomy considered. It reconciles, in a sense, two different approaches to the investigation of the spatial relationships between the points, viz., the agglomerative and the divisive methods. A graph, the shortest dendrite of Florek etal. (1951a), is constructed on a nearest neighbour basis and then divided into clusters by applying the criterion of minimum within cluster sum of squares. This procedure ensures an effective reduction of the number of possible splits. The method may be applied to a dichotomous division, but is perfectly suitable also for a global division into any number of clusters. An informal indicator of the "best number" of clusters is suggested. It is a"variance ratio criterion" giving some insight into the structure of the points. The method is illustrated by three examples, one of which is original. The results obtained by the dendrite method are compared with those...

5,772 citations

Journal ArticleDOI
01 Dec 1985-Genetics
TL;DR: A general method for determining genetic distances is proposed using the T2 statistic, defined in terms of the vector of contrasts specifying the distance, which permits the testing of the significance of any distance between any pair of populations that may be of interest from a genetic point of view.
Abstract: Morphological data showing continuous distributions, polygenically controlled, may be particularly useful in intergroup classification below the species level; an appropriate distance analysis based on these traits is an important tool in evolutionary biology and in plant and animal breeding.--The interpretation of morphological distances in genetic terms is not easy because simple phenotypic data may lead to biased estimates of genetic distances. Convenient estimates can be obtained whenever it is possible to breed populations according to a suitable crossing design and to derive information from genetic parameters.--A general method for determining genetic distances is proposed. The procedure of multivariate analysis of variance is extended to estimate appropriate genetic parameters (genetic effects). Not only are optimal statistical estimates of parameters obtained but also the procedure allows the measurement of genetic distances between populations as linear functions of the estimated parameters, providing an appropriate distance matrix that can be defined in terms of these parameters. The use of the T2 statistic, defined in terms of the vector of contrasts specifying the distance, permits the testing of the significance of any distance between any pair of populations that may be of interest from a genetic point of view.--A numerical example from maize diallel data is reported in order to illustrate the procedure. In particular, heterosis effects are used as the basis for estimates of genetic divergence between populations.

70 citations

Journal ArticleDOI
01 Feb 1997-Heredity
TL;DR: The least squares interval mapping approach using multiple regression on marker data has been developed and has been shown to be more powerful in detecting QTLs and more precise in determining their map position.
Abstract: In order to detect the linkage disequilibrium existing between alleles at a marker locus and alleles of a linked quantitative trait locus (QTL), a least squares interval mapping approach using multiple regression on marker data has been developed. It allows inclusion in the model of the parameters describing the experimental and environmental situation, so that the QTL × environment effects can be tested. The method can also be applied using any general statistical package to data for which the usual normal distribution assumption does not hold, and where the use of weighted approaches is therefore required. A method to cope with the frequent problem in biological experiments of missing data was also used. The analysis was performed on data concerning two components of maize pollen competitive ability, obtained from an experiment over 2 years. The method, in comparison with the traditional single marker approach, has been shown to be more powerful in detecting QTLs and more precise in determining their map position. The analysis has identified QTLs expressed across years, putative QTLs with major effects and QTLs accounting for genotype × environment interaction.

58 citations


Cited by
More filters
Journal Article
TL;DR: This book by a teacher of statistics (as well as a consultant for "experimenters") is a comprehensive study of the philosophical background for the statistical design of experiment.
Abstract: THE DESIGN AND ANALYSIS OF EXPERIMENTS. By Oscar Kempthorne. New York, John Wiley and Sons, Inc., 1952. 631 pp. $8.50. This book by a teacher of statistics (as well as a consultant for \"experimenters\") is a comprehensive study of the philosophical background for the statistical design of experiment. It is necessary to have some facility with algebraic notation and manipulation to be able to use the volume intelligently. The problems are presented from the theoretical point of view, without such practical examples as would be helpful for those not acquainted with mathematics. The mathematical justification for the techniques is given. As a somewhat advanced treatment of the design and analysis of experiments, this volume will be interesting and helpful for many who approach statistics theoretically as well as practically. With emphasis on the \"why,\" and with description given broadly, the author relates the subject matter to the general theory of statistics and to the general problem of experimental inference. MARGARET J. ROBERTSON

13,333 citations

Journal ArticleDOI
TL;DR: In this article, a non-parametric method for multivariate analysis of variance, based on sums of squared distances, is proposed. But it is not suitable for most ecological multivariate data sets.
Abstract: Hypothesis-testing methods for multivariate data are needed to make rigorous probability statements about the effects of factors and their interactions in experiments. Analysis of variance is particularly powerful for the analysis of univariate data. The traditional multivariate analogues, however, are too stringent in their assumptions for most ecological multivariate data sets. Non-parametric methods, based on permutation tests, are preferable. This paper describes a new non-parametric method for multivariate analysis of variance, after McArdle and Anderson (in press). It is given here, with several applications in ecology, to provide an alternative and perhaps more intuitive formulation for ANOVA (based on sums of squared distances) to complement the description pro- vided by McArdle and Anderson (in press) for the analysis of any linear model. It is an improvement on previous non-parametric methods because it allows a direct additive partitioning of variation for complex models. It does this while maintaining the flexibility and lack of formal assumptions of other non-parametric methods. The test- statistic is a multivariate analogue to Fisher's F-ratio and is calculated directly from any symmetric distance or dissimilarity matrix. P-values are then obtained using permutations. Some examples of the method are given for tests involving several factors, including factorial and hierarchical (nested) designs and tests of interactions.

12,328 citations

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a method called the "gap statistic" for estimating the number of clusters (groups) in a set of data, which uses the output of any clustering algorithm (e.g. K-means or hierarchical), comparing the change in within-cluster dispersion with that expected under an appropriate reference null distribution.
Abstract: We propose a method (the ‘gap statistic’) for estimating the number of clusters (groups) in a set of data. The technique uses the output of any clustering algorithm (e.g. K-means or hierarchical), comparing the change in within-cluster dispersion with that expected under an appropriate reference null distribution. Some theory is developed for the proposal and a simulation study shows that the gap statistic usually outperforms other methods that have been proposed in the literature.

4,283 citations

Book
01 Jan 2000
TL;DR: The gap statistic is proposed for estimating the number of clusters (groups) in a set of data by comparing the change in within‐cluster dispersion with that expected under an appropriate reference null distribution.
Abstract: We propose a method (the ‘gap statistic’) for estimating the number of clusters (groups) in a set of data. The technique uses the output of any clustering algorithm (e.g. K-means or hierarchical), comparing the change in within-cluster dispersion with that expected under an appropriate reference null distribution. Some theory is developed for the proposal and a simulation study shows that the gap statistic usually outperforms other methods that have been proposed in the literature.

3,860 citations

Journal ArticleDOI
TL;DR: A Monte Carlo evaluation of 30 procedures for determining the number of clusters was conducted on artificial data sets which contained either 2, 3, 4, or 5 distinct nonoverlapping clusters to provide a variety of clustering solutions.
Abstract: A Monte Carlo evaluation of 30 procedures for determining the number of clusters was conducted on artificial data sets which contained either 2, 3, 4, or 5 distinct nonoverlapping clusters. To provide a variety of clustering solutions, the data sets were analyzed by four hierarchical clustering methods. External criterion measures indicated excellent recovery of the true cluster structure by the methods at the correct hierarchy level. Thus, the clustering present in the data was quite strong. The simulation results for the stopping rules revealed a wide range in their ability to determine the correct number of clusters in the data. Several procedures worked fairly well, whereas others performed rather poorly. Thus, the latter group of rules would appear to have little validity, particularly for data sets containing distinct clusters. Applied researchers are urged to select one or more of the better criteria. However, users are cautioned that the performance of some of the criteria may be data dependent.

3,551 citations