scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Non-parametric multivariate analyses of changes in community structure

01 Mar 1993-Austral Ecology (Wiley/Blackwell (10.1111))-Vol. 18, Iss: 1, pp 117-143
TL;DR: Which elements of this often-quoted strategy for graphical representation of multivariate (multi-species) abundance data have proved most useful in practical assessment of community change resulting from pollution impact are identified.
Abstract: In the early 1980s, a strategy for graphical representation of multivariate (multi-species) abundance data was introduced into marine ecology by, among others, Field, et al. (1982). A decade on, it is instructive to: (i) identify which elements of this often-quoted strategy have proved most useful in practical assessment of community change resulting from pollution impact; and (ii) ask to what extent evolution of techniques in the intervening years has added self-consistency and comprehensiveness to the approach. The pivotal concept has proved to be that of a biologically-relevant definition of similarity of two samples, and its utilization mainly in simple rank form, for example ‘sample A is more similar to sample B than it is to sample C’. Statistical assumptions about the data are thus minimized and the resulting non-parametric techniques will be of very general applicability. From such a starting point, a unified framework needs to encompass: (i) the display of community patterns through clustering and ordination of samples; (ii) identification of species principally responsible for determining sample groupings; (iii) statistical tests for differences in space and time (multivariate analogues of analysis of variance, based on rank similarities); and (iv) the linking of community differences to patterns in the physical and chemical environment (the latter also dictated by rank similarities between samples). Techniques are described that bring such a framework into place, and areas in which problems remain are identified. Accumulated practical experience with these methods is discussed, in particular applications to marine benthos, and it is concluded that they have much to offer practitioners of environmental impact studies on communities.
Citations
More filters
Journal ArticleDOI
TL;DR: In this article, a non-parametric method for multivariate analysis of variance, based on sums of squared distances, is proposed. But it is not suitable for most ecological multivariate data sets.
Abstract: Hypothesis-testing methods for multivariate data are needed to make rigorous probability statements about the effects of factors and their interactions in experiments. Analysis of variance is particularly powerful for the analysis of univariate data. The traditional multivariate analogues, however, are too stringent in their assumptions for most ecological multivariate data sets. Non-parametric methods, based on permutation tests, are preferable. This paper describes a new non-parametric method for multivariate analysis of variance, after McArdle and Anderson (in press). It is given here, with several applications in ecology, to provide an alternative and perhaps more intuitive formulation for ANOVA (based on sums of squared distances) to complement the description pro- vided by McArdle and Anderson (in press) for the analysis of any linear model. It is an improvement on previous non-parametric methods because it allows a direct additive partitioning of variation for complex models. It does this while maintaining the flexibility and lack of formal assumptions of other non-parametric methods. The test- statistic is a multivariate analogue to Fisher's F-ratio and is calculated directly from any symmetric distance or dissimilarity matrix. P-values are then obtained using permutations. Some examples of the method are given for tests involving several factors, including factorial and hierarchical (nested) designs and tests of interactions.

12,328 citations

Book
21 Mar 2002
TL;DR: An essential textbook for any student or researcher in biology needing to design experiments, sample programs or analyse the resulting data is as discussed by the authors, covering both classical and Bayesian philosophies, before advancing to the analysis of linear and generalized linear models Topics covered include linear and logistic regression, simple and complex ANOVA models (for factorial, nested, block, split-plot and repeated measures and covariance designs), and log-linear models Multivariate techniques, including classification and ordination, are then introduced.
Abstract: An essential textbook for any student or researcher in biology needing to design experiments, sample programs or analyse the resulting data The text begins with a revision of estimation and hypothesis testing methods, covering both classical and Bayesian philosophies, before advancing to the analysis of linear and generalized linear models Topics covered include linear and logistic regression, simple and complex ANOVA models (for factorial, nested, block, split-plot and repeated measures and covariance designs), and log-linear models Multivariate techniques, including classification and ordination, are then introduced Special emphasis is placed on checking assumptions, exploratory data analysis and presentation of results The main analyses are illustrated with many examples from published papers and there is an extensive reference list to both the statistical and biological literature The book is supported by a website that provides all data sets, questions for each chapter and links to software

9,509 citations


Cites background or methods from "Non-parametric multivariate analyse..."

  • ...Clarke & Warwick (1994) described a procedure for ecological data termed SIMPER (similarity percentages) for determining which species (variables) are contributing most to the dissimilarity between groups of object (sampling units)....

    [...]

  • ...Clarke & Ainsworth (1993) proposed a procedure for ecological data that basically measures the correlation between dissimilarities between sampling units based on species composition and the dissimilarities between sampling units based on environmental variables. They provided an algorithm called BIO-ENV that first calculates a dissimilarity matrix (e.g. Bray–Curtis) between sampling units based on species abundances and a separate dissimilarity matrix (e.g. Euclidean distance) between sampling units based on environmental variables. It then measures any correlation between the rank-orders of these two matrices using the Spearman rank correlation coefficient. Each pair of observations for the correlation will be the rank of the Bray–Curtis dissimilarity (from species abundances) between objects h and i and the rank of the Euclidean distance (from environmental variables) between objects h and i. Legendre & Legendre (1998) pointed out that the BIO-ENV procedure basically calculates the same correlation as a Mantel test (Chapter 15 and Section 18....

    [...]

  • ...Clarke & Warwick (1994) argued that fourth-root transformations should always be used for species abundance data before calculating dissimilarities to reduce the influence of very abundant species....

    [...]

  • ...Clarke & Ainsworth (1993) proposed a procedure for ecological data that basically measures the correlation between dissimilarities between sampling units based on species composition and the dissimilarities between sampling units based on environmental variables....

    [...]

  • ...Clarke & Ainsworth (1993) proposed a procedure for ecological data that basically measures the correlation between dissimilarities between sampling units based on species composition and the dissimilarities between sampling units based on environmental variables. They provided an algorithm called BIO-ENV that first calculates a dissimilarity matrix (e.g. Bray–Curtis) between sampling units based on species abundances and a separate dissimilarity matrix (e.g. Euclidean distance) between sampling units based on environmental variables. It then measures any correlation between the rank-orders of these two matrices using the Spearman rank correlation coefficient. Each pair of observations for the correlation will be the rank of the Bray–Curtis dissimilarity (from species abundances) between objects h and i and the rank of the Euclidean distance (from environmental variables) between objects h and i. Legendre & Legendre (1998) pointed out that the BIO-ENV procedure basically calculates the same correlation as a Mantel test (Chapter 15 and Section 18.1.3), except the former is based on rank transformed data. The Mantel test could be used for the global test of no correlation between the two matrices, or even between the dissimilarities based on species composition and differences between sampling units for each environmental variable separately. It can also be extended to compare more than two matrices (Diniz-Filho & Bini 1996). Clarke & Ainsworth (1993) and Clarke & Warwick (1994) incorporated a stepwise routine into their BIO-ENV procedure, to find the combinations of environmental variables that produce dissimilarities between sampling units with the highest correlations with dissimilarities between sampling units based on species composition....

    [...]

Journal ArticleDOI
TL;DR: It is found that in direct contrast to the highly differentiated communities of their mothers, neonates harbored bacterial communities that were undifferentiated across multiple body habitats, regardless of delivery mode.
Abstract: Upon delivery, the neonate is exposed for the first time to a wide array of microbes from a variety of sources, including maternal bacteria. Although prior studies have suggested that delivery mode shapes the microbiota's establishment and, subsequently, its role in child health, most researchers have focused on specific bacterial taxa or on a single body habitat, the gut. Thus, the initiation stage of human microbiome development remains obscure. The goal of the present study was to obtain a community-wide perspective on the influence of delivery mode and body habitat on the neonate's first microbiota. We used multiplexed 16S rRNA gene pyrosequencing to characterize bacterial communities from mothers and their newborn babies, four born vaginally and six born via Cesarean section. Mothers' skin, oral mucosa, and vagina were sampled 1 h before delivery, and neonates' skin, oral mucosa, and nasopharyngeal aspirate were sampled <5 min, and meconium <24 h, after delivery. We found that in direct contrast to the highly differentiated communities of their mothers, neonates harbored bacterial communities that were undifferentiated across multiple body habitats, regardless of delivery mode. Our results also show that vaginally delivered infants acquired bacterial communities resembling their own mother's vaginal microbiota, dominated by Lactobacillus, Prevotella, or Sneathia spp., and C-section infants harbored bacterial communities similar to those found on the skin surface, dominated by Staphylococcus, Corynebacterium, and Propionibacterium spp. These findings establish an important baseline for studies tracking the human microbiome's successional development in different body habitats following different delivery modes, and their associated effects on infant health.

3,640 citations


Cites methods from "Non-parametric multivariate analyse..."

  • ...ANOSIM is a permutation-based test of the null hypothesis that within-group distances are not significantly smaller than between-group distances....

    [...]

  • ...We used the analysis of similarities (ANOSIM) (44) function in the program PRIMER (45) to test for differences in community composition among various sample groups....

    [...]

Journal ArticleDOI
01 Jan 2001-Ecology
TL;DR: The distance-based redundancy analysis (db-RDA) as mentioned in this paper is a nonparametric multivariate analysis of ecological data using permutation tests that is used to partition the variability in the data according to a complex design or model, as is often required in ecological experiments.
Abstract: Nonparametric multivariate analysis of ecological data using permutation tests has two main challenges: (1) to partition the variability in the data according to a complex design or model, as is often required in ecological experiments, and (2) to base the analysis on a multivariate distance measure (such as the semimetric Bray-Curtis measure) that is reasonable for ecological data sets. Previous nonparametric methods have succeeded in one or other of these areas, but not in both. A recent contribution to Ecological Monographs by Legendre and Anderson, called distance-based redundancy analysis (db-RDA), does achieve both. It does this by calculating principal coordinates and subsequently correcting for negative eigenvalues, if they are present, by adding a constant to squared distances. We show here that such a correction is not necessary. Partitioning can be achieved directly from the distance matrix itself, with no corrections and no eigenanalysis, even if the distance measure used is semimetric. An ecological example is given to show the differences in these statistical methods. Empirical simulations, based on parameters estimated from real ecological species abundance data, showed that db-RDA done on multifactorial designs (using the correction) does not have type 1 error consistent with the significance level chosen for the analysis (i.e., does not provide an exact test), whereas the direct method described and advocated here does.

3,468 citations


Cites background or methods from "Non-parametric multivariate analyse..."

  • ...It is generally agreed that the Euclidean distance measure is not appropriate for use with ecological data of species abundances (e.g., Faith et al. 1987, Clarke 1993, Legendre and Legendre 1998)....

    [...]

  • ...Email: mja@stat.auckland.ac.nz Several nonparametric multivariate methods for use in biology, ecology, and the social sciences have been proposed (Mantel 1967, Mantel and Valand 1970, Hubert and Schultz 1976, Mielke et al. 1976, Smith et al. 1990, McArdle 1991, Clarke 1993, Pillar and Orlóci 1996)....

    [...]

  • ...…seems to provide the most meaningful intuitive measure of dissimilarity in ecological community structure (Odum 1950, Hajdu 1981, Faith et al. 1987, Clarke 1993, Legendre and Legendre 1998), the mathematically complex portion of the information inherent in the measure has generally been ignored....

    [...]

  • ...First, there are those that can be based on any distance measure of choice, including semimetric measures such as the Bray-Curtis measure (Mantel 1967, Hubert and Schultz 1976, Smith et al. 1990, Clarke 1993)....

    [...]

Journal ArticleDOI
TL;DR: In this paper, distance-based tests of homogeneity of multivariate dispersions, which can be based on any dissimilarity measure of choice, are proposed, relying on the rotational invariance of either the multivariate centroid or the spatial median to obtain measures of spread using principal coordinate axes.
Abstract: The traditional likelihood-based test for differences in multivariate dispersions is known to be sensitive to nonnormality. It is also impossible to use when the number of variables exceeds the number of observations. Many biological and ecological data sets have many variables, are highly skewed, and are zero-inflated. The traditional test and even some more robust alternatives are also unreasonable in many contexts where measures of dispersion based on a non-Euclidean dissimilarity would be more appropriate. Distance-based tests of homogeneity of multivariate dispersions, which can be based on any dissimilarity measure of choice, are proposed here. They rely on the rotational invariance of either the multivariate centroid or the spatial median to obtain measures of spread using principal coordinate axes. The tests are straightforward multivariate extensions of Levene's test, with P-values obtained either using the traditional F-distribution or using permutation of either least-squares or LAD residuals. Examples illustrate the utility of the approach, including the analysis of stabilizing selection in sparrows, biodiversity of New Zealand fish assemblages, and the response of Indonesian reef corals to an El Nino. Monte Carlo simulations from the real data sets show that the distance-based tests are robust and powerful for relevant alternative hypotheses of real differences in spread.

2,255 citations


Cites background or methods from "Non-parametric multivariate analyse..."

  • ...This led Warwick and Clarke (1993) to propose that increased multivariate dispersion, as measured by the Bray–Curtis dissimilarity, may in general be a sign of increased environmental stress....

    [...]

  • ...Euclidean distance is inappropriate here for these reasons, and also because it is not generally considered useful for measuring the ecological dissimilarity among species assemblages (Faith et al., 1987; Clarke, 1993)....

    [...]

  • ...For measures such as Euclidean distance or Bray–Curtis, some form of standardization or transformation of variables may also be done before calculating distances, so that variables have equal weight or are placed on similar scales (Faith et al., 1987; Clarke, 1993)....

    [...]

  • ...Euclidean distance is inappropriate here for these reasons, and also because it is not generally considered useful for measuring the ecological dissimilarity among species assemblages (Faith et al., 1987; Clarke, 1993)....

    [...]

  • ...Several robust dissimilarity-based tests for equality of multivariate locations have been described (Mielke, Berry, and Johnson, 1976; Smith, Pontasch, and Cairns, 1990; Clarke, 1993; Pillar and Orlóci, 1996; Gower and Krzanowski, 1999; Legendre and Anderson, 1999; Anderson, 2001)....

    [...]

References
More filters
Book
B. J. Winer1
01 Jan 1962
TL;DR: In this article, the authors introduce the principles of estimation and inference: means and variance, means and variations, and means and variance of estimators and inferors, and the analysis of factorial experiments having repeated measures on the same element.
Abstract: CHAPTER 1: Introduction to Design CHAPTER 2: Principles of Estimation and Inference: Means and Variance CHAPTER 3: Design and Analysis of Single-Factor Experiments: Completely Randomized Design CHAPTER 4: Single-Factor Experiments Having Repeated Measures on the Same Element CHAPTER 5: Design and Analysis of Factorial Experiments: Completely-Randomized Design CHAPTER 6: Factorial Experiments: Computational Procedures and Numerical Example CHAPTER 7: Multifactor Experiments Having Repeated Measures on the Same Element CHAPTER 8: Factorial Experiments in which Some of the Interactions are Confounded CHAPTER 9: Latin Squares and Related Designs CHAPTER 10: Analysis of Covariance

25,607 citations

Journal ArticleDOI
TL;DR: This chapter discusses design and analysis of single-Factor Experiments: Completely Randomized Design and Factorial Experiments in which Some of the Interactions are Confounded.

24,665 citations

Journal ArticleDOI
TL;DR: In this article, the authors discuss the problem of estimating the sampling distribution of a pre-specified random variable R(X, F) on the basis of the observed data x.
Abstract: We discuss the following problem given a random sample X = (X 1, X 2,…, X n) from an unknown probability distribution F, estimate the sampling distribution of some prespecified random variable R(X, F), on the basis of the observed data x. (Standard jackknife theory gives an approximate mean and variance in the case R(X, F) = \(\theta \left( {\hat F} \right) - \theta \left( F \right)\), θ some parameter of interest.) A general method, called the “bootstrap”, is introduced, and shown to work satisfactorily on a variety of estimation problems. The jackknife is shown to be a linear approximation method for the bootstrap. The exposition proceeds by a series of examples: variance of the sample median, error rates in a linear discriminant analysis, ratio estimation, estimating regression parameters, etc.

14,483 citations

Journal Article
TL;DR: The technic to be given below for imparting statistical validity to the procedures already in vogue can be viewed as a generalized form of regression with possible useful application to problems arising in quite different contexts.
Abstract: The problem of identifying subtle time-space clustering of disease, as may be occurring in leukemia, is described and reviewed. Published approaches, generally associated with studies of leukemia, not dependent on knowledge of the underlying population for their validity, are directed towards identifying clustering by establishing a relationship between the temporal and the spatial separations for the n ( n - 1)/2 possible pairs which can be formed from the n observed cases of disease. Here it is proposed that statistical power can be improved by applying a reciprocal transform to these separations. While a permutational approach can give valid probability levels for any observed association, for reasons of practicability, it is suggested that the observed association be tested relative to its permutational variance. Formulas and computational procedures for doing so are given. While the distance measures between points represent symmetric relationships subject to mathematical and geometric regularities, the variance formula developed is appropriate for arbitrary relationships. Simplified procedures are given for the case of symmetric and skew-symmetric relationships. The general procedure is indicated as being potentially useful in other situations as, for example, the study of interpersonal relationships. Viewing the procedure as a regression approach, the possibility for extending it to nonlinear and multivariate situations is suggested. Other aspects of the problem and of the procedure developed are discussed. Similarly, pure temporal clustering can be identified by a study of incidence rates in periods of widespread epidemics. In point of fact, many epidemics of communicable diseases are somewhat local in nature and so these do actually constitute temporal-spatial clusters. For leukemia and similar diseases in which cases seem to arise substantially at random rather than as clear-cut epidemics, it is necessary to devise sensitive and efficient procedures for detecting any nonrandom component of disease occurrence. Various ingenious procedures which statisticians have developed for the detection of disease clustering are reviewed here. These procedures can be generalized so as to increase their statistical validity and efficiency. The technic to be given below for imparting statistical validity to the procedures already in vogue can be viewed as a generalized form of regression with possible useful application to problems arising in quite different contexts.

11,408 citations

Journal ArticleDOI
TL;DR: It is shown that nature of unit variation is a naajor problenl in systematies, and that whether this variation is diserete, continuous, or in some other form, there is a need for appliGation of (uantitative and statistical methods.
Abstract: INTRODUCTION A renewed interest in objeetive and quantitative approaehes to the elassifieation of plant communities has led, within the past decade, to an extensive exalllination of systematic theory and technique. This examination, ineluding the work of Sorenson (1948), Motyka et al. (1950), Curtis & McIntosh (1951), Brown & Curtis (1952), Ramensky (1952), Whittaker (1954, 1956), Goodall (1953a, 1954b)? deVries (1953), Guinoehet (1954, 1955), Webb (1954), Eughes (1954) and Poore (1956) has acconlpanied theoretie studies in taxonomy [Fisher (1936), Womble (1951), Clifford & Binet (1954), Gregg (1954)] and in statisties (Isaaeson 1954). It is a Gonclusion of many of these studies that nature of unit variation is a naajor problenl in systematies, and that whether this variation is diserete, continuous, or in some other form, there is a need for appliGation of (uantitative and statistical methods. In eeologic elassifieation, an inereased use of ordinate systellls, sr hiGh has been stimulated by the developnlent of more effieient sampling teehniques and the collection of stand data on a large seale, has prompted the proposal of the term \"ordination\" ( Goodall 1953b ) . Goodall (1954a) has defined ordination as \"an arrangenlent of units in a unior multi-dinlensional order\" as synonylllous with \"Ordnung,\" (Ramensky

9,549 citations