scispace - formally typeset
Search or ask a question
JournalISSN: 1045-2249

Behavioral Ecology 

Oxford University Press
About: Behavioral Ecology is an academic journal published by Oxford University Press. The journal publishes majorly in the area(s): Population & Sexual selection. It has an ISSN identifier of 1045-2249. Over the lifetime, 4532 publications have been published receiving 209539 citations. The journal is also known as: Behavioral ecology - online services.


Papers
More filters
Journal ArticleDOI
TL;DR: The meta-analysis on statistical power by Jennions and Moller (2003) revealed that, in the field of behavioral ecology and animal behavior, statistical power of less than 20% to detect a small effect and power of more than 50% to detects a medium effect existed.
Abstract: Recently, Jennions and Moller (2003) carried out a metaanalysis on statistical power in the field of behavioral ecology and animal behavior, reviewing 10 leading journals including Behavioral Ecology. Their results showed dismayingly low average statistical power (note that a meta-analytic review of statistical power is different from post hoc power analysis as criticized in Hoenig and Heisey, 2001). The statistical power of a null hypothesis (Ho) significance test is the probability that the test will reject Ho when a research hypothesis (Ha) is true. Knowledge of effect size is particularly important for statistical power analysis (for statistical power analysis, see Cohen, 1988; Nakagawa and Foster, in press). There are many kinds of effect size measures available (e.g., Pearson’s r, Cohen’s d, Hedges’s g), but most of these fall into one of two major types, namely the r family and the d family (Rosenthal, 1994). The r family shows the strength of relationship between two variables while the d family shows the size of difference between two variables. As a benchmark for research planning and evaluation, Cohen (1988) proposed ‘conventional’ values for small, medium, and large effects: r 1⁄4.10, .30, and .50 and d 1⁄4.20, .50, and .80, respectively (in the way that p values of .05, .01, and .001 are conventional points, although these conventional values of effect size have been criticized; e.g., Rosenthal et al., 2000). The meta-analysis on statistical power by Jennions and Moller (2003) revealed that, in the field of behavioral ecology and animal behavior, statistical power of less than 20% to detect a small effect and power of less than 50% to detect a medium effect existed. This means, for example, that the average behavioral scientist performing a statistical test has a greater probability of making a Type II error (or b) (i.e., not rejecting Ho when Ho is false; note that statistical power is equals to 1 2 b) than if they had flipped a coin, when an experiment effect is of medium size (i.e., r 1⁄4 .30, d 1⁄4 .50). Here, I highlight and discuss an implication of this low statistical power on one of the most widely used statistical procedures, Bonferroni correction (Cabin and Mitchell, 2000). Bonferroni corrections are employed to reduce Type I errors (i.e., rejecting Ho when Ho is true) when multiple tests or comparisons are conducted. Two kinds of Bonferroni procedures are commonly used. One is the standard Bonferroni procedure, where a modified significant criterion (a/k where k is the number of statistical tests conducted on given data) is used. The other is the sequential Bonferroni procedure, which was introduced by Holm (1979) and popularized in the field of ecology and evolution by Rice (1989) (see these papers for the procedure). For example, in a recent volume of Behavioral Ecology (vol. 13, 2002), nearly one-fifth of papers (23 out of 117) included Bonferroni corrections. Twelve articles employed the standard procedure while 11 articles employed the sequential procedure (10 citing Rice, 1989, and one citing Holm, 1979). A serious problem associated with the standard Bonferroni procedure is a substantial reduction in the statistical power of rejecting an incorrect Ho in each test (e.g., Holm, 1979; Perneger, 1998; Rice, 1989). The sequential Bonferroni procedure also incurs reduction in power, but to a lesser extent (which is the reason that the sequential procedure is used in preference by some researchers; Moran, 2003). Thus, both procedures exacerbate the existing problem of low power, identified by Jennions and Moller (2003). For example, suppose an experiment where both an experimental group and a control group consist of 30 subjects. After an experimental period, we measure five different variables and conduct a series of t tests on each variable. Even prior to applying Bonferroni corrections, the statistical power of each test to detect a medium effect is 61% (a 1⁄4 .05), which is less than a recommended acceptable 80% level (Cohen, 1988). In the field of behavioral ecology and animal behavior, it is usually difficult to use large sample sizes (in many cases, n , 30) because of practical and ethical reasons (see Still, 1992). When standard Bonferroni corrections are applied, the statistical power of each t test drops to as low as 33% (to detect a medium effect at a/5 1⁄4 .01). Although sequential Bonferroni corrections do not reduce the power of the tests to the same extent, on average (33–61% per t test), the probability of making a Type II error for some of the tests (b 1⁄4 1 2 power, so 39–66%) remains unacceptably high. Furthermore, statistical power would be even lower if we measured more than five variables or if we were interested in detecting a small effect. Bonferroni procedures appear to raise another set of problems. There is no formal consensus for when Bonferroni procedures should be used, even among statisticians (Perneger, 1998). It seems, in some cases, that Bonferroni corrections are applied only when their results remain significant. Some researchers may think that their results are ‘more significant’ if the results pass the rigor of Bonferroni corrections, although this is logically incorrect (Cohen, 1990, 1994; Yoccoz, 1991). Many researchers are already reluctant to report nonsignificant results ( Jennions and Moller, 2002a,b). The wide use of Bonferroni procedures may be aggravating the tendency of researchers not to present nonsignificant results, because presentation of more tests with nonsignificant results may make previously ‘significant’ results ‘nonsignificant’ under Bonferroni procedures. The more detailed research (i.e., research measuring more variables) researchers do, the less probability they have of finding significant results. Moran (2003) recently named this paradox as a hyper-Red Queen phenomenon (see the paper for more discussion on problems with the sequential method). Imagine that we conduct a study where we measure as many relevant variables as possible, 10 variables, for example. We find only two variables statistically significant. Then, what should we do? We could decide to write a paper highlighting these two variables (and not reporting the other eight at all) as if we had hypotheses about the two significant variables in the first place. Subsequently, our paper would be published. Alternatively, we could write a paper including all 10 variables. When the paper is reviewed, referees might tell us that there were no significant results if we had ‘appropriately’ employed Bonferroni corrections, so that our study would not be advisable for publication. However, the latter paper is Behavioral Ecology Vol. 15 No. 6: 1044–1045 doi:10.1093/beheco/arh107 Advance Access publication on June 30, 2004

1,996 citations

Journal ArticleDOI
TL;DR: The aim in this forum article is to argue for the greater use of the last of these tests, the t-test for unequal variances, which is not commonly used.
Abstract: Often in the study of behavioral ecology, and more widely in science, we require to statistically test whether the central tendencies (mean or median) of 2 groups are different from each other on the basis of samples of the 2 groups. In surveying recent issues of Behavioral Ecology (Volume 16, issues 1–5), I found that, of the 130 papers, 33 (25%) used at least one statistical comparison of this sort. Three different tests were used to make this comparison: Student’s t-test (67 occasions; 26 papers), Mann–Whitney U test (43 occasions; 21 papers), and the t-test for unequal variances (9 occasions; 4 papers). My aim in this forum article is to argue for the greater use of the last of these tests. The numbers just related suggest that this test is not commonly used. In my survey, I was able to identify tests described simply as ‘‘t-tests’’ with confidence as either a Student’s t-test or an unequal variance t-test because the calculation of degrees of freedom from the 2 sample sizes is different for the 2 tests (see below). Hence, the neglect of the unequal variance t-test illustrated above is a real phenomenon and can be explained in several (nonexclusive ways) ways: 1. Authors are unaware that Student’s t-test is unreliable

1,561 citations

Journal ArticleDOI
TL;DR: A formal meta-analysis of published studies reporting fitness consequences of single personality dimensions was conducted to identify general trends across species, and found bolder individuals had increased reproductive success, but incurred a survival cost, supporting the hypothesis that variation in boldness was maintained due to a "trade-off" in fitness consequences across contexts.
Abstract: The study of nonhuman personality capitalizes on the fact that individuals of many species behave in predictable, variable, and quantifiable ways. Although a few empirical studies have examined the ultimate consequences of personality differences, there has been no synthesis of results. We conducted a formal meta-analysis of published studies reporting fitness consequences of single personality dimensions to identify general trends across species. We found bolder individuals had increased reproductive success, particularly in males, but incurred a survival cost, thus, supporting the hypothesis that variation in boldness was maintained due to a ‘‘trade-off’’ in fitness consequences across contexts. Potential mechanisms maintaining variation in exploration and aggression are not as clear. Exploration had a positive effect only on survival, whereas aggression had a positive effect on both reproductive success and, not significantly, on survival. Such results would suggest that selection is driving populations to become more explorative and aggressive. However, limitations in meta-analytic techniques preclude us from testing for the effects of fluctuating environmental conditions or other forms of selection on these dimensions. Results do, however, provide evidence for general relationships between personality and fitness, and we provide a framework for future studies to follow in the hopes of spurring more in-depth, long-term research into the evolutionary mechanisms maintaining variation in personality dimensions and overall behavioral syndromes. We conclude with a discussion on how understanding and managing personality traits may play a key role in the captive breeding and recovery programs of endangered species. Key words: behavioral syndrome, fitness, personality, reproductive success, survival. [Behav Ecol]

1,273 citations

Journal ArticleDOI
TL;DR: It is argued that future work needs to examine the fitness effects of variation in immunocompetence and suggest that artificial selection experiments offer a potentially important tool for addressing this issue.
Abstract: There has been considerable recent interest in the effects of life-history decisions on immunocompetence in birds. If immunocompetence is limited by available resources, then trade-offs between investment in life-history components and investment in immunocompetence could be important in determining optimal life-history traits. For this to be true: (1) immunocompetence must be limited by resources, (2) investment in life-history components must be negatively correlated with immunocompetence, and (3) immunocompetence must be positively correlated with fitness. To gather such empirical data, ecologists need to be able to measure immunocompetence. We review techniques used to measure immunocompetence and how they are applied by ecologists. We also consider the components of the immune system that constitute immunocompetence and evaluate the possible consequences of measuring immunocompetence in different ways. We then review the empirical evidence for life-history tradeoffs involving immune defense. We conclude that there is some evidence suggesting that immunocompetence is limited by resources and that investment in certain life-history components reduces immunocompetence. However, the evidence that immunocompetence is related to fitness is circumstantial at present, although consistent with the hypothesis that immunocompetence and fitness are positively correlated. We argue that future work needs to examine the fitness effects of variation in immunocompetence and suggest that artificial selection experiments offer a potentially important tool for addressing this issue.

869 citations

Journal ArticleDOI
TL;DR: It is shown that random slope models have the potential to reduce residual variance by accounting for between-individual variation in slopes, which makes it easier to detect treatment effects that are applied between individuals, hence reducing type II errors as well.
Abstract: Mixed-effect models are frequently used to control for the nonindependence of data points, for example, when repeated measures from the same individuals are available. The aim of these models is often to estimate fixed effects and to test their significance. This is usually done by including random intercepts, that is, intercepts that are allowed to vary between individuals. The widespread belief is that this controls for all types of pseudoreplication within individuals. Here we show that this is not the case, if the aim is to estimate effects that vary within individuals and individuals differ in their response to these effects. In these cases, random intercept models give overconfident estimates leading to conclusions that are not supported by the data. By allowing individuals to differ in the slopes of their responses, it is possible to account for the nonindependence of data points that pseudoreplicate slope information. Such random slope models give appropriate standard errors and are easily implemented in standard statistical software. Because random slope models are not always used where they are essential, we suspect that many published findings have too narrow confidence intervals and a substantially inflated type I error rate. Besides reducing type I errors, random slope models have the potential to reduce residual variance by accounting for between-individual variation in slopes, which makes it easier to detect treatment effects that are applied between individuals, hence reducing type II errors as well.

744 citations

Performance
Metrics
No. of papers from the Journal in previous years
YearPapers
202371
2022132
2021178
2020141
2019242
2018199