scispace - formally typeset
Search or ask a question

Showing papers in "Psychological Methods in 2003"


Journal ArticleDOI
TL;DR: In this article, the authors provide formulas for computing generalized eta and omega squared statistics, which provide estimates of effect size that are comparable across a variet yo f research designs, but do not consider the effect that design features of the study have on the size of these statistics.
Abstract: The editorial policies of several prominent educational and psychological journals require that researchers report some measure of effect size along with tests for statistical significance. In analysis of variance contexts, this requirement might be met by using eta squared or omega squared statistics. Current procedures for computing these measures of effect often do not consider the effect that design features of the study have on the size of these statistics. Because research-design features can have a large effect on the estimated proportion of explained variance, the use of partial eta or omega squared can be misleading. The present article provides formulas for computing generalized eta and omega squared statistics, which provide estimates of effect size that are comparable across a variet yo f research designs. It is often argued that researchers can enhance the presentation of their research findings by including an effect-size measure along with a test of statistical significance. An effect-size measure is a standardized index and estimates a parameter that is independent of sample size and quantifies the magnitude of the difference between populations or the relationship between explanatory and response variables. Two broad

1,281 citations


Journal ArticleDOI
TL;DR: It is demonstrated that multiple trajectory classes can be estimated and appear optimal for nonnormal data even when only 1 group exists in the population.
Abstract: Growth mixture models are often used to determine if subgroups exist within the population that follow qualitatively distinct developmental trajectories. However, statistical theory developed for finite normal mixture models suggests that latent trajectory classes can be estimated even in the absence of population heterogeneity if the distribution of the repeated measures is nonnormal. By drawing on this theory, this article demonstrates that multiple trajectory classes can be estimated and appear optimal for nonnormal data even when only 1 group exists in the population. Further, the within-class parameter estimates obtained from these models are largely uninterpretable. Significant predictive relationships may be obscured or spurious relationships identified. The implications of these results for applied research are highlighted, and future directions for quantitative developments are suggested.

922 citations


Journal ArticleDOI
TL;DR: The bootstrap is used to assess the stability of dominance results across repeated sampling, and it is shown that these methods provide the researcher with more insights into the pattern of importance in a set of predictors than were previously available.
Abstract: A general method is presented for comparing the relative importance of predictors in multiple regression. Dominance analysis (D. V. Budescu, 1993), a procedure that is based on an examination of the R2 values for all possible subset models, is refined and extended by introducing several quantitative measures of dominance that differ in the strictness of the dominance definition. These are shown to be intuitive, meaningful, and informative measures that can address a variety of research questions pertaining to predictor importance. The bootstrap is used to assess the stability of dominance results across repeated sampling, and it is shown that these methods provide the researcher with more insights into the pattern of importance in a set of predictors than were previously available.

730 citations


Journal ArticleDOI
TL;DR: A model in which there is mediation at the lower level and the mediational links vary randomly across upper level units is discussed, and an ad hoc method that is illustrated with real and simulated data is developed.
Abstract: Multilevel models are increasingly used to estimate models for hierarchical and repeated measures data. The authors discuss a model in which there is mediation at the lower level and the mediational links vary randomly across upper level units. One repeated measures example is a case in which a person's daily stressors affect his or her coping efforts, which affect his or her mood, and both links vary randomly across persons. Where there is mediation at the lower level and the mediational links vary randomly across upper level units, the formulas for the indirect effect and its standard error must be modified to include the covariance between the random effects. Because no standard method can estimate such a model, the authors developed an ad hoc method that is illustrated with real and simulated data. Limitations of this method and characteristics of an ideal method are discussed.

621 citations


Journal ArticleDOI
TL;DR: Bauer and Curran as discussed by the authors discuss the D. J. Bauer and P. Curran (2003) investigation of growth mixture modeling and compare single-class modeling of nonnormal outcomes with modeling with multiple latent trajectory classes.
Abstract: This commentary discusses the D. J. Bauer and P. J. Curran (2003) investigation of growth mixture modeling. Single-class modeling of nonnormal outcomes is compared with modeling with multiple latent trajectory classes. New statistical tests of multiple-class models are discussed. Principles for substantive investigation of growth mixture model results are presented and illustrated by an example of high school dropout predicted by low mathematics achievement development in Grades 7-10.

608 citations


Journal ArticleDOI
TL;DR: The performance in terms of bias and sampling variance of 7 different effect-size indices for estimating the population standardized mean difference from a 2 x 2 table is examined by Monte Carlo simulation, assuming normal and nonnormal distributions.
Abstract: It is very common to find meta-analyses in which some of the studies compare 2 groups on continuous dependent variables and others compare groups on dichotomized variables. Integrating all of them in a meta-analysis requires an effect-size index in the same metric that can be applied to both types of outcomes. In this article, the performance in terms of bias and sampling variance of 7 different effect-size indices for estimating the population standardized mean difference from a 2 x 2 table is examined by Monte Carlo simulation, assuming normal and nonnormal distributions. The results show good performance for 2 indices, one based on the probit transformation and the other based on the logistic distribution.

480 citations


Journal ArticleDOI
TL;DR: Some of the more fundamental problems with conventional methods based on means are reviewed; some indication of why recent advances, based on robust measures of location, have practical value are provided; and why modern investigations dealing with nonnormality find practical problems when comparing means.
Abstract: Various statistical methods, developed after 1970, offer the opportunity to substantially improve upon the power and accuracy of the conventional t test and analysis of variance methods for a wide range of commonly occurring situations. The authors briefly review some of the more fundamental problems with conventional methods based on means; provide some indication of why recent advances, based on robust measures of location (or central tendency), have practical value; and describe why modern investigations dealing with nonnormality find practical problems when comparing means, in contrast to earlier studies. Some suggestions are made about how to proceed when using modern methods.

436 citations


Journal ArticleDOI
TL;DR: This article demonstrates that fixed-effects meta-analysis increases statistical power by reducing the standard error of the weighted average effect size (T.) and, in so doing, shrinks the confidence interval around T.
Abstract: One of the most frequently cited reasons for conducting a meta-analysis is the increase in statistical power that it affords a reviewer. This article demonstrates that fixed-effects meta-analysis increases statistical power by reducing the standard error of the weighted average effect size (T.) and, in so doing, shrinks the confidence interval around T.. Small confidence intervals make it more likely for reviewers to detect nonzero population effects, thereby increasing statistical power. Smaller confidence intervals also represent increased precision of the estimated population effect size. Computational examples are provided for 3 effect-size indices: d (standardized mean difference), Pearson's r, and odds ratios. Random-effects meta-analyses also may show increased statistical power and a smaller standard error of the weighted average effect size. However, the authors demonstrate that increasing the number of studies in a random-effects meta-analysis does not always increase statistical power.

422 citations


Journal ArticleDOI
TL;DR: The authors explain how to formulate an acceptable, modified null model, predict changes in fit index values accompanying its use, provide examples illustrating effects on fit indexvalues when using such a model, and discuss implications for theory and practice of structural equation modeling.
Abstract: In structural equation modeling, incremental fit indices are based on the comparison of the fit of a substantive model to that of a null model. The standard null model yields unconstrained estimates of the variance (and mean, if included) of each manifest variable. For many models, however, the standard null model is an improper comparison model. In these cases, incremental fit index values reported automatically by structural modeling software have no interpretation and should be disregarded. The authors explain how to formulate an acceptable, modified null model, predict changes in fit index values accompanying its use, provide examples illustrating effects on fit index values when using such a model, and discuss implications for theory and practice of structural equation modeling.

345 citations


Journal ArticleDOI
TL;DR: A simple effect size estimate (obtained from the sample size, N, and a p value) that can be used in meta-analytic research where only sample sizes and p values have been reported by the original investigator, or where no generally accepted effect size estimates exists.
Abstract: The purpose of this article is to propose a simple effect size estimate (obtained from the sample size, N, and a p value) that can be used (a) in meta-analytic research where only sample sizes and p values have been reported by the original investigator, (b) where no generally accepted effect size estimate exists, or (c) where directly computed effect size estimates are likely to be misleading. This effect size estimate is called r(equivalent) because it equals the sample point-biserial correlation between the treatment indicator and an exactly normally distributed outcome in a two-treatment experiment with N/2 units in each group and the obtained p value. As part of placing r(equivalent) into a broader context, the authors also address limitations of r(equivalent).

338 citations


Journal ArticleDOI
TL;DR: The authors conclude that transient error exists in all 3 trait domains and is especially large in the domain of affective traits.
Abstract: On the basis of an empirical study of measures of constructs from the cognitive domain, the personality domain, and the domain of affective traits, the authors of this study examine the implications of transient measurement error for the measurement of frequently studied individual differences variables. The authors clarify relevant reliability concepts as they relate to transient error and present a procedure for estimating the coefficient of equivalence and stability (L. J. Cronbach, 1947), the only classical reliability coefficient that assesses all 3 major sources of measurement error (random response, transient, and specific factor errors). The authors conclude that transient error exists in all 3 trait domains and is especially large in the domain of affective traits. Their findings indicate that the nearly universal use of the coefficient of equivalence (Cronbach's alpha; L. J. Cronbach, 1951), which fails to assess transient error, leads to overestimates of reliability and undercorrections for biases due to measurement error.

Journal ArticleDOI
TL;DR: An approach to sample size planning for multiple regression is presented that emphasizes accuracy in parameter estimation (AIPE) by providing necessary sample sizes in order for the likely widths of confidence intervals to be sufficiently narrow.
Abstract: An approach to sample size planning for multiple regression is presented that emphasizes accuracy in parameter estimation (AIPE). The AIPE approach yields precise estimates of population parameters by providing necessary sample sizes in order for the likely widths of confidence intervals to be sufficiently narrow. One AIPE method yields a sample size such that the expected width of the confidence interval around the standardized population regression coefficient is equal to the width specified. An enhanced formulation ensures, with some stipulated probability, that the width of the confidence interval will be no larger than the width specified. Issues involving standardized regression coefficients and random predictors are discussed, as are the philosophical differences between AIPE and the power analytic approaches to sample size planning.

Journal ArticleDOI
TL;DR: An overview of several models of confirmatory factor analysis for analyzing multitrait-multimethod (MTMM) data and a discussion of their advantages and limitations are provided.
Abstract: An overview of several models of confirmatory factor analysis for analyzing multitrait-multimethod (MTMM) data and a discussion of their advantages and limitations are provided. A new class of multi-indicator MTMM models combines several strengths and avoids a number of serious shortcomings inherent in previously developed MTMM models. The new models enable researchers to specify and to test trait-specific-method effects. The trait and method concepts composing these models are explained in detail and are contrasted with those of previously developed MTMM models for multiple indicators. The definitions of the models are explained step by step, and a practical empirical application of the models to the measurement of 3 traits x 3 methods is used to demonstrate their advantages and limitations.

Journal ArticleDOI
TL;DR: The 2-step approach using EM consistently yielded the most accurate reliability estimates and produced coverage rates close to the advertised 95% rate.
Abstract: A 2-step approach for obtaining internal consistency reliability estimates with item-level missing data is outlined. In the 1st step, a covariance matrix and mean vector are obtained using the expectation maximization (EM) algorithm. In the 2nd step, reliability analyses are carried out in the usual fashion using the EM covariance matrix as input. A Monte Carlo simulation examined the impact of 6 variables (scale length, response categories, item correlations, sample size, missing data, and missing data technique) on 3 different outcomes: estimation bias, mean errors, and confidence interval coverage. The 2-step approach using EM consistently yielded the most accurate reliability estimates and produced coverage rates close to the advertised 95% rate. An easy method of implementing the procedure is outlined.

Journal ArticleDOI
TL;DR: The results suggest the need for some strategy to study the local optima problem for a specific data set or to identify methods for finding "good" starting values that might lead to the best solutions possible.
Abstract: The popular K-means clustering method, as implemented in 3 commercial software packages (SPSS, SYSTAT, and SAS), generally provides solutions that are only locally optimal for a given set of data. Because none of these commercial implementations offer a reasonable mechanism to begin the K-means method at alternative starting points, separate routines were written within the MATLAB (Math-Works, 1999) environment that can be initialized randomly (these routines are provided at the end of the online version of this article in the PsycARTICLES database). Through the analysis of 2 empirical data sets and 810 simulated data sets, it is shown that the results provided by commercial packages are most likely locally optimal. These results suggest the need for some strategy to study the local optima problem for a specific data set or to identify methods for finding "good" starting values that might lead to the best solutions possible.

Journal ArticleDOI
TL;DR: It is shown how a variety of IRT models can be formulated as particular instances of nonlinear mixed models, and the unifying framework offers the advantage that relations between different I RT models become explicit and that it is rather straightforward to see how existing IRT Models can be adapted and extended.
Abstract: Mixed models take the dependency between observations based on the same cluster into account by introducing 1 or more random effects Common item response theory (IRT) models introduce latent person variables to model the dependence between responses of the same participant Assuming a distribution for the latent variables, these IRT models are formally equivalent with nonlinear mixed models It is shown how a variety of IRT models can be formulated as particular instances of nonlinear mixed models The unifying framework offers the advantage that relations between different IRT models become explicit and that it is rather straightforward to see how existing IRT models can be adapted and extended The approach is illustrated with a self-report study on anger

Journal ArticleDOI
TL;DR: The authors detail the formal similarity between uni- and bidimensional regression, provide computational methods and a new index of spatial distortion, outline the advantages of bid dimensional regression over other techniques, and provide guidelines for its use.
Abstract: Bidimensional regression is a method for comparing the degree of resemblance between 2 planar configurations of points and, more generally, for assessing the nature of the geometry (Euclidean and non-Euclidean) between 2-dimensional independent and dependent variables. For example, it can assess the similarity between location estimates from different tasks or participant groups, measure the fidelity between cognitive maps and actual locations, and provide parameters for psychological process models. The authors detail the formal similarity between uni- and bidimensional regression, provide computational methods and a new index of spatial distortion, outline the advantages of bidimensional regression over other techniques, and provide guidelines for its use. The authors conclude by describing substantive areas in psychology for which the method would be appropriate and uniquely illuminating.

Journal ArticleDOI
TL;DR: Bauer and Curran as mentioned in this paper argued that the acceptance of a model that fundamentally misrepresents the underlying data structure may be less useful in pursuit of the goal of explanation, and even if these 2 possibilities cannot be distinguished, a growth mixture model may still provide useful insights into the data.
Abstract: The comments on D. J. Bauer and P. J. Curran (2003) share 2 common themes. The 1st theme is that model-checking procedures may be capable of distinguishing between mixtures of normal and homogeneous nonnormal distributions. Although useful for assessing model quality, it is argued here that currently available procedures may not always help discern between these 2 possibilities. The 2nd theme is that even if these 2 possibilities cannot be distinguished, a growth mixture model may still provide useful insights into the data. It is argued here that whereas this may be true for the scientific goals of description and prediction, the acceptance of a model that fundamentally misrepresents the underlying data structure may be less useful in pursuit of the goal of explanation.

Journal ArticleDOI
TL;DR: Findings suggest that the direction of scoring can critically affect an item response theory analysis and argue that the large lower asymptote parameters are attributable to item-content ambiguity possibly caused by item-level multidimensionality.
Abstract: The authors compared the fit of the 2- and 3-parameter logistic models (2PLM; 3PLM) on 15 unidimensional factor scales derived from the Minnesota Multiphasic Personality Inventory--Adolescent item pool. Log-likelihood chi-square deviance tests indicated that a 3PLM provided an improved fit. However, residual statistics indicated that the difference in fit between the 2 models was negligible. An unexpected finding was that from 10% to 30% of the items had substantial lower asymptote parameters (c > or = .10) when the scales were scored in the pathology or nonpathology directions. The authors argue that the large lower asymptote parameters are attributable to item-content ambiguity possibly caused by item-level multidimensionality. These findings suggest that the direction of scoring can critically affect an item response theory analysis.

Journal ArticleDOI
TL;DR: This article presents a rationale for cumulating psychological effects in a raw metric and compares raw mean differences to standardized mean differences and statistical techniques for raw meta-analysis are described.
Abstract: This article discusses the meta-analysis of raw mean differences. It presents a rationale for cumulating psychological effects in a raw metric and compares raw mean differences to standardized mean differences. Some limitations of standardization are noted, and statistical techniques for raw meta-analysis are described. These include a graphical device for decomposing effect sizes. Several illustrative data sets are analyzed.

Journal ArticleDOI
TL;DR: D. Bauer and P. Curran (2003) cautioned that results obtained from growth mixture models may sometimes be inaccurate, and showed that this can occur when the variables in the population have a nonnormal distribution.
Abstract: D. J. Bauer and P. J. Curran (2003) cautioned that results obtained from growth mixture models may sometimes be inaccurate. The problem they addressed occurs when a growth mixture model is applied to a single, general population of individuals but findings incorrectly support the conclusion that there are 2 subpopulations. In an artificial sampling experiment, they showed that this can occur when the variables in the population have a nonnormal distribution. A realistic perspective is that although a healthy skepticism to complex statistical results is appropriate, there are no true models to discover. Consequently, the issue of model misspecification is irrelevant in practical terms. The purpose of a mathematical model is to summarize data, to formalize the dynamics of a behavioral process, and to make predictions. All of this is scientifically valuable and can be accomplished with a carefully developed model, even though the model is false.

Journal ArticleDOI
TL;DR: Results showed that different types of person-fit statistics can be used to detect different kinds of person misfit, and Parametric person-fitting statistics had more power than nonparametric person -fit statistics.
Abstract: Person-fit statistics have been proposed to investigate the fit of an item score pattern to an item response theory (IRT) model. The author investigated how these statistics can be used to detect different types of misfit. Intelligence test data were analyzed using person-fit statistics in the context of the G. Rasch (1960) model and R. J. Mokken’s (1971, 1997) IRT models. The effect of the choice of an IRT model to detect misfitting item score patterns and the usefulness of person-fit statistics for diagnosis of misfit are discussed. Results showed that different types of person-fit statistics can be used to detect different kinds of person misfit. Parametric person-fit statistics had more power than nonparametric person-fit statistics.

Journal ArticleDOI
TL;DR: Using an experimental method in which the cognitive sets of raters were manipulated as dimensional versus categorical, it is demonstrated that pseudotaxonicity can be created readily with rating scale measures, suggesting that researchers avoid an exclusive reliance on rating scales when conducting taxometrics investigations.
Abstract: Taxometric procedures such as mean above minus below a cut and maximum covariance can determine whether a trait is distributed as a discrete latent class. These methods have been used to infer taxonic structure in several personality and psychopathology constructs, often from analyses of rating scale data. This is problematic given (a) well established biases in ratings, (b) the human tendency to think categorically, and (c) implicit typological models of personality and psychopathology among expert raters. Using an experimental method in which the cognitive sets of raters were manipulated as dimensional versus categorical, it is demonstrated that pseudotaxonicity can be created readily with rating scale measures. This suggests that researchers avoid an exclusive reliance on rating scales when conducting taxometrics investigations.

Journal ArticleDOI
TL;DR: A true-score model is presented that incorporates transient errors for test-retest data, and a reliability estimate is derived that is less than coefficient alpha if transient error is present and is less susceptible to effects due to item recall than a test- retest correlation.
Abstract: Transient errors are caused by variations in feelings, moods, and mental states over time. If these errors are present, coefficient alpha is an inflated estimate of reliability. A true-score model is presented that incorporates transient errors for test-retest data, and a reliability estimate is derived. This estimate, referred to as the test-retest alpha, is less than coefficient alpha if transient error is present and is less susceptible to effects due to item recall than a test-retest correlation. An assumption underlying the test-retest alpha is essential tau equivalency of items. A test-retest split-half coefficient is presented as an alternative to the test-retest alpha when this assumption is violated. The test-retest alpha is the mean of all possible test-retest split-half coefficients.

Journal ArticleDOI
TL;DR: Many useful lessons can be learned from the inquiry into mixture models of growth curves about what a mixture distribution looks like, the meaning of the term homogeneous distribution, the importance of model checking, and advantages and disadvantages of using mixtures and similar procedures to approximate complicated distributions.
Abstract: D. J. Bauer and P. J. Curran (2003) raised some interesting issues with respect to mixture models of growth curves. Many useful lessons can be learned from their work, and more can be learned by extending the inquiry in related directions. These lessons involve the following issues: (a) what a mixture distribution looks like, (b) the meaning of the term homogeneous distribution, (c) the importance of model checking, (d) advantages and disadvantages of using mixtures and similar procedures to approximate complicated distributions, and (e) intrinsic versus nonintrinsic transformability.

Journal ArticleDOI
TL;DR: If providers are considered fixed, conclusions about the treatment must be conditioned on the specific providers in the study, and it is shown that in this case generalizing beyond these providers incurs inflated Type I error rates.
Abstract: University of ArizonaIn their criticism of B. E. Wampold and R. C. Serlin’s (2000) analysis of treatmenteffects in nested designs, M. Siemer and J. Joormann (2003) argued that providersof services should be considered a fixed factor because typically providers areneither randomly selected from a population of providers nor randomly assigned totreatments, and statistical power to detect treatment effects is greater in the fixedthan in the mixed model. The authors of the present article argue that if providersare considered fixed, conclusions about the treatment must be conditionedonthespecific providers in the study, and they show that in this case generalizing beyondthese providers incurs inflated Type I error rates.The modern age has been characterized by a Prometheanspirit, a restless energy that preys on speed records andshortcuts, unmindful of the past, uncaring of the future,existing only for the moment and the quick fix. (Rifkin,1987, p. 12)

Journal ArticleDOI
TL;DR: A contrast analysis framework that integrates analysis of 1-pattern and multiple-pattern hypotheses and accommodates 1 group or multiple groups of participants is presented.
Abstract: Contrast analysis of repeated-measures data generally focuses on hypotheses when only 1 pattern of results is of theoretical interest. This article articulates a framework for contrast analysis in repeated-measures contexts in which researchers have hypotheses relevant to 1 potential pattern or multiple potential patterns of results. For example, a researcher might ask whether participants exhibit a pattern of (a) immediate symptom reduction or (b) delayed symptom reduction. Alternatively, the researcher might ask whether 2 or more groups exhibit 2 or more patterns to differing degrees. Building on the familiar logic and computational procedures for 1-pattern hypotheses, the authors present a contrast analysis framework that integrates analysis of 1-pattern and multiple-pattern hypotheses and accommodates 1 group or multiple groups of participants.

Journal ArticleDOI
TL;DR: Guidelines in terms of the ratio of standard deviations are proposed for choosing among Spearman-Brown, alpha, and Angoff-Feldt coefficients.
Abstract: When the reliability of test scores must be estimated by an internal consistency method, partition of the test into just 2 parts may be the only way to maintain content equivalence of the parts. If the parts are classically parallel, the Spearman-Brown formula may be validly used to estimate the reliability of total scores. If the parts differ in their standard deviations but are tau equivalent, Cronbach's alpha is appropriate. However, if the 2 parts are congeneric, that is, they are unequal in functional length or they comprise heterogeneous item types, a less well-known estimate, the Angoff-Feldt coefficient, is appropriate. Guidelines in terms of the ratio of standard deviations are proposed for choosing among Spearman-Brown, alpha, and Angoff-Feldt coefficients.

Journal ArticleDOI
TL;DR: The authors discuss circumstances under which the treatment of nested provider effects as fixed as opposed to random is appropriate and present 2 formulas for the correct estimation of effect sizes when nested factors are fixed.
Abstract: Ignoring a nested factor can influence the validity of statistical decisions about treatment effectiveness. Previous discussions have centered on consequences of ignoring nested factors versus treating them as random factors on Type I errors and measures of effect size (B. E. Wampold & R. C. Serlin, 2000). The authors (a) discuss circumstances under which the treatment of nested provider effects as fixed as opposed to random is appropriate; (b) present 2 formulas for the correct estimation of effect sizes when nested factors are fixed; (c) present the results of Monte Carlo simulations of the consequences of treating providers as fixed versus random on effect size estimates, Type I error rates, and power; and (d) discuss implications of mistaken considerations of provider effects for the study of differential treatment effects in psychotherapy research.

Journal ArticleDOI
TL;DR: The authors disagree with M. Siemer and J. Joormann's assertion that therapist should be a fixed effect in psychotherapy treatment outcome studies and suggest that treatment is properly standardized and therapist effects can be examined in preliminary tests and the therapist term deleted from analyses if such differences approach zero.
Abstract: The authors disagree with M. Siemer and J. Joormann's assertion that therapist should be a fixed effect in psychotherapy treatment outcome studies. If treatment is properly standardized, therapist effects can be examined in preliminary tests and the therapist term deleted from analyses if such differences approach zero. If therapist effects are anticipated and either cannot be minimized through standardization or are specifically of interest because of the nature of the research question, the study has to be planned with adequate statistical power for including therapist as a random term. Simulation studies conducted by Siemer and Joormann confounded bias due to small sample size and inconsistent estimates.