scispace - formally typeset
Search or ask a question

Showing papers in "Psychological Methods in 2001"


Journal ArticleDOI
TL;DR: A simulation was presented to assess the potential costs and benefits of a restrictive strategy, which makes minimal use of auxiliary variables, versus an inclusive strategy,Which shows that the inclusive strategy is to be greatly preferred.
Abstract: Two classes of modern missing data procedures, maximum likelihood (ML) and multiple imputation (MI), tend to yield similar results when implemented in comparable ways. In either approach, it is possible to include auxiliary variables solely for the purpose of improving the missing data procedure. A simulation was presented to assess the potential costs and benefits of a restrictive strategy, which makes minimal use of auxiliary variables, versus an inclusive strategy, which makes liberal use of such variables. The simulation showed that the inclusive strategy is to be greatly preferred. With an inclusive strategy not only is there a reduced chance of inadvertently omitting an important cause of missingness, there is also the possibility of noticeable gains in terms of increased efficiency and reduced bias, with only minor costs. As implemented in currently available software, the ML approach tends to encourage the use of a restrictive strategy, whereas the MI approach makes it relatively simple to use an inclusive strategy.

2,153 citations


Journal ArticleDOI
TL;DR: The authors present an analytic approach to mediation and moderation issues using ordinary least squares estimation in the case in which the treatment varies within participants.
Abstract: Analyses designed to detect mediation and moderation of treatment effects are increasingly prevalent in research in psychology. The mediation question concerns the processes that produce a treatment effect. The moderation question concerns factors that affect the magnitude of that effect. Although analytic procedures have been reasonably well worked out in the case in which the treatment varies between participants, no systematic procedures for examining mediation and moderation have been developed in the case in which the treatment varies within participants. The authors present an analytic approach to these issues using ordinary least squares estimation.

800 citations


Journal ArticleDOI
TL;DR: Two Monte Carlo simulations are presented that compare the efficacy of the Hedges and colleagues, Rosenthal-Rubin, and Hunter-Schmidt methods for combining correlation coefficients for cases in which population effect sizes were both fixed and variable.
Abstract: The efficacy of the Hedges and colleagues, Rosenthal-Rubin, and Hunter-Schmidt methods for combining correlation coefficients was tested for cases in which population effect sizes were both fixed and variable. After a brief tutorial on these meta-analytic methods, the author presents two Monte Carlo simulations that compare these methods for cases in which the number of studies in the meta-analysis and the average sample size of studies were varied. In the fixed case the methods produced comparable estimates of the average effect size; however, the HunterSchmidt method failed to control the Type I error rate for the associated significance tests. In the variable case, for both the Hedges and colleagues and HunterSchmidt methods, Type I error rates were not controlled for meta-analyses including 15 or fewer studies and the probability of detecting small effects was less than .3. Some practical recommendations are made about the use of meta-analysis .

677 citations


Journal ArticleDOI
TL;DR: A group-based method to jointly estimate developmental trajectories of 2 distinct but theoretically related measurement series that will aid the analysis of comorbidity and heterotypic continuity is presented.
Abstract: This article presents a group-based method to jointly estimate developmental trajectories of 2 distinct but theoretically related measurement series. The method will aid the analysis of comorbidity and heterotypic continuity. Three key outputs of the model are (a) for both measurement series, the form of the trajectory of distinctive subpopulations; (b) the probability of membership in each such trajectory group; and (c) the joint probability of membership in trajectory groups across behaviors. This final output offers 2 novel features. First, the joint probabilities can characterize the linkage in the developmental course of distinct but related behaviors. Second, the joint probabilities can measure differences within the population in the magnitude of this linkage. Two examples are presented to illustrate the application of the method.

670 citations


Journal ArticleDOI
TL;DR: In this paper, the power of fixed-and random-effects tests of the mean effect size, tests for heterogeneity (or variation) of effect size parameters across studies, and tests for contrasts among effect sizes of different studies are discussed.
Abstract: Calculations of the power of statistical tests are important in planning research studies (including meta-analyses) and in interpreting situations in which a result has not proven to be statistically significant. The authors describe procedures to compute statistical power of fixed- and random-effects tests of the mean effect size, tests for heterogeneity (or variation) of effect size parameters across studies, and tests for contrasts among effect sizes of different studies. Examples are given using 2 published meta-analyses. The examples illustrate that statistical power is not always high in meta-analysis.

653 citations


Journal ArticleDOI
TL;DR: Simulation results suggest that recently developed correctives for missing data can mitigate problems that stem from nonnormal data.
Abstract: A Monte Carlo simulation examined full information maximum-likelihood estimation (FIML) in structural equation models with nonnormal indicator variables. The impacts of 4 independent variables were examined (missing data algorithm, missing data rate, sample size, and distribution shape) on 4 outcome measures (parameter estimate bias, parameter estimate efficiency, standard error coverage, and model rejection rates). Across missing completely at random and missing at random patterns, FIML parameter estimates involved less bias and were generally more efficient than those of ad hoc missing data techniques. However, similar to complete-data maximum-likelihood estimation in structural equation modeling, standard errors were negatively biased and model rejection rates were inflated. Simulation results suggest that recently developed correctives for missing data (e.g., rescaled statistics and the bootstrap) can mitigate problems that stem from nonnormal data.

632 citations


Journal ArticleDOI
TL;DR: A review of the history and nature of factor score indeterminacy can be found in this paper, where computer programs for assessing the degree of determinacy in a given analysis, as well as for computing and evaluating different types of factor scores, are presented and demonstrated using data from the Wechsler Intelligence Scale for Children-Third Edition.
Abstract: A variety of methods for computing factor scores can be found in the psychological literature. These methods grew out of a historic debate regarding the indeterminate nature of the common factor model. Unfortunately, most researchers are unaware of the indeterminacy issue and the problems associated with a number of the factor scoring procedures. This article reviews the history and nature of factor score indeterminacy. Novel computer programs for assessing the degree of indeterminacy in a given analysis, as well as for computing and evaluating different types of factor scores, are then presented and demonstrated using data from the Wechsler Intelligence Scale for Children-Third Edition. It is argued that factor score indeterminacy should be routinely assessed and reported as part of any exploratory factor analysis and that factor scores should be thoroughly evaluated before they are reported or used in subsequent statistical analyses.

528 citations


Journal ArticleDOI
TL;DR: The idea behind MI, the advantages of MI over existing techniques for addressing missing data, how to do MI for real problems, the software available to implement MI, and the results of a simulation study aimed at finding out how assumptions regarding the imputation model affect the parameter estimates provided by MI are discussed.
Abstract: This article provides a comprehensive review of multiple imputation (MI), a technique for analyzing data sets with missing values. Formally, MI is the process of replacing each missing data point with a set of m > 1 plausible values to generate m complete data sets. These complete data sets are then analyzed by standard statistical software, and the results combined, to give parameter estimates and standard errors that take into account the uncertainty due to the missing data values. This article introduces the idea behind MI, discusses the advantages of MI over existing techniques for addressing missing data, describes how to do MI for real problems, reviews the software available to implement MI, and discusses the results of a simulation study aimed at finding out how assumptions regarding the imputation model affect the parameter estimates provided by MI.

504 citations


Journal ArticleDOI
TL;DR: The authors consider how choices in the duration of the study, frequency of observation, and number of participants affect statistical power and show that power depends on a standardized effect size, the sample size, and a person-specific reliability coefficient.
Abstract: Consider a study in which 2 groups are followed over time to assess group differences in the average rate of change, rate of acceleration, or higher degree polynomial effect. In designing such a study, one must decide on the duration of the study, frequency of observation, and number of participants. The authors consider how these choices affect statistical power and show that power depends on a standardized effect size, the sample size, and a person-specific reliability coefficient. This reliability, in turn, depends on study duration and frequency. These relations enable researchers to weigh alternative designs with respect to feasibility and power. The authors illustrate the approach using data from published studies of antisocial thinking during adolescence and vocabulary growth during infancy.

438 citations


Journal ArticleDOI
TL;DR: An integrated, alternative inferential confidence interval approach to testing for statistical difference, equivalence, and indeterminacy that is algebraically equivalent to standard NHST procedures and therefore exacts the same evidential standard.
Abstract: Null hypothesis statistical testing (NHST) has been debated extensively but always successfully defended. The technical merits of NHST are not disputed in this article. The widespread misuse of NHST has created a human factors problem that this article intends to ameliorate. This article describes an integrated, alternative inferential confidence interval approach to testing for statistical difference, equivalence, and indeterminacy that is algebraically equivalent to standard NHST procedures and therefore exacts the same evidential standard. The combined numeric and graphic tests of statistical difference, equivalence, and indeterminacy are designed to avoid common interpretive problems associated with NHST procedures. Multiple comparisons, power, sample size, test reliability, effect size, and cause-effect ratio are discussed. A section on the proper interpretation of confidence intervals is followed by a decision rule summary and caveats.

370 citations


Journal ArticleDOI
TL;DR: In this article, a specific method associated with each form of phenomenological inquiry was used to analyze an interview transcript of a woman's experience of work-family role conflict, and a considerable degree of similarity was found in the resulting descriptions.
Abstract: Empirical phenomenology and hermeneutic phenomenology, the 2 most common approaches to phenomenological research in psychology, are described, and their similarities and differences examined. A specific method associated with each form of phenomenological inquiry was used to analyze an interview transcript of a woman's experience of work-family role conflict. A considerable degree of similarity was found in the resulting descriptions. It is argued that such convergence in analyses is due to the human capacities of reflection and intuition and the presence of intersubjective meanings. The similarity in the analyses is also encouraging about researchers' ability to reveal meaning despite the use of different methods and the difficulties associated with interpreting meaning.

Journal ArticleDOI
TL;DR: A synthesis of 319 meta-analyses of psychological, behavioral, and educational treatment research was conducted to assess the influence of study method on observed effect sizes relative to that of substantive features of the interventions, highlighting the difficulty of detecting treatment outcomes.
Abstract: A synthesis of 319 meta-analyses of psychological, behavioral, and educational treatment research was conducted to assess the influence of study method on observed effect sizes relative to that of substantive features of the interventions. An index was used to estimate the proportion of effect size variance associated with various study features. Study methods accounted for nearly as much variability in study outcomes as characteristics of the interventions. Type of research design and operationalization of the dependent variable were the method features associated with the largest proportion of variance. The variance as a result of sampling error was about as large as that associated with the features of the interventions studied. These results underscore the difficulty of detecting treatment outcomes, the importance of cautiously interpreting findings from a single study, and the importance of meta-analysis in summarizing results across studies.

Journal ArticleDOI
TL;DR: The authors' simulations show that a positive bias is caused by the truncation, but for large population values of rWG(J) it is negligible, and it is shown how the bootstrap method can be used for comparing the indices of 2 groups.
Abstract: L. R. James, R. G. Demaree, and G. Wolf (1984) introduced rWG(J) to estimate interrater agreement for a group. This index is calculated by comparing an observed group variance with an expected random variance. As researchers have gained experience using this index, several questions have arisen. What are the consequences of replacing values beyond the unit interval by 0? What is the dependence of rWG(J) on the group size? The authors' simulations show that a positive bias is caused by the truncation, but for large population values of rWG(J) it is negligible. Also, in this case, the group size has no effect on the expected value of rWG(J). For inference on rWG(J), researchers can exploit the availability of computers to simulate data from the hypothesized distribution and then compare the simulation results for rWG(J) with the actual values. In addition, it is shown how the bootstrap method can be used for comparing the indices of 2 groups.

Journal ArticleDOI
TL;DR: In this article, 3-mode principal components analysis is described at an elementary level and Guidance is given concerning the choices to be made in each step of the process of analyzing 3-way data by this technique.
Abstract: Three-way component analysis techniques are designed for descriptive analysis of 3-way data, for example, when data are collected on individuals, in different settings, and on different measures. Such techniques summarize all information in a 3-way data set by summarizing, for each way of the 3-way data set, the associated entities through a few components and describing the relations between these components. In this article, 3-mode principal components analysis is described at an elementary level. Guidance is given concerning the choices to be made in each step of the process of analyzing 3-way data by this technique. The complete process is illustrated with a detailed description of the analysis of an empirical 3-way data set.

Journal ArticleDOI
TL;DR: In this paper, the authors compared two multivariate statistical methods (logistic regression and signal detection) and evaluated their ability to identify subgroups at risk and found that signal detection may be more useful than logistic regression for designing distinct tailored interventions for subgroups of high-risk individuals.
Abstract: Identifying subgroups of high-risk individuals can lead to the development of tailored interventions for those subgroups. This study compared two multivariate statistical methods (logistic regression and signal detection) and evaluated their ability to identify subgroups at risk. The methods identified similar risk predictors and had similar predictive accuracy in exploratory and validation samples. However, the 2 methods did not classify individuals into the same subgroups. Within subgroups, logistic regression identified individuals that were homogeneous in outcome but heterogeneous in risk predictors. In contrast, signal detection identified individuals that were homogeneous in both outcome and risk predictors. Because of the ability to identify homogeneous subgroups, signal detection may be more useful than logistic regression for designing distinct tailored interventions for subgroups of high-risk individuals.

Journal ArticleDOI
TL;DR: In this paper, the authors suggest that parameters be tested for statistical significance through the likelihood ratio test, which is invariant to the identification choice, even though the identifications produce the same overall model fit.
Abstract: A problem with standard errors estimated by many structural equation modeling programs is described. In such programs, a parameter's standard error is sensitive to how the model is identified (i.e., how scale is set). Alternative but equivalent ways to identify a model may yield different standard errors, and hence different Z tests for a parameter, even though the identifications produce the same overall model fit. This lack of invariance due to model identification creates the possibility that different analysts may reach different conclusions about a parameter's significance level even though they test equivalent models on the same data. The authors suggest that parameters be tested for statistical significance through the likelihood ratio test, which is invariant to the identification choice.

Journal ArticleDOI
TL;DR: Estimation of the effect size parameter, D, the standardized difference between population means, is sensitive to heterogeneity of variance (heteroscedasticity), which seems to abound in psychological data, and various proposed solutions are reviewed, including measures that do not make these assumptions.
Abstract: Estimation of the effect size parameter, D, the standardized difference between population means, is sensitive to heterogeneity of variance (heteroscedasticity), which seems to abound in psychological data. Pooling s2s assumes homoscedasticity, as do methods for constructing a confidence interval for D, estimating D from t or analysis of variance results, formulas that adjust estimates for inflation by main effects or covariates, and the Q statistic. The common language effect size statistic as an estimate of Pr(X1 > X2), the probability that a randomly sampled member of Population 1 will outscore a randomly sampled member of Population 2, also assumes normality and homoscedasticity. Various proposed solutions are reviewed, including measures that do not make these assumptions, such as the probability of superiority estimate of Pr(X1 > X2). Ways to reconceptualize effect size when treatments may affect moments such as the variance are also discussed.

Journal ArticleDOI
TL;DR: Six different methods of computing factor scores were investigated in a simulation study, and results indicated that a simplified, unit-weighting procedure based on the factor score coefficients was generally superior to several unit- Weighting proceduresbased on the pattern or structure coefficients.
Abstract: Six different methods of computing factor scores were investigated in a simulation study. Population scores created from oblique factor patterns selected from the psychological literature served as the bases for the simulations, and the stability of the different methods was assessed through cross-validation in a subject-sampling model. Results from 5 evaluative criteria indicated that a simplified, unit-weighting procedure based on the factor score coefficients was generally superior to several unit-weighting procedures based on the pattern or structure coefficients. This simplified method of computing factor scores also compared favorably with an exact-weighting scheme based on the full factor score coefficient matrix. Results are discussed with regard to their potential impact on current practice, and several recommendations are offered.

Journal ArticleDOI
TL;DR: In this article, a parametric model for item interactions is introduced, and it is shown that ignoring a positive interaction results in an overestimation of the discrimination parameter in the two-parameter logistic model (2PLM), whereas ignoring a negative interaction leads to an underestimate of the parameter.
Abstract: Most item response theory models assume conditional independence, and it is known that interactions between items affect the estimated item discrimination. In this article, this effect is further investigated from a theoretical perspective and by means of simulation studies. To this end, a parametric model for item interactions is introduced. Next, it is shown that ignoring a positive interaction results in an overestimation of the discrimination parameter in the two-parameter logistic model (2PLM), whereas ignoring a negative interaction leads to an underestimation of the parameter. Furthermore, it is demonstrated that in some cases the item characteristic curves of the 2PLM and of an item involved in an interaction are quite similar, indicating that the 2PLM can provide a good fit to data with interactions.

Journal ArticleDOI
TL;DR: A weighted least squares (WLS) approach is recommended for 2-group studies, which is statistically accurate, is readily executed through popular software packages, and allows follow-up tests.
Abstract: Moderated multiple regression (MMR) arguably is the most popular statistical technique for investigating regression slope differences (interactions) across groups (e.g., aptitude-treatment interactions in training and differential test score-job performance prediction in selection testing). However, heterogeneous error variances can greatly bias the typical MMR analysis, and the conditions that cause heterogeneity are not uncommon. Statistical corrections that have been developed require special calculations and are not conducive to follow-up analyses that describe an interaction effect in depth. A weighted least squares (WLS) approach is recommended for 2-group studies. For 2-group studies, WLS is statistically accurate, is readily executed through popular software packages (e.g., SAS Institute, 1999; SPSS, 1999), and allows follow-up tests.

Journal ArticleDOI
TL;DR: In this paper, the authors investigated the link between the individual and group-level judgments by extending R. D. Luce's (1959) model, which was originally developed for individual choice behavior, to a mixed-effects paired comparison model.
Abstract: The method of paired comparisons belongs to a small group of techniques that provide explicit information about the consistency of individual and aggregated choices. This article investigates the link between the individual- and group-level judgments by extending R. D. Luce's (1959) model, which was originally developed for individual choice behavior, to a mixed-effects paired comparison model. It is shown that standard multilevel software for binary data can be used to estimate the model. The interpretation of the paired comparison parameters and statistical model tests are discussed in detail. An extensive analysis of an experimental study illustrates the usefulness of a hierarchical approach in modeling multiple pairwise judgments.


Journal ArticleDOI
TL;DR: The focus is on modeling the nonlinear relationship between treatment effects and baseline often observed in prevention programs designed for at-risk populations.
Abstract: Methods for assessing treatment effects of longitudinal randomized intervention are considered. The focus is on modeling the nonlinear relationship between treatment effects and baseline often observed in prevention programs designed for at-risk populations. Piecewise linear growth modeling was used to study treatment effects during the different periods of development. A multistep multiple-group analysis procedure is proposed for assessing treatment effects in the presence of nonlinear treatment-baseline interactions. Standard errors of the estimates from this multistep procedure were obtained using a bootstrap approach. The methods are illustrated using data from the Johns Hopkins Prevention Research Center involving an intervention aimed at improving classroom behavior.

Journal ArticleDOI
TL;DR: 3 articles in the special section of Psychological Methods are introduced, which consider multiple imputation and maximum-likelihood methods, new approaches to missing data that can often yield improved results.
Abstract: Traditional approaches to missing data (e.g., listwise deletion) can lead to less than optimal results in terms of bias, statistical power, or both. This article introduces the 3 articles in the special section of Psychological Methods, which consider multiple imputation and maximum-likelihood methods, new approaches to missing data that can often yield improved results. Computer software is now available to implement these new methods.

Journal ArticleDOI
TL;DR: In this paper, the authors investigated Type I error and power rates for a number of simultaneous and stepwise multiple comparison procedures using SAS (1999) PROC MIXED in unbalanced designs when normality and covariance homogeneity assumptions did not hold.
Abstract: One approach to the analysis of repeated measures data allows researchers to model the covariance structure of the data rather than presume a certain structure, as is the case with conventional univariate and multivariate test statistics. This mixed-model approach was evaluated for testing all possible pairwise differences among repeated measures marginal means in a Between-Subjects x Within-Subjects design. Specifically, the authors investigated Type I error and power rates for a number of simultaneous and stepwise multiple comparison procedures using SAS (1999) PROC MIXED in unbalanced designs when normality and covariance homogeneity assumptions did not hold. J. P. Shaffer's (1986) sequentially rejective step-down and Y. Hochberg's (1988) sequentially acceptive step-up Bonferroni procedures, based on an unstructured covariance structure, had superior Type I error control and power to detect true pairwise differences across the investigated conditions.

Journal ArticleDOI
TL;DR: This article develops a less conservative approach to local population inference, one based on the logic of B. S. Efron's (1979) nonparametric bootstrap, that based on randomization or permutation tests.
Abstract: A frequently used experimental design in psychological research randomly divides a set of available cases, a local population, between 2 treatments and then applies an independent-samples t test to either test a hypothesis about or estimate a confidence interval (CI) for the population mean difference in treatment response. C. S. Reichardt and H. F. Gollob (1999) established that the t test can be conservative for this design-yielding hypothesis test P values that are too large or CIs that are too wide for the relevant local population. This article develops a less conservative approach to local population inference, one based on the logic of B. Efron's (1979) nonparametric bootstrap. The resulting randomization bootstrap is then compared with an established approach to local population inference, that based on randomization or permutation tests. Finally, the importance of local population inference is established by reference to the distinction between statistical and scientific inference.

Journal ArticleDOI
TL;DR: The converse inequality argument as mentioned in this paper states that if we want P(H/D) where H and D represent hypothesis and data, respectively, we get P(D/H), and the 2 do not equal one another.
Abstract: Critics have put forth several arguments against the use of tests of statistical significance (TOSSes). Among these, the converse inequality argument stands out but remains sketchy, as does criticism of it. The argument states that we want P(H/D) (where H and D represent hypothesis and data, respectively), we get P(D/H), and the 2 do not equal one another. Each of the terms in 'P(D/H) not equal to P(H/D)' requires clarification. Furthermore, the argument as a whole allows for multiple interpretations. If the argument questions the logic of TOSSes, then defenses of TOSSes fall into 2 distinct types. Clarification and analysis of the argument suggest more moderate conclusions than previously offered by friends and critics of TOSSes. Furthermore, the general method of clarification through formalization may offer a way out of the current impasse.


Journal ArticleDOI
TL;DR: In this article, three approaches to the problem were considered: a test of homogeneity of variance, a comparison of control and experimental quartile frequencies by means of the chi-square test of association, and the cumulative percentage difference (CPD) curve.
Abstract: Bidirectional experimental effects cannot be demonstrated by comparing measures of central tendency. Three approaches to the problem were considered: a test of homogeneity of variance, a comparison of control and experimental quartile frequencies by means of the chi-square test of association, and the cumulative-percentage-difference (CPD) curve--a graphic tool for demonstrating bidirectional effects. Two tests of significance were developed based on the CPD curve's maximum and minimum values. All 3 tests were first put to use analyzing simulated data, which incorporated 6 different patterns of experimental effect. The results of 100 repetitions of the simulations are summarized. Next, the results of a lexical-decision experiment with partial-word preview were analyzed using the procedures considered. The report ends with a presentation of guidelines for the use of the bidirectional tests.