scispace - formally typeset
Search or ask a question

Showing papers in "Psychological Methods in 2002"


Journal ArticleDOI
TL;DR: 2 general approaches that come highly recommended: maximum likelihood (ML) and Bayesian multiple imputation (MI) are presented and may eventually extend the ML and MI methods that currently represent the state of the art.
Abstract: Statistical procedures for missing data have vastly improved, yet misconception and unsound practice still abound. The authors frame the missing-data problem, review methods, offer advice, and raise issues that remain unresolved. They clear up common misunderstandings regarding the missing at random (MAR) concept. They summarize the evidence against older procedures and, with few exceptions, discourage their use. They present, in both technical and practical language, 2 general approaches that come highly recommended: maximum likelihood (ML) and Bayesian multiple imputation (MI). Newer developments are discussed, including some for dealing with missing data that are not MAR. Although not yet in the mainstream, these procedures may eventually extend the ML and MI methods that currently represent the state of the art.

10,568 citations


Journal ArticleDOI
TL;DR: Efron and Tibshirani as discussed by the authors used bootstrap tests to assess mediation, finding that the sampling distribution of the mediated effect is skewed away from 0, and they argued that R. M. Kenny's (1986) recommendation of first testing the X --> Y association for statistical significance should not be a requirement when there is a priori belief that the effect size is small or suppression is a possibility.
Abstract: Mediation is said to occur when a causal effect of some variable X on an outcome Y is explained by some intervening variable M. The authors recommend that with small to moderate samples, bootstrap methods (B. Efron & R. Tibshirani, 1993) be used to assess mediation. Bootstrap tests are powerful because they detect that the sampling distribution of the mediated effect is skewed away from 0. They argue that R. M. Baron and D. A. Kenny's (1986) recommendation of first testing the X --> Y association for statistical significance should not be a requirement when there is a priori belief that the effect size is small or suppression is a possibility. Empirical examples and computer setups for bootstrap analyses are provided.

8,940 citations


Journal ArticleDOI
TL;DR: A Monte Carlo study compared 14 methods to test the statistical significance of the intervening variable effect and found two methods based on the distribution of the product and 2 difference-in-coefficients methods have the most accurate Type I error rates and greatest statistical power.
Abstract: A Monte Carlo study compared 14 methods to test the statistical significance of the intervening variable effect. An intervening variable (mediator) transmits the effect of an independent variable to a dependent variable. The commonly used R. M. Baron and D. A. Kenny (1986) approach has low statistical power. Two methods based on the distribution of the product and 2 difference-in-coefficients methods have the most accurate Type I error rates and greatest statistical power except in 1 important case in which Type I error rates are too high. The best balance of Type I error and statistical power across all cases is the test of the joint significance of the two effects comprising the intervening variable effect.

8,629 citations


Journal ArticleDOI
TL;DR: Principles for reporting analyses using structural equation modeling are reviewed, and it is recommended that every report give a detailed justification of the model used, along with plausible alternatives and an account of identifiability.
Abstract: Principles for reporting analyses using structural equation modeling are reviewed, with the goal of supplying readers with complete and accurate information. It is recommended that every report give a detailed justification of the model used, along with plausible alternatives and an account of identifiability. Nonnormality and missing data problems should also be addressed. A complete set of parameters and their standard errors is desirable, and it will often be convenient to supply the correlation matrix and discrepancies, as well as goodness-of-fit indices, so that readers can exercise independent critical judgment. A survey of fairly representative studies compares recent practice with the principles of reporting recommended here. Structural equation modeling (SEM), also known as path analysis with latent variables, is now a regularly used method for representing dependency (arguably “causal”) relations in multivariate data in the behavioral and social sciences. Following the seminal

3,834 citations


Journal ArticleDOI
TL;DR: The authors present the case that dichotomization is rarely defensible and often will yield misleading results.
Abstract: The authors examine the practice of dichotomization of quantitative measures, wherein relationships among variables are examined after 1 or more variables have been converted to dichotomous variables by splitting the sample at some point on the scale(s) of measurement. A common form of dichotomization is the median split, where the independent variable is split at the median to form high and low groups, which are then compared with respect to their means on the dependent variable. The consequences of dichotomization for measurement and statistical analyses are illustrated and discussed. The use of dichotomization in practice is described, and justifications that are offered for such usage are examined. The authors present the case that dichotomization is rarely defensible and often will yield misleading results. We consider here some simple statistical procedures for studying relationships of one or more independent variables to one dependent variable, where all variables are quantitative in nature and are measured on meaningful numerical scales. Such measures are often referred to as individual-differences measures, meaning that observed values of such measures are interpretable as reflecting individual differences on the attribute of interest. It is of course straightforward to analyze such data using correlational methods. In the case of a single independent variable, one can use simple linear regression and/or obtain a simple correlation coefficient. In the case of multiple independent variables, one can use multiple regression, possibly including interaction terms. Such methods are routinely used in practice. However, another approach to analysis of such data is also rather widely used. Considering the case of one independent variable, many investigators begin by converting that variable into a dichotomous variable by splitting the scale at some point and designating individuals above and below that point as defining

2,949 citations


Journal ArticleDOI
TL;DR: In this paper, a method for combining results across independent-groups and repeated measures designs is described, and the conditions under which such an analysis is appropriate are discussed, and a meta-analysis procedure using design-specific estimates of sampling variance is described.
Abstract: When a meta-analysis on results from experimental studies is conducted, differences in the study design must be taken into consideration. A method for combining results across independent-groups and repeated measures designs is described, and the conditions under which such an analysis is appropriate are discussed. Combining results across designs requires that (a) all effect sizes be transformed into a common metric, (b) effect sizes from each design estimate the same treatment effect, and (c) meta-analysis procedures use design-specific estimates of sampling variance to reflect the precision of the effect size estimates.

1,949 citations


Journal ArticleDOI
TL;DR: The authors propose a method to analyze the association between 2 variables when the assumption of stationarity may not be warranted, which results in estimates of both the strength of peak association and the time lag when the peak association occurred.
Abstract: Cross-correlation and most other longitudinal analyses assume that the association between 2 variables is stationary. Thus, a sample of occasions of measurement is expected to be representative of the association between variables regardless of the time of onset or number of occasions in the sample. The authors propose a method to analyze the association between 2 variables when the assumption of stationarity may not be warranted. The method results in estimates of both the strength of peak association and the time lag when the peak association occurred for a range of starting values of elapsed time from the beginning of an experiment.

312 citations


Journal ArticleDOI
TL;DR: A theoretical explanation for this phenomenon is provided using relationships between unique variances and eigenvalues of the fitted correlation matrix.
Abstract: Standard chi-square-based fit indices for factor analysis and related models have a little known property: They are more sensitive to misfit when unique variances are small than when they are large. Consequently, very small correlation residuals indicating excellent fit can be accompanied by indications of bad fit by the fit indices when unique variances are small. An empirical example of this incompatibility between residuals and fit indices is provided. For illustrative purposes, an artificial example is provided that yields exactly the same correlation residuals as the empirical example but has larger unique variances. For this example, the fit indices indicate excellent fit. A theoretical explanation for this phenomenon is provided using relationships between unique variances and eigenvalues of the fitted correlation matrix.

223 citations


Journal ArticleDOI
TL;DR: The correlated trait-correlated method (CT-CM) and correlated uniqueness (CU) confirmatory factor analysis models for multitrait-multimethod data are critiqued and the authors recommend that the CT-CM model be regarded as the generally preferred model and that the CU model be invoked only when the CT -CM model fails.
Abstract: The correlated trait-correlated method (CT-CM) and correlated uniqueness (CU) confirmatory factor analysis models for multitrait-multimethod data are critiqued. Although the CU model often returns convergent and admissible factor solutions when the CT-CM model does not, the CU model is shown to have theoretical and substantive shortcomings. On the basis of this critique, the authors recommend that the CT-CM model be regarded as the generally preferred model and that the CU model be invoked only when the CT-CM model fails.

198 citations


Journal ArticleDOI
TL;DR: A confidence interval for a general linear function of population medians, which can be used to test 2-sided directional hypotheses and finite interval hypotheses, is proposed and sample size formulas are given.
Abstract: When the distribution of the response variable is skewed, the population median may be a more meaningful measure of centrality than the population mean, and when the population distribution of the response variable has heavy tails, the sample median may be a more efficient estimator of centrality than the sample mean. The authors propose a confidence interval for a general linear function of population medians. Linear functions have many important special cases including pairwise comparisons, main effects, interaction effects, simple main effects, curvature, and slope. The confidence interval can be used to test 2-sided directional hypotheses and finite interval hypotheses. Sample size formulas are given for both interval estimation and hypothesis testing problems.

189 citations


Journal ArticleDOI
TL;DR: The results indicate that there exists a substantial amount of interest in the potential contribution of qualitative methods in major psychological journals, although this interest is not ubiquitous, well defined, or communicated.
Abstract: The acceptance of qualitative research in 15 journals published and distributed by the American Psychological Association (APA) was investigated. This investigation included a PsycINFO search using the keyword qualitative, an analysis of 15 APA journals for frequency of qualitative publication, a content analysis of the journal descriptions, and the results of qualitative interviews with 10 of the chief editors of those journals. The results indicate that there exists a substantial amount of interest in the potential contribution of qualitative methods in major psychological journals, although this interest is not ubiquitous, well defined, or communicated. These findings highlight the need for APA to state its position regarding the applicability of qualitative methods in the study of psychology.

Journal ArticleDOI
TL;DR: In this paper, a mixed-effects model for a response that displays identifiable regimes is reviewed and two examples are reviewed in detail, both of which can be estimated with software that is widely available.
Abstract: Behavior that develops in phases may exhibit distinctively different rates of change in one time period than in others. In this article, a mixed-effects model for a response that displays identifiable regimes is reviewed. An interesting component of the model is the change point. In substantive terms, the change point is the time when development switches from one phase to another. In a mixed-effects model, the change point can be a random coefficient. This possibility allows individuals to make the transition from one phase to another at different ages or after different lengths of time in treatment. Two examples are reviewed in detail, both of which can be estimated with software that is widely available.

Journal ArticleDOI
TL;DR: In this paper, the authors examined various factors that affect statistical power in randomized intervention studies with noncompliance and showed how statistical power changes depending on compliance rate, study design, outcome distributions, and covariate information.
Abstract: This study examined various factors that affect statistical power in randomized intervention studies with noncompliance. On the basis of Monte Carlo simulations, this study demonstrates how statistical power changes depending on compliance rate, study design, outcome distributions, and covariate information. It also examines how these factors influence power in different methods of estimating intervention effects. Intent-to-treat analysis and complier average causal effect estimation are compared as 2 alternative ways of estimating intervention effects under noncompliance. The results of this investigation provide practical implications in designing and evaluating intervention studies taking into account noncompliance.

Journal ArticleDOI
TL;DR: This paper reviewed some important lessons in design, analysis, and theory of field experiments emerging from the 20th century's field experiments, including the importance of ensuring that selection into experiments and assignment to conditions occurs properly, the need to attend to power and effect size, how to measure and take partial treatment implementation into account in analyses, modern analyses of quasi-experimental and multilevel data, Rubin's model, and the role of internal and external validity.
Abstract: Field experiments in the social sciences were increasingly used in the 20th century. This article briefly reviews some important lessons in design, analysis, and theory of field experiments emerging from that experience. Topics include the importance of ensuring that selection into experiments and assignment to conditions occurs properly, how to prevent and analyze attrition, the need to attend to power and effect size, how to measure and take partial treatment implementation into account in analyses, modern analyses of quasi-experimental and multilevel data, Rubin's model, and the role of internal and external validity. The article ends with observations on the computer revolution in methodology and statistics, convergences in theory and methods across disciplines, the need for an empirical program of methodological research, the key problem of selection bias, and the inevitability of increased specialization in field experimentation in the years to come.

Journal ArticleDOI
TL;DR: In the present Monte Carlo study, the classification efficiencies of MAXCOV and the k-means algorithm were compared across ranges of sample size, effect size, indicator number, taxon base rate, and within-groups covariance and it was found that when the impact of these parameters was minimized, k-Means classified more data points correctly thanMAXCOV.
Abstract: Maximum covariance (MAXCOV) is a method for determining whether a group of 3 or more indicators marks 1 continuous or 2 discrete latent distributions of individuals. Although the circumstances under which MAXCOV is effective in detecting latent taxa have been specified, its efficiency in classifying cases into groups has not been assessed, and few studies have compared its performance with that of cluster analysis. In the present Monte Carlo study, the classification efficiencies of MAXCOV and the k-means algorithm were compared across ranges of sample size, effect size, indicator number, taxon base rate, and within-groups covariance. When the impact of these parameters was minimized, k-means classified more data points correctly than MAXCOV. However, when the effects of all parameters were increased concurrently, MAXCOV outperformed k-means.

Journal ArticleDOI
TL;DR: A new approach for using path analysis to appraise the verisimilitude of theories is described, which corroborates a class of path diagrams by determining how well they predict intradata relations in comparison with other diagrams.
Abstract: A new approach for using path analysis to appraise the verisimilitude of theories is described. Rather than trying to test a model's truth (correctness), this method corroborates a class of path diagrams by determining how well they predict intradata relations in comparison with other diagrams. The observed correlation matrix is partitioned into disjoint sets. One set is used to estimate the model parameters, and a nonoverlapping set is used to assess the model's verisimilitude. Computer code was written to generate competing models and to test the conjectured model's superiority (relative to the generated set) using diagram combinatorics and is available on the Web (http://www.vanderbilt.edu/quantmetheval/downloads.htm).

Journal ArticleDOI
TL;DR: In a comparison of two treatments, if outcome scores are denoted by X in one condition and by Y in the other, stochastic equality is defined as P(X Y) as mentioned in this paper.
Abstract: In a comparison of 2 treatments, if outcome scores are denoted by X in 1 condition and by Y in the other, stochastic equality is defined as P(X Y). Tests of stochastic equality can be affected by characteristics of the distributions being compared, such as heterogeneity of variance. Thus, various robust tests of stochastic equality have been proposed and are evaluated here using a Monte Carlo study with sample sizes ranging from 10 to 30. Three robust tests are identified that perform well in Type I error rates and power except when extremely skewed data co-occur with very small n. When tests of stochastic equality might be preferred to tests of means is also considered.

Journal ArticleDOI
TL;DR: It is pointed out that in multiple testing with familywise error rate controlled at alpha, the directional error rate is greater than alpha/2 and can be arbitrarily close to alpha.
Abstract: L. V. Jones and J. W. Tukey (2000) pointed out that the usual 2-sided, equal-tails null hypothesis test at level alpha can be reinterpreted as simultaneous tests of 2 directional inequality hypotheses, each at level alpha/2, and that the maximum probability of a Type I error is alpha/2 if the truth of the null hypothesis is considered impossible. This article points out that in multiple testing with familywise error rate controlled at alpha, the directional error rate (assuming all null hypotheses are false) is greater than alpha/2 and can be arbitrarily close to alpha. Single-step, step-down, and step-up procedures are analyzed, and other error rates, including the false discovery rate, are discussed. Implications for confidence interval estimation and hypothesis testing practices are considered.

Journal ArticleDOI
TL;DR: In this article, a multiple regression based, pattern recognition procedure is described to identify a pattern of predictor scores associated with high scores on a criterion variable. And then, a second regression procedure is used to estimate the proportion of variation attributable to the criterion pattern.
Abstract: Along with examples involving vocational interests and mathematics achievement, the authors describe a multiple regression based, pattern recognition procedure that can be used to identify a pattern of predictor scores associated with high scores on a criterion variable. This pattern is called the criterion pattern. After the criterion pattern has been identified, a second regression procedure can be used to estimate the proportion of variation attributable to the criterion pattern. Cross-validation can then be used to estimate the variation attributable to a criterion pattern derived from regression weights estimated in another sample. Finally, issues of criterion pattern invariance and interpretation are discussed.

Journal ArticleDOI
TL;DR: The philosophy behind constraints on latent variable variances is examined and it is shown how their appropriate use is neither as straightforward nor as noncontroversial as portrayed in textbooks and computer manuals.
Abstract: In traditional approaches to structural equations modeling, variances of latent endogenous variables cannot be specified or constrained directly and, consequently, are not identified, unless certain precautions are taken. The usual method for achieving identification has been to fix one factor loading for each endogenous latent variable at unity. An alternative approach is to fix variances using newer constrained estimation algorithms. This article examines the philosophy behind such constraints and shows how their appropriate use is neither as straightforward nor as noncontroversial as portrayed in textbooks and computer manuals. The constraints on latent variable variances can interact with other model constraints to interfere with the testing of certain kinds of hypotheses and can yield incorrect standardized solutions with some popular software.

Journal ArticleDOI
TL;DR: In this paper, a series of hierarchical models are presented to test the stationarity of behavioral sequences, the homogeneity of sequences across a sample of episodes, and whether covariates can account for variation in sequences across the sample.
Abstract: The authors review the common methods for measuring strength of contingency between 2 behaviors in a behavioral sequence, the binomial z score and the adjusted cell residual, and point out a number of limitations of these approaches. They present a new approach using log odds ratios and empirical Bayes estimation in the context of hierarchical modeling, an approach not constrained by these limitations. A series of hierarchical models is presented to test the stationarity of behavioral sequences, the homogeneity of sequences across a sample of episodes, and whether covariates can account for variation in sequences across the sample. These models are applied to observational data taken from a study of the behavioral interactions of 254 couples to illustrate their use.

Journal ArticleDOI
TL;DR: An innovative method for theory appraisal called the delete one-add one (D1-A1) method, which assesses a relatively narrow range of causal implications, allows nonrecursive models, and is only norm referenced.
Abstract: Theories often place constraints on causal relationships, and such constraints are often assessed with causal models. Causal models should be recursive and just identified because cause is recursive and is more likely to be just identified than overidentified. A just-identified, recursive model (JIRM) is specified that satisfies both requirements and that can be used to assess a wide range of causal implications in either a norm-referenced or criterion-referenced manner. P. E. Meehl and N. G. Waller (2002) proposed an innovative method for theory appraisal called the delete one-add one (D1-A1) method, which assesses a relatively narrow range of causal implications, allows nonrecursive models, and is only norm referenced. The JIRM and D1-A1 methods are compared.

Journal ArticleDOI
TL;DR: In this article, the authors identify the statistically most powerful method of data analysis in the 3-wave intensive data analysis for straight-line growth models and show that adding a pretest as a covariate to a randomized posttest-only design increases statistical power.
Abstract: Adding a pretest as a covariate to a randomized posttest-only design increases statistical power, as does the addition of intermediate time points to a randomized pretest-posttest design. Although typically 5 waves of data are required in this instance to produce meaningful gains in power, a 3-wave intensive design allows the evaluation of the straight-line growth model and may reduce the effect of missing data. The authors identify the statistically most powerful method of data analysis in the 3-wave intensive design. If straight-line growth is assumed, the pretest-posttest slope must assume fairly extreme values for the intermediate time point to increase power beyond the standard analysis of covariance on the posttest with the pretest as covariate, ignoring the intermediate time point.

Journal ArticleDOI
TL;DR: A latent-class model of rater agreement is presented for which 1 of the model parameters can be interpreted as the proportion of systematic agreement.
Abstract: A latent-class model of rater agreement is presented for which 1 of the model parameters can be interpreted as the proportion of systematic agreement. The latent classes of the model emerge from the factorial combination of the "true" category in which a target belongs and the ease with which raters are able to classify targets into the true category. Several constrained cases of the model are described, and the relations to other well-known agreement models and kappa-type summary coefficients are explained. The differential quality of the rating categories can be assessed on the basis of the model fit. The model is illustrated using data from diagnoses of psychiatric disorders and classifications of individuals in a persuasive communication study.

Journal ArticleDOI
TL;DR: A Monte Carlo simulation was conducted to compare 9 pairwise multiple comparison procedures and found that the greatest all-pairs power was usually provided by 1 of 2 partition-based versions of E. Peritz's (1970) procedure.
Abstract: A Monte Carlo simulation was conducted to compare 9 pairwise multiple comparison procedures. Procedures were evaluated on the basis of any-pair power and all-pairs power. No procedure was found to be uniformly most powerful. A modification due to A. J. Hayter (1986) of Fisher's least significant difference was found to provide the best combination of ease of use and moderately high any-pair power in most cases. Pilot or exploratory studies can expect good power results with this relatively simple procedure. The greatest all-pairs power was usually provided by 1 of 2 partition-based versions of E. Peritz's (1970) procedure. Confirmatory studies will require such complex methods but may also need larger sample sizes than have been customary in psychological research.

Journal ArticleDOI
TL;DR: In this paper, the authors discuss the importance of subjecting causal models to severe (risky) tests and reemphasize the need to test the verisimilitude of causal models.
Abstract: P. E. Meehl and N. G. Waller (2002) described a novel approach for appraising the verisimilitude of path analysis models. This approach uses limited information parameter estimates when assessing model fit and a nonparametric badness-of-fit measure that relies on path diagram combinatories. R. C. MacCallum, M. W. Browne, and K. J. Preacher (2002); C. S. Reichardt (2002); and S. Mulaik (2002) have provided comments on this work. In this article the authors respond to the commentators and reemphasize the importance of subjecting causal models to severe (risky) tests.

Journal ArticleDOI
TL;DR: In this article, Meehl and Waller proposed a method for assessing path analysis models wherein they subjected a given model, along with a set of alternatives, to risky tests using selected elements of a sample correlation matrix.
Abstract: P. E. Meehl and N. G. Waller (2002) proposed an innovative method for assessing path analysis models wherein they subjected a given model, along with a set of alternatives, to risky tests using selected elements of a sample correlation matrix. Although the authors find much common ground with the perspective underlying the Meehl-Waller approach, they suggest that there are aspects of the proposed procedure that require close examination and further development. These include the selection of only one subset of correlations to estimate parameters when multiple solutions are generally available, the fact that the risky tests may test only a subset of parameters rather than the full model of interest, and the potential for different results to be obtained from analysis of equivalent models.

Journal ArticleDOI
TL;DR: The author extends the graphical latent variable models for nominal and/or ordinal data proposed by C. J. Anderson and J. K. Vermunt to situations in which dependencies between observed variables are not fully accounted for by the latent variables.
Abstract: Models used to analyze cross-classifications of counts from psychological experiments must represent associations between multiple discrete variables and take into account attributes of stimuli, experimental conditions, or characteristics of subjects. The models must also lend themselves to psychological interpretations about underlying structures mediating the relationship between stimuli and responses. To meet these needs, the author extends the graphical latent variable models for nominal and/or ordinal data proposed by C. J. Anderson and J. K. Vermunt (2000) to situations in which dependencies between observed variables are not fully accounted for by the latent variables. The graphical models provide a unified framework for studying multivariate associations that include log-linear models and log-multiplicative association models as special cases.

Journal ArticleDOI
TL;DR: This paper argued from a naturalistic-cognitive philosophy of science that science seeks objective knowledge and that hypothesis testing is central to achieving that goal and argued that P E Meehl and N G Waller's proposal blurs the distinction between hypothesis testing and explorations of the data seeking an optimal model to serve as a prospective inductive generalization.
Abstract: P E Meehl and N G Waller's (2002) proposed method may not yield unique solutions for model parameters nor unique solutions for model lack of fit The author argues from a naturalistic-cognitive philosophy of science that science seeks objective knowledge and that hypothesis testing is central to achieving that goal It is also argued that P E Meehl and N G Waller's proposal blurs the distinction between hypothesis testing and explorations of the data seeking an optimal model to serve as a prospective inductive generalization But it is noted that inductive generalizations are never unique and must be tested to eliminate those that reflect subjective aspects of the researcher's methods and points of view