scispace - formally typeset
Search or ask a question

Showing papers in "Psychological Methods in 2009"


Journal ArticleDOI
TL;DR: The aim of this work is to introduce the principles of the standard recursive partitioning methods as well as recent methodological improvements, to illustrate their usage for low and high-dimensional data exploration, but also to point out limitations of the methods and potential pitfalls in their practical application.
Abstract: Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, which can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine, and bioinformatics within the past few years. High-dimensional problems are common not only in genetics, but also in some areas of psychological research, where only a few subjects can be measured because of time or cost constraints, yet a large amount of data is generated for each subject. Random forests have been shown to achieve a high prediction accuracy in such applications and to provide descriptive variable importance measures reflecting the impact of each variable in both main effects and interactions. The aim of this work is to introduce the principles of the standard recursive partitioning methods as well as recent methodological improvements, to illustrate their usage for low and high-dimensional data exploration, but also to point out limitations of the methods and potential pitfalls in their practical application. Application of the methods is illustrated with freely available implementations in the R system for statistical computing.

2,001 citations


Journal ArticleDOI
TL;DR: Results indicate some positive findings with respect to reporting practices including proposing multiple models a priori and near universal reporting of the chi-square significance test, but many deficiencies were found such as lack of information regarding missing data and assessment of normality.
Abstract: Reporting practices in 194 confirmatory factor analysis studies (1,409 factor models) published in American Psychological Association journals from 1998 to 2006 were reviewed and compared with established reporting guidelines. Three research questions were addressed: (a) how do actual reporting practices compare with published guidelines? (b) how do researchers report model fit in light of divergent perspectives on the use of ancillary fit indices (e.g., L.-T. Hu & P. M. Bentler, 1999; H. W. Marsh, K.-T., Hau, & Z. Wen, 2004)? and (c) are fit measures that support hypothesized models reported more often than fit measures that are less favorable? Results indicate some positive findings with respect to reporting practices including proposing multiple models a priori and near universal reporting of the chi-square significance test. However, many deficiencies were found such as lack of information regarding missing data and assessment of normality. Additionally, the authors found increases in reported values of some incremental fit statistics and no statistically significant evidence that researchers selectively report measures of fit that support their preferred model. Recommendations for reporting are summarized and a checklist is provided to help editors, reviewers, and authors improve reporting practices.

1,662 citations


Journal ArticleDOI
TL;DR: The article shows that the correct effect size for treatment efficacy in GMA--the difference between the estimated means of the 2 groups at end of study (determined from the coefficient for the slope difference and length of study) divided by the baseline standard deviation--is not reported in clinical trials.
Abstract: The use of growth-modeling analysis (GMA)--including hierarchical linear models, latent growth models, and general estimating equations--to evaluate interventions in psychology, psychiatry, and prevention science has grown rapidly over the last decade. However, an effect size associated with the difference between the trajectories of the intervention and control groups that captures the treatment effect is rarely reported. This article first reviews 2 classes of formulas for effect sizes associated with classical repeated-measures designs that use the standard deviation of either change scores or raw scores for the denominator. It then broadens the scope to subsume GMA and demonstrates that the independent groups, within-subjects, pretest-posttest control-group, and GMA designs all estimate the same effect size when the standard deviation of raw scores is uniformly used. Finally, the article shows that the correct effect size for treatment efficacy in GMA--the difference between the estimated means of the 2 groups at end of study (determined from the coefficient for the slope difference and length of study) divided by the baseline standard deviation--is not reported in clinical trials.

670 citations


Journal ArticleDOI
TL;DR: An overview of IDA as it may be applied within the psychological sciences is presented, the relative advantages and disadvantages of IDa are discussed, analytic strategies for analyzing pooled individual data are described, and recommendations for the use ofIDA in practice are offered.
Abstract: There are both quantitative and methodological techniques that foster the development and maintenance of a cumulative knowledge base within the psychological sciences. Most noteworthy of these techniques is meta-analysis, which allows for the synthesis of summary statistics drawn from multiple studies when the original data are not available. However, when the original data can be obtained from multiple studies, many advantages stem from the statistical analysis of the pooled data. The authors define integrative data analysis (IDA) as the analysis of multiple data sets that have been pooled into one. Although variants of IDA have been incorporated into other scientific disciplines, the use of these techniques is much less evident in psychology. In this article the authors present an overview of IDA as it may be applied within the psychological sciences, discuss the relative advantages and disadvantages of IDA, describe analytic strategies for analyzing pooled individual data, and offer recommendations for the use of IDA in practice.

601 citations


Journal ArticleDOI
TL;DR: This article proposes Bayesian analysis of mediation effects, which allows researchers to incorporate prior information into the mediation analysis, thus potentially improving the efficiency of estimates and conceptually simpler for multilevel mediation analysis.
Abstract: In this article, we propose Bayesian analysis of mediation effects. Compared with conventional frequentist mediation analysis, the Bayesian approach has several advantages. First, it allows researchers to incorporate prior information into the mediation analysis, thus potentially improving the efficiency of estimates. Second, under the Bayesian mediation analysis, inference is straightforward and exact, which makes it appealing for studies with small samples. Third, the Bayesian approach is conceptually simpler for multilevel mediation analysis. Simulation studies and analysis of 2 data sets are used to illustrate the proposed methods.

441 citations


Journal ArticleDOI
TL;DR: The authors contacted 66 researchers who had published articles using dichotomized variables and obtained their justifications for dichotomization and explored both logically and with Monte Carlo simulations to determine whether any of these reasons are valid.
Abstract: Despite many articles reporting the problems of dichotomizing continuous measures, researchers still commonly use this practice. The authors' purpose in this article was to understand the reasons that people still dichotomize and to determine whether any of these reasons are valid. They contacted 66 researchers who had published articles using dichotomized variables and obtained their justifications for dichotomization. They also contacted 53 authors of articles published in Psychological Methods and asked them to identify any situations in which they believed dichotomized indicators could perform better. Justifications provided by these two groups fell into three broad categories, which the authors explored both logically and with Monte Carlo simulations. Continuous indicators were superior in the majority of circumstances and never performed substantially worse than the dichotomized indicators, but the simulations did reveal specific situations in which dichotomized indicators performed as well as or better than the original continuous indictors. The authors also considered several justifications for dichotomization that did not lend themselves to simulation, but in each case they found compelling arguments to address these situations using techniques other than dichotomization.

378 citations


Journal ArticleDOI
TL;DR: A resource management perspective on making a complete or reduced factorial design decision is advocated, in which the investigator seeks a strategic balance between service to scientific objectives and economy.
Abstract: An investigator who plans to conduct an experiment with multiple independent variables must decide whether to use a complete or reduced factorial design. This article advocates a resource management perspective on making this decision, in which the investigator seeks a strategic balance between service to scientific objectives and economy. Considerations in making design decisions include whether research questions are framed as main effects or simple effects; whether and which effects are aliased (confounded) in a particular design; the number of experimental conditions that must be implemented in a particular design and the number of experimental subjects the design requires to maintain the desired level of statistical power; and the costs associated with implementing experimental conditions and obtaining experimental subjects. In this article 4 design options are compared: complete factorial, individual experiments, single factor, and fractional factorial. Complete and fractional factorial designs and single-factor designs are generally more economical than conducting individual experiments on each factor. Although relatively unfamiliar to behavioral scientists, fractional factorial designs merit serious consideration because of their economy and versatility.

319 citations


Journal ArticleDOI
TL;DR: The authors describe the relative benefits of conducting meta-analyses with (a) individual participant data (IPD) gathered from the constituent studies and (b) aggregated data (AD), or the group-level statistics that appear in reports of a study's results.
Abstract: The authors describe the relative benefits of conducting meta-analyses with (a) individual participant data (IPD) gathered from the constituent studies and (b) aggregated data (AD), or the group-level statistics (in particular, effect sizes) that appear in reports of a study's results. Given that both IPD and AD are equally available, meta-analysis of IPD is superior to meta-analysis of AD. IPD meta-analysis permits synthesists to perform subgroup analyses not conducted by the original collectors of the data, to check the data and analyses in the original studies, to add new information to the data sets, and to use different statistical methods. However, the cost of IPD meta-analysis and the lack of available IPD data sets suggest that the best strategy currently available is to use both approaches in a complementary fashion such that the first step in conducting an IPD meta-analysis would be to conduct an AD meta-analysis. Regardless of whether a meta-analysis is conducted with IPD or AD, synthesists must remain vigilant in how they interpret their results. They must avoid ecological fallacies, Simpson's paradox, and interpretation of synthesis-generated evidence as supporting causal inferences.

284 citations


Journal ArticleDOI
TL;DR: This article presents a bootstrapping procedure that allows one to determine the statistical significance of a relative weight and illustrates this approach here by applying the procedure to published data.
Abstract: Relative weight analysis is a procedure for estimating the relative importance of correlated predictors in a regression equation. Because the sampling distribution of relative weights is unknown, researchers using relative weight analysis are unable to make judgments regarding the statistical significance of the relative weights. J. W. Johnson (2004) presented a bootstrapping methodology to compute standard errors for relative weights, but this procedure cannot be used to determine whether a relative weight is significantly different from zero. This article presents a bootstrapping procedure that allows one to determine the statistical significance of a relative weight. The authors conducted a Monte Carlo study to explore the Type I error, power, and bias associated with their proposed technique. They illustrate this approach here by applying the procedure to published data.

238 citations


Journal ArticleDOI
TL;DR: The authors articulate some of the challenges of meta-analytic and pooled-data approaches and introduce a coordinated analysis approach as an important avenue for maximizing the comparability, replication, and extension of results from longitudinal studies.
Abstract: Replication of research findings across independent longitudinal studies is essential for a cumulative and innovative developmental science. Meta-analysis of longitudinal studies is often limited by the amount of published information on particular research questions, the complexity of longitudinal designs and sophistication of analyses, and practical limits on full reporting of results. In many cases, cross-study differences in sample composition and measurements impede or lessen the utility of pooled data analysis. A collaborative, coordinated analysis approach can provide a broad foundation for cumulating scientific knowledge by facilitating efficient analysis of multiple studies in ways that maximize comparability of results and permit evaluation of study differences. The goal of such an approach is to maximize opportunities for replication and extension of findings across longitudinal studies through open access to analysis scripts and output for published results, permitting modification, evaluation, and extension of alternative statistical models, and application to additional data sets. Drawing on the cognitive aging literature as an example, we articulate some of the challenges of meta-analytic and pooled-data approaches and introduce a coordinated analysis approach as an important avenue for maximizing the comparability, replication, and extension of results from longitudinal studies.

208 citations


Journal ArticleDOI
TL;DR: Full information maximum likelihood (FIML) was compared with a 3-stage estimator for categorical item factor analysis (CIFA) when the unweighted least squares method was used in CIFA's third stage, and both methods failed in a number of conditions.
Abstract: The performance of parameter estimates and standard errors in estimating F. Samejima's graded response model was examined across 324 conditions. Full information maximum likelihood (FIML) was compared with a 3-stage estimator for categorical item factor analysis (CIFA) when the unweighted least squares method was used in CIFA's third stage. CIFA is much faster in estimating multidimensional models, particularly with correlated dimensions. Overall, CIFA yields slightly more accurate parameter estimates, and FIML yields slightly more accurate standard errors. Yet, across most conditions, differences between methods are negligible. FIML is the best election in small sample sizes (200 observations). CIFA is the best election in larger samples (on computational grounds). Both methods failed in a number of conditions, most of which involved 200 observations, few indicators per dimension, highly skewed items, or low factor loadings. These conditions are to be avoided in applications.

Journal ArticleDOI
TL;DR: The authors discuss the challenges presented by these 3 issues in the calculation and interpretation of SEM- and MLM-based fit indices for growth curve models and conclude by identifying some lines for future research.
Abstract: Evaluating overall model fit for growth curve models involves 3 challenging issues. (a) Three types of longitudinal data with different implications for model fit may be distinguished: balanced on time with complete data, balanced on time with data missing at random, and unbalanced on time. (b) Traditional work on fit from the structural equation modeling (SEM) perspective has focused only on the covariance structure, but growth curve models have four potential sources of misspecification: within-individual covariance matrix, between-individuals covariance matrix, marginal mean structure, and conditional mean structure. (c) Growth curve models can be estimated in both the SEM and multilevel modeling (MLM) frameworks; these have different emphases for the evaluation of model fit. In this article, the authors discuss the challenges presented by these 3 issues in the calculation and interpretation of SEM- and MLM-based fit indices for growth curve models and conclude by identifying some lines for future research.

Journal ArticleDOI
TL;DR: A newly proposed moderated nonlinear factor analysis model generalizes models and procedures, allowing for items of different scale types (continuous or discrete) and differential item functioning across levels of categorical and/or continuous variables.
Abstract: When conducting an integrative analysis of data obtained from multiple independent studies, a fundamental problem is to establish commensurate measures for the constructs of interest. Fortunately, procedures for evaluating and establishing measurement equivalence across samples are well developed for the linear factor model and commonly used item response theory models. A newly proposed moderated nonlinear factor analysis model generalizes these models and procedures, allowing for items of different scale types (continuous or discrete) and differential item functioning across levels of categorical and/or continuous variables. The potential of this new model to resolve the problem of measurement in integrative data analysis is shown via an empirical example examining changes in alcohol involvement from ages 10 to 22 years across 2 longitudinal studies.

Journal ArticleDOI
TL;DR: The authors use multiple-sample longitudinal data from different test batteries to examine propositions about changes in constructs over the life span from classic studies on intellectual abilities to lead to a few new methodological suggestions for dealing with repeated constructs based on changing measurements in developmental studies.
Abstract: The authors use multiple-sample longitudinal data from different test batteries to examine propositions about changes in constructs over the life span. The data come from 3 classic studies on intellectual abilities in which, in combination, 441 persons were repeatedly measured as many as 16 times over 70 years. They measured cognitive constructs of vocabulary and memory using 8 age-appropriate intelligence test batteries and explore possible linkage of these scales using item response theory (IRT). They simultaneously estimated the parameters of both IRT and latent curve models based on a joint model likelihood approach (i.e., NLMIXED and WINBUGS). They included group differences in the model to examine potential interindividual differences in levels and change. The resulting longitudinal invariant Rasch test analyses lead to a few new methodological suggestions for dealing with repeated constructs based on changing measurements in developmental studies.

Journal ArticleDOI
TL;DR: New FE meta-analytic confidence intervals are proposed that are easy to compute and perform properly under effect-size heterogeneity and nonrandomly selected studies and may be used to combine unstandardized or standardized mean differences from studies having either independent samples or dependent samples.
Abstract: The fixed-effects (FE) meta-analytic confidence intervals for unstandardized and standardized mean differences are based on an unrealistic assumption of effect-size homogeneity and perform poorly when this assumption is violated. The random-effects (RE) meta-analytic confidence intervals are based on an unrealistic assumption that the selected studies represent a random sample from a large superpopulation of studies. The RE approach cannot be justified in typical meta-analysis applications in which studies are nonrandomly selected. New FE meta-analytic confidence intervals for unstandardized and standardized mean differences are proposed that are easy to compute and perform properly under effect-size heterogeneity and nonrandomly selected studies. The proposed meta-analytic confidence intervals may be used to combine unstandardized or standardized mean differences from studies having either independent samples or dependent samples and may also be used to integrate results from previous studies into a new study. An alternative approach to assessing effect-size heterogeneity is presented.

Journal ArticleDOI
TL;DR: A new model is developed that allows the simultaneous analysis of accuracy scores and response times of cognitive tests with a rule-based design and is capable of simultaneously estimating ability and speed on the person side as well as difficulty and time intensity on the task side, thus dissociating information that is often confounded in current analysis procedures.
Abstract: In current psychological research, the analysis of data from computer-based assessments or experiments is often confined to accuracy scores. Response times, although being an important source of additional information, are either neglected or analyzed separately. In this article, a new model is developed that allows the simultaneous analysis of accuracy scores and response times of cognitive tests with a rule-based design. The model is capable of simultaneously estimating ability and speed on the person side as well as difficulty and time intensity on the task side, thus dissociating information that is often confounded in current analysis procedures. Further, by integrating design matrices on the task side, it becomes possible to assess the effects of design parameters (e.g., cognitive processes) on both task difficulty and time intensity, offering deeper insights into the task structure. A Bayesian approach, using Markov Chain Monte Carlo methods, has been developed to estimate the model. An application of the model in the context of educational assessment is illustrated using a large-scale investigation of figural reasoning ability.

Journal ArticleDOI
TL;DR: The authors present several recommendations for improving graphs including the following: bar charts of means should be supplanted by graphs containing distributional information, and good design should be used to allow more information to be included in a graph without obscuring trends in the data.
Abstract: Statistical graphs are commonly used in scientific publications. Unfortunately, graphs in psychology journals rarely portray distributional information beyond central tendency, and few graphs portray inferential statistics. Moreover, those that do portray inferential information generally do not portray it in a way that is useful for interpreting the data. The authors present several recommendations for improving graphs including the following: (a) bar charts of means with or without standard errors should be supplanted by graphs containing distributional information, (b) good design should be used to allow more information to be included in a graph without obscuring trends in the data, and (c) figures should include both graphic images and inferential statistics presented in words and numbers.

Journal ArticleDOI
TL;DR: The present attempt to replicate Field's simulations included comparisons with analytic values as well as results for efficiency and confidence-interval coverage, and practical guidance is offered regarding simulation evidence and choices among methods.
Abstract: In 2 Monte Carlo studies of fixed- and random-effects meta-analysis for correlations, A. P. Field (2001) ostensibly evaluated Hedges-Olkin-Vevea Fisher-z and Schmidt-Hunter Pearson-r estimators and tests in 120 conditions. Some authors have cited those results as evidence not to meta-analyze Fisher-z correlations, especially with heterogeneous correlation parameters. The present attempt to replicate Field's simulations included comparisons with analytic values as well as results for efficiency and confidence-interval coverage. Field's results under homogeneity were mostly replicable, but those under heterogeneity were not: The latter exhibited up to over .17 more bias than ours and, for tests of the mean correlation and homogeneity, respectively, nonnull rejection rates up to .60 lower and .65 higher. Changes to Field's observations and conclusions are recommended, and practical guidance is offered regarding simulation evidence and choices among methods. Most cautions about poor performance of Fisher-z methods are largely unfounded, especially with a more appropriate z-to-r transformation. The Appendix gives a computer program for obtaining Pearson-r moments from a normal Fisher-z distribution, which is used to demonstrate distortion due to direct z-to-r transformation of a mean Fisher-z correlation.

Journal ArticleDOI
TL;DR: It is shown that the study of intraindividual variability can be made more productive by examining variability of interest at specific time scales, rather than considering the variability of entire time series.
Abstract: The study of intraindividual variability is central to the study of individuals in psychology. Previous research has related the variance observed in repeated measurements (time series) of individuals to trait-like measures that are logically related. Intraindividual measures, such as intraindividual standard deviation or the coefficient of variation, are likely to be incomplete representations of intraindividual variability. This article shows that the study of intraindividual variability can be made more productive by examining variability of interest at specific time scales, rather than considering the variability of entire time series. Furthermore, examination of variance in observed scores may not be sufficient, because these neglect the time scale dependent relationships between observations. The current article outlines a method of using estimated derivatives to examine intraindividual variability through estimates of the variance and other distributional properties at multiple time scales. In doing so, this article encourages more nuanced discussion about intraindividual variability and highlights that variability and variance are not equivalent. An example with simulated data and an example relating variability in daily measures of negative affect to neuroticism are provided.


Journal ArticleDOI
TL;DR: The author proposes that the methods and techniques described in this set of articles can significantly propel researchers forward in their ongoing quest to build a cumulative psychological science.
Abstract: The goal of any empirical science is to pursue the construction of a cumulative base of knowledge upon which the future of the science may be built. However, there is mixed evidence that the science of psychology can accurately be characterized by such a cumulative progression. Indeed, some argue that the development of a truly cumulative psychological science is not possible with the current paradigms of hypothesis testing in single-study designs. The author explores this controversy as a framework to introduce the 6 articles that make up this special issue on the integration of data and empirical findings across multiple studies. The author proposes that the methods and techniques described in this set of articles can significantly propel researchers forward in their ongoing quest to build a cumulative psychological science.

Journal ArticleDOI
TL;DR: Using data from a large-scale panel survey, an alternative estimator of reliability is investigated that relaxes the assumptions of both Cronbach's alpha and the simplex estimator and thus generalizes both estimators.
Abstract: Scale score (also known as composite score) measures (SSMs) are very common in psychological and social science research. As an example, the Child Behavior Checklist (CBCL) is a common SSM for measuring behavior problems in children (see Achenbach, 1991 for the version of the CBCL used in this paper). It consists of 118 items on behavior problems, each scored on a 3-point scale (1 = not true, 2 = sometimes true, and 3 = often true). The CBCL Total Behavior Problem Score is an empirical measure of child behavior computed as a sum of the responses to the 118 items. The usefulness of any SSM in data analysis depends in large part on its reliability. An SSM with poor reliability is infected with random errors that obscure the underlying true score values. SSMs with good reliability are relatively free from random error, which increases the statistical power of the variable for analysis. As an example, Biemer and Trewin (1977) show that as reliability (ρ) decreases, the standard errors of estimates of means, totals, and proportions increase by the factor ρ−1. In the same paper, the authors show that, for simple linear regression, the estimator of slope coefficient, β^, estimates βρ rather than the true parameter, β; i.e., β^ is biased toward 0 if the explanatory variable is not reliable. Estimates of quantiles, goodness-of-fit tests, and measures of association in categorical data analysis are also biased. Thus, assessing scale score reliability is typically an integral and critical step in the use of SSMs in data analysis. A common method for assessing scale score reliability is Cronbach’s α (Hogan, Benjamin, & Brezinsky, 2000), which is based upon the internal consistency of the items comprising the SSM. It can be shown that, under certain assumptions (specified below) the reliability of an SSM is proportional to the item consistency. Many authors in numerous disciplines have used α to assess the reliability of scale scores (see, for example, Burney & Kromrey, 2001; Sapin et al., 2005; Yoshizumi, Murakami, & Takai, 2006). For example, Hogan, Benjamin, and Brezinski (2000) found that α was used in about 75% of reported reliability estimates in publications by the American Psychological Association. One reason for its ubiquity is that data analysis software packages (for example, SAS, SPSS, and STATA) provide subroutines for computing α with relative ease. In addition, few alternatives exist for assessing reliability in cross-sectional studies. Yet, Cronbach’s α and other so-called internal consistency indictors of ρ have been criticized in the literature due to the rather strong assumptions underlying their development (see, for example, Bollen, 1989, p. 217; Cortina, 1993; Green & Hershberger, 2000; Lucke, 2005; Raykov, 2001; Shevlin, Miles, Davies, & Walker, 2000; Zimmerman & Zumbo, 1993). For longitudinal data, an alternative to α is the (quasi-) simplex estimator that operates on the repeated measurements of the same SSM over multiple waves of a panel survey. While the simplex estimator relaxes some of α’s assumptions, it imposes others that can be overly restrictive in some situations. A more general estimator extends the simplex model by incorporating equivalent forms of the SSMs using the method of split halves (see, for example, Bollen, 1989, p. 213). This method, referred to as the generalized simplex (GS) method, relaxes many of the parameter constraints imposed by the traditional simplex method. The GS model also provides a framework based upon formal tests of significance for identifying the most parsimonious model for estimating reliability. By imposing parameter constraints on the GS model, estimators that are equivalent to α, the simplex estimator, and several other related estimators can be compared for a particular set of data. As an example, in situations where its assumptions hold, α may be preferred over the more complex, longitudinal estimators that typically have larger standard errors. However, for large sample sizes, bias may be the determining factor and researchers may prefer to compute the estimators of reliability from the unrestricted GS model. Even in these situations, it is instructive to identify situations where the assumptions underlying α and the traditional simplex model do not hold to inform future uses of the simpler models. The next section briefly reviews the concept of reliability, particularly scale score reliability, and introduces the notation and models needed for describing the methods. We examine the assumptions underlying Cronbach’s α and consider the biases that result when assumptions are violated, as often occurs in survey work. Section 3 considers some alternatives to Cronbach’s α for longitudinal data such as the simplex approach and a generalization of that approach that relaxes a critical and restrictive assumption of the simplex model. This section also develops the methodology for testing the assumptions underlying several alternative estimates of reliability. In Section 4, we apply this methodology to a number of scale score measures from the National Survey of Child and Adolescent Well-being (NSCAW) to illustrate the concepts and the performance of the estimators. Finally, Section 5 summarizes the findings and provides conclusions and recommendations.

Journal ArticleDOI
TL;DR: The author concludes by recommending critical examination of model-based inferences from IDA through sensitivity analyses and by noting that IDA can promote collaboration and networks that yield data that are more amenable to integrated analyses in the future.
Abstract: The author comments on the potential of integrative data analysis (IDA) as a new methodological activity and on some of the topics that were discussed in the 5 articles in this special issue. One topic is the extent to which IDA will be used to provide conclusive summaries regarding the strength of evidence for well-specified questions versus to provide new information that goes beyond the simple sum of individual studies. Another is the meaning of variances of effects that are observed over studies and sample strata. A 3rd is the potential to enhance understanding of construct validity by fitting measurement models described in the special issue. The author concludes by recommending critical examination of model-based inferences from IDA through sensitivity analyses and by noting that IDA can promote collaboration and networks that yield data that are more amenable to integrated analyses in the future.

Journal ArticleDOI
TL;DR: In this article, the authors derived sample size formulas for general standardized linear contrasts of k > or = 2 means for both between-subjects designs and within subject-designs, and special sample size formula also were derived for the standardizer proposed by G. V. Glass.
Abstract: L. Wilkinson and the Task Force on Statistical Inference (1999) recommended reporting confidence intervals for measures of effect sizes. If the sample size is too small, the confidence interval may be too wide to provide meaningful information. Recently, K. Kelley and J. R. Rausch (2006) used an iterative approach to computer-generate tables of sample size requirements for a standardized difference between 2 means in between-subjects designs. Sample size formulas are derived here for general standardized linear contrasts of k > or = 2 means for both between-subjects designs and within-subjects designs. Special sample size formulas also are derived for the standardizer proposed by G. V. Glass (1976).

Journal ArticleDOI
TL;DR: A unifying theory of subject-centered scalability is offered that is grounded in structural true score modeling, is conceptually distinct from internal consistency and homogeneity as determined by item correlations, and is empirically confirmable.
Abstract: A unifying theory of subject-centered scalability is offered that is grounded in structural true score modeling, is conceptually distinct from internal consistency and homogeneity as determined by item correlations, and is empirically confirmable. Scalability holds when item true scores are perfectly correlated but differ in their individual scale metric. The condition of scalability imposes constraints that allow individual item reliability to be estimated independently of scalability. Scalability is shown to imply unit rank and to be testable by a single-factor confirmatory factor analysis reinterpreted as a test of unit rank. High item correlations are shown, contrary to intuition, to be an insufficient condition for scalability. Conversely, low item correlations do not necessarily imply lack of scalability. A stepped decision-oriented procedure is offered as a guideline in summated rating scale construction.