scispace - formally typeset
Search or ask a question

Showing papers in "Psychological Methods in 2016"


Journal ArticleDOI
TL;DR: A review of the particularly valuable statistical indices one can derive from bifactor models, which include omega reliability coefficients, factor determinacy, construct reliability, explained common variance, and percentage of uncontaminated correlations are provided.
Abstract: Bifactor measurement models are increasingly being applied to personality and psychopathology measures (Reise, 2012). In this work, authors generally have emphasized model fit, and their typical conclusion is that a bifactor model provides a superior fit relative to alternative subordinate models. Often unexplored, however, are important statistical indices that can substantially improve the psychometric analysis of a measure. We provide a review of the particularly valuable statistical indices one can derive from bifactor models. They include omega reliability coefficients, factor determinacy, construct reliability, explained common variance, and percentage of uncontaminated correlations. We describe how these indices can be calculated and used to inform: (a) the quality of unit-weighted total and subscale score composites, as well as factor score estimates, and (b) the specification and quality of a measurement model in structural equation modeling. (PsycINFO Database Record

848 citations


Journal ArticleDOI
TL;DR: The Pearson productmoment correlation coefficient and the Spearman rank correlation coefficient (rs) are widely used in psychological research as mentioned in this paper, and they have similar expected values but rs is more variable than rp, especially when the correlation is strong.
Abstract: The Pearson product-moment correlation coefficient (rp) and the Spearman rank correlation coefficient (rs) are widely used in psychological research. We compare rp and rs on 3 criteria: variability, bias with respect to the population value, and robustness to an outlier. Using simulations across low (N = 5) to high (N = 1,000) sample sizes we show that, for normally distributed variables, rp and rs have similar expected values but rs is more variable, especially when the correlation is strong. However, when the variables have high kurtosis, rp is more variable than rs. Next, we conducted a sampling study of a psychometric dataset featuring symmetrically distributed data with light tails, and of 2 Likert-type survey datasets, 1 with light-tailed and the other with heavy-tailed distributions. Consistent with the simulations, rp had lower variability than rs in the psychometric dataset. In the survey datasets with heavy-tailed variables in particular, rs had lower variability than rp, and often corresponded more accurately to the population Pearson correlation coefficient (Rp) than rp did. The simulations and the sampling studies showed that variability in terms of standard deviations can be reduced by about 20% by choosing rs instead of rp. In comparison, increasing the sample size by a factor of 2 results in a 41% reduction of the standard deviations of rs and rp. In conclusion, rp is suitable for light-tailed distributions, whereas rs is preferable when variables feature heavy-tailed distributions or when outliers are present, as is often the case in psychological research.

428 citations


Journal ArticleDOI
TL;DR: This work introduces a multilevel structural equation modeling (MSEM) logic that clarifies the nature of the problems with existing practices and remedies them with latent variable interactions and uses random coefficients and/or latent moderated structural equations (LMS) for unbiased tests of multileVEL moderation.
Abstract: Social scientists are increasingly interested in multilevel hypotheses, data, and statistical models as well as moderation or interactions among predictors. The result is a focus on hypotheses and tests of multilevel moderation within and across levels of analysis. Unfortunately, existing approaches to multilevel moderation have a variety of shortcomings, including conflated effects across levels of analysis and bias due to using observed cluster averages instead of latent variables (i.e., "random intercepts") to represent higher-level constructs. To overcome these problems and elucidate the nature of multilevel moderation effects, we introduce a multilevel structural equation modeling (MSEM) logic that clarifies the nature of the problems with existing practices and remedies them with latent variable interactions. This remedy uses random coefficients and/or latent moderated structural equations (LMS) for unbiased tests of multilevel moderation. We describe our approach and provide an example using the publicly available High School and Beyond data with Mplus syntax in Appendix. Our MSEM method eliminates problems of conflated multilevel effects and reduces bias in parameter estimates while offering a coherent framework for conceptualizing and testing multilevel moderation effects. (PsycINFO Database Record

278 citations


Journal ArticleDOI
TL;DR: A Monte Carlo simulation study was carried out to compare the performance of ML, DWLS, and ULS in estimating model parameters, and their robust corrections to standard errors, and chi-square statistics in a structural equation model with ordinal observed variables.
Abstract: Three estimation methods with robust corrections-maximum likelihood (ML) using the sample covariance matrix, unweighted least squares (ULS) using a polychoric correlation matrix, and diagonally weighted least squares (DWLS) using a polychoric correlation matrix-have been proposed in the literature, and are considered to be superior to normal theory-based maximum likelihood when observed variables in latent variable models are ordinal. A Monte Carlo simulation study was carried out to compare the performance of ML, DWLS, and ULS in estimating model parameters, and their robust corrections to standard errors, and chi-square statistics in a structural equation model with ordinal observed variables. Eighty-four conditions, characterized by different ordinal observed distribution shapes, numbers of response categories, and sample sizes were investigated. Results reveal that (a) DWLS and ULS yield more accurate factor loading estimates than ML across all conditions; (b) DWLS and ULS produce more accurate interfactor correlation estimates than ML in almost every condition; (c) structural coefficient estimates from DWLS and ULS outperform ML estimates in nearly all asymmetric data conditions; (d) robust standard errors of parameter estimates obtained with robust ML are more accurate than those produced by DWLS and ULS across most conditions; and (e) regarding robust chi-square statistics, robust ML is inferior to DWLS and ULS in controlling for Type I error in almost every condition, unless a large sample is used (N = 1,000). Finally, implications of the findings are discussed, as are the limitations of this study as well as potential directions for future research. (PsycINFO Database Record

248 citations


Journal ArticleDOI
TL;DR: It is argued that in order to make a meaningful comparison of the strength of the cross-lagged associations, the coefficients should be standardized within persons, and disregarding individual differences in dynamics can prove misleading.
Abstract: By modeling variables over time it is possible to investigate the Granger-causal cross-lagged associations between variables. By comparing the standardized cross-lagged coefficients, the relative strength of these associations can be evaluated in order to determine important driving forces in the dynamic system. The aim of this study was twofold: first, to illustrate the added value of a multilevel multivariate autoregressive modeling approach for investigating these associations over more traditional techniques; and second, to discuss how the coefficients of the multilevel autoregressive model should be standardized for comparing the strength of the cross-lagged associations. The hierarchical structure of multilevel multivariate autoregressive models complicates standardization, because subject-based statistics or group-based statistics can be used to standardize the coefficients, and each method may result in different conclusions. We argue that in order to make a meaningful comparison of the strength of the cross-lagged associations, the coefficients should be standardized within persons. We further illustrate the bivariate multilevel autoregressive model and the standardization of the coefficients, and we show that disregarding individual differences in dynamics can prove misleading, by means of an empirical example on experienced competence and exhaustion in persons diagnosed with burnout. (PsycINFO Database Record

164 citations


Journal ArticleDOI
TL;DR: These confidence interval methods for 4 reliability coefficients under a variety of conditions with 3 large-scale Monte Carlo simulation studies are evaluated and lead to generally recommend bootstrap confidence intervals for hierarchical omega for continuous items and categorical omega for categorical items.
Abstract: A composite score is the sum of a set of components. For example, a total test score can be defined as the sum of the individual items. The reliability of composite scores is of interest in a wide variety of contexts due to their widespread use and applicability to many disciplines. The psychometric literature has devoted considerable time to discussing how to best estimate the population reliability value. However, all point estimates of a reliability coefficient fail to convey the uncertainty associated with the estimate as it estimates the population value. Correspondingly, a confidence interval is recommended to convey the uncertainty with which the population value of the reliability coefficient has been estimated. However, many confidence interval methods for bracketing the population reliability coefficient exist and it is not clear which method is most appropriate in general or in a variety of specific circumstances. We evaluate these confidence interval methods for 4 reliability coefficients (coefficient alpha, coefficient omega, hierarchical omega, and categorical omega) under a variety of conditions with 3 large-scale Monte Carlo simulation studies. Our findings lead us to generally recommend bootstrap confidence intervals for hierarchical omega for continuous items and categorical omega for categorical items. All of the methods we discuss are implemented in the freely available R language and environment via the MBESS package.

161 citations


Journal ArticleDOI
TL;DR: This work introduces psychologists to social media language research, identifying descriptive and predictive analyses that language data allow and describes how raw language data can be accessed and quantified for inclusion in subsequent analyses, exploring personality as expressed on Facebook to illustrate.
Abstract: Language data available through social media provide opportunities to study people at an unprecedented scale. However, little guidance is available to psychologists who want to enter this area of research. Drawing on tools and techniques developed in natural language processing, we first introduce psychologists to social media language research, identifying descriptive and predictive analyses that language data allow. Second, we describe how raw language data can be accessed and quantified for inclusion in subsequent analyses, exploring personality as expressed on Facebook to illustrate. Third, we highlight challenges and issues to be considered, including accessing and processing the data, interpreting effects, and ethical issues. Social media has become a valuable part of social life, and there is much we can learn by bringing together the tools of computer science with the theories and insights of psychology. (PsycINFO Database Record

154 citations


Journal ArticleDOI
TL;DR: This paper describes multilevel imputation strategies and evaluates their performance in a variety of common analysis models, and derives 4 major conclusions: joint modeling and chained equations imputation are appropriate for random intercept analyses, the joint model is superior for analyses that posit different within- and between-cluster associations, and a latent variable formulation for categorical variables is quite effective.
Abstract: Although missing data methods have advanced in recent years, methodologists have devoted less attention to multilevel data structures where observations at level-1 are nested within higher-order organizational units at level-2 (e.g., individuals within neighborhoods; repeated measures nested within individuals; students nested within classrooms). Joint modeling and chained equations imputation are the principal imputation frameworks for single-level data, and both have multilevel counterparts. These approaches differ algorithmically and in their functionality; both are appropriate for simple random intercept analyses with normally distributed data, but they differ beyond that. The purpose of this paper is to describe multilevel imputation strategies and evaluate their performance in a variety of common analysis models. Using multiple imputation theory and computer simulations, we derive 4 major conclusions: (a) joint modeling and chained equations imputation are appropriate for random intercept analyses; (b) the joint model is superior for analyses that posit different within- and between-cluster associations (e.g., a multilevel regression model that includes a level-1 predictor and its cluster means, a multilevel structural equation model with different path values at level-1 and level-2); (c) chained equations imputation provides a dramatic improvement over joint modeling in random slope analyses; and (d) a latent variable formulation for categorical variables is quite effective. We use a real data analysis to demonstrate multilevel imputation, and we suggest a number of avenues for future research. (PsycINFO Database Record

127 citations


Journal ArticleDOI
TL;DR: This article introduces two methods that are often employed to extract patterns and reduce the dimensionality of large data sets: singular value decomposition and latent Dirichlet allocation and demonstrates how to use dimensions or clusters extracted from data to build predictive models in a cross-validated way.
Abstract: This article aims to introduce the reader to essential tools that can be used to obtain insights and build predictive models using large data sets. Recent user proliferation in the digital environment has led to the emergence of large samples containing a wealth of traces of human behaviors, communication, and social interactions. Such samples offer the opportunity to greatly improve our understanding of individuals, groups, and societies, but their analysis presents unique methodological challenges. In this tutorial, we discuss potential sources of such data and explain how to efficiently store them. Then, we introduce two methods that are often employed to extract patterns and reduce the dimensionality of large data sets: singular value decomposition and latent Dirichlet allocation. Finally, we demonstrate how to use dimensions or clusters extracted from data to build predictive models in a cross-validated way. The text is accompanied by examples of R code and a sample data set, allowing the reader to practice the methods discussed here. A companion website (http://dataminingtutorial.com) provides additional learning resources. (PsycINFO Database Record

126 citations


Journal ArticleDOI
TL;DR: This article introduces an approach called theory-driven web scraping in which the choice to use web-based big data must follow substantive theory, and introduces data source theories, a term used to describe the assumptions a researcher must make about a prospective big data source in order to meaningfully scrape data from it.
Abstract: The term big data encompasses a wide range of approaches of collecting and analyzing data in ways that were not possible before the era of modern personal computing. One approach to big data of great potential to psychologists is web scraping, which involves the automated collection of information from webpages. Although web scraping can create massive big datasets with tens of thousands of variables, it can also be used to create modestly sized, more manageable datasets with tens of variables but hundreds of thousands of cases, well within the skillset of most psychologists to analyze, in a matter of hours. In this article, we demystify web scraping methods as currently used to examine research questions of interest to psychologists. First, we introduce an approach called theory-driven web scraping in which the choice to use web-based big data must follow substantive theory. Second, we introduce data source theories, a term used to describe the assumptions a researcher must make about a prospective big data source in order to meaningfully scrape data from it. Critically, researchers must derive specific hypotheses to be tested based upon their data source theory, and if these hypotheses are not empirically supported, plans to use that data source should be changed or eliminated. Third, we provide a case study and sample code in Python demonstrating how web scraping can be conducted to collect big data along with links to a web tutorial designed for psychologists. Fourth, we describe a 4-step process to be followed in web scraping projects. Fifth and finally, we discuss legal, practical and ethical concerns faced when conducting web scraping projects. (PsycINFO Database Record

114 citations


Journal ArticleDOI
TL;DR: A variety of additional "replication goals" are developed that will allow researchers to develop a more nuanced understanding of replication that can be flexible enough to answer the various questions that researchers might seek to understand.
Abstract: As the field of psychology struggles to trust published findings, replication research has begun to become more of a priority to both scientists and journals. With this increasing emphasis placed on reproducibility, it is essential that replication studies be capable of advancing the field. However, we argue that many researchers have been only narrowly interpreting the meaning of replication, with studies being designed with a simple statistically significant or nonsignificant results framework in mind. Although this interpretation may be desirable in some cases, we develop a variety of additional "replication goals" that researchers could consider when planning studies. Even if researchers are aware of these goals, we show that they are rarely used in practice-as results are typically analyzed in a manner only appropriate to a simple significance test. We discuss each goal conceptually, explain appropriate analysis procedures, and provide 1 or more examples to illustrate these analyses in practice. We hope that these various goals will allow researchers to develop a more nuanced understanding of replication that can be flexible enough to answer the various questions that researchers might seek to understand.

Journal ArticleDOI
TL;DR: The results indicate that the CFI and TLI provide nearly identical estimations and are the most accurate fit indices, followed at a step below by the RMSEA, and then by the SRMR, which gives notably poor dimensionality estimates.
Abstract: An early step in the process of construct validation consists of establishing the fit of an unrestricted "exploratory" factorial model for a prespecified number of common factors. For this initial unrestricted model, researchers have often recommended and used fit indices to estimate the number of factors to retain. Despite the logical appeal of this approach, little is known about the actual accuracy of fit indices in the estimation of data dimensionality. The present study aimed to reduce this gap by systematically evaluating the performance of 4 commonly used fit indices-the comparative fit index (CFI), the Tucker-Lewis index (TLI), the root mean square error of approximation (RMSEA), and the standardized root mean square residual (SRMR)-in the estimation of the number of factors with categorical variables, and comparing it with what is arguably the current golden rule, Horn's (1965) parallel analysis. The results indicate that the CFI and TLI provide nearly identical estimations and are the most accurate fit indices, followed at a step below by the RMSEA, and then by the SRMR, which gives notably poor dimensionality estimates. Difficulties in establishing optimal cutoff values for the fit indices and the general superiority of parallel analysis, however, suggest that applied researchers are better served by complementing their theoretical considerations regarding dimensionality with the estimates provided by the latter method.

Journal ArticleDOI
TL;DR: This article is a practical guide to conducting big data research, covering data management, acquisition, processing, and analytics (including key supervised and unsupervised learning data mining methods), accompanied by walkthrough tutorials on data acquisition, text analysis with latent Dirichlet allocation topic modeling, and classification with support vector machines.
Abstract: The massive volume of data that now covers a wide variety of human behaviors offers researchers in psychology an unprecedented opportunity to conduct innovative theory- and data-driven field research. This article is a practical guide to conducting big data research, covering data management, acquisition, processing, and analytics (including key supervised and unsupervised learning data mining methods). It is accompanied by walkthrough tutorials on data acquisition, text analysis with latent Dirichlet allocation topic modeling, and classification with support vector machines. Big data practitioners in academia, industry, and the community have built a comprehensive base of tools and knowledge that makes big data research accessible to researchers in a broad range of fields. However, big data research does require knowledge of software programming and a different analytical mindset. For those willing to acquire the requisite skills, innovative analyses of unexpected or previously untapped data sources can offer fresh ways to develop, test, and extend theories. When conducted with care and respect, big data research can become an essential complement to traditional research. (PsycINFO Database Record

Journal ArticleDOI
TL;DR: It is proposed that null hypothesis testing in multigroup SEM be replaced by equivalence testing, which allows researchers to effectively control the size of misspecification before moving on to testing a more restricted model.
Abstract: Multigroup structural equation modeling (SEM) plays a key role in studying measurement invariance and in group comparison. When population covariance matrices are deemed not equal across groups, the next step to substantiate measurement invariance is to see whether the sample covariance matrices in all the groups can be adequately fitted by the same factor model, called configural invariance. After configural invariance is established, cross-group equalities of factor loadings, error variances, and factor variances-covariances are then examined in sequence. With mean structures, cross-group equalities of intercepts and factor means are also examined. The established rule is that if the statistic at the current model is not significant at the level of .05, one then moves on to testing the next more restricted model using a chi-square-difference statistic. This article argues that such an established rule is unable to control either Type I or Type II errors. Analysis, an example, and Monte Carlo results show why and how chi-square-difference tests are easily misused. The fundamental issue is that chi-square-difference tests are developed under the assumption that the base model is sufficiently close to the population, and a nonsignificant chi-square statistic tells little about how good the model is. To overcome this issue, this article further proposes that null hypothesis testing in multigroup SEM be replaced by equivalence testing, which allows researchers to effectively control the size of misspecification before moving on to testing a more restricted model. R code is also provided to facilitate the applications of equivalence testing for multigroup SEM. (PsycINFO Database Record

Journal ArticleDOI
TL;DR: Vuong's likelihood ratio tests are applied to the comparison of nonnested structural equation models (SEMs) and offer researchers a useful tool for nonnesting SEM comparison, with barriers to test implementation now removed.
Abstract: In this article, we apply Vuong's (1989) likelihood ratio tests of nonnested models to the comparison of nonnested structural equation models (SEMs). Similar tests have been previously applied in SEM contexts (especially to mixture models), though the nonstandard output required to conduct the tests has limited their use and study. We review the theory underlying the tests and show how they can be used to construct interval estimates for differences in nonnested information criteria. Through both simulation and application, we then study the tests' performance in nonmixture SEMs and describe their general implementation via free R packages. The tests offer researchers a useful tool for nonnested SEM comparison, with barriers to test implementation now removed. (PsycINFO Database Record

Journal ArticleDOI
TL;DR: The introduction to this special issue on psychological research involving big data summarizes the highlights of 10 articles that address a number of important and inspiring perspectives, issues, and applications relevant to big data research discussed in the articles.
Abstract: The introduction to this special issue on psychological research involving big data summarizes the highlights of 10 articles that address a number of important and inspiring perspectives, issues, and applications. Four common themes that emerge in the articles with respect to psychological research conducted in the area of big data are mentioned, including: (a) The benefits of collaboration across disciplines, such as those in the social sciences, applied statistics, and computer science. Doing so assists in grounding big data research in sound theory and practice, as well as in affording effective data retrieval and analysis. (b) Availability of large data sets on Facebook, Twitter, and other social media sites that provide a psychological window into the attitudes and behaviors of a broad spectrum of the population. (c) Identifying, addressing, and being sensitive to ethical considerations when analyzing large data sets gained from public or private sources. (d) The unavoidable necessity of validating predictive models in big data by applying a model developed on 1 dataset to a separate set of data or hold-out sample. Translational abstracts that summarize the articles in very clear and understandable terms are included in Appendix A, and a glossary of terms relevant to big data research discussed in the articles is presented in Appendix B. (PsycINFO Database Record

Journal ArticleDOI
TL;DR: In this article, the authors present a flexible full-information approach to modeling multiple user-defined response styles across multiple constructs of interest. But the model is based on a novel parameterization of the multidimensional nominal response model that separates estimation of overall item slopes from the scoring functions for each item and latent trait.
Abstract: We present a flexible full-information approach to modeling multiple user-defined response styles across multiple constructs of interest. The model is based on a novel parameterization of the multidimensional nominal response model that separates estimation of overall item slopes from the scoring functions (indicating the order of categories) for each item and latent trait. This feature allows the definition of response styles to vary across items as well as overall item slopes that vary across items for both substantive and response style dimensions. We compared the model with similar approaches using examples from the smoking initiative of the Patient-Reported Outcomes Measurement Information System. A small set of simulations showed that the estimation approach is able to recover model parameters, factor scores, and reasonable estimates of standard errors. Furthermore, these simulations suggest that failing to include response style factors (when present in the data generating model) has adverse consequences for substantive trait factor score recovery. (PsycINFO Database Record

Journal ArticleDOI
TL;DR: An overview of SEM forests is provided and their utility in the context of cross-sectional factor models of intelligence and episodic memory is illustrated, including differences in latent factor profiles or developmental trajectories.
Abstract: Structural equation model (SEM) trees, a combination of SEMs and decision trees, have been proposed as a data-analytic tool for theory-guided exploration of empirical data. With respect to a hypothesized model of multivariate outcomes, such trees recursively find subgroups with similar patterns of observed data. SEM trees allow for the automatic selection of variables that predict differences across individuals in specific theoretical models, for instance, differences in latent factor profiles or developmental trajectories. However, SEM trees are unstable when small variations in the data can result in different trees. As a remedy, SEM forests, which are ensembles of SEM trees based on resamplings of the original dataset, provide increased stability. Because large forests are less suitable for visual inspection and interpretation, aggregate measures provide researchers with hints on how to improve their models: (a) variable importance is based on random permutations of the out-of-bag (OOB) samples of the individual trees and quantifies, for each variable, the average reduction of uncertainty about the model-predicted distribution; and (b) case proximity enables researchers to perform clustering and outlier detection. We provide an overview of SEM forests and illustrate their utility in the context of cross-sectional factor models of intelligence and episodic memory. We discuss benefits and limitations, and provide advice on how and when to use SEM trees and forests in future research. (PsycINFO Database Record

Journal ArticleDOI
TL;DR: The shifted Wald (SW) distribution is proposed and demonstrated as both a useful measurement tool and intraindividual process model for psychological response time (RT) data and how the approach can be easily generalized to a variety of experimental domains is shown.
Abstract: We propose and demonstrate the shifted Wald (SW) distribution as both a useful measurement tool and intraindividual process model for psychological response time (RT) data. Furthermore, we develop a methodology and fitting approach that readers can easily access. As a measurement tool, the SW provides a detailed quantification of the RT data that is more sophisticated than mean and SD comparisons. As an intraindividual process model, the SW provides a cognitive model for the response process in terms of signal accumulation and the threshold needed to respond. The details and importance of both of these features are developed, and we show how the approach can be easily generalized to a variety of experimental domains. The versatility and usefulness of the approach is demonstrated on 3 published data sets, each with a different canonical mode of responding: manual, vocal, and oculomotor modes. In addition, model-fitting code is included with the article.

Journal ArticleDOI
TL;DR: This article used longitudinal Twitter data across three case studies to examine the impact of violence near or on college campuses in the communities of Isla Vista, CA, Flagstaff, AZ, and Roseburg, OR, compared with control communities, between 2014 and 2015.
Abstract: Studying communities impacted by traumatic events is often costly, requires swift action to enter the field when disaster strikes, and may be invasive for some traumatized respondents. Typically, individuals are studied after the traumatic event with no baseline data against which to compare their postdisaster responses. Given these challenges, we used longitudinal Twitter data across 3 case studies to examine the impact of violence near or on college campuses in the communities of Isla Vista, CA, Flagstaff, AZ, and Roseburg, OR, compared with control communities, between 2014 and 2015. To identify users likely to live in each community, we sought Twitter accounts local to those communities and downloaded tweets of their respective followers. Tweets were then coded for the presence of event-related negative emotion words using a computerized text analysis method (Linguistic Inquiry and Word Count, LIWC). In Case Study 1, we observed an increase in postevent negative emotion expression among sampled followers after mass violence, and show how patterns of response appear differently based on the timeframe under scrutiny. In Case Study 2, we replicate the pattern of results among users in the control group from Case Study 1 after a campus shooting in that community killed 1 student. In Case Study 3, we replicate this pattern in another group of Twitter users likely to live in a community affected by a mass shooting. We discuss conducting trauma-related research using Twitter data and provide guidance to researchers interested in using Twitter to answer their own research questions in this domain. (PsycINFO Database Record

Journal ArticleDOI
TL;DR: The variance of the adjusted Rand index is provided and it is shown that a normal approximation is appropriate across a wide range of sample sizes and varying numbers of clusters and that confidence intervals based on the normal distribution have desirable levels of coverage and accuracy.
Abstract: For 30 years, the adjusted Rand index has been the preferred method for comparing 2 partitions (e.g., clusterings) of a set of observations. Although the index is widely used, little is known about its variability. Herein, the variance of the adjusted Rand index (Hubert & Arabie, 1985) is provided and its properties are explored. It is shown that a normal approximation is appropriate across a wide range of sample sizes and varying numbers of clusters. Further, it is shown that confidence intervals based on the normal distribution have desirable levels of coverage and accuracy. Finally, the first power analysis evaluating the ability to detect differences between 2, different adjusted Rand indices is provided. (PsycINFO Database Record

Journal ArticleDOI
TL;DR: Evaluating the application of Meng and Rubin’s (1992) pooling procedure for likelihood ratio statistic to the SEM test of model fit and exploring the possibility of using this test statistic to define imputation-based versions of common fit indices such as the TLI, CFI, and RMSEA found it to be well-calibrated with those of full information maximum likelihood estimation.
Abstract: Multiple imputation has enjoyed widespread use in social science applications, yet the application of imputation-based inference to structural equation modeling has received virtually no attention in the literature. Thus, this study has 2 overarching goals: evaluate the application of Meng and Rubin's (1992) pooling procedure for likelihood ratio statistic to the SEM test of model fit, and explore the possibility of using this test statistic to define imputation-based versions of common fit indices such as the TLI, CFI, and RMSEA. Computer simulation results suggested that, when applied to a correctly specified model, the pooled likelihood ratio statistic performed well as a global test of model fit and was closely calibrated to the corresponding full information maximum likelihood (FIML) test statistic. However, when applied to misspecified models with high rates of missingness (30%-40%), the imputation-based test statistic generally exhibited lower power than that of FIML. Using the pooled test statistic to construct imputation-based versions of the TLI, CFI, and RMSEA worked well and produced indices that were well-calibrated with those of full information maximum likelihood estimation. This article gives Mplus and R code to implement the pooled test statistic, and it offers a number of recommendations for future research. (PsycINFO Database Record

Journal ArticleDOI
TL;DR: This article reviews and discusses an important practical issue in propensity analysis, in which the baseline covariates (potential confounders) and the outcome have missing values (incompletely observed).
Abstract: Propensity score analysis is a method that equates treatment and control groups on a comprehensive set of measured confounders in observational (nonrandomized) studies. A successful propensity score analysis reduces bias in the estimate of the average treatment effect in a nonrandomized study, making the estimate more comparable with that obtained from a randomized experiment. This article reviews and discusses an important practical issue in propensity analysis, in which the baseline covariates (potential confounders) and the outcome have missing values (incompletely observed). We review the statistical theory of propensity score analysis and estimation methods for propensity scores with incompletely observed covariates. Traditional logistic regression and modern machine learning methods (e.g., random forests, generalized boosted modeling) as estimation methods for incompletely observed covariates are reviewed. Balance diagnostics and equating methods for incompletely observed covariates are briefly described. Using an empirical example, the propensity score estimation methods for incompletely observed covariates are illustrated and compared. (PsycINFO Database Record

Journal ArticleDOI
TL;DR: In this article, a multivariate extension to a decision tree ensemble method called gradient boosted regression trees (Friedman, 2001) is introduced, which is useful for identifying important predictors, detecting predictors with nonlinear effects and interactions without specification of such effects, and for identifying predictors that cause 2 or more outcome variables to covary.
Abstract: Technology and collaboration enable dramatic increases in the size of psychological and psychiatric data collections, but finding structure in these large data sets with many collected variables is challenging. Decision tree ensembles such as random forests (Strobl, Malley, & Tutz, 2009) are a useful tool for finding structure, but are difficult to interpret with multiple outcome variables which are often of interest in psychology. To find and interpret structure in data sets with multiple outcomes and many predictors (possibly exceeding the sample size), we introduce a multivariate extension to a decision tree ensemble method called gradient boosted regression trees (Friedman, 2001). Our extension, multivariate tree boosting, is a method for nonparametric regression that is useful for identifying important predictors, detecting predictors with nonlinear effects and interactions without specification of such effects, and for identifying predictors that cause 2 or more outcome variables to covary. We provide the R package "mvtboost" to estimate, tune, and interpret the resulting model, which extends the implementation of univariate boosting in the R package "gbm" (Ridgeway, 2015) to continuous, multivariate outcomes. To illustrate the approach, we analyze predictors of psychological well-being (Ryff & Keyes, 1995). Simulations verify that our approach identifies predictors with nonlinear effects and achieves high prediction accuracy, exceeding or matching the performance of (penalized) multivariate multiple regression and multivariate decision trees over a wide range of conditions. (PsycINFO Database Record

Journal ArticleDOI
TL;DR: Examining the population performance of items, parcels, and scales under a range of model misspecifications, examining structural path coefficient accuracy, power, and population fit indices revealed that any parceling scheme typically results in more accurate structural parameters, but less power to detect the misspecification.
Abstract: Previous research has suggested that the use of item parcels in structural equation modeling can lead to biased structural coefficient estimates and low power to detect model misspecification. The present article describes the population performance of items, parcels, and scales under a range of model misspecifications, examining structural path coefficient accuracy, power, and population fit indices. Results revealed that, under measurement model misspecification, any parceling scheme typically results in more accurate structural parameters, but less power to detect the misspecification. When the structural model is misspecified, parcels do not affect parameter accuracy, but they do substantially elevate power to detect the misspecification. Under particular, known measurement model misspecifications, a parceling scheme can be chosen to produce the most accurate estimates. The root mean square error of approximation and the standardized root mean square residual are more sensitive to measurement model misspecification in parceled models than the likelihood ratio test statistic. (PsycINFO Database Record

Journal ArticleDOI
TL;DR: It is concluded that despite their differences from the classic null-hypothesis testing approach-or perhaps because of them-SLT methods may hold value as a statistically rigorous approach to exploratory regression.
Abstract: Statistical learning theory (SLT) is the statistical formulation of machine learning theory, a body of analytic methods common in "big data" problems. Regression-based SLT algorithms seek to maximize predictive accuracy for some outcome, given a large pool of potential predictors, without overfitting the sample. Research goals in psychology may sometimes call for high dimensional regression. One example is criterion-keyed scale construction, where a scale with maximal predictive validity must be built from a large item pool. Using this as a working example, we first introduce a core principle of SLT methods: minimization of expected prediction error (EPE). Minimizing EPE is fundamentally different than maximizing the within-sample likelihood, and hinges on building a predictive model of sufficient complexity to predict the outcome well, without undue complexity leading to overfitting. We describe how such models are built and refined via cross-validation. We then illustrate how 3 common SLT algorithms-supervised principal components, regularization, and boosting-can be used to construct a criterion-keyed scale predicting all-cause mortality, using a large personality item pool within a population cohort. Each algorithm illustrates a different approach to minimizing EPE. Finally, we consider broader applications of SLT predictive algorithms, both as supportive analytic tools for conventional methods, and as primary analytic tools in discovery phase research. We conclude that despite their differences from the classic null-hypothesis testing approach-or perhaps because of them-SLT methods may hold value as a statistically rigorous approach to exploratory regression. (PsycINFO Database Record

Journal ArticleDOI
TL;DR: A power-calibrated effect size (PCES) approach to sample size planning is proposed that accounts for the uncertainty associated with an effect size estimate in a properly calibrated manner: sample sizes determined on the basis of the PCES are neither too small nor too large and thus provide the desired level of power.
Abstract: Statistical power and thus the sample size required to achieve some desired level of power depend on the size of the effect of interest. However, effect sizes are seldom known exactly in psychological research. Instead, researchers often possess an estimate of an effect size as well as a measure of its uncertainty (e.g., a standard error or confidence interval). Previous proposals for planning sample sizes either ignore this uncertainty thereby resulting in sample sizes that are too small and thus power that is lower than the desired level or overstate the impact of this uncertainty thereby resulting in sample sizes that are too large and thus power that is higher than the desired level. We propose a power-calibrated effect size (PCES) approach to sample size planning that accounts for the uncertainty associated with an effect size estimate in a properly calibrated manner: sample sizes determined on the basis of the PCES are neither too small nor too large and thus provide the desired level of power. We derive the PCES for comparisons of independent and dependent means, comparisons of independent and dependent proportions, and tests of correlation coefficients. We also provide a tutorial on setting sample sizes for a replication study using data from prior studies and discuss an easy-to-use website and code that implement our PCES approach to sample size planning.

Journal ArticleDOI
TL;DR: Results of the simulation show that estimates of the fixed effect of the time-varying predictor are as accurate for these cases as for cases with positive variance estimates, and that treating theTime-variesing predictor as random and allowing negative variance estimates performs well, whether thetime-vARYing predictor is fixed or random in reality.
Abstract: Time-varying predictors in multilevel models are a useful tool for longitudinal research, whether they are the research variable of interest or they are controlling for variance to allow greater power for other variables. However, standard recommendations to fix the effect of time-varying predictors may make an assumption that is unlikely to hold in reality and may influence results. A simulation study illustrates that treating the time-varying predictor as fixed may allow analyses to converge, but the analyses have poor coverage of the true fixed effect when the time-varying predictor has a random effect in reality. A second simulation study shows that treating the time-varying predictor as random may have poor convergence, except when allowing negative variance estimates. Although negative variance estimates are uninterpretable, results of the simulation show that estimates of the fixed effect of the time-varying predictor are as accurate for these cases as for cases with positive variance estimates, and that treating the time-varying predictor as random and allowing negative variance estimates performs well whether the time-varying predictor is fixed or random in reality. Because of the difficulty of interpreting negative variance estimates, 2 procedures are suggested for selection between fixed-effect and random-effect models: comparing between fixed-effect and constrained random-effect models with a likelihood ratio test or fitting a fixed-effect model when an unconstrained random-effect model produces negative variance estimates. The performance of these 2 procedures is compared. (PsycINFO Database Record

Journal ArticleDOI
TL;DR: Dynamical correlation is a functional data analysis technique developed to measure the similarity of 2 curves that is a nonparametric approach that does not require a prespecified functional form and it places no assumption on homogeneity of the sample.
Abstract: In this article, we introduce dynamical correlation, a new method for quantifying synchrony between 2 variables with intensive longitudinal data. Dynamical correlation is a functional data analysis technique developed to measure the similarity of 2 curves. It has advantages over existing methods for studying synchrony, such as multilevel modeling. In particular, it is a nonparametric approach that does not require a prespecified functional form, and it places no assumption on homogeneity of the sample. Dynamical correlation can be easily estimated with irregularly spaced observations and tested to draw population-level inferences. We illustrate this flexible statistical technique with a simulation example and empirical data from an experiment examining interpersonal physiological synchrony between romantic partners. We discuss the advantages and limitations of the method, and how it can be extended and applied in psychological research. We also provide a set of R code for other researchers to estimate and test for dynamical correlation. (PsycINFO Database Record

Journal ArticleDOI
TL;DR: Researchers should refrain from testing the residual variance when conducting planned contrasts, and single contrasts, Bayes factors, and likelihood ratios provide reasonable alternatives that are less problematic.
Abstract: It is current practice that researchers testing specific, theory-driven predictions do not only use a planned contrast to model and test their hypotheses but also test the residual variance (the C+R approach). This analysis strategy relies on work by Abelson and Prentice (1997) who suggested that the result of a planned contrast needs to be interpreted in the light of the variance that is left after the variance explained by the contrast has been subtracted from the variance explained by the factors of the statistical model. Unfortunately, the C+R approach leads to six fundamental problems. In particular, the C+R approach (1) relies on the interpretation of a non-significant result as evidence for no effect, (2) neglects the impact of sample size, (3) creates problems for a priori power analyses, (4) may lead to significant effects that lack a meaningful interpretation, (5) may give rise to misinterpretations, and (6) is inconsistent with the interpretation of other statistical analyses. Given these flaws, researchers should refrain from testing the residual variance when conducting planned contrasts. Single contrasts, Bayes factors, and likelihood ratios provide reasonable alternatives that are less problematic.