scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Guidelines, Criteria, and Rules of Thumb for Evaluating Normed and Standardized Assessment Instruments in Psychology.

01 Jan 1994-Psychological Assessment (American Psychological Association)-Vol. 6, Iss: 4, pp 284-290
TL;DR: In this paper, the authors provide guidelines, guidelines, and simple rules of thumb to assist the clinician faced with the challenge of choosing an appropriate test instrument for a given psychological assessment.
Abstract: In the context of the development of prototypic assessment instruments in the areas of cognition, personality, and adaptive functioning, the issues of standardization, norming procedures, and the important psychometrics of test reliability and validity are evaluated critically. Criteria, guidelines, and simple rules of thumb are provided to assist the clinician faced with the challenge of choosing an appropriate test instrument for a given psychological assessment. Clinicians are often faced with the critical challenge of choosing the most appropriate available test instrument for a given psychological assessment of a child, adolescent, or adult of a particular age, gender, and class of disability. It is the purpose of this report to provide some criteria, guidelines, or simple rules of thumb to aid in this complex scientific decision. As such, it draws upon my experience with issues of test development, standardization, norming procedures, and important psychometrics, namely, test reliability and validity. As I and my colleagues noted in an earlier publication, the major areas of psychological functioning, in the normal development of infants, children, adolescents, adults, and elderly people, include cognitive, academic, personality, and adaptive behaviors (Sparrow, Fletcher, & Cicchetti, 1985). As such, the major examples or applications discussed in this article derive primarily, although not exclusively, from these several areas of human functioning.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
01 Jan 2012
TL;DR: This paper provides an overview of methodological issues related to the assessment of IRR with a focus on study design, selection of appropriate statistics, and the computation, interpretation, and reporting of some commonly-used IRR statistics.
Abstract: Many research designs require the assessment of inter-rater reliability (IRR) to demonstrate consistency among observational ratings provided by multiple coders. However, many studies use incorrect statistical procedures, fail to fully report the information necessary to interpret their results, or do not address how IRR affects the power of their subsequent analyses for hypothesis testing. This paper provides an overview of methodological issues related to the assessment of IRR with a focus on study design, selection of appropriate statistics, and the computation, interpretation, and reporting of some commonly-used IRR statistics. Computational examples include SPSS and R syntax for computing Cohen’s kappa and intra-class correlations to assess IRR.

3,046 citations

Journal ArticleDOI
TL;DR: Results of this study suggested that it is a specific group of externalized behaviours that are the most strongly associated with both parent and teacher stress.
Abstract: Background The purpose of this study was to examine the correlates of caregiver stress in a large sample of young people with autism spectrum disorders (ASDs). Two main objectives were to: (  ) disentangle the effects of behaviour problems and level of functioning on caregiver stress; and (  ) measure the stability of behaviour problems and caregiver stress. Methods Parents or teachers of  young people with ASDs completed measures of stress, behaviour problems and social competence. Parents also completed an adaptive behaviour scale. Eighty-one young people were rated twice at a  -year interval. Results Parents and teachers did not perfectly agree on the nature and severity of behaviour problems. However, both sets of ratings indicated that behaviour problems were strongly associated with stress. Conduct problems in particular were significant predictors of stress. Adaptive skills were not significantly associated with caregiver stress. Parental reports of behaviour problems and stress were quite stable over the  -year interval, much more so than teacher reports. Parent ratings suggested that behaviour problems and stress exacerbated each other over time. This transactional model did not fit the teacher data. Conclusion Results of this study suggested that it is a specific group of externalized behaviours that are the most strongly associated with both parent and teacher stress. Results were discussed from methodological and conceptual perspectives.

1,063 citations

Journal ArticleDOI
16 May 2018-PLOS ONE
TL;DR: The RAVDESS is a validated multimodal database of emotional speech and song consisting of 24 professional actors, vocalizing lexically-matched statements in a neutral North American accent, which shows high levels of emotional validity and test-retest intrarater reliability.
Abstract: The RAVDESS is a validated multimodal database of emotional speech and song. The database is gender balanced consisting of 24 professional actors, vocalizing lexically-matched statements in a neutral North American accent. Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions, and song contains calm, happy, sad, angry, and fearful emotions. Each expression is produced at two levels of emotional intensity, with an additional neutral expression. All conditions are available in face-and-voice, face-only, and voice-only formats. The set of 7356 recordings were each rated 10 times on emotional validity, intensity, and genuineness. Ratings were provided by 247 individuals who were characteristic of untrained research participants from North America. A further set of 72 participants provided test-retest data. High levels of emotional validity and test-retest intrarater reliability were reported. Corrected accuracy and composite "goodness" measures are presented to assist researchers in the selection of stimuli. All recordings are made freely available under a Creative Commons license and can be downloaded at https://doi.org/10.5281/zenodo.1188976.

1,036 citations

Journal ArticleDOI
TL;DR: Clinician self-reports of MI skillfulness were unrelated to proficiency levels in observed practice, and coaching and/or feedback also increased posttraining proficiency.
Abstract: The Evaluating Methods for Motivational Enhancement Education trial evaluated methods for learning motivational interviewing (MI). Licensed substance abuse professionals (N = 140) were randomized to 5 training conditions: (a) clinical workshop only; (b) workshop plus practice feedback; (c) workshop plus individual coaching sessions; (d) workshop, feedback, and coaching; or (e) a waiting list control group of self-guided training. Audiotaped practice samples were analyzed at baseline, posttraining, and 4, 8, and 12 months later. Relative to controls, the 4 trained groups showed larger gains in proficiency. Coaching and/or feedback also increased posttraining proficiency. After delayed training, the waiting list group showed modest gains in proficiency. Posttraining proficiency was generally well maintained throughout follow-up. Clinician self-reports of MI skillfulness were unrelated to proficiency levels in observed practice.

1,004 citations


Cites methods from "Guidelines, Criteria, and Rules of ..."

  • ...Cicchetti (1994) has proposed categories to evaluate the usefulness of ICCs in clinical instruments: below .40 poor, .40 to .59 fair, .60 to .74 good, and .75 to 1.00 excellent....

    [...]

  • ...Cicchetti (1994) has proposed categories to evaluate the usefulness of ICCs in clinical instruments: below .40 poor, .40 to .59 fair, .60 to .74 good, and .75 to 1.00 excellent....

    [...]

Journal ArticleDOI
TL;DR: The BDI-II is a relevant psychometric instrument, showing high reliability, capacity to discriminate between depressed and non-depressed subjects, and improved concurrent, content, and structural validity.

906 citations

References
More filters
Journal ArticleDOI
TL;DR: A general statistical methodology for the analysis of multivariate categorical data arising from observer reliability studies is presented and tests for interobserver bias are presented in terms of first-order marginal homogeneity and measures of interob server agreement are developed as generalized kappa-type statistics.
Abstract: This paper presents a general statistical methodology for the analysis of multivariate categorical data arising from observer reliability studies. The procedure essentially involves the construction of functions of the observed proportions which are directed at the extent to which the observers agree among themselves and the construction of test statistics for hypotheses involving these functions. Tests for interobserver bias are presented in terms of first-order marginal homogeneity and measures of interobserver agreement are developed as generalized kappa-type statistics. These procedures are illustrated with a clinical diagnosis example from the epidemiological literature.

64,109 citations

Journal ArticleDOI
TL;DR: The difficulties inherent in obtaining consistent and adequate diagnoses for the purposes of research and therapy have been pointed out and a wide variety of psychiatric rating scales have been developed.
Abstract: The difficulties inherent in obtaining consistent and adequate diagnoses for the purposes of research and therapy have been pointed out by a number of authors. Pasamanick12in a recent article viewed the low interclinician agreement on diagnosis as an indictment of the present state of psychiatry and called for "the development of objective, measurable and verifiable criteria of classification based not on personal or parochial considerations, but on behavioral and other objectively measurable manifestations." Attempts by other investigators to subject clinical observations and judgments to objective measurement have resulted in a wide variety of psychiatric rating scales.4,15These have been well summarized in a review article by Lorr11on "Rating Scales and Check Lists for the Evaluation of Psychopathology." In the area of psychological testing, a variety of paper-and-pencil tests have been devised for the purpose of measuring specific

35,176 citations

Journal ArticleDOI
TL;DR: In this article, the authors present guidelines for choosing among six different forms of the intraclass correlation for reliability studies in which n target are rated by k judges, and the confidence intervals for each of the forms are reviewed.
Abstract: Reliability coefficients often take the form of intraclass correlation coefficients. In this article, guidelines are given for choosing among six different forms of the intraclass correlation for reliability studies in which n target are rated by k judges. Relevant to the choice of the coefficient are the appropriate statistical model for the reliability and the application to be made of the reliability results. Confidence intervals for each of the forms are reviewed.

21,185 citations

Book
01 Jan 1981
TL;DR: In this paper, the basic theory of Maximum Likelihood Estimation (MLE) is used to detect a difference between two different proportions of a given proportion in a single proportion.
Abstract: Preface.Preface to the Second Edition.Preface to the First Edition.1. An Introduction to Applied Probability.2. Statistical Inference for a Single Proportion.3. Assessing Significance in a Fourfold Table.4. Determining Sample Sizes Needed to Detect a Difference Between Two Proportions.5. How to Randomize.6. Comparative Studies: Cross-Sectional, Naturalistic, or Multinomial Sampling.7. Comparative Studies: Prospective and Retrospective Sampling.8. Randomized Controlled Trials.9. The Comparison of Proportions from Several Independent Samples.10. Combining Evidence from Fourfold Tables.11. Logistic Regression.12. Poisson Regression.13. Analysis of Data from Matched Samples.14. Regression Models for Matched Samples.15. Analysis of Correlated Binary Data.16. Missing Data.17. Misclassification Errors: Effects, Control, and Adjustment.18. The Measurement of Interrater Agreement.19. The Standardization of Rates.Appendix A. Numerical Tables.Appendix B. The Basic Theory of Maximum Likelihood Estimation.Appendix C. Answers to Selected Problems.Author Index.Subject Index.

16,435 citations