scispace - formally typeset
Search or ask a question
Journal ArticleDOI

On the Validity of Student Evaluation of Teaching: The State of the Art

01 Dec 2013-Review of Educational Research (SAGE Publications)-Vol. 83, Iss: 4, pp 598-642
TL;DR: The authors provided an extensive overview of the recent literature on student evaluation of teaching (SET) in higher education, based on the SET meta-validation model, drawing upon research reports published in peer-reviewed journals since 2000.
Abstract: This article provides an extensive overview of the recent literature on student evaluation of teaching (SET) in higher education. The review is based on the SET meta-validation model, drawing upon research reports published in peer-reviewed journals since 2000. Through the lens of validity, we consider both the more traditional research themes in the field of SET (i.e., the dimensionality debate, the ‘bias’ question, and questionnaire design) and some recent trends in SET research, such as online SET and bias investigations into additional teacher personal characteristics. The review provides a clear idea of the state of the art with regard to research on SET, thus allowing researchers to formulate suggestions for future research. It is argued that SET remains a current yet delicate topic in higher education, as well as in education research. Many stakeholders are not convinced of the usefulness and validity of SET for both formative and summative purposes. Research on SET has thus far failed to provide c...
Citations
More filters
Journal ArticleDOI
TL;DR: This article conducted a meta-analysis of all multisection studies and found no significant correlations between student evaluation of teaching (SET) ratings and learning, and suggested that institutions focused on student learning and career success may want to abandon SET ratings as a measure of faculty's teaching effectiveness.

352 citations

Journal ArticleDOI
TL;DR: This article used data from a French university to analyze gender biases in student evaluations of teaching (SETs) and found that male students express a bias in favor of male professors, despite the fact that students appear to learn as much from women as from men.

296 citations

Journal ArticleDOI
TL;DR: This paper found that teachers' conceptions of and approaches to teaching with technology are central for the successful implementation of educational technologies in higher education, and that teachers were concerned with the success of the implementation of higher education.
Abstract: Research indicates that teachers’ conceptions of and approaches to teaching with technology are central for the successful implementation of educational technologies in higher education. This study ...

256 citations

Journal ArticleDOI
TL;DR: This paper reported that Australian universities are facing a significant and growing problem of students outsourcing their assessment to third parties, a behaviour commonly known as ''c...'' behaviour commonly referred to as "c...
Abstract: If media reports are to be believed, Australian universities are facing a significant and growing problem of students outsourcing their assessment to third parties, a behaviour commonly known as ‘c...

226 citations

Journal ArticleDOI
TL;DR: A review of over 100 years of research on grading considers five types of studies: early studies of the reliability of grades, quantitative studies of composition of K-12 report card grades, survey and interview studies of teachers' perceptions of grade, studies of standards-based grading, and grading in higher education.
Abstract: Grading refers to the symbols assigned to individual pieces of student work or to composite measures of student performance on report cards. This review of over 100 years of research on grading considers five types of studies: (a) early studies of the reliability of grades, (b) quantitative studies of the composition of K–12 report card grades, (c) survey and interview studies of teachers’ perceptions of grades, (d) studies of standards-based grading, and (e) grading in higher education. Early 20th-century studies generally condemned teachers’ grades as unreliable. More recent studies of the relationships of grades to tested achievement and survey studies of teachers’ grading practices and beliefs suggest that grades assess a multidimensional construct containing both cognitive and noncognitive factors reflecting what teachers value in student work. Implications for future research and for grading practices are discussed.

182 citations

References
More filters
Journal ArticleDOI
TL;DR: It seems clear that the items in the Edwards Social Desirability Scale would, of necessity, have extreme social desirability scale positions or, in other words, be statistically deviant.
Abstract: It has long been recognized that personality test scores are influenced by non-test-relevant response determinants. Wiggins and Rumrill (1959) distinguish three approaches to this problem. Briefly, interest in the problem of response distortion has been concerned with attempts at statistical correction for "faking good" or "faking bad" (Meehl & Hathaway, 1946), the analysis of response sets (Cronbach, 1946,1950), and ratings of the social desirability of personality test items (Edwards, 19 5 7). A further distinction can be made, however, which results in a somewhat different division of approaches to the question of response distortion. Common to both the Meehl and Hathaway corrections for faking good and faking bad and Cronbach's notion of response sets is an interest in the test behavior of the subject(S). By social desirability, on the other hand, Edwards primarily means the "scale value for any personality statement such that the scale value indicates the position of the statement on the social desirability continuum . . ." (1957, p. 3). Social desirability, thus, has been used to refer to a characteristic of test items, i.e., their scale position on a social desirability scale. Whether the test behavior of 5s or the social desirability properties of items are the focus of interest, however, it now seems clear that underlying both these approaches is the concept of statistical deviance. In the construction of the MMPI K scale, for example, items were selected which differentiated between clinically normal persons producing abnormal te¥Tpfpfiles~snd^cTinically abnormal individuals with abnormal test profiles, and between clinically abnormal persons with normal test profiles and abnormal 5s whose test records were abnormal. Keyed responses to the K scale items tend to be statistically deviant in the parent populations. Similarly, the development of the Edwards Social Desirability Scale (SDS) illustrates this procedure. Items were drawn from various MMPI scales (F, L, K, and the Manifest Anxiety Scale [Taylor, 1953]) and submitted to judges who categorized them as either socially desirable or socially undesirable. Only items on which there was unanimous agreement among the 10 judges were included in the SDS. It seems clear that the items in Edwards SDS would, of necessity, have extreme social desirability scale positions or, in other words, be statistically deviant. Some unfortunate consequences follow from the strict use of the statistical deviance model in the development of-sOcialTtesirSbTBty scales. With items drawn from the MMPI, it is apparent that in addition to their scalability for social desirability the items may also be characterized by their content which,^n a general sense, has pathological implications. When a social desrrabtltty^scale constructed according to this procedure is then applied to a college student population, the meaning of high social desirability scores is not at all clear. When 5s given the Edwards SDS deny, for example, that their sleep is fitful and disturbed (Item 6) or that they worry quite a bit over possible misfortunes (Item 35), it cannot be determined whether these responses are attributable to social desirability or to a genuine absence of such symptoms. The probability of occurrence of the symptoms represented in MMPI items (and incorportated in the SDS)

8,478 citations


"On the Validity of Student Evaluati..." refers background in this paper

  • ...In this regard, future SET research could also explore the simultaneous administration of SET and such measures as the Marlowe-Crowne Social Desirability Bias Index (Crowne & Marlowe, 1960)....

    [...]

Journal ArticleDOI
TL;DR: In this paper, the authors propose a unified concept of construct validity, which integrates considerations of content, criteria, and consequences into a construct framework for the empirical testing of rational hypotheses about score meaning and theoretically relevant relationships.
Abstract: The traditional conception of validity divides it into three separate and substitutable types—namely, content, criterion, and construct validities. This view is fragmented and incomplete, especially because it fails to take into account both evidence of the value implications of score meaning as a basis for action and the social consequences of score use. The new unified concept of validity interrelates these issues as fundamental aspects of a more comprehensive theory of construct validity that addresses both score meaning and social values in test interpretation and test use. That is, unified validity integrates considerations of content, criteria, and consequences into a construct framework for the empirical testing of rational hypotheses about score meaning and theoretically relevant relationships, including those of an applied and a scientific nature. Six distinguishable aspects of construct validity are highlighted as a means of addressing central issues implicit in the notion of validity as a unified concept. These are content, substantive, structural, generalizability, external, and consequential aspects of construct validity. In effect, these six aspects function as general validity criteria or standards for all educational and psychological measurement, including performance assessments, which are discussed in some detail because of their increasing emphasis in educational and employment settings.

3,141 citations

01 Jan 1989

3,037 citations

Journal ArticleDOI
TL;DR: Suggestions for improving the effectiveness of evaluation strategy are to seek to obtain the highest response rates possible to all surveys; to take account of probable effects of survey design and methods on the feedback obtained when interpreting that feedback; and to enhance this action by making use of data derived from multiple methods of gathering feedback.
Abstract: This article is about differences between, and the adequacy of, response rates to online and paper‐based course and teaching evaluation surveys. Its aim is to provide practical guidance on these matters. The first part of the article gives an overview of online surveying in general, a review of data relating to survey response rates and practical advice to help boost response rates. The second part of the article discusses when a response rate may be considered large enough for the survey data to provide adequate evidence for accountability and improvement purposes. The article ends with suggestions for improving the effectiveness of evaluation strategy. These suggestions are: to seek to obtain the highest response rates possible to all surveys; to take account of probable effects of survey design and methods on the feedback obtained when interpreting that feedback; and to enhance this action by making use of data derived from multiple methods of gathering feedback.

2,413 citations


"On the Validity of Student Evaluati..." refers background in this paper

  • ...In recent years, however, electronic evaluation appears to have replaced the classic paper-andpencil questionnaire as the most common means of gathering SET in institutions throughout the world (Arnold, 2009; Nulty, 2008)....

    [...]

Journal ArticleDOI
TL;DR: In this article, six distinguishable aspects of construct validity are highlighted as a means of addressing central issues implicit in the notion of validity as a unified concept, namely, content, substantive, structural, generalizability, external, and consequential aspects.
Abstract: The traditional conception of validity divides it into three separate and substitutable types – namely, content, criterion, and construct validities. This view is fragmented and incomplete, especially in failing to take into account evidence of the value implications of score meaning as a basis for action and of the social consequences of score use. The new unified concept of validity interrelates these issues as fundamental aspects of a more comprehensive theory of construct validity addressing both score meaning and social values in both test interpretation and test use. That is, unified validity integrates considerations of content, criteria, and consequences into a construct framework for empirically testing rational hypotheses about score meaning and theoretically relevant relationships, including those of both an applied and a scientific nature. Six distinguishable aspects of construct validity are highlighted as a means of addressing central issues implicit in the notion of validity as a unified concept. These are content, substantive, structural, generalizability, external, and consequential aspects of construct validity. In effect, these six aspects function as general validity criteria or standards for all educational and psychological measurement, including performance assessments, which are discussed in some detail because of their increasing emphasis in educational and employment settings.

1,699 citations

Trending Questions (1)
What is the state of the art in the literature about university teaching?

The state of the art in the literature about university teaching is discussed in the paper.