TL;DR: In this article, the authors show that valid inferences on teaching drawn from students' test scores require that tests are sensitive to the instruction students received in class and that measures of the test items' instructional...
Abstract: Valid inferences on teaching drawn from students’ test scores require that tests are sensitive to the instruction students received in class. Accordingly, measures of the test items’ instructional ...
TL;DR: In this paper, a multi-year study of over 200 fourth and fifth grade US teachers revealed that teacher knowledge positively predicts student achievement gains. But, empirical findings on the distinguishability of these two knowledge components, and their relationship with student outcomes, are mixed.
Abstract: During the last three decades, scholars have proposed several conceptual structures to represent teacher knowledge. A common denominator in this work is the assumption that disciplinary knowledge and the knowledge needed for teaching are distinct. However, empirical findings on the distinguishability of these two knowledge components, and their relationship with student outcomes, are mixed. In this replication and extension study, we explore these issues, drawing on evidence from a multi-year study of over 200 fourth- and fifth-grade US teachers. Exploratory and confirmatory factor analyses of these data suggested a single dimension for teacher knowledge. Value-added models predicting student test outcomes on both state tests and a test with cognitively challenging tasks revealed that teacher knowledge positively predicts student achievement gains. We consider the implications of these findings for teacher selection and education.
TL;DR: Aktive Beteiligung am Unterrichtsgesprach gilt als wichtiger Baustein schulischen Lernens und als Indikator fur bildungsbezogene Partizipation.
Abstract: Zusammenfassung. Eine aktive Beteiligung am Unterrichtsgesprach gilt als wichtiger Baustein schulischen Lernens und als Indikator fur bildungsbezogene Partizipation. In der vorliegenden Studie wurd...
TL;DR: In this article, the authors investigated test and item sensitivity to teaching quality, reanalyzing data from a quasi-experimental intervention study in primary school science education (1026 students, 53 classes, Mage =※8.79 years, SDage= 0.49, 50% female).
TL;DR: In this article, conditions and consequences of teacher popularity in primary schools were investigated, and teacher popularity was embedded in a theoretical framework that describes relationships between teacher competence, teaching quality, and student outcomes.
Abstract: In this study, we investigated conditions and consequences of teacher popularity in primary schools. Teacher popularity is embedded in a theoretical framework that describes relationships between teacher competence, teaching quality, and student outcomes. In the empirical analyses, we used multilevel modeling to distinguish between individual students’ liking of the teacher and a teacher’s popularity as rated by the whole class (N = 1070 students, 54 teachers). The classroom level composite of the extent to which students liked their teacher was a reliable indicator of teacher popularity. Teacher popularity was associated with teacher self-reports of self-efficacy and teaching enthusiasm and with external observers’ ratings of teaching quality. The grades students received were not related to the popularity ratings. In a longitudinal study, teacher popularity predicted students’ learning gains and interest development over and above the effects of teaching quality. These results suggest that teacher popularity can be a useful and informative indicator in research on students’ academic development and teacher effectiveness.
TL;DR: In this article, the authors discuss empirically unklar, ob a test nicht instruktionssensitiv oder ein Unterricht nicht effektiv war.
Abstract: Testergebnisse von Schulerinnen und Schulern dienen regelmasig als ein zentrales Kriterium fur die Beurteilung der Effektivitat von Schule und Unterricht. Gultige Ruckschlusse uber Schule und Unterricht setzen voraus, dass die eingesetzten Testinstrumente mogliche Effekte des Unterrichts auffangen konnen, also instruktionssensitiv sind. Jedoch wird diese Voraussetzung nur selten empirisch uberpruft. Somit bleibt mitunter unklar, ob ein Test nicht instruktionssensitiv oder ein Unterricht nicht effektiv war. Die Klarung dieser Frage erfordert die empirische Untersuchung der Instruktionssensitivitat der eingesetzten Tests und Items. Wahrend die Instruktionssensitivitat in den USA bereits seit Langem diskutiert wird, findet das Konzept im deutschsprachigen Diskurs bislang nur wenig Beachtung. Unsere Arbeit zielt daher darauf ab, das Konzept Instruktionssensitivitat in den deutschsprachigen Diskurs uber schulische Leistungsmessung einzubetten. Dazu werden drei Themenfelder behandelt, (a) der theoretische Hintergrund des Konzepts Instruktionssensitivitat, (b) die Messung von Instruktionssensitivitat sowie (c) die Identifikation von weiteren Forschungsbedarfen.
"Absolute and relative measures of i..." refers background in this paper
...To test the hypothesis of whether an item is instructionally sensitive, various measures have been proposed (see Haladyna & Roid, 1981; Polikoff, 2010)....
[...]
...In consistence with the predominant statistical notion of item sensitivity (see Haladyna & Roid, 1981; Haladyna, 2004; Polikoff, 2010), test sensitivity may be defined as the overall (i.e., unconditional) variation of test scores across either time points, groups, or both (cf. Naumann et al., 2016)....
TL;DR: Bayesian tests (Bayes factor, deviance information criterion) are proposed which enable multiple marginal invariance hypotheses to be tested simultaneously and show that background information can be used to explain cross-national variation in item functioning.
Abstract: Random item effects models provide a natural framework for the exploration of violations of measurement invariance without the need for anchor items. Within the random item effects modelling framework, Bayesian tests (Bayes factor, deviance information criterion) are proposed which enable multiple marginal invariance hypotheses to be tested simultaneously. The performance of the tests is evaluated with a simulation study which shows that the tests have high power and low Type I error rate. Data from the European Social Survey are used to test for measurement invariance of attitude towards immigrant items and to show that background information can be used to explain cross-national variation in item functioning.
49 citations
"Absolute and relative measures of i..." refers methods in this paper
...We checked items’ absolute and relative differential sensitivity, that is, the variance components f22i,
following a procedure by Verhagen and colleagues (Verhagen & Fox, 2013; Verhagen, Levy, Millsap, & Fox, 2015)....
TL;DR: In this article, the authors developed a method for capturing the alignment between how teachers bring standards to life in their classrooms and how the standards are defined on a test, and found that the best predictors of classroom achievement were the match between how the state's academic standards were defined on the state test.
Abstract: The accuracy of achievement test score inferences largely depends on the sensitivity of scores to instruction focused on tested objectives. Sensitivity requirements are particularly challenging for standards-based assessments because a variety of plausible instructional differences across classrooms must be detected. For this study, we developed a new method for capturing the alignment between how teachers bring standards to life in their classrooms and how the standards are defined on a test. Teachers were asked to report the degree to which they emphasized the state's academic standards, and to describe how they taught certain objectives from the standards. Two curriculum experts judged the alignment between how teachers brought the objectives to life in their classrooms and how the objectives were operationalized on the state test. Emphasis alone did not account for achievement differences among classrooms. The best predictors of classroom achievement were the match between how the standards w...
TL;DR: In this article, the authors consider the inclusion of person-by-item predictors into the model and distinguish between static and dynamic interaction models, focusing on models for differential item functioning (DIF) and local item dependencies.
Abstract: In this chapter we consider the inclusion of person-by-item predictors into the model. Unlike person predictors or item predictors, person-by-item predictors vary both within and between persons. The inclusion of person-by-item predictors besides person predictors or item predictors is relevant for modeling various phenomena such as differential item functioning (DIF) and local item dependencies (LID) (see Zwinderman, 1997). To describe models with person-by-item predictors we will distinguish between static and dynamic interaction models. We concentrate here on models for DIF and LID, but the interaction concept is of course more general.
45 citations
"Absolute and relative measures of i..." refers background in this paper
...DIF approaches from the groups perspective focus on cross-sectional data and may become computationally rather demanding when accounting for multilevel structures (multilevel DIF; Meulders & Xie, 2004)....
TL;DR: In this paper, the authors examined the potential to improve matching by conditioning simultaneously on test score and a categorical variable representing the educational background of the examinees using a logistic regression procedure.
Abstract: When tests are designed to measure dimensionally complex material, DIF analysis with matching based on the total test score may be inappropriate. Previous research has demonstrated that matching can be improved by using multiple internal or both internal and external measures to more completely account for the latent ability space. The present article extends this line of research by examining the potential to improve matching by conditioning simultaneously on test score and a categorical variable representing the educational background of the examinees. The responses of male and female examinees from a test of medical competence were analyzed using a logistic regression procedure. Results show a substantial reduction in the number of items identified as displaying significant DIF when conditioning is based on total test score and a variable representing educational background as opposed to total test score only.