scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Accountability, incentives and behavior: the impact of high-stakes testing in the Chicago Public Schools

01 Jun 2005-Journal of Public Economics (North-Holland)-Vol. 89, Iss: 5, pp 761-796
TL;DR: The authors examined the impact of an accountability policy implemented in the Chicago Public Schools in 1996-1997, using a panel of student-level, administrative data, and found that math and reading achievement increased sharply following the introduction of the accountability policy, in comparison to both prior achievement trends in the district and to changes experienced by other large, urban districts.
About: This article is published in Journal of Public Economics.The article was published on 2005-06-01 and is currently open access. It has received 554 citations till now. The article focuses on the topics: Accountability & Special education.

Summary (6 min read)

1. Introduction

  • If HST increased the general skill level, observed achievement gains should be reflected in other measures of student outcomes.
  • By placing low performing students in special education programs, teachers are able to exempt them from most 2 Achievement gains may also be due to increases in cheating on the part of students, teachers or administrators.
  • This paper addresses these questions in the context of a test-based accountability policy that was implemented in Chicago Public Schools in 1996-97.3.
  • On the one hand, they provide strong empirical support for general incentive theories, including the multi-task theories of Holmstrom and Milgrom (1991).

2. Background

  • The evidence on school-based accountability programs and student performance is decidedly mixed.
  • Several studies note that Texas students have made substantial achievement gains since the implementation of that state’s accountability program (Grissmer and Flanagan 1998, Grissmer et. al. 2000, Haney 2000, Klein et. al. 2000, Toenjes et. al. 2000, Deere and Strayer 2001).
  • Koretz and Barron (1998) find survey evidence that elementary teachers in Kentucky shifted the amount of time devoted to math and science across grades to correspond with the subjects tested in each grade.
  • Various studies suggest that test preparation associated with high-stakes testing may artificially inflate achievement, producing gains that are not generalizable to other exams (Linn and Graue 1990, Shepard 1990, Koretz et. al. 1991, Koretz and Barron 1998, Stecher and Barron 1998, Klein et. al. 2000).

2.2 High-Stakes Testing in Chicago

  • In 1996 the ChiPS introduced a comprehensive accountability policy designed to raise academic achievement.
  • The first component of the policy focused on holding students accountable for learning, by ending a practice commonly known as “social promotion” whereby students are advanced to the next grade regardless of ability or achievement level.
  • Students who again fail to meet the standard are required to repeat the grade, with the exception of 15-year-olds who attend newly created “transition” centers.
  • The same whether one considers the eighth grade policy to have been implemented in 1996 or 1997.

3. Empirical strategy

  • Because Chicago instituted its accountability policy district-wide in 1996-97, it is difficult to identify the causal impact of the program with certainty.
  • Similarly, improvements in the economy or other time-varying factors coincident with the policy would bias their estimates.
  • Finally, one might be worried about other policies or programs in Chicago whose impact was felt at the same time as HST, so that 0),( ≠φHighStakesCov .
  • This is essentially a difference-in-difference estimator where the first difference is a within student change over time and the second difference is a district-wide change from pre-policy to post-policy.
  • One might be particularly concerned about unobservable changes on the state or national level effecting student performance (e.g., implementation of state or federal school reform legislation).

4. Data

  • This study utilizes detailed administrative data from the ChiPS.
  • Student records include information on a student’s school, home address, demographic and family background characteristics, special education and bilingual placement, free lunch status, standardized test scores, grade retention and summer school attendance.
  • On the other hand, there was some increase in initial student achievement—e.g., prior reading achievement increased from an average of 0.89 grade equivalents below norms to 0.71 grade equivalents below norms.

5.2 The Heterogeneity of Effects Across Student and School Risk Level

  • If the improvements in student achievement were caused by the accountability policy, one might expect them to vary across students and schools.
  • Model 1 provides the average effect for all students in all of the post-policy cohorts, providing a baseline from which to compare the other results.
  • First, students in low-performing schools seem to have fared considerably better under the policy than comparable peers in higher-performing schools.
  • Moreover, the effect for marginal students appears somewhat stronger in reading than math, suggesting that there may be more intentional targeting of individual students in reading than in math, or that there is greater divisibility in the production of reading achievement.

5.3 Student-Focused versus School-Focused Accountability

  • Unlike most previous accountability systems, high-stakes testing in Chicago provided direct incentives for students as well as teachers.
  • Table 5 presents the policy affects for grades three, six and eight (i.e., promotional gate grades) versus grades four, five and seven (i.e., nongate grades).
  • Finally, it is possible that the first year effects were somewhat anomalous, perhaps because students and teachers were still adjusting to the policy or because the form change that year may have affected grades differentially.
  • Tables available from the author upon request.
  • The 1998 accountability effects are at least twice as large in grades three, six and eight compared with grade five (for example, 0.144 versus 0.067 s.d. gain in math), suggesting that the student accountability provisions may have played a large role in the overall policy in later years.

6. What factors are driving the improvements in performance in Chicago?

  • Even if a positive causal relationship between HST and student achievement can be established, it is important to understand what factors are driving the improvements in performance.
  • Critics of test-based accountability often argue that the primary impact of HST is to increase the time spent on test-specific preparation activities, which could improve testspecific skills at the expense of more general skills.
  • Others argue that test score gains reflect student motivation on the day of the exam.
  • Unfortunately, because such things as effort and test preparation are not directly observable, it is difficult to disentangle the factors underlying the achievement gains in Chicago.
  • This section attempts to shed some light on the factors driving the achievement gains in Chicago, first by comparing student performance across exams and then by examining the ITBS improvements in greater detail.

6.1 The Role of General Skills

  • Even the most comprehensive achievement exam can only cover a fraction of the possible skills and topics within a particular domain.
  • Differences in student effort across exams (or rather changes in student effort) also complicate the comparison of performance trends from one test to another.
  • The data for this analysis is drawn from school “report cards” compiled by the Illinois State Board of Education (ISBE) which provide average IGAP scores by grade and subject as well as background information on schools and districts.
  • 24 To identify the comparison districts, I first identify districts in the top decile in terms of the percent of students receiving free or reduced price lunch, percent minority students, and total enrollment and in the bottom decile in terms of average student achievement (averaged over third, sixth and eighth grade reading and math scores) based on 1990 data.
  • The point estimates indicate that once the authors take into account district-specific pre-existing trends and demographics, HST appears to have a slight negative effect on IGAP achievement in Chicago.

6.2 The Role of Specific Skills

  • Based on analysis of teacher survey data, Tepper (2002) concluded that ITBS-specific test preparation and curriculum alignment increased following the introduction of the accountability policy.
  • 28 Column 1 classifies questions into two groups—those testing basic skills such as math computation and number concepts and those testing more complex skills such as estimation, data interpretation and problem-solving (i.e., word problems).
  • Column 2 separates items into five categories—computation, number concept, data interpretation, estimation and problem-solving— and shows the same pattern.
  • The item difficulty measures are the percentage of students correctly answering the item in a nationally representative ample used by the test publisher to norm the exam.
  • This analysis suggests that test preparation may have played a large role in the math gains, but was perhaps less important in reading improvement.

6.3 The Role of Effort

  • Student effort is another likely candidate for explaining the large ITBS gains.
  • 29 Test completion is one indicator of effort.
  • This pattern is true even among the lowest achieving students who left the greatest number of items blank prior to the accountability policy.
  • While increased guessing cannot explain a significant portion of the ITBS gains, other forms of effort may play a larger role.
  • Comparing the gain across item position groups, the authors see that 1998 students improved nearly 6.7 percentage points on the final 20 percent of items.

6.4. Summary

  • The improvement in math achievement in Chicago appears to be driven largely by gains in specific skill areas such as math computation that make up a large portion of the ITBS, but are emphasized less on the IGAP.
  • This suggests that teachers aligned their math curriculum to more closely match the content of the high-stake exam.
  • In reading, ITBS gains were equally distributed across item types, but were considerably larger among questions at the end of the exam.
  • This suggests that student effort or “stamina” played a larger role than test preparation in the observed reading improvements.
  • The fact that IGAP trends did not jump sharply following the introduction of the accountability policy confirms that the ITBS gains were not driven entirely by improvements in general skills.

7. Did educators respond strategically to high-stakes testing?

  • In evaluating the effectiveness of HST, it is important to understand whether teachers and administrators respond strategically to the incentives provided by the accountability policy.
  • Critics of test-based accountability worry about educator responses along a number of dimensions, ranging from changes in the rate of special education placements to substitution away from low-stakes subjects.
  • This section examines several of these issues.

7.1 Low-stakes versus high-stakes subjects

  • Given the consequences attached to test performance in certain subjects, one might expect teachers and students to shift resources and attention toward subjects included in the accountability program.
  • The authors can test this theory by comparing trends in math and reading achievement after the introduction of HST with test score trends in social studies and science, subjects that are not included in the Chicago accountability policy.
  • Unfortunately science and social studies exams are not given in every grade, and the grades in which these exams are given has changed over time.
  • The distribution of effects is also somewhat different for low versus high-stakes subjects.
  • As the authors noted earlier, in math and reading, students in low-achieving schools experienced greater gains. , However, conditional on school achievement, low-ability students appeared to make only slightly larger gains than their peers.

7.2 Special education placements

  • While the accountability policies in Chicago are designed to increase student achievement, they also create incentives for teachers and administrators to alter the pool of testtakers.
  • The sample only includes third, sixth and eighth grade students from 1994 to 2000 because some special education and reporting data is not available for the 1993 cohort.
  • Figures available from the author upon request.
  • Beginning in 1997, ChiPS began excluding the ITBS scores of students who had been enrolled in bilingual programs for three or fewer years to encourage teachers to test these students for appears that the trend became steeper beginning in 1997, suggesting that the accountability policy may have influenced teacher and administrator behavior.
  • The lowest performing schools increased special education placements for high-risk sixth graders by 50 percent following the introduction of the accountability policy, compared with an increase of roughly 32 percent among moderateachieving schools and no increase among the highest performing schools.

7.3 Grade retention

  • Another way for teachers to shield low-achieving students from the accountability mandates is to preemptively retain them—that is, hold them back before they enter grade three, six or eight.
  • 36 Roderick et al. (2000) found that retention rates in kindergarten, first and second grades started to rise in 1996 and jumped sharply in 1997 among first and second graders.
  • Grade, 2.5 percent in second grade and a little over 1 percent in grades four, five and seven.
  • Retention rates began to increase in 1996, possibly in anticipation of the new standards the students would face in 1997.
  • The bottom panel controls for current achievement, age and special education status as well as demographic variables, thereby accounting for prior retention and giving a better sense of the marginal effect of the policy on the propensity to retain students.

7.4 Sensitivity analysis

  • To test the sensitivity of the findings presented in the previous sections, Table 13 presents comparable estimates for a variety of different specifications and samples.
  • The next three rows show that the results are not sensitive to including students who either were in that grade for the second time (e.g., retained students) or whose test scores were not included for official reporting purposes because of a special education or bilingual classification.
  • This should control for any changes in form difficulty that may confound the results.

8. Conclusions

  • When the federal legislation No Child Left Behind became law earlier this year, high- stakes testing took on a heightened level of importance for students, teachers and parents across the country.
  • If the authors make the conservative assumption that special education rates increased by two percentage points in all grades (mirroring the increases they saw in grades three, six and eight), this would translate to an additional expenditure of $40 per pupil.
  • “Comparing State and District Results to National Norms: The Validity of the Claim that 'Everyone is Above Average'.” Educational Measurement: Issues and Practice 9(3): 5-14.

Did you find this useful? Give us your feedback

Citations
More filters
BookDOI
TL;DR: The role of education in promoting economic well-being, focusing on the role of educational quality, has become controversial because expansion of school attainment has not guaranteed improved economic conditions as mentioned in this paper.
Abstract: The role of improved schooling, a central part of most development strategies, has become controversial because expansion of school attainment has not guaranteed improved economic conditions. This paper reviews the role of education in promoting economic well-being, focusing on the role of educational quality. It concludes that there is strong evidence that the cognitive skills of the population-rather than mere school attainment-are powerfully related to individual earnings, to the distribution of income, and to economic growth. New empirical results show the importance of both minimal and high-level skills, the complementarity of skills and the quality of economic institutions, and the robustness of the relationship between skills and growth. International comparisons incorporating expanded data on cognitive skills reveal much larger skill deficits in developing countries than generally derived from just school enrollment and attainment. The magnitude of change needed makes it clear that closing the economic gap with industrial countries will require major structural changes in schooling institutions.

808 citations

Journal ArticleDOI
TL;DR: In this paper, a lack of evidence on whether teachers' impacts on students' test scores (value-added) is a good measure of their quality has been raised, and the question has sparked debate partly because of a lack-of-evidence on whether high value-ad...
Abstract: Are teachers' impacts on students' test scores (value-added) a good measure of their quality? This question has sparked debate partly because of a lack of evidence on whether high value-ad...

693 citations

Journal ArticleDOI
TL;DR: Jacob and Levitt as mentioned in this paper investigated the prevalence and predictors of teacher cheating in Chicago Public Schools. But they did not consider the role of teachers in the cheating and did not identify any teachers who were involved in cheating.
Abstract: NBER WORKING PAPER SERIES ROTTEN APPLES: AN INVESTIGATION OF THE PREVALENCE AND PREDICTORS OF TEACHER CHEATING Brian A. Jacob Steven D. Levitt Working Paper 9413 http://www.nber.org/papers/w9413 NATIONAL BUREAU OF ECONOMIC RESEARCH 1050 Massachusetts Avenue Cambridge, MA 02138 December 2002 We would like to thank Suzanne Cooper, Mark Duggan, Sue Dynarski, Arne Duncan, Michael Greenstone, James Heckman, Lars Lefgren, and seminar participants too numerous to mention for helpful comments and discussions. We also thank Arne Duncan, Phil Hansen, Carol Perlman, and Jessie Qualles of the Chicago Public Schools for their help and cooperation on the project. Financial support was provided by the National Science Foundation and the Sloan Foundation. All remaining errors are our own. The views expressed herein are those of the authors and not necessarily those of the National Bureau of Economic Research. © 2002 by Brian A. Jacob and Steven D. Levitt. All rights reserved. Short sections of text not to exceed two paragraphs, may be quoted without explicit permission provided that full credit including, © notice, is given to the source.

660 citations

Journal ArticleDOI
TL;DR: A review of the theoretical and empirical literature on quality disclosure and certification can be found in this paper, with a particular focus on healthcare, education, and finance, and the empirical review covers quality measurement, the effect of third-party disclosure on consumer choice and seller behavior as well as the economics of certifiers.
Abstract: This essay reviews the theoretical and empirical literature on quality disclosure and certification. After comparing quality disclosure with other quality assurance mechanisms and describing a brief history of quality disclosure, we address two sets of theoretical issues. First, why don't sellers voluntarily disclose through a process of "unraveling" and, given the lack of unraveling, is it desirable to mandate seller disclosure? Second, when we rely on certifiers to act as the intermediary of quality disclosure, do certifiers necessarily report unbiased and accurate information? We further review empirical evidence on these issues, with a particular focus on healthcare, education, and finance. The empirical review covers quality measurement, the effect of third-party disclosure on consumer choice and seller behavior, as well as the economics of certifiers. (JEL D18, K32, L15, M31)

604 citations

Journal ArticleDOI
TL;DR: This article provided the first empirical test of the causal impact of HCZ charters on educa- tional outcomes, finding that the effects of attending an HCZ middle school are enough to close the black-white achievement gap in math and ELA.
Abstract: Harlem Children's Zone (HCZ), an ambitious social experiment, combines community programs with charter schools. We provide the first empirical test of the causal impact of HCZ charters on educa- tional outcomes. Both lottery and instrumental variable identifica- tion strategies suggest that the effects of attending an HCZ middle school are enough to close the black-white achievement gap in math- ematics. The effects in elementary school are large enough to close the racial achievement gap in both mathematics and ELA. We con- clude with evidence that suggests high-quality schools are enough to significantly increase academic achievement among the poor. Community programs appear neither necessary nor sufficient. (JEL H75, I21, I28, J13, R23)

556 citations

References
More filters
01 Jan 1998
TL;DR: The curriculum-based external exit exam system (CBEEES) as mentioned in this paper was proposed by the American Federation of Teachers (AFT) to improve teaching and learning of core subjects.
Abstract: Two presidents, the National Governors Association and numerous blue ribbon panels have called for the development of state content standards for core subjects and examinations that assess the achievement of these standards. The Competitiveness Policy Council, for example, advocates that "external assessments be given to individual students at the secondary level and that the results should be a major but not exclusive factor qualifying for college and better jobs at better wages (1993, p. 30)." The American Federation of Teachers advocates a system in which: Students are periodically tested on whether they're reaching the standards, and if they are not, the system responds with appropriate assistance and intervention. Until they meet the standards, they won't be able to graduate from high school or enter college (AFT 1995 p. 1-2). It is claimed that curriculum-based external exit exam systems (CBEEES), based on world class content standards will improve teaching and learning of core subjects. What evidence is there for this claim? New York's Regents Exams are an example of such a system. Do New York students outperform students with similar socio-economic backgrounds from other states? Outside the United States such systems are the rule, not the exception. What impacts have such systems had on school policies, teaching and student learning?

55 citations

Journal ArticleDOI
TL;DR: This paper investigated the relationship between school-level minimum competency testing (MCT) programs and student reading proficiency as measured by the 1983-1984 National Assessment of Education (NAE).
Abstract: This study investigates the relationship between school-level minimum competency testing (MCT) programs and student reading proficiency as measured by the 1983–1984 National Assessment of Education...

43 citations

01 Mar 1994
TL;DR: The influence of state-mandated MCT on teaching and learning as reflected in the National Assessment of Educational Progress (NAEP) was investigated by comparing the performance of participants in the 1978 mathematics assessment with performance of students in the 1986 assessment; the same set of items was used on both occasions.
Abstract: Past research on the effects of Minimum Competency Tests (MCT) on teaching and learning is reviewed, and the large database of the National Assessment of Educational Progress (NAEP) is used to shed more light on these effects. There seems to be little doubt that MCT, together with associated changes in instructional methods, has produced some substantial changes in student performance. The influence of state-mandated MCT on the quality of teaching and learning as reflected in the NAEP was investigated by comparing the performance of participants in the 1978 mathematics assessment with performance of participants in the 1986 assessment; the same set of items was used on both occasions. The 1978 assessment occurred before MCT were in general use. The better performance of students in 1986 was probably due to the efforts of teachers who made use of MCTs and high-stakes tests. The younger students in 1986 apparently profit...d more from the MCTs and high-stakes tests than the older students did. It seems reasonable to conclude that the use of MCTs can have desirable influences on performance of young students as measured by the NAEP. Nine tables, including some in an appendix titled "Some Broader NEP Methods and Interests," present study findings. (Contains 16 references.) (SLD) Reproductions supplied by EDRS are the best that can be made * * from the original document. _ INFORMATION CENTER II Educational Testing Service The Influence of Minimum Competency Tests on Teaching and Learning by Norman Frederiksen U.111. DEPARTMENT OP EDUCATION ortc, or Edvtatronai Research end Improvement EDUCATIONAL RESOURCES INFORMATION CENTER (ERIC) 1:114/his document has been reproduce° as reCervod from the person Of orgarnfation (logo 'IMMO It 0 Mon Or Changes have bean mad* to trnfsrove reproduct.on guartne PrIts of vtest or opntons staled in th.s docu rnent do not necessartly represen1 otbctai OE RI Posit*. 0( POI.Cy 2 "PERMISSION TO REPRODUCE THIS MATERIAL HAS BEEN GRANTED BY eiei919t2 &Ley TO THE EDUCATIONAL RESOURCES INFORMATION CENTER (ERIC)."

42 citations

Journal ArticleDOI
TL;DR: In the early 1990s, the Charlotte-Mecklenburg school system (CMS) began a sweeping program of school reform that involved revamped standards, a comprehensive system of benchmark goals, increased au...
Abstract: In the early 1990s, the Charlotte-Mecklenburg school system (CMS) began a sweeping program of school reform that involved revamped standards, a comprehensive system of benchmark goals, increased au...

36 citations

Frequently Asked Questions (1)
Q1. What are the contributions in "Nber working paper series accountability, incentives and behavior: the impact of high-stakes testing in the chicago public schools" ?

This study examines the impact of an accountability policy implemented in the Chicago Public Schools in 1996-97.