scispace - formally typeset
SciSpace - Your AI assistant to discover and understand research papers | Product Hunt

Journal ArticleDOI

Accountability, incentives and behavior: the impact of high-stakes testing in the Chicago Public Schools

01 Jun 2005-Journal of Public Economics (North-Holland)-Vol. 89, Iss: 5, pp 761-796

AbstractThe recent federal education bill, No Child Left Behind, requires states to test students in grades 3 to 8 each year and to judge school performance on the basis of these test scores. While intended to maximize student learning, there is little empirical evidence about the effectiveness of such policies. This study examines the impact of an accountability policy implemented in the Chicago Public Schools in 1996–1997. Using a panel of student-level, administrative data, I find that math and reading achievement increased sharply following the introduction of the accountability policy, in comparison to both prior achievement trends in the district and to changes experienced by other large, urban districts in the mid-west. However, for younger students, the policy did not increase performance on a state-administered, low-stakes exam. An item-level analysis suggests that the observed achievement gains were driven by increases in test-specific skills and student effort. I also find that teachers responded strategically to the incentives along a variety of dimensions—by increasing special education placements, preemptively retaining students and substituting away from low-stakes subjects like science and social studies.

Topics: Accountability (54%), Special education (54%), Test (assessment) (52%), Incentive (51%)

Summary (6 min read)

1. Introduction

  • If HST increased the general skill level, observed achievement gains should be reflected in other measures of student outcomes.
  • By placing low performing students in special education programs, teachers are able to exempt them from most 2 Achievement gains may also be due to increases in cheating on the part of students, teachers or administrators.
  • This paper addresses these questions in the context of a test-based accountability policy that was implemented in Chicago Public Schools in 1996-97.3.
  • On the one hand, they provide strong empirical support for general incentive theories, including the multi-task theories of Holmstrom and Milgrom (1991).

2. Background

  • The evidence on school-based accountability programs and student performance is decidedly mixed.
  • Several studies note that Texas students have made substantial achievement gains since the implementation of that state’s accountability program (Grissmer and Flanagan 1998, Grissmer et. al. 2000, Haney 2000, Klein et. al. 2000, Toenjes et. al. 2000, Deere and Strayer 2001).
  • Koretz and Barron (1998) find survey evidence that elementary teachers in Kentucky shifted the amount of time devoted to math and science across grades to correspond with the subjects tested in each grade.
  • Various studies suggest that test preparation associated with high-stakes testing may artificially inflate achievement, producing gains that are not generalizable to other exams (Linn and Graue 1990, Shepard 1990, Koretz et. al. 1991, Koretz and Barron 1998, Stecher and Barron 1998, Klein et. al. 2000).

2.2 High-Stakes Testing in Chicago

  • In 1996 the ChiPS introduced a comprehensive accountability policy designed to raise academic achievement.
  • The first component of the policy focused on holding students accountable for learning, by ending a practice commonly known as “social promotion” whereby students are advanced to the next grade regardless of ability or achievement level.
  • Students who again fail to meet the standard are required to repeat the grade, with the exception of 15-year-olds who attend newly created “transition” centers.
  • The same whether one considers the eighth grade policy to have been implemented in 1996 or 1997.

3. Empirical strategy

  • Because Chicago instituted its accountability policy district-wide in 1996-97, it is difficult to identify the causal impact of the program with certainty.
  • Similarly, improvements in the economy or other time-varying factors coincident with the policy would bias their estimates.
  • Finally, one might be worried about other policies or programs in Chicago whose impact was felt at the same time as HST, so that 0),( ≠φHighStakesCov .
  • This is essentially a difference-in-difference estimator where the first difference is a within student change over time and the second difference is a district-wide change from pre-policy to post-policy.
  • One might be particularly concerned about unobservable changes on the state or national level effecting student performance (e.g., implementation of state or federal school reform legislation).

4. Data

  • This study utilizes detailed administrative data from the ChiPS.
  • Student records include information on a student’s school, home address, demographic and family background characteristics, special education and bilingual placement, free lunch status, standardized test scores, grade retention and summer school attendance.
  • On the other hand, there was some increase in initial student achievement—e.g., prior reading achievement increased from an average of 0.89 grade equivalents below norms to 0.71 grade equivalents below norms.

5.2 The Heterogeneity of Effects Across Student and School Risk Level

  • If the improvements in student achievement were caused by the accountability policy, one might expect them to vary across students and schools.
  • Model 1 provides the average effect for all students in all of the post-policy cohorts, providing a baseline from which to compare the other results.
  • First, students in low-performing schools seem to have fared considerably better under the policy than comparable peers in higher-performing schools.
  • Moreover, the effect for marginal students appears somewhat stronger in reading than math, suggesting that there may be more intentional targeting of individual students in reading than in math, or that there is greater divisibility in the production of reading achievement.

5.3 Student-Focused versus School-Focused Accountability

  • Unlike most previous accountability systems, high-stakes testing in Chicago provided direct incentives for students as well as teachers.
  • Table 5 presents the policy affects for grades three, six and eight (i.e., promotional gate grades) versus grades four, five and seven (i.e., nongate grades).
  • Finally, it is possible that the first year effects were somewhat anomalous, perhaps because students and teachers were still adjusting to the policy or because the form change that year may have affected grades differentially.
  • Tables available from the author upon request.
  • The 1998 accountability effects are at least twice as large in grades three, six and eight compared with grade five (for example, 0.144 versus 0.067 s.d. gain in math), suggesting that the student accountability provisions may have played a large role in the overall policy in later years.

6. What factors are driving the improvements in performance in Chicago?

  • Even if a positive causal relationship between HST and student achievement can be established, it is important to understand what factors are driving the improvements in performance.
  • Critics of test-based accountability often argue that the primary impact of HST is to increase the time spent on test-specific preparation activities, which could improve testspecific skills at the expense of more general skills.
  • Others argue that test score gains reflect student motivation on the day of the exam.
  • Unfortunately, because such things as effort and test preparation are not directly observable, it is difficult to disentangle the factors underlying the achievement gains in Chicago.
  • This section attempts to shed some light on the factors driving the achievement gains in Chicago, first by comparing student performance across exams and then by examining the ITBS improvements in greater detail.

6.1 The Role of General Skills

  • Even the most comprehensive achievement exam can only cover a fraction of the possible skills and topics within a particular domain.
  • Differences in student effort across exams (or rather changes in student effort) also complicate the comparison of performance trends from one test to another.
  • The data for this analysis is drawn from school “report cards” compiled by the Illinois State Board of Education (ISBE) which provide average IGAP scores by grade and subject as well as background information on schools and districts.
  • 24 To identify the comparison districts, I first identify districts in the top decile in terms of the percent of students receiving free or reduced price lunch, percent minority students, and total enrollment and in the bottom decile in terms of average student achievement (averaged over third, sixth and eighth grade reading and math scores) based on 1990 data.
  • The point estimates indicate that once the authors take into account district-specific pre-existing trends and demographics, HST appears to have a slight negative effect on IGAP achievement in Chicago.

6.2 The Role of Specific Skills

  • Based on analysis of teacher survey data, Tepper (2002) concluded that ITBS-specific test preparation and curriculum alignment increased following the introduction of the accountability policy.
  • 28 Column 1 classifies questions into two groups—those testing basic skills such as math computation and number concepts and those testing more complex skills such as estimation, data interpretation and problem-solving (i.e., word problems).
  • Column 2 separates items into five categories—computation, number concept, data interpretation, estimation and problem-solving— and shows the same pattern.
  • The item difficulty measures are the percentage of students correctly answering the item in a nationally representative ample used by the test publisher to norm the exam.
  • This analysis suggests that test preparation may have played a large role in the math gains, but was perhaps less important in reading improvement.

6.3 The Role of Effort

  • Student effort is another likely candidate for explaining the large ITBS gains.
  • 29 Test completion is one indicator of effort.
  • This pattern is true even among the lowest achieving students who left the greatest number of items blank prior to the accountability policy.
  • While increased guessing cannot explain a significant portion of the ITBS gains, other forms of effort may play a larger role.
  • Comparing the gain across item position groups, the authors see that 1998 students improved nearly 6.7 percentage points on the final 20 percent of items.

6.4. Summary

  • The improvement in math achievement in Chicago appears to be driven largely by gains in specific skill areas such as math computation that make up a large portion of the ITBS, but are emphasized less on the IGAP.
  • This suggests that teachers aligned their math curriculum to more closely match the content of the high-stake exam.
  • In reading, ITBS gains were equally distributed across item types, but were considerably larger among questions at the end of the exam.
  • This suggests that student effort or “stamina” played a larger role than test preparation in the observed reading improvements.
  • The fact that IGAP trends did not jump sharply following the introduction of the accountability policy confirms that the ITBS gains were not driven entirely by improvements in general skills.

7. Did educators respond strategically to high-stakes testing?

  • In evaluating the effectiveness of HST, it is important to understand whether teachers and administrators respond strategically to the incentives provided by the accountability policy.
  • Critics of test-based accountability worry about educator responses along a number of dimensions, ranging from changes in the rate of special education placements to substitution away from low-stakes subjects.
  • This section examines several of these issues.

7.1 Low-stakes versus high-stakes subjects

  • Given the consequences attached to test performance in certain subjects, one might expect teachers and students to shift resources and attention toward subjects included in the accountability program.
  • The authors can test this theory by comparing trends in math and reading achievement after the introduction of HST with test score trends in social studies and science, subjects that are not included in the Chicago accountability policy.
  • Unfortunately science and social studies exams are not given in every grade, and the grades in which these exams are given has changed over time.
  • The distribution of effects is also somewhat different for low versus high-stakes subjects.
  • As the authors noted earlier, in math and reading, students in low-achieving schools experienced greater gains. , However, conditional on school achievement, low-ability students appeared to make only slightly larger gains than their peers.

7.2 Special education placements

  • While the accountability policies in Chicago are designed to increase student achievement, they also create incentives for teachers and administrators to alter the pool of testtakers.
  • The sample only includes third, sixth and eighth grade students from 1994 to 2000 because some special education and reporting data is not available for the 1993 cohort.
  • Figures available from the author upon request.
  • Beginning in 1997, ChiPS began excluding the ITBS scores of students who had been enrolled in bilingual programs for three or fewer years to encourage teachers to test these students for appears that the trend became steeper beginning in 1997, suggesting that the accountability policy may have influenced teacher and administrator behavior.
  • The lowest performing schools increased special education placements for high-risk sixth graders by 50 percent following the introduction of the accountability policy, compared with an increase of roughly 32 percent among moderateachieving schools and no increase among the highest performing schools.

7.3 Grade retention

  • Another way for teachers to shield low-achieving students from the accountability mandates is to preemptively retain them—that is, hold them back before they enter grade three, six or eight.
  • 36 Roderick et al. (2000) found that retention rates in kindergarten, first and second grades started to rise in 1996 and jumped sharply in 1997 among first and second graders.
  • Grade, 2.5 percent in second grade and a little over 1 percent in grades four, five and seven.
  • Retention rates began to increase in 1996, possibly in anticipation of the new standards the students would face in 1997.
  • The bottom panel controls for current achievement, age and special education status as well as demographic variables, thereby accounting for prior retention and giving a better sense of the marginal effect of the policy on the propensity to retain students.

7.4 Sensitivity analysis

  • To test the sensitivity of the findings presented in the previous sections, Table 13 presents comparable estimates for a variety of different specifications and samples.
  • The next three rows show that the results are not sensitive to including students who either were in that grade for the second time (e.g., retained students) or whose test scores were not included for official reporting purposes because of a special education or bilingual classification.
  • This should control for any changes in form difficulty that may confound the results.

8. Conclusions

  • When the federal legislation No Child Left Behind became law earlier this year, high- stakes testing took on a heightened level of importance for students, teachers and parents across the country.
  • If the authors make the conservative assumption that special education rates increased by two percentage points in all grades (mirroring the increases they saw in grades three, six and eight), this would translate to an additional expenditure of $40 per pupil.
  • “Comparing State and District Results to National Norms: The Validity of the Claim that 'Everyone is Above Average'.” Educational Measurement: Issues and Practice 9(3): 5-14.

Did you find this useful? Give us your feedback

...read more

Content maybe subject to copyright    Report

NBER WORKING PAPER SERIES
ACCOUNTABILITY, INCENTIVES AND BEHAVIOR:
THE IMPACT OF HIGH-STAKES TESTING IN THE CHICAGO PUBLIC SCHOOLS
Brian A. Jacob
Working Paper 8968
http://www.nber.org/papers/w8968
NATIONAL BUREAU OF ECONOMIC RESEARCH
1050 Massachusetts Avenue
Cambridge, MA 02138
June 2002
I would like to thank the Chicago Public Schools, the Illinois State Board of Education and the Consortium
on Chicago School Research for providing the data used in this study. I am grateful to Peter Arcidiacono,
Anthony Bryk, Susan Dynarski, Carolyn Hill, Robert LaLonde, Lars Lefgren, Steven Levitt, Helen Levy,
Susan Mayer, Melissa Roderick, Robin Tepper and seminar participants at various institutions for helpful
comments and suggestions. Jenny Huang provided excellent research assistance. Funding for this research
was provided by the Spencer Foundation. All remaining errors are my own. The views expressed herein are
those of the author and not necessarily those of the National Bureau of Economic Research.
© 2002 by Brian A. Jacob. All rights reserved. Short sections of text, not to exceed two paragraphs, may
be quoted without explicit permission provided that full credit, including © notice, is given to the source.

Accountability, Incentives and Behavior:
The Impact of High-Stakes Testing in the Chicago Public Schools
Brian A. Jacob
NBER Working Paper No. 8968
June 2002
JEL No. I20, I28, J24
ABSTRACT
The recent federal education bill, No Child Left Behind, requires states to test students in grades
three to eight each year, and to judge school performance on the basis of these test scores. While intended
to maximize student learning, there is little empirical evidence about the effectiveness of such policies.
This study examines the impact of an accountability policy implemented in the Chicago Public Schools
in 1996-97. Using a panel of student-level, administrative data, I find that math and reading achievement
increased sharply following the introduction of the accountability policy, in comparison to both prior
achievement trends in the district and to changes experienced by other large, urban districts in the
mid-west. I demonstrate that these gains were driven largely by increases in test-specific skills and
student effort, and did not lead to comparable gains on a state-administered, low-stakes exam. I also find
that teachers responded strategically to the incentives along a variety of dimensions—by increasing
special education placements, preemptively retaining students and substituting away from low-stakes
subjects like science and social studies.
Brian A. Jacob
John F. Kennedy School of Government
Harvard University
79 JFK Street
Cambridge, MA 02138
and NBER
Brian_Jacob@harvard.edu

1
1. Introduction
In January 2002, President Bush signed the No Child Left Behind Act of 2001 (NCLB),
ushering in a new era of educational accountability. The new federal legislation requires states to
test students in grades three through eight and to use these exam results to judge the performance
of schools. If a school fails to make adequate progress for several consecutive years, the district
must allow children to attend another public school in the district and provide students with
supplemental education services such as private tutoring. Persistently low-performing schools
may be closed or reconstituted with new staff and curriculum (Robelen 2002).
NCLB strengthens a movement toward accountability in education that has been
gathering momentum for nearly a decade. Statutes in 25 states now explicitly link student
promotion or graduation to performance on state or district assessments. At the same time, 18
states reward teachers and administrators on the basis of exemplary student performance and 20
states sanction school staff on the basis of poor student performance (Quality Counts 2002).
These accountability policies dwarf all other education reforms in scope. Consider, for
example, one of the most popular school reform initiatives in recent years—school choice. Of
the nearly 53 million children attending elementary and secondary schools in the country, only
60,000 used vouchers to attend a private school and 580,000 others attended a charter school
percent of all schoolchildren (Howell and Peterson 2002, CER 2002). Of the roughly 47 million
students in public schools, only four million participated in any type of public school choice
program, which includes inter-district choice, magnet schools and other types of intra-district
choice (NCES 1997). On the other hand, the accountability program in Texas alone impacts
approximately 3.6 million students while the policies in Chicago and New York City affect an
additional 1.5 million students. As the mandates of NCLB are implemented, all of the 33.4

million elementary students in the nation will be attending schools subject to test-based
accountability.
1
While the primary intent of such accountability policies is to provide incentives to
maximize student learning, poorly designed incentives can have perverse consequences. For
example, Holmstrom and Milgrom (1991) show that high-powered incentives will lead agents to
focus on the most easily observable aspects of a multi-dimensional task. Based on similar logic,
testing critics have argued that current accountability policies will cause teachers to shift
resources away from low-stakes subjects, neglect infra-marginal students and ignore critical
aspects of learning that are not explicitly tested.
Despite its increasing popularity within education, there is little empirical evidence on
test-based accountability (also referred to as high-stakes testing, abbreviated hereafter as HST).
The majority of existing research focuses on mandatory high school graduation exams, which
provide incentives for secondary students but have little direct impact on teachers or
administrators. Recent evidence on school-based accountability programs is mixed, with some
studies showing modest achievement gains but other showing little change in student
performance. Moreover, most studies of school-based accountability do not utilize individual
student data and thus cannot examine many outcomes of interest or investigate how effects vary
across students.
Test-based accountability raises three fundamental questions about the ways in which
students and teachers respond to performance incentives. The most fundamental question about
HST is whether it increases student achievement. Insofar as test-based accountability raises
student motivation, increases parent involvement and/or improves curriculum or pedagogy, one
1
All national enrollment figures are taken from the 2001 Digest of Education Statistics (Digest 2001).

would expect HST to improve student performance. Unfortunately, accountability policies are
often implemented in conjunction with a variety of other reforms, frequently without any pre-
existing data on student performance, making it difficult to attribute the achievement changes to
the accountability policy.
Even if a positive causal relationship between HST and student achievement can be
established, it is important to understand what factors are driving the improvements in
performance. Critics of test-based accountability often argue that its primary impact is to
increase the time spent on test-preparation activities, thus improving test-specific skills at the
expense of more general skills. Others argue that test score gains reflect student motivation on
the day of the exam. Thus, one might want to examine whether test score gains reflect increases
in general skills, test-specific skills, transitory student effort or some combination thereof.
2
If
HST increased the general skill level, observed achievement gains should be reflected in other
measures of student outcomes. On the other hand, to the extent the improvements are due to
transitory student effort or increases in test-specific skills, one might not expect the test results to
generalize.
Finally, in evaluating the effectiveness of HST, it is important to understand whether
teachers and administrators respond strategically to the incentives provided by the accountability
policy. Critics have worried about educator responses along a number of dimensions. For
example, since low-ability students bring down the performance level of a school, the policy
provides an incentive for teachers to find ways to exclude students from testing. By placing low
performing students in special education programs, teachers are able to exempt them from most
2
Achievement gains may also be due to increases in cheating on the part of students, teachers or administrators.
While Jacob and Levitt (2002) found that instances of classroom cheating increased substantially following the

Citations
More filters

BookDOI
Abstract: The role of improved schooling, a central part of most development strategies, has become controversial because expansion of school attainment has not guaranteed improved economic conditions. This paper reviews the role of education in promoting economic well-being, focusing on the role of educational quality. It concludes that there is strong evidence that the cognitive skills of the population-rather than mere school attainment-are powerfully related to individual earnings, to the distribution of income, and to economic growth. New empirical results show the importance of both minimal and high-level skills, the complementarity of skills and the quality of economic institutions, and the robustness of the relationship between skills and growth. International comparisons incorporating expanded data on cognitive skills reveal much larger skill deficits in developing countries than generally derived from just school enrollment and attainment. The magnitude of change needed makes it clear that closing the economic gap with industrial countries will require major structural changes in schooling institutions.

757 citations


Journal ArticleDOI
Abstract: NBER WORKING PAPER SERIES ROTTEN APPLES: AN INVESTIGATION OF THE PREVALENCE AND PREDICTORS OF TEACHER CHEATING Brian A. Jacob Steven D. Levitt Working Paper 9413 http://www.nber.org/papers/w9413 NATIONAL BUREAU OF ECONOMIC RESEARCH 1050 Massachusetts Avenue Cambridge, MA 02138 December 2002 We would like to thank Suzanne Cooper, Mark Duggan, Sue Dynarski, Arne Duncan, Michael Greenstone, James Heckman, Lars Lefgren, and seminar participants too numerous to mention for helpful comments and discussions. We also thank Arne Duncan, Phil Hansen, Carol Perlman, and Jessie Qualles of the Chicago Public Schools for their help and cooperation on the project. Financial support was provided by the National Science Foundation and the Sloan Foundation. All remaining errors are our own. The views expressed herein are those of the authors and not necessarily those of the National Bureau of Economic Research. © 2002 by Brian A. Jacob and Steven D. Levitt. All rights reserved. Short sections of text not to exceed two paragraphs, may be quoted without explicit permission provided that full credit including, © notice, is given to the source.

616 citations


ReportDOI
Abstract: Are teachers' impacts on students' test scores ("value-added") a good measure of their quality? This question has sparked debate largely because of disagreement about (1) whether value-added (VA) provides unbiased estimates of teachers' impacts on student achievement and (2) whether high-VA teachers improve students' long-term outcomes We address these two issues by analyzing school district data from grades 3-8 for 25 million children linked to tax records on parent characteristics and adult outcomes We find no evidence of bias in VA estimates using previously unobserved parent characteristics and a quasi-experimental research design based on changes in teaching staff Students assigned to high-VA teachers are more likely to attend college, attend higher- ranked colleges, earn higher salaries, live in higher SES neighborhoods, and save more for retirement They are also less likely to have children as teenagers Teachers have large impacts in all grades from 4 to 8 On average, a one standard deviation improvement in teacher VA in a single grade raises earnings by about 1% at age 28 Replacing a teacher whose VA is in the bottom 5% with an average teacher would increase the present value of students' lifetime income by more than $250,000 for the average class- room in our sample We conclude that good teachers create substantial economic value and that test score impacts are helpful in identifying such teachers

532 citations


Journal ArticleDOI
Abstract: This essay reviews the theoretical and empirical literature on quality disclosure and certification. After comparing quality disclosure with other quality assurance mechanisms and describing a brief history of quality disclosure, we address two sets of theoretical issues. First, why don't sellers voluntarily disclose through a process of "unraveling" and, given the lack of unraveling, is it desirable to mandate seller disclosure? Second, when we rely on certifiers to act as the intermediary of quality disclosure, do certifiers necessarily report unbiased and accurate information? We further review empirical evidence on these issues, with a particular focus on healthcare, education, and finance. The empirical review covers quality measurement, the effect of third-party disclosure on consumer choice and seller behavior, as well as the economics of certifiers. (JEL D18, K32, L15, M31)

525 citations


Journal ArticleDOI
Abstract: Are teachers' impacts on students' test scores (value-added) a good measure of their quality? This question has sparked debate partly because of a lack of evidence on whether high value-ad...

518 citations


References
More filters

Journal ArticleDOI
Abstract: Introduction In the standard economic treatment of the principal–agent problem, compensation systems serve the dual function of allocating risks and rewarding productive work. A tension between these two functions arises when the agent is risk averse, for providing the agent with effective work incentives often forces him to bear unwanted risk. Existing formal models that have analyzed this tension, however, have produced only limited results. It remains a puzzle for this theory that employment contracts so often specify fixed wages and more generally that incentives within firms appear to be so muted, especially compared to those of the market. Also, the models have remained too intractable to effectively address broader organizational issues such as asset ownership, job design, and allocation of authority. In this article, we will analyze a principal–agent model that (i) can account for paying fixed wages even when good, objective output measures are available and agents are highly responsive to incentive pay; (ii) can make recommendations and predictions about ownership patterns even when contracts can take full account of all observable variables and court enforcement is perfect; (iii) can explain why employment is sometimes superior to independent contracting even when there are no productive advantages to specific physical or human capital and no financial market imperfections to limit the agent's borrowings; (iv) can explain bureaucratic constraints; and (v) can shed light on how tasks get allocated to different jobs.

5,412 citations


Journal ArticleDOI
Abstract: GOVERNMENTAL post-schooling training programs have become a permanent fixture of the U.S. economy in the last decade. These programs are typically advocated for diverse reasons: (1) to reduce inflation by the provision of more skilled workers to alleviate shortages, (2) to reduce unemployment of certain groups, and (3) to reduce poverty by increasing the skills of certain groups. All of these objectives require that training programs increase the earnings of trainees above what they otherwise would be. For example, alleviating shortages by training more highly skilled workers should increase the earnings of these workers. Likewise, the concern for unemployed workers is derived from a concern for the decreased earnings of these workers; and if trainees subsequently suffer less unemployment, their earnings should be higher. Finally, training programs are intended to reduce poverty by increasing the earnings of low income workers. Evaluating the success of training programs is thus inherently a quantitative assessment of the effect of training on trainee earnings.' It is an important process both because it helps to inform discussions of public policy by shedding light on the past value of these programs as investments and because it can provide a means of testing our ability to augment the human capital of certain workers. Although there have been many studies of the effect of post-school classroom training on earnings it is by now rather widely agreed that very little is reliably known about the actual effects of these programs.2 Three main problems account for this state of affairs: (1) the large sample sizes required to detect relatively small anticipated program effects in a variable with such high variance as earnings, (2) the considerable expense required to keep track of trainees over a long enough period of time to measure the full inter-temporal impact of training, and (3) the extreme difficulty of implementing an adequate experimental design so as to obtain a group against which to reliably compare trainees.3 The purpose of this paper is to report on efforts to cope with this third problem using a data collection system that comes some way towards resolving the first two. The basic idea of this data system is to match the program record on each trainee with the trainee's Social Security earnings history. The Social Security Administration maintains a summary year-by-year earnings history for each Social Security account over the period since 1950 that may be used, under the appropriate confidentiality restrictions, for this purpose.4 In this paper I have concentrated on an analysis of all classroom trainees who started training under the Manpower Development and Training Act (MDTA) in the first 3 months of 1964 so as to ensure their having completed training in that year. In choosing to analyze trainees from so early a cohort something is clearly lost. On the one hand, the nature of the participants in these early years was considerably different than in the later years. In particular, programs geared Received for publication February 9, 1977. Revision accepted for publication August 1, 1977. * Princeton University. This research was supported by ASPER, U.S. Department of Labor, but does not represent an official position of the Department of Labor, its agencies, or staff. I would like to thank Gregory Chow, Ronald Ehrenberg, Roger Gordon, Zvi Griliches, George E. Johnson, Nicholas Kiefer, Richard Quandt, and Sherwin Rosen for helpful comments. I also owe a heavy debt to D. Alton Smith for computational and other assistance. 'See Reid (1976), for example, for a clear analysis of how knowledge of these effects is required in order to establish the impact of government training on the black/white wage differential. 2 Surveys of many of these studies may be found in Stromsdorfer (1972) and O'Neill (1973). 3For further discussion of these points see Ashenfelter (1975). 4The idea for using these data to analyze the effectiveness of government training programs is apparently quite an old one, having been suggested by the National Manpower Advisory Committee (U.S. Department of Labor, 1972) to the Secretary of Labor at its first meeting in a letter dated October 10, 1962, the year of passage of the Manpower Development and Training Act. Actual efforts along these lines were ultimately reported by Borus (1967), Commins (1970), Farber (1970), and Prescott and Cooley (1972).

1,330 citations


Journal ArticleDOI
Abstract: NBER WORKING PAPER SERIES ROTTEN APPLES: AN INVESTIGATION OF THE PREVALENCE AND PREDICTORS OF TEACHER CHEATING Brian A. Jacob Steven D. Levitt Working Paper 9413 http://www.nber.org/papers/w9413 NATIONAL BUREAU OF ECONOMIC RESEARCH 1050 Massachusetts Avenue Cambridge, MA 02138 December 2002 We would like to thank Suzanne Cooper, Mark Duggan, Sue Dynarski, Arne Duncan, Michael Greenstone, James Heckman, Lars Lefgren, and seminar participants too numerous to mention for helpful comments and discussions. We also thank Arne Duncan, Phil Hansen, Carol Perlman, and Jessie Qualles of the Chicago Public Schools for their help and cooperation on the project. Financial support was provided by the National Science Foundation and the Sloan Foundation. All remaining errors are our own. The views expressed herein are those of the authors and not necessarily those of the National Bureau of Economic Research. © 2002 by Brian A. Jacob and Steven D. Levitt. All rights reserved. Short sections of text not to exceed two paragraphs, may be quoted without explicit permission provided that full credit including, © notice, is given to the source.

616 citations


Journal ArticleDOI
Abstract: This paper examines evidence on the effect of class size on student achievement. First, it is shown that results of quantitative summaries of the literature, such as Hanushek (1997), depend critically on whether studies are accorded equal weight. When studies are given equal weight, resources are systematically related to student achievement. When weights are in proportion to their number of estimates, resources and achievements are not systematically related. Second, a cost-benefit analysis of class size reduction is performed. Results of the Tennessee STAR class-size experiment suggest that the internal rate of return from reducing class size from 22 to 15 students is around 6%.

586 citations


Journal ArticleDOI
Walt Haney1
Abstract: I summarize the recent history of education reform and statewide testing in Texas, which led to introduction of the Texas Assessment of Academic Skills (TAAS) in 1990-91. A variety of evidence in the late 1990s led a number of observers to conclude that the state of Texas had made near miraculous progress in reducing dropouts and increasing achievement. The passing scores on TAAS tests were arbitrary and discriminatory. Analyses comparing TAAS reading, writing and math scores with one another and with relevant high school grades raise doubts about the reliability and validity of TAAS scores. I discuss problems of missing students and other mirages in Texas enrollment statistics that profoundly affect both reported dropout statistics and test scores. Only 50% of minority students in Texas have been progressing from grade 9 to high school graduation since the initiation of the TAAS testing program. Since about 1982, the rates at which Black and Hispanic students are required to repeat grade 9 have climbed steadily, such that by the late 1990s, nearly 30% of Black and Hispanic students were "failing" grade 9. Cumulative rates of grade retention in Texas are almost twice as high for Black and Hispanic students as for White students. Some portion of the gains in grade 10 TAAS pass rates are illusory. The numbers of students taking the grade 10 tests who were classified as "in special education" and hence not counted in schools' accountability ratings nearly doubled between 1994 and 1998. A substantial portion of the apparent increases in TAAS pass rates in the 1990s are due to such exclusions. In the opinion of educators in Texas, schools are devoting a huge amount of time and energy preparing students specifically for TAAS, and emphasis on TAAS is hurting more than helping teaching and learning in Texas schools, particularly with at-risk students, and TAAS contributes to retention in grade and dropping out. Five different sources of evidence about rates of high school completion in Texas are compared and contrasted. The review of GED statistics indicated that there was a sharp upturn in numbers of young people taking the GED tests in Texas in the mid-1990s to avoid TAAS. A convergence of evidence indicates that during the 1990s, slightly less than 70% of students in Texas actually graduated from high school. Between 1994 and 1997, TAAS results showed a 20% increase in the percentage of students passing all three exit level TAAS tests (reading, writing and math), but TASP (a college readiness test) results showed a sharp decrease (from 65.2% to 43.3%) in the percentage of students passing all three parts (reading, math, and writing). As measured by performance on the SAT, the academic learning of secondary school students in Texas has not improved since the early 1990s, compared with SAT takers nationally. SAT-Math scores have deteriorated relative to students nationally. The gains on NAEP for Texas fail to confirm the dramatic gains apparent on TAAS. The gains on TAAS and the unbelievable decreases in dropouts during the 1990s are more illusory than real. The Texas "miracle" is more hat than cattle.

543 citations


Frequently Asked Questions (1)
Q1. What are the contributions in "Nber working paper series accountability, incentives and behavior: the impact of high-stakes testing in the chicago public schools" ?

This study examines the impact of an accountability policy implemented in the Chicago Public Schools in 1996-97.