JAH Textbooks & Teaching 2004

2004
Role of Testing in Teaching American History

Editors' Introduction
Gary J. Kornblith & Carol Lasser
Article

We Are Not Ready to Assess History Performance
Richard Rothstein
Article

What Is the Historical Significance of the Advanced Placement Test?
Timothy A. Hacsi
Article

Crazy for History
Sam Wineburg
Article

Document-Based Question: What Is the Historical Significance of the Advanced Placement Test?

Timothy A. Hacsi

Testing has played important roles in education in the United States over the past century, and, if anything, the importance of tests has been increasing since the late 1980s. In particular, the use of standardized, statewide tests has grown dramatically. Tests shape the classes children are placed in, determine whether they can enter and graduate from high school, and influence what kind of college they can hope to attend. After students graduate from college, tests continue to act as barriers or gateways if they seek postgraduate degrees. Yet not all tests are the same. Some are designed to sort students, while others are intended to influence schools or school districts. Some tests, such as Advanced Placement (AP) tests, offer extensive rewards for students if they do well; others give students little that they would not have if the test did not exist but can punish students who do poorly. Some such tests are meant to rank students hierarchically based on their performance, much as IQ tests do.¹ Other kinds of tests are meant to change how classrooms and schools function by shaping curricula or putting pressure on schools to improve (as judged, not coincidentally, by scores on standardized tests). Those varying goals for tests are not mutually exclusive. Some tests are designed to do one thing, while others are multipurpose.

AP tests, more than most, provide a carrot for students: Do well on this test and be rewarded; do poorly and there is relatively little negative consequence. Exploring the differing roles that various kinds of tests have played may help put AP tests in their proper context; it may also say a great deal about how (or if ) testing has changed education and how it has served to enhance, rather than reform, the major functions of public schooling in the United States.

The focus here is not on how the AP United States history test has changed over time, but instead on how it, and the AP program in general, fit into the broader history of testing in the United States over the past century. I will briefly describe several tests that have had an important impact, either by shaping what occurred in schools or by affecting individual students' opportunities. I will also examine how testing has functioned in a society and an educational system that use schooling for the dual functions of passing along class status to children of middle- and upper-class families (thereby reinforcing inequality) and allowing some individual upward mobility.

AP tests were not the first exams designed to determine whether high school students were expert in specific subject matters in order to establish their future status in college. In 1901, half a century before the AP exams were developed, the College Entrance Examination Board (CEEB) began offering subject tests. The CEEB was created in 1900 largely by Harvard University's Charles William Eliot and Columbia University's Nicholas Murray Butler. Its driving belief was that the use of standardized entrance examinations by universities and colleges across the nation would be a positive advancement for both colleges and students. The first CEEB tests, soon to be known as the College Boards, were given to just under one thousand high school students in 1901. The developers of the test hoped that colleges everywhere would recognize the results of those tests; surely, they believed, such uniformity would be better than the chaotic approach then used, in which colleges had idiosyncratic admission requirements for prospective students. The CEEB also hoped that its exams would help elevate the standards of secondary schools, encouraging them to provide colleges with better-trained students. The original College Boards thus aimed to provide two important standardizing functions: one, to make secondary schools more alike in what they taught while simultaneously raising their standards and, two, to make colleges more alike in how they made admissions decisions. The central goal of the tests was to connect high schools and colleges. (History was one of the original nine subjects tested, with a committee from the American Historical Association defining the subject.)²

The original exams were reasonably well received but were also criticized by many secondary school officials. So in 1915 the CEEB developed a new set of subject exams, which were based on recent practices at Harvard University and were more comprehensive than the original College Boards. During the 1920s the new College Board tests were widely recognized as having increasing influence over what secondary schools chose to teach, although observers differed over whether or not this was a good thing; the CEEB's influence troubled many school administrators. The CEEB's development of the Scholastic Aptitude Test (SAT), first offered in 1926, increased that organization's influence on college admissions but, ironically, weakened the influence of the College Board subject tests. By the mid-1930s some colleges required only the SAT, and the number of students taking it was growing even as the number of students taking the College Boards was declining.³ Somewhat unintentionally, the CEEB had replaced a group of subject matter tests with one test that assessed more general aptitude and education. The College Boards had influenced how subjects were taught in many secondary schools; the SAT did not do so to the same extent. Instead, it sorted students hierarchically, in practice rewarding some and harming others.

Over the past six decades, no other test has had as powerful an effect on students across the nation as the SAT. Carl C. Brigham originally developed the SAT for the CEEB in the early 1920s. The SAT was expected to test scholastic aptitude, by which Brigham and the CEEB meant how well students would do in college courses; high SAT scores were expected to predict later success in undergraduate work, while low scores would be a red flag warning of possible failure. It took its modern form in 1941, and its influence over college admissions grew for decades thereafter. For many years, it was called the Scholastic Aptitude Test. Recognition that there was no clear distinction between aptitude and achievement eventually led it to be renamed the Scholastic Assessment Test. It was seen, rightly or wrongly and despite considerable criticism, as a relatively objective measure of a student's knowledge and capabilities. There is a powerful carrot-and-stick quality to the SAT, but that should not be overstated, since the quality of the school one attends has considerable influence over one's likely SAT scores. Whether or not a student can take advantage of the reward aspect of the SAT depends heavily on things other than innate ability and effort. For example, no matter how much the Educational Testing Service (ETS) has tried to avoid racial bias in the test, evidence surfaces on a regular basis that the test is biased.⁴ Whatever bias there is reduces mobility rather than enhancing it.

At the beginning of the twenty-first century, there are signs that the SAT is becoming less significant in shaping college admissions. The nation's most prominent state university system, the University of California (UC), recently changed its admission requirements to accept the top 4 percent of every California high school's graduating class, without regard to SAT scores. As Peter Schrag, a critic of current educational policy, wrote, SAT scores "had always been a major barrier to poor and minority students." Now the UC system is trying to weaken that barrier.⁵

AP tests, however, are now playing an increasingly prominent role in college admission decisions. While both AP and SAT tests serve as gateways between high school and college, they have relatively little in common as tests other than a common institutional home at the ETS. (In 1947, after several years of discussion, the CEEB, the Carnegie Foundation, and the American Council on Education joined to create a nonprofit corporation, the ETS. The CEEB gave the ETS all its testing contracts that were not related to its own admissions test programs but retained considerable influence on how the ETS functioned. In 1951 the Ford Foundation created the Fund for the Advancement of Education, with the hardly modest goal of improving education in the United States. The Fund, as it was know during its sixteen-year existence, became involved in many aspects of education. In collaboration with several prominent universities, the Fund created what would become the AP program, which it then handed over to the CEEB and the ETS in 1956.)⁶

In the immediate post-World War II era, opinions were divided over whether the College Board exams were, or should be, influencing secondary schools. ETS officials did not want them to influence curricula and refused to believe that they did so, despite research to the contrary. ETS officials believed the central function of their admission tests--the SAT and the College Boards--was to provide schools with a fair and accurate measure of what students could achieve in college. For this and other reasons, ETS officials refused to believe that the tests themselves might change what or how high school teachers taught.⁷ AP tests, in contrast, were developed hand in hand with curricula to prepare students to take them. They were, from the start, somewhat like College Board exams on steroids. Although they were designed to judge a stu-dent's knowledge of a specific subject, they were not simply tests given to students about to graduate. Instead, they came as part of a larger package that included classes for secondary schools. AP courses would teach the material that the AP tests would examine. They represented, if reluctantly on the part of some of their designers, a very specific intrusion into the way secondary schools taught. Like the College Boards and the SAT, AP tests also helped colleges decide which students to accept.

But for students who did well on the tests, there was, and is, an added bonus. AP courses were intended to be college-level courses for high school students, and so doing well on AP tests could provide students with college credit (although many colleges were very slow to award credit in the early years of the AP program). The willingness of colleges to accept AP courses as equivalents to their own implies that there is a specific body of knowledge that experts can agree is crucial for students in the high school senior and college freshman nexus to know--a claim that, in practice, historians have doubted. The history committee that was initially supposed to design the history test refused to do so, believing colleges should simply trust secondary schools to teach what mattered and to do so well. It would be two years before American and European history tests were developed. But they were in place by the time the ETS took over the AP program in 1956.⁸

The CEEB claimed it was not interested in influencing curricula, but it was obviously the nature of AP tests to do just that. As John A. Valentine, the author of The College Board and the School Curriculum, the best single study of the CEEB's history, put it: "Although it was centered on placement after admission rather than on admission, it was strikingly reminiscent of the Board's original entrance examination program: the AP Program encouraged schools to teach certain courses and offered teachers guidance in teaching these courses."⁹

In the late 1950s, ETS officials began active efforts to increase the number of schools with AP classes, the number of students taking the AP tests, and the number of colleges and universities willing to give credit for passing grades on those tests. The number of students taking AP tests in 1960 was five times what it had been in 1956. The number of schools offering AP courses has increased gradually since then, although in 2003 fewer than 60 percent of schools offered AP courses.¹⁰

Robert Blackey, a professor of history at California State University, San Bernardino, recently examined the AP European history exam in some detail, looking at 540 different essay questions that have been used in the test over the past four decades. As with other AP tests, the essay questions have been the result of the long and grueling work of experienced teachers. Blackey wrote that they "are meticulously crafted-- typically over a period of two-to-three years--by the dedicated teams of secondary school and university historians who have comprised the test development committees." While some questions do not work as well as others, Blackey found them to be innovative and generally of very high quality.¹¹

Right from the start, the goal of the European history essay questions was to require students to interpret historical issues and make thoughtful judgments about them; just memorizing facts would not do. One important shift came in the mid-1970s when the AP United States and European history tests first began using docu-ment-based questions (DBQ). Students had to answer the DBQ, but they could choose the other essay question(s) they wanted to answer from a variety that were offered.¹²

Blackey found that the European history test questions followed trends within the historical field more generally. Between 1956 and 1972, most questions were political or diplomatic in nature. After an influential conference in 1972, the questions shifted from a topical and chronological to a thematic structure, and social, cultural, economic, and intellectual history questions became much more important. There is no reason to think that the AP United States history exam has been any more resistant to changes in the historical field.¹³

AP tests benefit students who take them; conversely, students who attend schools that offer little or nothing in the way of AP courses are at a distinct disadvantage in the college admissions process. The University of California system is hardly alone in its practice of treating grades earned in an AP class as something superior (a 4.0 A can become a 4.5 or even 5.0 A). AP courses are intended to be small and intensive, and they are therefore expensive. AP students also need to be at a certain educational level if they are to have a reasonable opportunity to succeed, which means earlier years of schooling must have been productive. In short, AP classes require a school system with considerable resources. They are, as a result, much more likely to be found at schools in communities of some wealth than in those of working-class or disadvantaged communities. In California, several minority students tried to sue on the basis that their schools offered very few AP courses, thereby putting them at a disadvantage compared to students attending schools with more resources and more AP courses.

California governor Gray Davis responded by announcing that he would address the issue by making sure that every public high school had enough money to provide at least four AP courses. If Davis had kept his promise, which he did not, that would hardly have been an adequate solution, but for some schools in working-class or impoverished communities, it would have been a step forward.¹⁴

The lawsuit did not go to trial, but the issue it raised was very real; the availability of AP courses and hence the possibility of passing one or more AP tests often reflect the gap between rich and poor. AP courses are a carrot that is far more likely to be available to students in middle-class and wealthy communities. Even so, they can also serve as levers to improve the quality of a school. Some schools in disadvantaged communities have moved to offer AP courses as a way of simultaneously raising their standards and increasing their students' achievement levels and likelihood of attending college. In Seattle, Washington, James A. Garfield High School was seen as failing before it became a magnet school that offered, among other things, "a huge array of Advanced Placement" courses. Garfield has been a success as a magnet school, attracting a diverse student body and dramatically improving student achievement. The availability of AP courses has undoubtedly played a role in these changes. Now the school "produces each year more National Merit semifinalists than any other school in the state of Washington." (Conversely, some schools with strong reputations, such as Regis High School, a highly respected Catholic school for boys in New York City, do not need AP courses to elicit respect from college admissions boards. Regis does not offer AP courses because its teachers believe they are better judges of what their students should learn than standardized test makers are.)¹⁵

In the 1990s, standardized testing began to move to the forefront of educational reform as the latest in a series of educational reforms stimulated by the publication of A Nation at Risk in 1983. Since the 1990s, the focus of reform has been on making schools accountable and raising the standards of individual students. In turn, the pursuit of accountability and standards has relied on testing as the way to judge the achievements of states, districts, individual schools, and individual students. Many states now use high-stakes standardized tests to determine whether or not students can graduate from eighth grade and attend high school and whether or not high school seniors can receive a diploma. Reading and math are the most commonly tested subjects, with history and science not far behind. The No Child Left Behind Act, a 2001 renewal and revision of the Elementary and Secondary Education Act passed in 1965, relies heavily on tests, but it is following, not setting, the nationwide trend in doing so.¹⁶

The main thrust of standards and testing has been to try to force meaningful school reform through threats: threats of state take-over, threats of large numbers of failing students, and threats of reduced funding. Barring the direst of sanctions--state take-over--it is the students who will suffer when schools are not doing a good job, not the schools or the people running them. The quality of reform in response to the push for standards and the use of tests varies tremendously. The basic assumption is that holding schools accountable will force principals and teachers to do a better job and students to try harder. But some states have failed to provide either funding or guidance on how local districts might improve, while some have provided guidance of questionable quality. Still others have provided a reasonable increase in funding along with increased oversight. Nowadays tests are not simply used to judge the success of reform efforts; they are the reform efforts.

The only other reform that is nearly as widespread and potentially powerful is the movement for more equitable school funding, which has had many ups and downs since its start in California in the late 1960s. The Serrano v. Priest decision of 1971 was the first case in which a state supreme court found the funding of schools by a state system to be inequitable because it did not provide sufficient resources to schools in disadvantaged communities. But school funding movements have sometimes failed in the courts, and, even when they succeed, are ofen opposed by legislatures and the large segments of the population who do not want to pay higher taxes or see taxes taken away from their own schools and given to schools in other communities. Testing, conversely, has been extremely popular, as it is seen as a way to make sure that students are actually learning, that their diplomas mean something, and that standards are indeed raised. Even so, some states are already retreating from testing as its downside becomes more visible; it is highly expensive to do properly and, if standards are kept high, leads to high failure rates, especially among groups that have traditionally been treated poorly in many school systems, such as children of color and children with learning disabilities.¹⁷

State accountability systems that rely largely (or solely) on standardized tests have also invigorated advocates seeking higher state funding levels from state courts. Peter Schrag has made the argument that by 2002 there was "a growing understanding among moderate liberals that the best weapon they had for bringing better teachers and schools to low-income children and--in general--for upgrading the schools was those tests."¹⁸ In the early 1900s, the CEEB hoped to use its tests to improve secondary schools so that they would provide colleges with better-trained students; current legal advocates for higher levels of school funding hope to use existing state tests to improve secondary schools so they will do a better job of educating their students. We have hardly come full circle, but there is an interesting connection between the two strategies.

Despite the qualms of the ETS, AP tests are a prime example of testing affecting curricula. But they are becoming more than that; they are increasingly seen as a mark of a school's quality. AP testing is so prevalent in "better" high schools that its very existence, or lack thereof, can be seen as proof of the school's quality. In the summer of 2003, Newsweek published a cover story on the nation's best schools. Using the number of AP classes offered by individual schools as the means for ranking them, the author, Jay Matthews, listed the nation's "Top 100" public high schools. Not surprisingly, many of the schools listed were in upper-middle-class or wealthy communities. AP courses, as Matthews recognizes, are a very narrow measure of a school's quality. It is difficult, however, to come up with a better single measure of a school's quality, if by quality we mean setting high standards and preparing students for college. There is no good single measure; the prevalence of AP courses is only one of a number of flawed measures from which to choose. That Matthews and Newsweek made that choice is certainly a sign of the extent to which AP courses have become permanently embedded in the more affluent public schools and in our assumptions about what a good school provides.¹⁹

High-stakes testing seeks to improve schools, but in the early years of the twenty-first century, it has done relatively little to improve troubled schools. In a similar vein, states can push their schools to provide AP courses in order to make them look better to outsiders. Whether those AP courses will actually improve the schools is another matter, as is the question of what improvement entails. After AP courses had been used to improve James A. Garfield High School in Seattle, William G. Ouchi wrote, "Teachers had formed two conflicting camps, one which taught the neighborhood black students, and the other which taught the AP courses to the whites and Asians."²⁰ Testing, in the case of AP tests and in general, is more likely to reinforce class structure than to make it more fluid.

But that does not mean testing cannot facilitate reform. There are ways to use tests other than to sort, to push curricula in a particular direction, or to scare schools into improving. In the 1930s and 1940s, Ralph Tyler invented the field of educational evaluation almost single-handedly. Tyler had a different use in mind for tests: to judge the effectiveness of programs, including, but not limited to, curricula and teaching methods, and then to use the knowledge gained to improve schools.²¹ Evaluators have used testing in this way ever since, not to rate students, but to rate programs and policies. The goal is not to praise or punish a student (according to what he or she deserves, of course!), but to learn how well or poorly schools are doing and then to use the evidence to do something about it. That goal is not reflected in the use of most high-stakes testing, which is often used by states to punish schools that do poorly by withholding money or other aid when the schools seem to need help while rewarding schools that do well. Tyler's conception also has little in common with AP tests and SATs, neither of which is directly concerned with the quality of schooling being offered to American students as a whole.

Timothy A. Hacsi teaches history at the University of Massachusetts, Boston. He would like to thank Gary Kornblith and Carol Lasser for extremely helpful feedback on earlier versions of this article.

Readers may contact Hacsi at <tim.hacsi@umb.edu>.

¹One of the most useful arguments about the sorting function of IQ tests in the early twentieth century appears in Paul Davis Chapman, Schools as Sorter: Lewis M. Terman, Applied Psychology, and the Intelligence Testing Movement, 1890-1930 (New York, 1988).

²John A. Valentine, The College Board and the School Curriculum: A History of the College Board's Influence on the Substance and Standards of American Education, 1900-1980 (New York, 1987), 1-6, 13.

³Ibid., 25-40.

⁴Christopher Jencks, "Racial Bias in Testing," in The Black-White Test Score Gap, ed. Christopher Jencks and Meredith Phillips (Washington, 1998), 64-74; Jay Matthews, "The Bias Question," Atlantic Monthly, 292 (Nov. 2003), 130-40.

⁵Peter Schrag, Final Test: The Battle for Adequacy in America's Schools (New York, 2003), 100-101.

⁶Valentine, College Board and the School Curriculum, 55-59, 79-86; Robert Orrill, "Grades 11-14: The Heartland or Wasteland of American Education?," in A Faithful Mirror: Reflections on the College Board and Education in America, ed. Michael C. Johanek (New York, 2001), 82-83.

⁷Valentine, College Board and the School Curriculum, 69-72.

⁸Ibid., 79-85.

⁹Ibid., 74.

¹⁰Ibid., 87-90; AP Central <http://apcentral.collegeboard.com/program> (Nov. 14, 2003).

¹¹Robert Blackey, "Advanced Placement European History: An Anatomy of the Essay Examination, 1956- 2000," History Teacher, 35 (May 2002), 311-42, esp. 312.

¹²Ibid., 312-24.

¹³Ibid., 320-24.

¹⁴Schrag, Final Test, 98-99.

¹⁵William G. Ouchi with Lydia G. Segal, Making Schools Work: A Revolutionary Plan to Get Your Children the Education They Need (New York, 2003), 38-39, 176-77.

¹⁶National Commission on Excellence in Education, A Nation at Risk: The Imperative for Educational Reform: A Report to the Nation and the Secretary of Education, United States Department of Education (Washington, 1983). On the 1965 law and its effects, see ESEA: The Office of Education Administers a Law (Syracuse, 1968). For the No Child Left Behind Act, Pub. L. No. 107-110, 115 Stat. 1425 (2001), see <http://www.ed.gov/nclb/overview/intro/presidentplan/index.html> (Dec. 5, 2003).

¹⁷Serrano v. Priest, 487 P.2d 1241 (1971); Timothy A. Hacsi, Children as Pawns: The Politics of Educational Reform (Cambridge, Mass., 2002), 182-203.

¹⁸Schrag, Final Test, 111.

¹⁹Jay Matthews, "The 100 Best High Schools in America," Newsweek , June 2, 2003, pp. 48-54.

²⁰Ouchi with Segal, Making Schools Work, 38-39.

²¹Hacsi, Children as Pawns, 5-6.