State rethinks how to report test scores

The Smarter Balanced Assessment Consortium has created a cross-grade scale that will enable tracking of a student's or school's growth in achievement from grades 3 through 11. The lines show the cut points dividing the four achievement levels in math.
Source: Smarter Balanced Assessment Consortium

California policymakers say they intend to create a different system for reporting results of the upcoming tests on the Common Core standards than parents and schools have become used to in the era of the No Child Left Behind Act.

At this point, they can’t say what it will look like. The reporting system is one of several moving parts that include recalibrating the Academic Performance Index, the current measure of school improvement, of which the results on the Common Core standards would be a big piece. But state leaders can say what the new system won’t be: anything resembling the federal system for measuring schools, which led to most being judged failures.

“States can report however we want and can include anything that we want,” said Michael Kirst, president of the State Board of Education, which is immersed in creating a new accountability system for districts and schools.

California education officials spent several days last month with their counterparts from the 22 member states in the Smarter Balanced Assessment Consortium deciding how to place students’ scores into bands of achievement on the new Common Core tests they will be giving next year. But state policymakers are distancing themselves from what they agreed to and say they plan to create a better way to present scores to parents and to judge schools’ and districts’ progress.

The Smarter Balanced representatives approved “cut scores” for the new tests. The scores are single points that divide the spectrum of test scores into four achievement levels, from the lowest, Level 1, to the highest, Level 4, indicating degrees of mastery of Common Core content. That approach is similar to what California and other states have done in the past and conforms with the requirements of No Child Left Behind, which has mandated that all states report annually how many students’ standardized test results in grades 3 to 8 and grade 11 fall within each level. The federal government has penalized schools whose students failed to score at least at Level 3  – defined in past tests as proficiency – in English language arts and math. Though much criticized and all but discredited, NCLB remains the law of the land.

The thresholds for achievement levels, known as  cut scores, on the Smarter Balanced tests were based on the judgments of teams of educators at each grade, then aligned for inter-grade consistency and adjusted to reflect how well students did on the field tests that students took last spring.  Smarter Balanced estimates that under half of students – closer to a third in some grades – will score at Level 3 next spring, the first administration of the official tests. This graph shows the achievement levels for English language arts.

Source: Smarter Balanced Assessment Consortium

The thresholds for achievement levels, known as cut scores, on the Smarter Balanced tests were based on the judgments of teams of educators at each grade, then aligned for inter-grade consistency and adjusted to reflect how well students did on the field tests last spring. Smarter Balanced estimates that less than half of students – closer to a third in some grades – will score at Level 3 next spring, the first administration of the official tests. This graph shows the achievement levels for English language arts.

Some Smarter Balanced states chafed at continuing the NCLB model. Vermont and New Hampshire abstained from the consensus vote to approve the cut scores.

Vermont’s secretary of education, Rebecca Holcombe, wrote a three-page letter that said the use of cut scores “misrepresents the underlying complexity of achievement and contributes to simplistic policies that make it difficult to achieve our public purposes.”

Kirst, who wasn’t at the meeting but participated in discussions with the state’s delegation*, said California leaders share Holcombe’s concern, but since they represent the largest state and are a driving force in the consortium, they didn’t want to abstain. Instead, California pressed member states to adopt a memorandum that was released with the cut scores. It reiterated some of Holcombe’s arguments with the admonition that cut scores and achievement categories “should serve only as a starting point for discussion” about student achievement and “should not be interpreted as infallible predictors of students’ futures.” Californians, including Stanford School of Education professor Linda Darling-Hammond, a technical adviser to the Smarter Balanced consortium, had a major hand in drafting the memo.

Creating categories to designate proficient and basic levels of performance, as NCLB requires, “has a public appeal” because it enables clear comparisons and can highlight achievement gaps among student groups, Holcombe acknowledged in her letter. But using a single point on a standardized test to set the levels, Darling-Hammond argues, is “a crude measure” and arbitrary. “A cut score has had a magic quality, yet is not as precise as numbers would lead you to believe,” she said.

Under NCLB, cut scores “led to dysfunctional behavior” in which school districts focused their attention on “bubble kids” – raising scores of those right below the cut score while ignoring those further below or above the line, she said. Schools would not get credit when students showed growth within achievement levels from year to year or when scores grew significantly but fell short of proficient.

The Common Core standards, most agree, are not only different from but also more challenging than those in most states, including California’s previous standards. As a result, state officials have repeatedly warned against comparing scores and achievement levels on Smarter Balanced tests with past states’ tests. To discourage comparisons, the Smarter Balanced consortium has avoided labeling Levels 1 to 4 as below basic, basic, proficient and advanced.

Kirst also said that technically, Smarter Balanced Level 3  does not designate that a student is performing at grade level. For 11th grade, Level 3 will signal much higher achievement, that a student is on track by the end of high school for coursework at a four-year college without the need for brush-up courses or remediation. California had lobbied for that rigorous definition of Level 3 as a potential replacement for the Early Assessment Program, which the California State University created a decade ago as a supplement to the 11th-grade state standardized tests.**

Since Smarter Balanced’s Level 3 would set a standard higher than proficiency, Kirst said, scoring below it doesn’t mean a student is doing subpar high school work or is not prepared for a two-year associate’s degree or entry into a training program to become, say, an electrician or dental hygienist. California may set separate cut scores for the high school exit exam or readiness for various post-secondary career paths, he said. The Advisory Committee for the Public School Accountability Act, which reports to the state board, will discuss various possibilities, he said.

Other options for reporting scores

Students who take the Smarter Balanced tests next spring will receive a four-digit score on a scale of 2,000 to 3,000, a point system intentionally chosen to differentiate it from what other states have used. Unlike the old California Standards Tests, the Smarter Balanced tests are designed to be scored on a continuum from grade to grade. This vertical alignment from 3rd- to 11th-grade tests is critical, because it will make it possible to track the progress of individual students and subgroups of students in mastering the Common Core over time.

Holcombe’s letter encourages states to report students’ scale scores, with attention to gains from year to year, instead of focusing on percentages of students who score within the different achievement levels. Kirst and Darling-Hammond say they favor that as well.

And instead of using single cut scores, they advocate reporting scores in what are called confidence bands, a score range that recognizes that cut scores are approximations of student knowledge. Confidence bands are like the margins of error that pollsters use when reporting voter surveys.

Doug McRae, a retired educational testing company executive who was in charge of designing and developing K-12 standardized tests, agrees with Kirst and Darling-Hammond. McRae has been an outspoken critic of the process that Smarter Balanced used in creating achievement levels (see his EdSource commentary) and the lack of openness in explaining how exactly the cut scores were determined, then modified based on the results of the Smarter Balanced field tests last spring.

“If California’s goal is to go to scale scores and report scores in ranges rather than achievement categories, from a purist perspective, I agree,” he said. “It is better in terms of reducing the overinterpretation of scores. This has also been the goal of K-12 test publishers  for the last 40 years, but has never materialized.”

Kirst said that the state board will spend the next 10 months revising the state’s school accountability system. The key date is Oct. 15, 2015, when California law requires the state board to have adopted a set of “rubrics” or rules for measuring state and local goals in district Local Control Accountability Plans. Measures of academic progress are only one of eight requirements in the plans, and state standardized tests are just one indicator within that requirement.

Before then, however, the state board must decide how to present next spring’s Smarter Balanced test results to parents and schools. That message will likely be: Consider the scores a useful first-year indicator but not the preeminent measure of achievement on the new standards or readiness for college.

* Members of the state delegation included Rich Zeiger, chief deputy state superintendent; Karen Stapf Walters, executive director of the State Board of Education, and Diane Hernandez, director of the California Department of Education Assessment Development and Administration Division. Deb Sigman, a former state deputy superintendent who is now a deputy superintendent in Rocklin Unified School District, and Keric Ashley, interim Deputy Superintendent of Public Instruction of the state Department of Education, didn’t attend the achievement-level setting meeting but have represented the state previously. Beverly L. Young, assistant vice chancellor for academic affairs, California State University, is a member of Smarter Balanced’s executive committee.

** An earlier version of the article incorrectly stated that the University of California campuses recognize results of the Early Assessment Program for placement in  non-remedial courses. Only CSU and many community colleges do. In the same paragraph, the definition of Level 3 in 11th grade has been clarified.

EdSource in your inbox!

Stay ahead of the latest developments on education in California and nationally from early childhood to college and beyond. Sign up for EdSource’s no-cost daily email.

Subscribe