State officials push alternative ways to report test scores

Top education leaders in California are dissatisfied with how the federal government requires calculating school performance using standardized test scores. But they say the California Department of Education lacks the time and money to do analyses that may better measure achievement of low-performing subgroups.

Students in California and 15 other states who take the Smarter Balanced assessments in math and in English receive a score on a scale between 2,000 and 3,000. The issue is how to interpret that score, and whether the states should focus on how students improve from year to year, not just whether they meet a set proficiency standard.

“If we are unable to do the analysis that helps us figure out if the gaps are closing, then it is faith-based education reform rather than data-driven education reform,” board member Patricia Rucker said at the State Board of Education meeting last month.

Deputy State Superintendent Keric Ashley put out an appeal for free help at the meeting. “We need to partner with some good-hearted university that might be willing to look at this data, because we don’t have money to give them,” he told the board.

Ashley said last week that the Learning Policy Institute, a nonprofit research organization in Palo Alto founded by Stanford University Professor of Education Emeritus Linda Darling-Hammond, had offered its help. In a recent EdSource commentary, Darling-Hammond and researcher Leib Sutcher presented a high-level look at data the state board says it needs for a better sense of how all students are performing from year to year.

Connecticut is among the Smarter Balanced states that will be releasing some of the research that California says it can’t afford, such as following the performance of a group of the same students from year to year. An analysis of growth of scores on the scale can provide a more refined look at the lowest-scoring student subgroups, though few have yet scored at a proficient level, such as students with disabilities and newly identified English learners.

As it did under the No Child Left Behind law, the federal government currently is continuing to require states to report the percentage of students overall and by race, ethnicity, income and other characteristics who were proficient in English language arts and math. Smarter Balanced defines this as having reached the point on the scale that for each grade designates “Standard Met,” the third of four scoring levels. The other levels are Standard Not Met (Level 1), Standard Nearly Met (Level 2) and Standards Exceeded (Level 4).

This year, in California, 49 percent of students in grades 3–8 and 11 were proficient in English language arts, up 5 percentage points from 2015, the test’s first year. In math, 37 percent of students met or exceeded standards, up 4 percentage points from the year before.

The results revealed great disparities among student groups: While 72 percent of Asian students (up 3 percentage points from last year) and 53 percent of white students (up 4 percentage points) achieved Standard Met, the definition of proficiency, only 18 percent of African-American students (up 2 percentage points) and 24 percent of Hispanic students (up 3 percentage points) did.

More fine-grained alternatives

Proficiency rates are a useful measure of achievement. In setting the Standard Met point on the scale, groups of educators and officials from the Smarter Balanced states agreed that students at that level would be on track to be ready for college-level work. But that measure also has limitations, which more than 40 academics outlined in a letter to U.S. Department of Education Secretary of Education John King, Jr. in July. They said that it creates incentives for schools to focus attention and resources on those students near the point of proficiency instead of students further below or above, who may need as much or more help. It doesn’t encourage schools to advance students to high levels of achievement, because they don’t get credit for it, the letter said, and it penalizes schools with larger proportions of low-achieving students since they aren’t given credit for improvements in performance that don’t reach the point of proficiency.

The initial analysis by the Learning Policy Institute, for example, showed that Latino students had a slightly larger numerical growth in points on the scale than white students this year, even though they remain well below the proficiency standard and the gap between the groups remained large.

Connecticut is among the Smarter Balanced states that will be releasing some of the research that California says it can’t afford, such as following the performance of a group of the same students from year to year.

Darling-Hammond was among the signers of the letter, which was written by Morgan Polikoff, an associate professor of education at the USC Rossier School of Education.

State Board of Education President Michael Kirst and State Superintendent of Public Instruction Tom Torlakson reiterated many of the points in their own letter to the federal government, objecting to using proficiency percentages as the primary way to determine lowest-performing schools.

“We are concerned that this definition is not only too narrow and resembles California’s old paradigm of accountability, but more importantly, forces all states to only emphasize a particular point of achievement,” they wrote.

Ed Haertel, an emeritus professor at the Stanford Graduate School of Education, agreed in an email. He cautioned that using any arbitrary point on a scale to compare gaps in scores between groups over time will create statistical distortions.

And yet the state Department of Education didn’t ask for any other form of score analysis beyond what the federal government required in its three-year $240 million contract with the ETS, the company that tabulates the Smarter Balanced results, Ashley acknowledged.

Kirst said looking at alternative ways to analyze the test scores, such as how much individual students have improved, can also help the board set different benchmarks that would highlight and potentially expedite the improvement of low-performing groups of students.

Doug McRae, a retired standardized testing publisher from Monterey, said reporting test results using scale scores has been used for decades. The chief disadvantage is that they are harder to communicate than proficiency percentages. The only scale score that has taken hold with the public is the SAT, which is reported on a scale of 400 to 1600.

There are tradeoffs with different ways of analyzing scores, he said, which is why it’s useful to look at more than one.

To get more reports like this one, click here to sign up for EdSource’s no-cost daily email on latest developments in education.

More fine-grained alternatives

Share Article