Credit: Photo by Štefan Štefančík on Unsplash
The story was updated on Nov. 9 with new graphics and additional information.

Citing methodology flaws, the State Board of Education on Wednesday revised criteria for rating performance on standardized test scores on the new color-coded California School Dashboard. The unanimous decision will reduce the number of districts and schools rated red, the lowest performing of the five color categories, but board members and state administrators insisted that was not the motivating factor (see previous story).

“It would be worse to do nothing; that would undermine credibility (of the system) and create more volatility” in test score ratings, said board member Ilene Straus, who monitors school improvement and accountability issues for the board.

The fix — which was proposed by a technical group of advisers to the California State Department of Education — was in response to what would otherwise be a big fluctuation in school and district color designations on the “academic” indicator, which includes test scores, on the school dashboard.

The next version is due out at the end of this month. Districts’ and student subgroups’ performance on the dashboard will determine which districts will receive assistance this year from their local county offices of education.

The technical problem is based on the department’s assumption, in creating the dashboard criteria, that students would do better than they did on the most recent Smarter Balanced tests in math and English language arts, which they took in the spring. But scores on average were flat statewide, and in many schools and districts, they declined. Without changing the criteria for setting dashboard colors, the number of school districts and schools designated as red would double, as would the number in orange, the second lowest of the five performance levels.

The impact is unclear, because districts and student subgroups would have to perform in the red for at least one other indicator other than test scores — suspension rates, graduation rates and the performance of English learners ­— to be designated as needing help. State officials said they haven’t calculated how many districts would qualify either overall or for one or more of their low-performing subgroups, such as students with disabilities.

The changes in parameters that the board unanimously approved will reduce the number of districts and schools rated red, while significantly increasing the proportion rated orange. The combined percentages of red and orange districts actually would be greater than under the current system — proof, said board member Sue Burr, that the intention “was not to game the system.”

Board member Feliza Ortiz-Licon agreed volatility had to be fixed and there was no intent to “fudge the data,” but the state’s memo to the board “sounded like” there was an effort to “mitigate the red numbers.” And, she said, there is no data laying out what the effect on districts and minority students would be.

Districts and subgroups may be eligible for assistance if  results in both math and English arts are rated red, or if one subject is red and the other is orange. There is a good chance, however, that reducing the number of districts in the red performance level by nearly two-thirds under the new criteria will cut the number of districts and subgroups assigned help.

Other changes recommended by the Technical Design Group will lead to designate slightly more districts and schools “blue,” the highest performing level. More schools will also be “green,” the second highest performance category, which is the state’s goal, but far fewer district in the middle “yellow” level, where schools are slightly to moderately below grade level with no improvement from the previous year.

In a letter to the board, 14 civil rights and student advocacy groups acknowledge the need for some changes in criteria, but objected to re-designating low-performing districts whose scores dropped very significantly from red to orange.

And they objected to the hurried process and lack of public access to meetings of the technical advisory group, which meets privately and doesn’t make its minutes public.

“I feel rushed to make a big decision,” Ortiz-Licon said. “We may come back next year to the same conversation.”

Chances are board members will.

“This may not be the last time we make adjustments. We are going to get there but we are not there yet,” Straus said.

Cindy Kazanis, director of the department’s Assessment Development & Administration Division, said state officials made it clear when they set the initial academic indicator criteria that they would likely have to tweak it. It may take several years of results to produce consistent, predictable criteria for growth in test scores, she said.

On a related issue, Deputy State Superintendent Keric Ashley defended the reliability of the Smarter Balanced test itself. Some experts and advocacy groups suggested the lower test scores, particularly in English language arts, might point to a problem with the spring 2017 test itself, not California students’ performance on it. Students in all 14 states taking Smarter Balanced tests either declined or, like California, were flat, which is unusual.

But Ashley told the board that the executive committee of Smarter Balanced, which he serves on, turned to experts to look into that possibility. An initial study by the American Institutes of Research found no difference in the quality of the test questions or in how the test was conducted, compared with the year before, he said. That study has not been released, but the final report, due in February, will be public, he said.

“There is nothing wrong with raising questions, but it is not responsible to speculate” on the reasons why the scores were what they were, he said.

Source: California Department of Education

This chart shows the distribution of school districts by performance color for the academic indicator under the original criteria that the board adopted. It covers the 2016 and 2017 test results in English language arts in math. The number of districts in red would have doubled from 81 to 169 and nearly doubled in orange, had the board not reset the criteria.

Source: California Department of Education

This chart shows the distribution of districts under the new criteria for the academic indicator of the dashboard for English language arts and math, retroactively for 2016 and for the 2017 testing last spring. There will be fewer red and yellow districts, and a lot more orange and green districts than under the current criteria, shown in the chart above.

 

SHARE ARTICLE

Comments (8)

Leave a Comment

Your email address will not be published. Required fields are marked *

Comments Policy

The goal of the comments section on EdSource is to facilitate thoughtful conversation about content published on our website. Click here for EdSource's Comments Policy.

  1. Erika Dyquisto 7 days ago7 days ago

    So what, exactly, are the methodology flaws? Those of us who understand statistics would like to know….

    Replies

    • John Fensterwald 7 days ago7 days ago

      Erika, since you have an understanding of statistics, I encourage you to read pages 6-19 of the board agenda Item 3 that details the issues. The item explains in depth the challenge in predicting rates of change in scores, using only two years of data, that underlie color ratings.

  2. Brian Ausland 1 week ago1 week ago

    To begin, yes Doug McRae nailed it. Having the organization (AIR), that actually built and has financial interest in the success of this platform, assess its validity seems like a pretty poor decision that does not help ensure the process integrity. Here is the actual documentation for the $14 million line item for the "platform" courtesy of the Internet Archive (https://web.archive.org/web/20121014113840/http://www.k12.wa.us/SMARTER/Jobs-Contracts.aspx). With that said, having been involved in the initial research that focused on this … Read More

    To begin, yes Doug McRae nailed it. Having the organization (AIR), that actually built and has financial interest in the success of this platform, assess its validity seems like a pretty poor decision that does not help ensure the process integrity. Here is the actual documentation for the $14 million line item for the “platform” courtesy of the Internet Archive (https://web.archive.org/web/20121014113840/http://www.k12.wa.us/SMARTER/Jobs-Contracts.aspx).
    With that said, having been involved in the initial research that focused on this assessment, we were tasked with interviewing 5 different states that had been using AIR’s testing platform (that would serve as the basis for the SBAC system) for 5 years on average. We collected information to assess the pros and cons for schools, districts, and states in transitioning from paper/pencil to digital assessments. One of the handful of persistent findings was a general drop in student achievement results which was largely attributed to the increased accuracy of the assessment results. Essentially, they were getting a clearer picture of student performance results. Subsequently, all those states cited a series of re-calibrations that were necessary over the first few years of implementation to determine pass scores and proficiency levels.
    Regardless, more transparency is always a good decision when tinkering with metrics that define how our students and thus our schools are doing.

  3. Carrie Hahnel 1 week ago1 week ago

    We are told that this “fix” was in response to what would otherwise be big fluctuations in color designations on the academic indicator. But staff and board members said nothing of the fluctuations we are likely to see on other indicators. Our own internal analysis suggests that districts will also see big fluctuations on the suspension indicator. With a quick run of the numbers, we found that more than 25% of unified districts will see … Read More

    We are told that this “fix” was in response to what would otherwise be big fluctuations in color designations on the academic indicator. But staff and board members said nothing of the fluctuations we are likely to see on other indicators. Our own internal analysis suggests that districts will also see big fluctuations on the suspension indicator. With a quick run of the numbers, we found that more than 25% of unified districts will see their color designation change by two or more levels (for example, from Blue to Yellow or from Orange to Green). That’s more than the 15% we are told would see changes on the English language arts indicator, absent this fix. Should we conclude that the State Board is going to tinker with the colored rating system for any volatile indicator, or just any indicator that is both volatile and flatlining?

  4. Don 1 week ago1 week ago

    Like Keric Ashley said, I won't speculate. Instead, I have all the information or lack thereof to decide to opt out my son from testing. Until transparency is given more than lip service and test validity is confirmed, I suggest others do the same. After all, why subject your children to these tests and their questionable utility when we have so many unanswered questions which, according to Ashley, it is improper to speculate upon. No, … Read More

    Like Keric Ashley said, I won’t speculate. Instead, I have all the information or lack thereof to decide to opt out my son from testing. Until transparency is given more than lip service and test validity is confirmed, I suggest others do the same. After all, why subject your children to these tests and their questionable utility when we have so many unanswered questions which, according to Ashley, it is improper to speculate upon. No, sir, I will not speculate. I will opt out.

  5. Victoria Howard 2 weeks ago2 weeks ago

    The real deficiency is the poor academic performance of California schools. Manipulating the reporting system rather than correcting the deficiency adds an element of fraudulence to an already reprehensible situation.

    Replies

    • JudiAU 1 week ago1 week ago

      Agreed. The dashboard was constructed to obscure achievement data. Johnny can’t read but we didn’t suspend him!

  6. Doug McRae 2 weeks ago2 weeks ago

    Contrary to Ashley's statement that it is "not responsible to speculate," given Smarter Balanced lack of transparency for the technical data needed to generate a more informed opinion on what went wrong with 2017 Smarter Balanced gain scores across the entire consortium (14 states), the responsible thing to do is to voice informed speculation to narrow the potential causes for the problematic gain scores (for example, to prioritize ELA scores for scrutiny) so that folks … Read More

    Contrary to Ashley’s statement that it is “not responsible to speculate,” given Smarter Balanced lack of transparency for the technical data needed to generate a more informed opinion on what went wrong with 2017 Smarter Balanced gain scores across the entire consortium (14 states), the responsible thing to do is to voice informed speculation to narrow the potential causes for the problematic gain scores (for example, to prioritize ELA scores for scrutiny) so that folks analyzing their 2017 SB data are informed and can act appropriately.

    In addition, I’d mention that AIR is a major subcontractor to SBAC, hardly an unbiased source for any kind of independent review of work done directly by SBAC staff or another subcontractor. And, a Feb 2018 report is not timely for use of 2017 Smarter Balanced scores for spring 2018 LCAP deliberations for all California school districts.