Credit: Berkeley Unified School District
Students in California preparing for the Smarter Balanced tests.

Stagnant test results on the third year of the Smarter Balanced tests are posing an unexpected challenge for the State Board of Education: Hundreds of school districts may require county assistance because they failed to raise their scores in English language arts and math on the tests they took last spring.

As a result, at its meeting Wednesday, the state board will consider substantially reworking the criteria for rating schools’ and school districts’ scores. Doing so will reduce the number of poorly performing districts that could trigger outside help this year.

 Without rejiggering the criteria for determining high and low performers, the number of districts in the “red” zone, the lowest performing level on the California School Dashboard, in English language arts would more than double, from 81 districts identified last spring in the trial run of the dashboard covering 2015-16 test results, to 169 for the 2016-17 tests. That would comprise 11 percent of school districts.

In math, districts that would be rated as red would nearly double from 119 identified for the 2015-16 tests, to 231 for the latest test — 14 percent of districts. Which districts won’t be known until the state releases more data later this month.

These numbers apply to average scores of all students in a district. They don’t include poor performance by low-performing student groups, including students with disabilities. The same dashboard criteria would apply to them, too. County offices of education could have to provide assistance to many additional school districts to help them raise those subgroups’ achievement.

Faced with that prospect, the California Department of Education is recommending that the state board reduce the number of districts, student subgroups and schools falling in the lowest performing level by changing the dashboard’s color coding and making other changes for test scores. The education department memo does not mention the capacity of 52 county offices of education to provide help. But in previous discussions of the dashboard, state board members said they were concerned that hundreds of districts needing help could overwhelm counties’ ability to provide it, particularly in the first year of a new accountability system.

The department is making its recommendation on advice from the Technical Design Group, a committee of technical advisers who have not been identified and who meet behind closed doors. In a Nov. 3 letter to the state board, 14 student advocacy and civil rights organization in the Local Control Funding Formula Equity Coalition criticized the board’s haste in deciding what to do. Later this month, the state is scheduled to formally launch the state’s long-awaited school and district accountability system, with the release of the second-year results of the dashboard.

“All this has come at the last minute, with very little notice to stakeholders and the public, a rushed closed-door meeting with the Technical Design Group, and no time to run analyses on how these proposed changes affect student subgroup performance. We should not be changing expectations for schools and districts without more data and understanding,” the letter said.

“The mode of operation employed at this point (by the Technical Design Group) is deeply troubling,” wrote Samantha Tran, senior management director at the Oakland-based children’s advocacy nonprofit Children Now, in a letter to the board. Children Now asked for more openness, with the state listing design group members and publicly posting its agendas and allowing the public to comment.

How the dashboard works

The dashboard uses color codes to rate the performance of schools, districts and a dozen student subgroups on a range of statewide indicators, including suspension rates, graduation rates, performance of English learners and Smarter Balanced test scores. Indicators for chronic absenteeism and students’ readiness for college and careers will be added in the future.

The five color performance bands — red, orange, yellow, green and blue, from lowest to highest — will determine consequences. Generally, districts that rank red in two or more indicators, such as suspension rates and test scores, must receive technical help from their county office of education. Districts also must highlight red or orange indicators in their Local Control and Accountability Plans and detail plans for improvement.

Low-performing schools won’t receive assistance from county offices until next fall, under criteria that the state board will choose early in 2018 under the federal Every Student Succeeds Act.

A district’s dashboard color for math and English language arts performance gives equal weight to two factors: the latest average student scores and the amount the scores grew or fell from the previous year.

In the 2015-16 Smarter Balanced tests, California’s students showed strong growth in the percentage of students who were proficient in English language arts and math from the initial year of testing. In designing the dashboard criteria based on that year’s results, the state built growth into the dashboard assumptions. But in 2016-17, state scores on average were essentially flat. And more low-performing schools showed significant declines in scores than the year before and fell from orange to red on the dashboard.

Administrators from the Department of Education are recommending countering that by changing parameters to increase the number of districts that would be green and orange while reducing the criteria for red. Low-scoring districts whose scores declined significantly would become orange, instead of red as they do under the current criteria.

In making its case, the department argued that doubling the number of red districts in the current methodology “does not meet the intended purpose of the accountability system, which is to establish goals that are ambitious but also attainable by all schools throughout the state.” The memo suggested that the state may have to adjust the dashboard criteria for the testing indicator again in the next few years to further smooth out variations.

In its letter, the Equity Coalition acknowledged that it’s not good to have accountability measures that fluctuate annually. “However, we do not think the answer is to establish the significant negative precedent of altering the rubrics to produce the result the State finds more palatable.” Children Now has proposed technical fixes that the advising technical experts did not consider.

Because Smarter Balanced scores were flat or fell last year in English language arts in all 14 states that gave the test, some experts in assessment, including Edward Haertel, professor emeritus at the Stanford Graduate School of Education (see end of article), and Doug McRae, a retired assessment executive for textbook publishers, have said the problem could lie with the design and content of the test itself, not with the lackluster performance of California students on it.

“Before modifying the accountability system so significantly, the State should have the Technical Design Group and its experts examine the (Smarter Balanced) test more closely,” the Equity Coalition wrote.

To address a lack of growth in scores in the Smarter Balanced test, “we have to address the deficiencies” in the test, “not tinker with” the deficiencies by changing the dashboard parameters and color schemes, McRae wrote to the state board.

Administrators for the testing consortium so far have dismissed criticisms of the test and defended its reliability.

SHARE ARTICLE

Comments (9)

Leave a Comment

Your email address will not be published. Required fields are marked *

Comments Policy

The goal of the comments section on EdSource is to facilitate thoughtful conversation about content published on our website. Click here for EdSource's Comments Policy.

  1. Kathi Reece 7 days ago7 days ago

    Until people understand that an 8 year child is not developmentally able to type a multiple parargraph response to lit questions, the results of the language arts test will continue to flatten out. Perhaps, it should be labeled a keyboarding skills assessment. But heck, what do I know? I am just a lowly 3rd grade teacher.

  2. Bruce William Smith 1 week ago1 week ago

    That the rapidly growing number of Latino pupils in a Smarter Balanced state like California is increasingly struggling to achieve the Common Core's English standards on the state tests should not be surprising, if that data should be forthcoming in the near future; nor is a possible, logical reaction, the growing appeal of opting out of California's state schooling altogether, as the English-speaking community takes refuge from the state's continuously lowered standards so as to … Read More

    That the rapidly growing number of Latino pupils in a Smarter Balanced state like California is increasingly struggling to achieve the Common Core’s English standards on the state tests should not be surprising, if that data should be forthcoming in the near future; nor is a possible, logical reaction, the growing appeal of opting out of California’s state schooling altogether, as the English-speaking community takes refuge from the state’s continuously lowered standards so as to enable their children to compete in an increasingly competitive global economy.

  3. SD Parent 2 weeks ago2 weeks ago

    As a parent whose children have been on this roller coaster ride of Common Core and SBAC tests while the state and districts try to figure things out, I am deeply disturbed. I want to point out that changing the goal posts, as it were, to make the dashboard results less red and to alleviate the overloaded county offices from providing support for districts whose students are not performing well by the measure we … Read More

    As a parent whose children have been on this roller coaster ride of Common Core and SBAC tests while the state and districts try to figure things out, I am deeply disturbed. I want to point out that changing the goal posts, as it were, to make the dashboard results less red and to alleviate the overloaded county offices from providing support for districts whose students are not performing well by the measure we are using to assess them is in reality failing to support the students, who have only one shot at their K-12 education.

    I have watched in dismay as accountability in the LCFF (via the fictitiously named LCAP) and Common Core learning (via the SBAC/CAASPP) are all swept under the carpet by even the State Board of Education. The state and members of the SBE, not unlike the San Diego Unified District, purposely promote a narrative that obfuscates the truth in favor of presenting a more positive outcome to the public, while ultimately failing the very students they profess to be trying to help.

    As to the what caused the lack of improvement, the fact that numerous states experienced similar ELA results would strongly suggest that the test – being the one common variable – is the most likely culprit. That does not explain the math results, but my experience at the school site level is that teachers are still trying to figure out how to teach Common Core Math and do not have the tools (both in professional development and texts) to do it well.

    I would suggest that, rather than moving the goal posts, much more effort be expended to find pockets where students are genuinely improving and to share best practices.

  4. M. Fetler 2 weeks ago2 weeks ago

    It is always tempting to kill the messenger. Before casting doubt on the test a deep dive into California’s demographics, especially poverty-related items may well be most productive in planning remedial strategies. Nine times out of ten changes in economic conditions provide better explanations for stagnant or declining test results.

    Replies

    • Doug McRae 2 weeks ago2 weeks ago

      Mark: My reason for casting doubt on the test pretty much eliminates the effect of any changes in California demographic changes from 2016 to 2017. My reason is based on consortium-wide 2017 gain results from 2016 to 2017 [20 states + DC] accounting for just over 30 percent of the entire country's K-12 enrollment, and one-year demographic changes for that many students can't be a whole lot. In addition, my reason is based … Read More

      Mark: My reason for casting doubt on the test pretty much eliminates the effect of any changes in California demographic changes from 2016 to 2017. My reason is based on consortium-wide 2017 gain results from 2016 to 2017 [20 states + DC] accounting for just over 30 percent of the entire country’s K-12 enrollment, and one-year demographic changes for that many students can’t be a whole lot.

      In addition, my reason is based on change scores decreases from ’16 to ’17 that were substantially larger for Smarter Balanced than for PARCC, with much larger change score losses for ELA than for math for Smarter Balanced compared to PARCC. In other words, the pattern of consortium-wide data fingered Smarter Balanced ELA scores as the likely culprit.

      It’s really hard to make either a demographic or a curriculum / instruction rationale for that kind of loss pattern from ’16 to ’17. We don’t have any consortium-wide data to check out whether the losses were differential by demographic group, and it’s logical to look at such data, but it’s not likely to be the main reason that Smarter Balanced had starkly different losses for ELA vs math and starkly different ELA losses compared to PARCC ELA gains for the same time period.

  5. el 2 weeks ago2 weeks ago

    I would be very interested in any pointers to better understand how scores are calculated given the adaptive nature of the test, where students who are getting questions correct are given harder questions and students getting them wrong are getting easier questions. For example, if an 11th grader is proficient to the Algebra II level and gets most of those questions correct, and they get questions that rely on trig and statistics knowledge that the … Read More

    I would be very interested in any pointers to better understand how scores are calculated given the adaptive nature of the test, where students who are getting questions correct are given harder questions and students getting them wrong are getting easier questions. For example, if an 11th grader is proficient to the Algebra II level and gets most of those questions correct, and they get questions that rely on trig and statistics knowledge that the student maybe hasn’t even been exposed to, what kind of score would that create? It might be helpful to have some scenarios of what kind of hypothetical knowledge and ability results in each score range.

    I also wonder if anyone has the data to compare the 11th grade scores to SAT data. Obviously not every student takes the SAT, but enough do that you should be able to compare them to the results by say the first quarter of the senior year. College Board and the universities have picked some numbers for those scores that they consider “college ready;” I wonder how often CAASPP/Smarter Balanced and SAT agree or disagree, and if we can learn anything from that.

    Replies

    • John Fensterwald 2 weeks ago2 weeks ago

      Great questions, el. I hope one of our experts responds to the scoring of adaptive test -- particularly as it relates to the 11th grade test, which is intended to capture students who may be currently taking geometry, Algebra II or even Calculus and may not remember what they did in geometry two years earlier. Long Beach, which wants to switch from Smarter Balanced to SAT for the 11th grade test, reports there is a … Read More

      Great questions, el. I hope one of our experts responds to the scoring of adaptive test — particularly as it relates to the 11th grade test, which is intended to capture students who may be currently taking geometry, Algebra II or even Calculus and may not remember what they did in geometry two years earlier.

      Long Beach, which wants to switch from Smarter Balanced to SAT for the 11th grade test, reports there is a strong correlation between its students’ scores on SAT and Smarter Balanced. So does SAT, for obvious business reasons. The feds require that whatever test a state uses be aligned with its standards — Common Core in the case of most states. A number of states have swapped their tests for ACT or SAT; I don’t know what studies they had to provide for federal approval.

      A reminder that the 11th grade SAT results are not included in the dashboard for the academic indicator. That covers only grades 3-8. The state board decided 11th grade test results should be included as one element of the college and career readiness indicator, which is still under development. Many of the advocacy groups cited in the piece today disagree with that decision and argue that, as the only math and ELA standardized test that students take in high school, it deserves more prominence and weight than the state board is giving it.

    • Doug McRae 2 weeks ago2 weeks ago

      EL: Re how adaptive tests are scored, the short answer is that items are weighted so that higher difficulty items measuring more advanced material get higher weights than easier items measuring less advanced material. Sometimes, the weights can even involve placement in the test administration sequence. The long answer can be extremely complicated. Re grade 11, I've seen some data from another state that the old SAT is decently correlated with Smarter Balanced grade 11 scores, … Read More

      EL: Re how adaptive tests are scored, the short answer is that items are weighted so that higher difficulty items measuring more advanced material get higher weights than easier items measuring less advanced material. Sometimes, the weights can even involve placement in the test administration sequence. The long answer can be extremely complicated.

      Re grade 11, I’ve seen some data from another state that the old SAT is decently correlated with Smarter Balanced grade 11 scores, not highly correlated but possibly decent enough for aggregate data use but not individual student use. For high school, I’m intrigued with the ESSA option of using a “menu” of tests for high school testing, with each student taking a test that conforms to his or her instructional pathway [e.g., for college bound, academic tests; for non-college bound, perhaps a career instrument] with some sort of calibration process across the entire menu in order to meet the ESSA requirement that scores be aggregated for all students. Other states are pursuing this option, but I haven’t heard much interest here in California, so far only a bit of interest from Long Beach.

      • el 2 weeks ago2 weeks ago

        Thanks, Doug, I really appreciate that you weigh in here. Someday I’d love to hear the long answer too. 🙂 As someone who works with algorithms a lot I’ve started to view a lot more situations with Cathy O’Neil’s Weapons of Math Destruction sensibilities for unexpected and unintended results that have no opportunity to be corrected with outside feedback.

        This is a complicated coding problem and I always enjoy hearing how such problems are solved.