Credit: Alison Yin / EdSource (2014)
This post was updated on July 10 to cite a report by researcher Paul Warren.

A group of education professors and dozens of student advocacy groups are urging California education officials to switch to a method that most states use to rate student progress on standardized tests. They say it will more accurately measure and compare schools’ performance than what they see as the flawed system the state uses now.

However, State Superintendent of Public Instruction Tom Torlakson is recommending that the State Board of Education stick for now with the current method, which compares the most recent scores to the scores of students who took the test the previous year. Torlakson and staff of the California Department of Education say they want to be sure they’re choosing the right alternative to a two-year-old school accountability system that parents and teachers are just getting used to. The State Board of Education may decide what to do at its meeting on Wednesday. (See Item 1, attachment 1, of the board’s agenda for the staff recommendation and links to related documents.)

California is required under state and federal laws to inform the public how students are performing on state standardized tests in reading and math each year, and whether they’re showing improvement. At issue is the best way to describe a school’s impact on student improvement for state accountability purposes, how much progress their children are making compared with students in other schools and, for parents in particular, whether their kids are on track to be proficient in math and reading.

To meet this requirement, California has adopted a formula that dozens of leading academics say may actually paint an inaccurate picture of how a school is serving its students.

“California has chosen basically the worst growth measure you can possibly choose,” said Morgan Polikoff, a professor of education at the University of Southern California. He wrote a letter to the State Board of Education urging California to use a method that better demonstrates a school’s impact on a student. It was signed by more than a dozen education scholars, including those from UC Davis, Harvard University and the University of Washington.

The current measure takes last year’s average scores by grade and subtracts it from this year’s average scores by grade to derive a schoolwide average. The difference becomes a key factor on the California School Dashboard, the system that creates color-coded ratings for schools, districts and student ethnic, racial and demographic groups.

“That doesn’t work for a lot of reasons,” Polikoff said. One reason is that this model fails to consider student mobility. Particularly in high-poverty neighborhoods, large numbers of students move, so a cohort of students by grade is often different from the year before, undermining the validity of the comparison. Paul Warren, research associate for the Public Policy Institute of California, reached a similar conclusion in a report released late last month.

Instead, more than 40 states track the growth in individual students’ scores from year to year on standardized tests like Smarter Balanced, which students take in grades 3 to 8 and again in grade 11.

“Growth measures, on the other hand, tell us how much each individual student is learning and how much of that learning can be attributed to the school. Many researchers agree that growth is a superior measure for purposes of measuring how well schools are doing,” wrote three dozen student equity and advocacy organizations, including Oakland-based Education Trust-West and Teach Plus California, a nonprofit in Los Angeles.

In fact, the ability to support a growth model was one of the criteria that the state board used in choosing Smarter Balanced as its math and English language arts test, the nonprofit Children Now noted in its letter to the board.

State board shows interest

Many states adopted a growth model to gauge student improvement when they received a waiver from sanctions imposed by the federal No Child Left Behind law during the final years of the Obama administration. They had to agree to use the results of student test scores as a factor in evaluating teacher performance, although some states either never fully implemented that aspect or subsequently abandoned the practice. Teachers unions opposed the idea, and many academics criticized using student test results as a statistically unreliable basis for evaluating teacher effectiveness. Gov. Jerry Brown and the state board opposed the idea and never sought a waiver.

Two years ago, the state board indicated it was interested in switching to some version of a growth model for the state’s new accountability system and asked ETS, a company that has done assessment work for the state, and state education department staff to look into some of the variations. It has done so and took the charge seriously, said Cindy Kazanis, the director of the Analysis, Measurement and Accountability Division of the California Department of Education.

A growth model can be used to predict an average school’s yearly growth, using average scores of different student groups — low-income students, English learners, ethnic and racial groups. The model can then be used to rank schools’ performance, on a scale of 1 to 100, based on how far below or above average their individual students actually scored. A half-dozen California school districts, including Los Angeles, San Francisco, Oakland and Long Beach, called the CORE districts, have been using this method for five years to informally guide school improvement, though not for accountability purposes with consequences.

Staff of the education department studied a slightly different growth method, which proponents say reveals how much each individual student has learned in a year and how much of that learning can be attributed to the school. “Parents and educators alike want the most accurate information on student learning, they want to know the truth about how their schools are doing, and they want to be held accountable based on fair measures,” the advocacy groups said in their letter.

But after applying two years of Smarter Balanced test data to one particular growth model, the department concluded for technical and policy reasons the state should hold off moving ahead this year — and perhaps consider looking at other forms of growth models. The state should examine one or more additional years of data to see if ranking schools based on a growth model is valid and reliable, the department said. Edward Haertel, an emeritus professor of education at Stanford University and a longtime member of a technical advisory group for the department, confirmed that he and others unanimously agreed with the department’s conclusion.

And state staff said that a system that simply compares schools’ performance omits what the current dashboard provides: It doesn’t tell parents and teachers how much a school’s score increased toward the goal of reaching proficiency in math or reading. And comparing schools’ growth also doesn’t tell you if or how quickly the achievement gap among student groups is closing, said David Sapp, deputy policy director and assistant legal counsel of the state board. That, too, is what teachers and principals want to know, he said.

Samantha Tran, senior managing director for education for Children Now, said growth models can provide that information, too; it’s not either-or. The state board and staff must be clearer on what they’re looking for, she said.

What is clear, Polikoff wrote, is that the state’s current system of rating school performance based on comparing test scores by different cohorts of students is “unacceptable.”

To get more reports like this one, click here to sign up for EdSource’s no-cost daily email on latest developments in education.

Share Article

Comments (9)

Leave a Comment

Your email address will not be published. Required fields are marked * *

Comments Policy

We welcome your comments. All comments are moderated for civility, relevance and other considerations. Click here for EdSource's Comments Policy.

  1. Phil Rixstine 4 years ago4 years ago

    Of course tracking by student is better than tracking by school because tracking by school inflates the scores and rankings of the best recruiting schools at the expense of those that do not recruit. But the California Board of Education will never track by student because, after a few years, it will be obvious who the good and bad teachers are. Which is not something the Board or Superintendent, who are both owned by the … Read More

    Of course tracking by student is better than tracking by school because tracking by school inflates the scores and rankings of the best recruiting schools at the expense of those that do not recruit.

    But the California Board of Education will never track by student because, after a few years, it will be obvious who the good and bad teachers are. Which is not something the Board or Superintendent, who are both owned by the teachers union, will allow.

    What they haven’t figured out yet is the charter schools are able to recruit the better students (like mine who aced the CAASPP) and remove them from public school, which undermines the public schools.

    Look, holding teacher’s accountable is here to stay. More importantly, it will do even more to help public education than showing the only real skill charter schools have is for cherry picking.

  2. Bill Conrad 5 years ago5 years ago

    Funny that the Colleges of Education are leading the charge on the critique of the State System for monitoring achievement. The state test is a summative test that is more about adult practice in supporting student achievement at this point. It is valid to look at performance of students across years even though the children change from year to year as school systems should be able to show remarkable performance for all students … Read More

    Funny that the Colleges of Education are leading the charge on the critique of the State System for monitoring achievement. The state test is a summative test that is more about adult practice in supporting student achievement at this point. It is valid to look at performance of students across years even though the children change from year to year as school systems should be able to show remarkable performance for all students all of the time. I produced 3 years worth of achievement data in ELA and Math for school districts in Silicon Valley at http://sipbigpicture.com. When you look at a school district like San Jose Unified, you can easily see a pattern of poor performance across years overall and by subgroups. Over 3 years, slightly more than 1/3 of eleventh graders meet or exceed math standards and amazingly only 3% of English Learners each year for 3 years meet or exceed math standards. You don’t need a growth metric to see a pattern in this abysmal performance.

    That is not to say that a Growth Metric would not be helpful. I would be more than happy to post growth performance on my web site but I would need to have access to anonymized individual student data. I requested the data from SJUSD but was rejected as they hid behind the skirts of Mother State and said by law that they could not release the data which is not true as I asked for anonymized data.

    It is ironic that Colleges of Education rebel at the display of student achievement data and are looking for Growth metrics. I am not surprised as they are one of the root causes of the poor performance that we see in non-cohort improvement data. They crank out teachers who are content illiterate, unable to perform teaching practices at high levels, and unable to effectively use assessment information to improve student learning. You can see my blog post at http://sipbigpicture.com/blog.

  3. Bill Conrad 5 years ago5 years ago

    If you look a the data visualizations that I have produced for the school districts in Santa Clara County, you can clearly see value in comparing student academic performance across years and across grades. The underperformance of Hispanics, Blacks, English Learners, and Students with Disability is stark when compared to White and Asian student performances. It has value. It is clear and specific. While there is great value in growth visualizations, there is … Read More

    If you look a the data visualizations that I have produced for the school districts in Santa Clara County, you can clearly see value in comparing student academic performance across years and across grades. The underperformance of Hispanics, Blacks, English Learners, and Students with Disability is stark when compared to White and Asian student performances. It has value. It is clear and specific. While there is great value in growth visualizations, there is also value in non-cohort improvement visualization. Check out my displays at http://sipbigpicture.com

  4. Richard Illig 5 years ago5 years ago

    I am a retired San Francisco Unified School District 3rd grade teacher. Obviously 3rd graders generate the baseline of data for future comparisons. During my final three years of teaching, I administered the SBAC and each cohort of students had distinct results. But what was consistent from year to year was the confusion and alienation of many students as they attempted to decipher obtuse multi-part directions and complete confusing activities. Even with extensive keyboarding preparation, … Read More

    I am a retired San Francisco Unified School District 3rd grade teacher. Obviously 3rd graders generate the baseline of data for future comparisons. During my final three years of teaching, I administered the SBAC and each cohort of students had distinct results. But what was consistent from year to year was the confusion and alienation of many students as they attempted to decipher obtuse multi-part directions and complete confusing activities. Even with extensive keyboarding preparation, some students lacked the manual dexterity to move back and forth between sections of the test and often struggled with editing their written work. For at least 20% of the third graders, the SBAC appeared developmentally inappropriate. Has any research been done examining the behavior and attitudes of 3rd graders during the SBAC testing window?

    Replies

    • Floyd Thursby 5 years ago5 years ago

      I have an idea, why don’t we skip the tests and just ask kids to draw a smiley face and tell us if they are happy. We will not rate either, for who are we to judge smiley faces and what really is happiness? Let’s all just feel good and know that people are good and try their best. No borders, no profits, no prisons. Everyone is good. Except Trump. He’s bad.

  5. Jonathan Raymond 5 years ago5 years ago

    Let’s look to the work of CORE for inspiration and aspiration? As one of the original Superintendents in CORE, I can say the data collection, multiple measure, and continuous improvement work is their greatest contribution toward teaching and learning in California. Why wouldn’t the State adopt this growth model that is working? Why wouldn’t they do what’s best for kids?

  6. Bill Conrad 5 years ago5 years ago

    Adding a growth component to the current accountability system would certainly be an great additional improvement. However, it is still possible to make the case for comparing one year of performance to previous years of performance for academic subjects also has validity. The key assumption here is that the professionals within a school or school district should have the ability to move cohorts of students across the proficiency line from year year to year even … Read More

    Adding a growth component to the current accountability system would certainly be an great additional improvement.

    However, it is still possible to make the case for comparing one year of performance to previous years of performance for academic subjects also has validity. The key assumption here is that the professionals within a school or school district should have the ability to move cohorts of students across the proficiency line from year year to year even if the cohort of students is not the same. We should be able to see improvement in student outcomes from year to year even though the cohorts are different because it is the the adults in the system who are making the difference in achievement for students through the improvement of their professional practices.

    Both an improvement (different cohorts) and growth (same cohorts) have value in an accountability system. It just depends on your underlying assumptions and purposes.

  7. el 5 years ago5 years ago

    All we can do at best is proxy and guess with this data, and I think we do a terrible job at communicating that out. Staff of the education department studied a slightly different growth method, which proponents say reveals how much each individual student has learned in a year and how much of that learning can be attributed to the school. Let us be honest here: there is no way any exam can tell … Read More

    All we can do at best is proxy and guess with this data, and I think we do a terrible job at communicating that out.

    Staff of the education department studied a slightly different growth method, which proponents say reveals how much each individual student has learned in a year and how much of that learning can be attributed to the school.

    Let us be honest here: there is no way any exam can tell you how much an individual student’s learning is due to the school, due to the individual student’s own independent exploration, due to actions by the parents, or even due to external tutoring or schooling.

    In large numbers, we can make some broad assumptions about school program effectiveness across a range of students. If all the students test well, we can feel comfortable needs are being met. If all the students test poorly, those students need some different help that they’re not getting (but you don’t know what that is). IME when an individual student does well, you know they are doing fine, but if an individual student does poorly, you don’t necessarily know why or even if they might do well on the same material on another test day.

    We have to make assumptions and testing the kids itself is an educational act. Assuming that a kid who scored Standard Exceeded in 5th grade but Standard Met in 6th shows regression ignores the learning required to get that 6th grade score.

    I agree that comparing the scores of last years’ 4th graders to this year’s is fundamentally mistaken in terms of evaluating the input of the school and individual teachers.

    All of these numbers are clues, the beginning of questions, not the answers, and not the prescription for doing better.

  8. Wayne Bishop 5 years ago5 years ago

    The worst problem is that the SBAC exams have never been “smarter” or “balanced.” At least in mathematics, they represent the testing philosophy of Phil Daro who has been pushing the ideas since the days of the CLAS fiasco of the mid-90s. Bring back the CSTs and easy online comparison that they provided. California would be shocked if it realized how much educational competence has been deliberately squandered in the name of progress.