Pioneered in California, publishing teacher "effectiveness" rankings draws more criticism
Mar 28, 2012 | By Louis Freedberg | 4 Comments
The release last month of “value-added” rankings of New York City teachers based on student test scores, a practice pioneered by the Los Angeles Times in the summer of 2010, has once again raised pointed questions about whether the rankings of individual teachers should be published by the media.
The practice threatens to increase resistance to the notion of linking evaluations to student performance being promoted by the Obama administration, the Bill & Melinda Gates Foundation and others. Publication of teachers’ rankings by name is a separate question from the validity of the “value-added” methodology, over which major questions continue to be raised, including yesterday in Chicago by nearly 100 education researchers in an open letter to Mayor Rahm Emanuel. But the two issues are related: what occurred in New York suggests that should school districts actually do the evaluations, the odds are that media organizations will request them, and districts will be forced to release them, regardless of their validity.
Whether or not to publish rankings will inevitably become more of an issue in states that receive a waiver from some of the most onerous requirements of the federal No Child Left Behind law. As a condition for receiving a waiver from the Obama administration, states will be required to establish teacher evaluation systems that take into account “student growth” in every school and district. California is still considering applying for such a waiver. States that received funds from the multi-billion dollar Race to the Top program are also struggling to set up such evaluations systems as required by rules established by the administration.
So far, the Los Angeles Times rankings — which the newspaper devised on its own with the help of a Rand Corporation researcher — are the only ones to have been been published in California.
As a result of requests from multiple media organizations, under New York’s Freedom of Information Law, and a failed lawsuit filed by unions to block their release, on February 24th New York city schools released “Teacher Data Reports” for 18,000 4th through 8th grade teachers. Included were so-called “performance rankings” for each teacher.
The release marked a 180 degree reversal of earlier written assurances by senior school officials in 2008 that the rankings would not be released when teachers agreed to participate in the “value-added”ranking system.
But even some vigorous proponents of linking teacher evaluations to test scores are unhappy about making rankings public, especially those based only on test scores.
Last month, Bill Gates came out strongly against the practice.
“Publicly ranking teachers by name will not help them get better at their jobs or improve student learning,” Gates wrote in an op-ed article in the New York Times. “On the contrary, it will make it a lot harder to implement teacher evaluation systems that work.”
Developing a systematic way to help teachers get better is the most powerful idea in education today. The surest way to weaken it is to twist it into a capricious exercise in public shaming. Let’s focus on creating a personnel system that truly helps teachers improve.
His foundation has invested tens of millions of dollars in its Measures of Effective Teaching project to come up with comprehensive evaluation systems, which in addition to test scores are supposed to include other factors, such as videotapes of teachers in the classroom, surveys of students to get feedback on their teachers, and surveys of teachers about working conditions in their schools. None of the latter were present in the New York City evaluations.
Gates wrote the Washington Post article days before the release of the rankings in New York. In an op-ed piece in the New York Daily News literally on the eve of their release, Schools Chancellor Dennis Walcott also attempted to head off the misuse of the rankings by the media.
Teacher Data Reports were created primarily as a tool to help teachers improve, and not to be used in isolation.
I’m deeply concerned that some of our hardworking teachers might be denigrated in the media based on this information. That would be inexcusable. Ultimately, each news organization will make its own choices about how to proceed, and this may result in teacher names appearing in the paper or on media websites.
Although we can’t control how reporters use this information, we will work hard to make sure parents and the public understand how to interpret the Teacher Data Reports.
I hope news organizations will report on the data responsibly and treat our teachers with respect.
He went on to say that:
The most important thing you should know is that these reports don’t tell the full story about a teacher’s performance. They provide one important perspective on how well teachers were doing their most important job — helping students learn — using a method called “value-added” that has been found to predict a teacher’s future success better than any other technique.
But the reports were never intended to judge a teacher’s overall success in the classroom. No single measure can do that — whether it’s value-added data, the results of a classroom observation or anything else.
Most but not all of the media heeded Walcott’s entreaties. Within 24 hours the New York Post had singled out the “worst teachers” in New York, focusing specifically on 37-year-old Pascale Mauclair, a 7th grade English as a Second Language teacher in Queens — just the public shaming that Gates had opined against.
To write their story, reporters raced to Mauclair’s parents’ home to try to find her. They then reached Mauclair at her home in what one online post described as a “private housing development,” and rang her doorbell repeatedly. She declined to talk with to the reporters and, according to some online reports, had to call the police twice to have the reporters removed.
An article appeared the next day with the headline “They’re doing zero, zilch, zippo, for students” focusing on the 16 teachers with the lowest value-added rankings on the list. Referring to Mauclair, the first sentence read, “When it comes to teaching math, she’s a zero.” This was followed by a Sunday story that described Mauclair as “the city’s worst teacher,” along with a photograph taken from a school yearbook.
The newspaper’s characterization, based on her lowly value-added ranking, was diametrically opposite to how her principal viewed her actual performance. “I would put my own children in her class,” she said.
As Stanford education professor Linda Darling-Hammond, recently appointed to the California Teacher Credentialing Commission by Gov. Brown, explained in an article in Education Week
Mauclair is an experienced and much-admired English-as-a-second-language teacher. She works with new immigrant students who do not yet speak English at one of the city’s strongest elementary schools. Her school, PS 11, received an A from the city’s rating system and is led by one of the city’s most respected principals, Anna Efkarpides, who declares Mauclair an excellent teacher.
Discrepancies like these between what the rankings purportedly show, and the experience of parents and colleagues of a particular teacher, have a bearing on whether the rankings are reliable measures of teachers’ performance, which in turn should be considered when deciding whether they should or should not be published in the first place.
The rankings were designed “to show how much progress individual teachers helped students make in reading and math over the course of a year,” explained New York school administrators on the district’s website. After taking into account a range of characteristics about the student and the school, the value-added methodology is supposed to be able to predict how a particular student should do the following year.
The New York rankings did take into account a number of student characteristics, such as their racial or ethnic background, English learner status, whether they had attended summer school, been retained, or were in special education. The methodology also took into account a number of “classroom characteristics” such as the percentage of low-income students, the number of absences, the number of students retained a grade, and the percentage of new students to the school.
But a paper published in late February and written by several of California’s leading education researchers, including Stanford’s Darling-Hammond and her colleague Ed Haertel, contributed to the already substantial critiques of the methodology:
Current research suggests that value-added ratings are not sufficiently reliable or valid to support high-stakes, individual-level decisions about teachers. Other tools for teacher evaluation have shown greater success in measuring and improving teaching, especially those that examine teachers’ practices in relation to professional standards.
As Darling-Hammond and her fellow researchers noted in the Phi Delta Kappan article, a teacher’s effectiveness is determined by numerous school and non-school factors that a “value-added” analysis typically doesn’t, or can’t, take into account. These might include variables such as the impact of peer culture, students’ prior teachers and schools, their peer culture, differences in summer learning loss, access to tutors, and even the kinds of tests used to measure achievement.
Some of California’s leading supporters of linking how students do in the classroom to teacher evaluations expressed doubts about media publication of rankings like those in New York and Los Angeles. Last November, EdVoice, a Sacramento-based advocacy organization, along with several others, filed suit against the Los Angeles Unified School District, to force it to implement teachers evaluations tied to measures of student academic “growth.” The lawsuit alleges the district is violating state law (the little known 1972 Stull Act) by not tying teacher evaluations to “student progress.”
But Bill Lucia, EdVoice’s executive director, said just publishing how teachers ranked based on the test scores of their students represents “an incomplete measure” of a teacher’s abilities. “It is certainly not clear what the purpose would be to provide one data element in an evaluation,” he said.
So far, the message from the Obama administration as to whether this practice is appropriate or helpful to the cause of improving teacher effectiveness remains fuzzy.
When the Los Angeles Times released its rankings, also using value-added methodology, Secretary of Education Arne Duncan applauded the move, saying of teachers “What’s there to hide?”
“The truth is always hard to swallow, but it can only make us better, stronger and smarter,” he asserted in a speech in Little Rock, Ark. shorty after the Los Angeles release. “That’s what accountability is all about — facing the truth and taking responsibility.”
He has since softened his position, saying that teacher evaluations should be based on “multiple measures” but, nonetheless, how the data was released was a “local decision.” He has yet to take a clear position against the publication of value-added rankings.
At its core, said Marshall Smith, former senior counselor to Duncan, and undersecretary of education during the Clinton administration, the purpose of evaluations should be to improve teacher effectiveness. “They should be aimed at getting teachers to improve, not to embarrass people,” he said. “It should be to give them support. If you have a teacher who is real disaster in a classroom, and the principal hasn’t noticed, then that principal needs to be fired.
Smith said the rankings should be part of a teacher’s formal evaluation. By so doing, districts would not be allowed to release them, because they would be in teachers’ personnel files.
That is the case in the District of Columbia, where teacher evaluations based on multiple measures including student test scores were introduced under Michelle Rhee, the former chancellor of the Washington D.C. public schools.
Rhee has made new teacher evaluations a major part of the reform agenda of her Sacramento-based StudentsFirst organization. She believes that 50 percent of a teacher’s evaluation should be based on student test scores, using “value-added” methodology.
In San Jose last week, she argued for having report cards for teachers “so parents can understand the strengths and possible weaknesses of their child’s teachers.”
But she disagreed with making public just one piece of the evaluation, as was done in New York. Like Smith, she said the goal should be to help teachers to improve.
A teacher should be evaluated based on multiple measures, and value-added gains by their children are one important aspect to a teacher’s evaluation, but it doesn’t necessarily make sense to make public just one piece of the puzzle without giving parents and the public a broader context for the performance of that teacher.
Accountability is important, and teachers should want to know how effective they are in getting gains in student performance. That information is important to have and to use. If we frame how it is going to be used punitively, it distracts us from the conversation that we really need to be having, that it is very valuable information for people to have to become better professionals.
Meanwhile, In New York the teacher ratings system continues to stir controversy — and complex responses.
“These ratings should never have been made public,” said Diana Agosta, who teaches English as a Second Language teacher at the Pablo Neruda Academy, a small public high school in the Bronx. She did not get a rating herself, because only 4th through 8th graders participated in what was billed as a pilot study. Agosta, who also has a Ph.D. in anthropology, said in an interview with EdSource that the rankings were “very, very flawed.”
She has some empathy for Pascale Mauclair, the teacher singled out by the New York Post, who was also an ESL teacher.
She noted that immigrant students are supposed to take state tests within a year of coming to the United States — when they may not even have been literate in their home country. Another complication is that many students can take tests in Spanish, but students from other language backgrounds have to take it in English, using a glossary of words in their own language. She said New York’s ranking system doesn’t take into account these and numerous other factors.
But she also said that if a way were devised to come up with “reliable” teacher ratings, they should be made public. “As a parent, I want to know as much about my child’s school and teachers,” she said. But “it can’t just be something administrators have come up with just to have a number” to rank teachers.