Stanford professor finds Michelle Rhee's teacher evaluation system was effective

Michelle Rhee ~ Photo courtesy of Neighborhood Centers

Michelle Rhee. Credit: Neighborhood Centers

Score one for Michelle Rhee and performance pay.

A study released Wednesday of the controversial teacher evaluation system that Rhee initiated when she was chancellor of the District of Columbia Public Schools has found that both its threats of dismissal and big pay incentives worked as intended. Within its first three years, the system led to increases in the retention and the performance of effective teachers while encouraging ineffective teachers either to quit or improve.

The research, co-authored by Professors Thomas Dee of the Stanford Graduate School of Education and James Wyckoff of the University of Virginia, is one of the first studies to show a positive impact of offering more money to teachers who perform better. As they acknowledge, most of the research “raises considerable doubt about the promise of teachers’ compensation-based incentives as a lever for driving improvements in teacher performance.” Especially when the pay incentives were linked to increasing test scores alone, “it may be that teachers generally lack the willingness (or, possibly, the capacity) to respond to incentives that are linked narrowly and exclusively to test scores,” Dee and Wyckoff wrote.

Washington, D.C.’s IMPACT system also gave heavy weight – 50 percent of a teacher’s evaluation score – to valued-added test scores of students who took the district’s standardized tests. But only 17 percent of the district’s teachers taught subjects that were tested. The evaluations of the other 83 percent of teachers were based on multiple factors, including five observations by principals and master teachers of classroom management and instruction, a teacher’s impact on the school community and other measures of student achievement not involving standardized tests. Three other elements distinguished IMPACT from other pay-for-performance programs, the researchers said:

  • Strong incentives of dismissals after two straight “minimally effective” reviews and substantial monetary rewards, including first-year bonuses of up to $27,000 and permanent raises of as much as $25,000 for two consecutive “highly effective” ratings;
  • Instructional coaches to assist teachers in improving performance;
  • Recognition that the system would be neither small-scale nor temporary.

“IMPACT was not cash for test scores,” Dee said Wednesday. “It was based on multiple measurements and powerful incentives – a jump of five years on the salary schedule for those twice rated highly effective – that dwarf other programs.”

The research concentrated on those teachers who would be most likely to be motivated by IMPACT: those teachers facing the prospect of dismissal for a second consecutive “minimally effective” rating and borderline “effective” teachers motivated to become “highly effective.” The study attributed the higher rates of attrition of minimally effective teachers and higher rates of retention of highly effective teachers to IMPACT. A significant percentage of minimally effective teachers who didn’t quit saw improved evaluation scores.

“We think there should be new and different ways of assessing teachers,” Dee said. “IMPACT may not necessarily be the best way and the final word, but it does provide early important evidence.”

Dee acknowledged that the study didn’t focus on the vast majority of teachers rated effective whose jobs were not in jeopardy and may not have been motivated by IMPACT’s incentives. “For teachers square in the middle, not close to minimally or highly effective, reform might have washed over them,” he said.

Rhee resigned as chancellor of D.C. Public Schools in the fall of 2009, the first year that IMPACT went into effect, to head StudentsFirst, a Sacramento-based education nonprofit. Since 2010, 500 teachers with “ineffective” ratings on evaluations have been dismissed. There were investigations into allegations that teachers in some schools, motivated by the possibility of big bonuses or fear of dismissal, changed students’ test scores.

Dee said that most of the cheating charges surfaced a year before IMPACT appeared and that the small number of teachers identified with the potential violations were excluded from the study.

Dee and Wyckoff published “Incentives, Selection and Teacher Peformance: Evidence from IMPACT” as a working paper for the National Bureau of Economic Research.


Filed under: Evaluations, Pay and Tenure, Teaching

Tags: , , ,


Leave a Comment

Your email address will not be published. Required fields are marked *

Comment Policy

EdSource encourages a robust debate on education issues and welcomes comments from our readers.

  • To preserve a civil dialogue, writers should avoid personal, gratuitous attacks and invective.
  • Comments should be relevant to the subject of the article responded to.
  • EdSource retains the right not to publish inappropriate and offensive comments.
  • EdSource encourages commenters to use their real names. Commenters who do decide to use a pseudonym should use it consistently.
  • Please limit comments to 250 words to prevent comment clutter; if you intend to say more please link out to a place that contains your full comment.
  • Comments with more than one link automatically enter moderation. Comments from new commenters are automatically moderated.
  • Repeated violation of this comment policy will lead to a warning. Continued violations will lead to a ban.

29 Responses to “Stanford professor finds Michelle Rhee's teacher evaluation system was effective”

EdSource does not track who "likes or dislikes" a comment. We only track the number of likes and dislikes.

  1. skabetti on Nov 21, 2013 at 6:10 pm11/21/2013 6:10 pm

    • 000

    But how many teachers that succeeded falsified scores?

  2. Mike Funn on Oct 19, 2013 at 5:58 pm10/19/2013 5:58 pm

    • 000

    Questions: Why didn’t Rhee stay a teacher? Why didn’t Rhee stay a Superintendent? Where is Rhee now? In California. Where was the study done? California.
    Salem Witch Hunt- walk around the colony with a flame;, burning down houses and its citizens in the name of good.
    Merit pay=Cheater Pay
    Successful business-Tomorrow’s failure

    No art, no music, no dance. Just testing.

  3. Ken Curtis on Oct 18, 2013 at 8:16 am10/18/2013 8:16 am

    • 000

    The research is flawed. Why do we assume that the evaluators are masters in their fields? Why not assume that teachers are effective?
    Let’s remember if there are incompetents in teaching, administrators put them on the staff. Merit pay is divisive. It pits teachers against teachers, parents against parents, and perceptive students against each other, wondering why they didn’t have the high performance teacher. If I had a son or daughter in a school that adopted merit pay with highly effective, satisfactory, or lacking in effectiveness I would be in the Principal’s office at the start of the school year wanting to know: “Does my child get the merit teacher, the demerit teacher, or just plain teacher?”
    The most important evaluation is the one that takes place at the time of decision to hire. Teachers are for the most part, excluded from this decision-making. It may surprise some that teachers do not want incompetents as colleagues. Look at Finland, a country we admire for its educational system. The National Board of Education and the teachers union collaborate on the hiring and retention of teachers. Here, folks like MR are more interested in attacking unions. She is still a phony reformer in my book.


    • el on Oct 18, 2013 at 5:28 pm10/18/2013 5:28 pm

      • 000

      In our district, teachers are part of every interview panel for a new teacher. I can’t imagine not doing that.

  4. TheMorrigan on Oct 18, 2013 at 7:47 am10/18/2013 7:47 am

    • 000

    Rhee modeled her performance pay system off the failed IBM model of stack-ranking. What you have is a cannibalistic system that does not cultivate an environment of innovation, collaboration, improvement, and retention. These are all items that are integral to schools. At first, the Rhee system appears to work quite effectively; simply put, stack-ranking is effective for removing bloat. However, given enough time, the system begins to eat away the good parts of itself.

    There are reasons why DC’s IMPACT has been modified since Rhee’s tenure there. There are reasons why fields of cheating bloomed a year after IMPACT’s implementation. And there are reasons why the DC performance pay system no longer offers big bonuses any more.

    I suppose if we lived in some awkward parallel universe where we could poorly evaluate parts of the whole and we ignored all of the changes since Rhee it is “Score one for Michelle Rhee and performance pay.”

  5. el on Oct 17, 2013 at 10:16 pm10/17/2013 10:16 pm

    • 000

    Bruce Baker wrote on this today also:

    This brings us to the present study on the DC Impact teacher evaluation system. Here, the researchers identified teachers who were really no different from one another statistically on their DC Impact ratings, but some were just a few fractions of a point low enough to be labeled as Ineffective and face threat of dismissal, and others just high enough to be out of the woods for now. That is, there really aren’t any substantive observed quality differences between these two groups. Note that the researchers studied this at the high end of the ratings distribution as well, but didn’t really find as much going on there.

    Put simply, what this study says is that if we take a group of otherwise similar teachers, and randomly label some as “ok” and tell others they suck and their jobs are on the line, the latter group is more likely to seek employment elsewhere. No big revelation there and certainly no evidence that DC Impact “works.”

  6. Michael on Oct 17, 2013 at 9:18 pm10/17/2013 9:18 pm

    • 000

    You’ve all been had. Here’s how the “study” begins- on the first page!

    “NBER working papers are circulated for discussion and comment purposes. They have not been peer-reviewed or been subject to the review by the NBER Board of Directors that accompanies official
    NBER publications.”

    For EdSource to pass this off as some kind of legitimate report is ludicrous.

  7. Carol on Oct 17, 2013 at 9:08 pm10/17/2013 9:08 pm

    • 000

    The NYT ran an article about this and got blasted for it. It was for a peer-reviewed publication but had yet to be peer reviewed.


    • John Fensterwald on Oct 18, 2013 at 10:55 am10/18/2013 10:55 am

      • 000

      Thanks, Carol. I should have been explicit that the study had not yet been peer-reviewed.

  8. StudentsLast on Oct 17, 2013 at 9:04 pm10/17/2013 9:04 pm

    • 000

    So teachers who were threatened with dismissal saw their test scores go up? And then magically they went up. There’s a word for that: cheating.

  9. Paul on Oct 17, 2013 at 7:27 pm10/17/2013 7:27 pm

    • 000

    Phenomena ascribed to IMPACT almost certainly existed before. We have no way of knowing, because the study coincided with the launch of IMPACT and the researchers made no attempt to study prior conditions and draw comparisons.

    For example, the report mentions again and again that teachers resigned (i.e., left voluntarily) after receiving a “minimally effective” rating. Isn’t it natural to resign when you haven’t been successful in your job? There was a baseline level of teacher attrition before IMPACT, and presumably, some of those “leavers” were low performers.

    The report points out that the system average teacher effectiveness score increased during the three-year study. This result is only meaningful if we assume that teacher effectiveness was static or declining before IMPACT. Isn’t it natural for individuals, groups of individuals, and systems to strive for improvement? Were DC teachers stupid, lazy and ineffective until Michelle Rhee changed the evaluation system?

    My last criticism is that the study presumes that IMPACT teacher effectiveness scores are a valid measure of teacher effectiveness. I’ll address only one part of that big debate.

    As is fashionable, the authors point out that old teacher evaluation systems were binary, and that most teachers received “satisfactory” ratings. The authors give the percentages of “minimally effective”, “effective” and “highly effective” teachers under IMPACT. They also point out that observations play an important part of a teacher’s score.

    As it was politically unacceptable to continue rating almost all teachers “satisfactory”, what alternative guidance was handed down to principals and peer evaluators under IMPACT? IMPACT was designed to classify more teachers as “minimally effective”. Presumably, Michelle Rhee punished evaluators who granted too many “effective” ratings. A motive to classify more teachers as ineffective does not make more teachers ineffective in fact.

    The authors don’t seem to wonder whether “satisfactory” might have meant just that. Is there any profession in which a sizable percentage of the practitioners are prima facie “unsatisfactory”?

    Regarding the comments about medicine, doctors comprise a self-regulated professional group. Doctors elect their own leaders to licensing and discipline bodies. Teachers have never had control of their licensing or discipline bodies. Moreover, with exception of doctors in the Veterans Administration, in the former state mental health hospitals, and in municipal public hospitals and public health departments, the phenomenon of working for a corporation was unheard of before World War II and remained uncommon until the rise of HMOs in the 1980s. Direct self-supervision, i.e., standalone private practice, is still possible for doctors, where it has never been possible for public school teachers.

  10. Christopher on Oct 17, 2013 at 4:21 pm10/17/2013 4:21 pm

    • 000

    Kaiser. Doctors have admin evaluating them. The state doesnt do it (although there are mechanisms there for really bad actors) because the hospitals arent public institutions. Schools are public and therefore governed by state/local district board oversight. Does this really need to be outlined?

    Point is, name me a profession — especially one as important as teachers — where employees are not evaluated and to some degree held accountable to results.

    If teachers do not have this kind of oversight in CA. The law and local unions preclude it. Why do you think we are having this debate?


    • Gary Ravani on Oct 17, 2013 at 5:52 pm10/17/2013 5:52 pm

      • 000

      “If teachers do not have this kind of oversight in CA. The law and local unions preclude it. Why do you think we are having this debate?”

      Really? So what is the Stull Bill about? Every LEA in CA has a contract with evaluation likely as a part of it. In many, if not most, cases evaluation is based on the CA Standards for the Teaching Profession. In what way is evaluation “precluded” by anyone?

  11. Christopher on Oct 17, 2013 at 4:03 pm10/17/2013 4:03 pm

    • 000

    My doctor gets evaluated on performance/results. Not sure where you get healthcare. 😀


    • el on Oct 17, 2013 at 4:10 pm10/17/2013 4:10 pm

      • 000

      Um, the United States! Who in your universe does performance evaluations of doctors in private practice?

      Teachers always have administrative supervisors who have oversight. Doctors frequently are their own oversight. The state does not come in and insist that a set of numbers be met for doctors.

    • Ken Curtis on Oct 18, 2013 at 8:26 am10/18/2013 8:26 am

      • 000

      Remember the old line?: Mrs. Cohen, your check came back.
      Mrs. Cohen: So did my arthritis.

  12. Christopher on Oct 17, 2013 at 3:46 pm10/17/2013 3:46 pm

    • 000

    Hannah K.: What if your doctor told you “We are the medical experts, so we need no performance evaluations nor accountability.” Would that make you feel comfortable next time you showed up for a check-up?

    And get the “meddling parents” out of the classroom? Seriously? Teachers work for the parents. They have a right to know what their child is learning and how well it is being taught.

    All professions have evaluations — teachers are humans too and should not be immune from accountability.


    • el on Oct 17, 2013 at 4:00 pm10/17/2013 4:00 pm

      • 000

      Um… last time I looked, that is how American medicine works. :-)

    • Sonia on Nov 18, 2013 at 6:25 pm11/18/2013 6:25 pm

      • 000

      Agree with Cristopher. I cannot believe that Hannah K is an educator. People that thinks that way about “their” job, should be dismissed.

      Hanna K: Teachers are paid by the taxpayer money, which are 9 out of 10 times the PARENTS! Children do not need ignorant educators. Also, teaching is a profession,such as Medicine and Law. If the person chooses to be an educator, that person made a conscience choice about their profession, pay rate and expectations.

  13. Gary Ravani on Oct 17, 2013 at 3:24 pm10/17/2013 3:24 pm

    • 000

    Allow me to begin with a quote, from this site, from a Louis Freedberg on John Merrow:

    ‘What matters much more is what she (Rhee) failed to accomplish in Washington. She espoused a certain approach to reforming failing schools, a path that she and her successor have followed for six years, and that approach has not worked. That’s the central point: Rhee’s ‘scorched earth’ approach of fear, intimidation and reliance on standardized tests scores to judge (and fire) teachers and principals does not lead to improved schools, educational opportunities, graduation rates or any of the other goals that she presumably embraces.s inquiries into the cheating scandal in the DC schools.”

    The authors of the study dismiss these critical cheating issues in a pretty cavalier way. I think the Merrow quote puts the study into an appropriate context. The study is still based on tests we know to be narrow and superficial, and uses VAM that every legitimate educational researcher has thoroughly debunked. The concept of “merit pay,” for despite the assertions of the studies authors that is what this is, has repeatedly been tried and abandoned and, again, debunked.


    • el on Oct 17, 2013 at 3:59 pm10/17/2013 3:59 pm

      • 000

      To some extent the assertion is a tautology, and obvious. “We identify these teachers as effective. We pay them enormous bonuses. We find that the people we pay these bonuses to are more likely to stay than those passed over for bonuses.”

      There is one rather enormous flaw though… nowhere is it actually validated that the “highly effective” teachers were accurately identified… nor does it validate that there was a newly positive effect for students.

      So it kind of misses the point of why we do all this.

      (and that’s aside from the elephant of scalability.)

      At the end of the day, kind of glossed over in the article is that Rhee resigned because her boss got trounced in an election, an election where her governance of the schools was a key issue. The community rejected her and wanted her gone. I think anyone looking to hire her, to enlist her organization, or to take her advice consider that extremely carefully before proceeding.

  14. RelDog on Oct 17, 2013 at 1:01 pm10/17/2013 1:01 pm

    • 000

    There are no objective evaluators for these issues.

    Stanford is not a valid objective source. They are a conservative think tank that gives degrees. The same could be said for others like Cal which is a liberal think tank that gives degrees.

    We need a verifiably objective non-partisan source to make these evaluations. Good luck with that!

  15. el on Oct 17, 2013 at 10:37 am10/17/2013 10:37 am

    • 000

    That said, though, I’ll highlight this commentary on the current status of Value-Added analysis and its reliability in measuring teacher effectiveness:

    “I looked at New York City value-added findings when the teacher data were released a few years back. I would argue that the New York City model is probably better than most I’ve seen thus far and its technical documentation reveals more thorough attempts to resolve common concerns about bias. Yet, the model, by my cursory analysis still fails to produce sufficiently high quality information for confidently judging teacher effectiveness.

    Among other things, I found that only in the most recent year, were the year over year correlations even modest, and the numbers of teachers in the top 20% for multiple years running astoundingly low.”

  16. el on Oct 17, 2013 at 10:32 am10/17/2013 10:32 am

    • 000

    If the State will fund us for an additional $25,000 a year for every teacher that we honestly evaluate as highly effective, with the goal that every teacher should be honestly so rated, sign us up.

    I’ll note in our district this would be a 50% raise for our median teacher.

  17. Hannah Katz on Oct 17, 2013 at 10:25 am10/17/2013 10:25 am

    • 000

    I find the Rhee evaluation offensive to professional educators. I heard someone say that accountability is something no one enjoys, but which everyone needs. I disagree. We are the education experts, so we need no evaluations nor accountability. All the education profession needs is more generous funding. Lots more. Then let each teacher do what she feels is best and LEAVE US ALONE! Get the meddling parents and other non-professionals out of my classroom!


    • CCB on Oct 18, 2013 at 12:18 pm10/18/2013 12:18 pm

      • 000

      Forgive me, but are you aware of how bizarre this request is? I have huge respect for teachers and the work they do–my parents were teachers; I grew up surrounded by the teacher community and substitute taught myself. I will be the first to say that they are underpaid and under-appreciated in many contexts. But there is no profession in the world in which one isn’t subject to some form of either accountability or else market economics. If your product sucks, people will know, and no one will buy it. Or if you are a lousy employee and fail to achieve *reasonable* goals, you will get fired. I don’t think it’s crazy to think that teachers, who despite lousy pay still have enormous flexibility within the classroom AND comparative job security, should get measured on some level and held to some standard of accountability. We can debate what those measures and standards are, to be sure. But to ask for complete freedom from any oversight or consequences, when you’re teaching the next generation of citizens? Give me a break.

      • Jimmy B. on Nov 27, 2013 at 5:31 pm11/27/2013 5:31 pm

        • 000

        Agreed. Teachers should be evaluated. The question is how. You said there is no profession in the world in which one isn’t subject to some form of accountability right? So let’s evaluate a doctor on how well his patient sticks to his diet. Ok? Or let’s base a lawyer’s performance on whether his client gets thrown back in jail. I suggest that evaluating a teacher on how well his (or her) student performs on standardized tests is very similar when you consider the socio-economic forces in play here. Simply put, if a teacher’s students come from modest means and go to school hungry and/or aren’t confident in their living arrangements? Then to suggest that their teacher has a fighting chance of teaching them something that they’ll remember and be able to use during a standardized test? That’s as ludicrous as my suggestions for evaluating doctors and lawyers.

        • Norma K on Dec 3, 2013 at 10:18 am12/3/2013 10:18 am

          • 000

          AMEN! My daughter is a teacher with a Masters Degree. This is her 23rd yr. of teaching. She has taught every elementary grade…K-5, English as a special language, has been called in on many evaluations of students and also has had many student teachers. She is a single mom. She has one biological son and has adopted 2 more little boys that were foster children.Her own boys are in parocihal school, with alot of financial help, I might add. They are there because she knows the time a GOOD public school teacher has to devote to them. This yr. she has 28 kindergardners, a new principal that had been in the classroom for only 2yrs. Believe me, she is not there for the acolades(she has plenty of them) She is not there for the money! She is there to be treated as a professional, respected by parents, and the love of children. If we told a dentist…Your career will be based on how many children return with no cavaties. Do you honestly believe he can control that? Until we get the parents to care and be responsible no amount of money, testing etc. will help. We need to restore the family first. Believe me in parochical school the parents are involved! They have some money invested and they don’t always have the BEST of the Best teachers but those kids succeed! How many years did Ms. Rhee spend in the public school classroom?

  18. navigio on Oct 17, 2013 at 7:39 am10/17/2013 7:39 am

    • 000

    Regardless of the problem of looking at only a subset of teachers in the study, it should be noted that a $25k pay increase is over 35% of the average teacher salary in CA. And of course the goal is for every teacher to be effective. If memory serves, I think MR even had to secure private funding to fund this program.

Template last modified: