Credit: Alison Yin for EdSource Today
Fifth graders take a reading test.

When President Barack Obama declared that “unnecessary testing” is “consuming too much instructional time” and creating “undue stress for educators and students,” it was another sign that the dominant strategy over the past 15 years to use standardized tests to hold children and schools “accountable” in education reform may have reached a tipping point.

California is on course to have a major impact on reshaping the national discourse – and practice – on this issue. The state is in the middle of devising a new accountability system, a massive and complex undertaking in a state as large and diverse as California, that is intended to go far beyond a narrow preoccupation with test scores.

President Obama’s recent anti-testing pronouncements are especially significant because using test scores as the dominant measure of school and student progress has been central to his K-12 education reform agenda.

Arne Duncan, Obama’s departing secretary of education, acknowledged the administration’s contribution to the problem. “It’s important that we’re all honest with ourselves,” he said. “At the federal, state and local level, we have all supported policies that have contributed to the problem in implementation. We can and will work with states, districts and educators to help solve it.”

By contrast, Gov. Jerry Brown has been consistent in challenging the role of testing – and has clashed repeatedly with the Obama administration on this issue, even before he returned to the governorship in 2011.

Brown likes to recount what was apparently a seminal experience while he was a student at St. Ignatius College Prep in San Francisco, when the only question on an exam asked students to give their impressions of a green leaf.

“Still, as I walk by trees, I keep saying, ‘How’s my impression coming? Can I feel anything? Am I dead inside?’ So, this was a very powerful question that has haunted me for 50 years.”

The point, Brown says, is that “you can’t put that on a standardized test. There are important educational encounters that can’t be captured by tests.”

State education leaders have echoed Brown’s deep skepticism about the excessive use of standardized tests.

“We must always be mindful that time spent testing generally comes at the expense of time our students would otherwise have spent gaining the very knowledge and skills that are the goal of education,” State Superintendent of Public Instruction Tom Torlakson declared three years ago in a report to the state Legislature on Transitioning California to a Future Assessment System.

Torlakson noted that many countries that “lead the world in achievement place little or no emphasis on standardized testing.” When they do test, he said, “they use more open-minded measures, sparingly and strategically, and often sample students rather than testing every child.” He suggested that if the federal government weren’t requiring it, California would do even less testing than it is currently doing.

Other prominent California education leaders have also been at the forefront of questioning how tests have been used  in the national education reform agenda. Most significantly they include Linda Darling-Hammond, the president of the Learning Policy Institute, who is also Brown’s appointee as chair of the California Teacher Credentialing Commission. Two-and-a-half years ago Darling-Hammond took aim at what she calls the “test and punish” approach to accountability. “Without major changes, we will, indeed, be testing our nation to death,” she wrote.

But California has done more than talk about the issue.

The state has suspended – and is considering permanently abolishing – the Academic Performance Index, which for 15 years ranked schools based almost entirely on the test score results of students.

This past summer the Legislature suspended the California High School Exit Exam, at least for the next three years – and has even told districts to award diplomas retroactively to students who did not pass the exam and were denied a diploma because of it during the decade the exit exam was in place.

Also gone, for now, are standardized tests in 2nd-, 9th- and 10th-grade math and English language arts, end-of-course math tests in Algebra I, Algebra II, geometry, general math and integrated math; all history tests; and end-of-course tests in high school in biology, chemistry, physics and integrated science. 

One unresolved question is whether California will permanently eliminate these end-of-course standardized tests permanently or whether they will be replaced with ones that are aligned with the Common Core standards.

For now at least, the only standardized tests left that are administered by the state are the Smarter Balanced tests in math and English language arts, which all students in 3rd through 8th grade and 11th grade are expected to take. Students still take a science test in 5th, 8th and 10th grade because they are required to do so under the No Child Left Behind law. (Students with special needs take a variety of tests designed to take into account their specific disabilities)

What makes what is happening in California especially interesting is that the state is not reflexively against tests in general. In fact, California is a leading backer of the Smarter Balanced assessments aligned with the Common Core – the very same tests that have fueled vehement anti-testing sentiments in some other states, most notably in New York.

That’s because strong backers like Darling-Hammond have argued that the assessments are significantly improved compared to the old multiple-choice tests, measure deeper learning skills, and have the potential to actually drive classroom instruction, not just be used to measure how well or how badly schools or students are doing. California has also prevented Smarter Balanced from becoming a lightning rod for opposition by resisting pressures from the Obama administration to use test scores to evaluate teacher effectiveness.

So rather than being against all tests, the state is moving toward establishing a much broader accountability system, of which tests – improved ones, according to proponents – will comprise just one part. In California, the new accountability system will be based on “multiple measures” rooted in eight “priority areas” established by the state in the 2013 Local Control Funding Formula law championed by Brown.

In addition to scores on the Smarter Balanced tests, these could include measures of middle and high school dropout rates, attendance rates, absenteeism and graduation rates, parent engagement, and “school climate,” as revealed in suspension and expulsion rates and student surveys.

Furthest along in developing a new “multiple measure” accountability system are the six CORE districts, which are  developing a School Quality Improvement Index that could inform what will happen in the state and nationally on this hugely complex task.

By March 2016, Torlakson must present his recommendations for a comprehensive assessment system to the State Board of Education, so the next few months will be crucial in shaping where California as a whole will end up on this issue.

Torlakson is being advised by an “Accountability and Continuous Improvement Task Force” which is mandated by state and is co-chaired by Eric Heins, the president of the California Teachers Association, and Wes Smith, executive director of the Association of California School Administrators. The 29 member task force includes many of the state’s most prominent education leaders.

All this is taking place as Congress, after years of gridlock on the issue, appears to be moving to replace the No Child Left Behind law with one that will move the nation distinctly in the direction California is already going. As task force member David Plank, executive director for Policy Analysis for California Education, said, “There is general agreement that California is in a position to lead, and to set a new course not only for the state but for accountability in general.”

SHARE ARTICLE

Comments

Leave a Comment

Your email address will not be published. Required fields are marked *

Comments Policy

The goal of the comments section on EdSource is to facilitate thoughtful conversation about content published on our website. Click here for EdSource's Comments Policy.

  1. Henry E. Fourcade 11 months ago11 months ago

    Like the governor, I was in the SI class of 1955, and am very much in agreement with his views. The best tests I took through my education from grammar school through licensing as an MD and Specialty Board exams were the Blue Book essay tests and the oral exams, now seemingly abolished at all levels. In contrast, the standardized multiple choice tests discourage and remove the pleasure of learning, organizing one's knowledge, and exploring … Read More

    Like the governor, I was in the SI class of 1955, and am very much in agreement with his views. The best tests I took through my education from grammar school through licensing as an MD and Specialty Board exams were the Blue Book essay tests and the oral exams, now seemingly abolished at all levels. In contrast, the standardized multiple choice tests discourage and remove the pleasure of learning, organizing one’s knowledge, and exploring how deep that knowledge really goes. Instead, “good” multiple choice tests are crafted to produce Bell curves and to fail a pre-set percent of students. A good education is not produced by memorizing such cook-book questions where a certain % always must rank as failures. The old way was more subjective, and required a lot of teacher time to produce a valid score, but it gave much better results in my experience. Good going, Jerry!

  2. Todd Maddison 1 year ago1 year ago

    Agree with this completely. Why would we want to know how well our educational systems are working to teach our children, until they reach college? After all, SAT scores have been declining for decades while at the same time the cost of our educational system has been increasing exponentially - what interest do we all have in finding out whether we're actually getting any value for our investment? Why would we want to know … Read More

    Agree with this completely. Why would we want to know how well our educational systems are working to teach our children, until they reach college?

    After all, SAT scores have been declining for decades while at the same time the cost of our educational system has been increasing exponentially – what interest do we all have in finding out whether we’re actually getting any value for our investment?

    Why would we want to know where that system is failing with enough lead time to make changes in the systems (or personnel) before our children are out in the “real world”, where performance tests happen every day? Why not just let everyone do what feels good to them, and assume that’s working?

    Better to focus on the impressions our kids have of green leaves, instead of the math, science, and english skills that the real world considers keys to success.

    After all, if we find out that investing more money into the system is not producing better results, we might hold the people we hire to spend that money accountable, and we most certainly can’t have that happening.

    I mean, it’s not like the scientific method – testing the results of every experiment with the goal of identifying success or failure and then ensuring repeatability – has ever resulted in any significant gains in human knowledge, has it?

    In the immortal words of Albert Einstein, “Don’t bother checking my math, just trust me – I know my theory of relativity is true….”

    Replies

    • SD Parent 1 year ago1 year ago

      Todd said it. Why is it that the education power brokers seem to forget that students who want more than a high school diploma will have to engage in a number of high stakes tests in high school to attend college? Have they not heard of the SAT, ACT, and APs? I was no fan of elementary school students taking CSTs annually--in fact, standardized testing in elementary school in general should be … Read More

      Todd said it.

      Why is it that the education power brokers seem to forget that students who want more than a high school diploma will have to engage in a number of high stakes tests in high school to attend college? Have they not heard of the SAT, ACT, and APs? I was no fan of elementary school students taking CSTs annually–in fact, standardized testing in elementary school in general should be minimized–but once students reach middle school, we need to start preparing them for the world after high school, which does involve high stakes testing.

      The only way to effectively know whether students are learning and whether there are achievement gaps–the entire premise behind LCFF and the LCAP–is to have meaningful measures of their learning. Suspension and expulsion rates are NOT measures of learning; they are essentially measures of students’ respect for societal norms and the patience of the schools in dealing with students’ negative actions. Absenteeism, attendance, drop out, and graduation rates are NOT measures of learning; they are measures of student engagement. While all these measures are obliquely related to student learning in that students don’t learn what they are taught in school when they aren’t physically at school, do we really believe these measures hold schools accountable for student learning when the students are present?

      • Manuel 1 year ago1 year ago

        The problem is that your position, SD Parent, implies that these tests are "evidence of learning" or, as put by others, "evidence of being on grade level." As Doug stated elsewhere, neither the state nor the test designers have ever made such a claim. To that, I would add that when the distribution of test scores yields a Bell Curve, either by design or accident, that such a test is effectively irrelevant for proving academic achievement. … Read More

        The problem is that your position, SD Parent, implies that these tests are “evidence of learning” or, as put by others, “evidence of being on grade level.”

        As Doug stated elsewhere, neither the state nor the test designers have ever made such a claim.

        To that, I would add that when the distribution of test scores yields a Bell Curve, either by design or accident, that such a test is effectively irrelevant for proving academic achievement. All it shows is that the test takers can be “ranked and stacked.” Sure, you can do comparisons, but you better not believe that it is “evidence of being on grade level.”

        • Doug McRae 1 year ago1 year ago

          Manuel -- You do not reflect what I have said elsewhere accurately. To clarify, I have said neither the state nor the test maker claims the scores reflect "being on grade level." Grade level is a mythical concept inaccurately used by some when interpreting test scores, but scores on large scale tests are not designed to address that concept. But I have said that well constructed and administered large scale tests do serve as good … Read More

          Manuel —

          You do not reflect what I have said elsewhere accurately. To clarify, I have said neither the state nor the test maker claims the scores reflect “being on grade level.” Grade level is a mythical concept inaccurately used by some when interpreting test scores, but scores on large scale tests are not designed to address that concept. But I have said that well constructed and administered large scale tests do serve as good measures of achievement (or, as “evidence of learning”), both current status and trends over time.

          Re your contention that when distributions of test scores reflect the Bell Curve (or, the Normal Distribution statistical model to be more precise), such a test is irrelevant for measuring academic achievement — that contention is inaccurate as has been explained many times via EdSource comment exchanges over the years. Standards-Based tests are designed to measure academic achievement, and do so as well as the state-of-the-art permits if well constructed and administered, and the circumstance that results for large aggregates of students yield normal distributions (like many many other measures of human behavior) does not affect the validity of their measurement of academic achievement.

          • Manuel 1 year ago1 year ago

            Doug stated: "You do not reflect what I have said elsewhere accurately. To clarify, I have said neither the state nor the test maker claims the scores reflect “being on grade level.” Grade level is a mythical concept inaccurately used by some when interpreting test scores, but scores on large scale tests are not designed to address that concept. But I have said that well constructed and administered large scale tests do serve as good … Read More

            Doug stated: “You do not reflect what I have said elsewhere accurately. To clarify, I have said neither the state nor the test maker claims the scores reflect “being on grade level.” Grade level is a mythical concept inaccurately used by some when interpreting test scores, but scores on large scale tests are not designed to address that concept. But I have said that well constructed and administered large scale tests do serve as good measures of achievement (or, as “evidence of learning”), both current status and trends over time.”

            That’s a very carefully crafted statement which I believe is contradictory: If a test cannot tell whether or not a student has learned what they are supposed to learn (“being on grade level”) then it cannot provide “evidence of learning.”

            OTOH, you maintain that if the test is “well constructed and administered,” then it can provide “evidence of learning.” Since neither the state or the test maker refuse to make that claim, then the tests are either not well constructed or well administered. Or both.

            BTW, back in the good old days, if a student was “on grade level” as shown by her/his classroom marks, then s/he was passed on to the next grade. That doesn’t sound to me like a “mythical concept.” Just because no test maker wants to assert that their tests can measure this does not make it fiction.

            Doug said: “Re your contention that when distributions of test scores reflect the Bell Curve (or, the Normal Distribution statistical model to be more precise), such a test is irrelevant for measuring academic achievement — that contention is inaccurate as has been explained many times via EdSource comment exchanges over the years.”

            It is your professional and expert opinion that my contention is inaccurate. I can live with that because, after all, even if it is published in a peer reviewed journal does not mean it is true. But if I am incorrect, then why is it that the distribution of scores was the same regardless of the students’ classroom accomplishment as was demonstrated by an internal LAUSD “study” of the 2009 administration (I’d be happy to share my copy with you)? This unpublished analysis demonstrates that there is no connection between what the student learned as measured through traditional measures (i.e., classroom mark) and what the test maker claims the student has learned as determined from the student’s test score.

            Doug continued: “Standards-Based tests are designed to measure academic achievement, and do so as well as the state-of-the-art permits if well constructed and administered,”

            But that is exactly the problem: there is no guarantee that they are well administered under the same conditions across the state. And there is certainly no guarantee that they are well constructed since nobody (not even you these days) can peek behind the curtain. Up until my getting involved in this, I’ve never bothered to query my own children on their CST experiences. Once I did, I was horrified from what they told me.

            Doug finished with: “and the circumstance that results for large aggregates of students yield normal distributions (like many many other measures of human behavior) does not affect the validity of their measurement of academic achievement.”

            Indeed, there is a tendency for natural large systems to develop Gaussian distributions (best example: gases in thermal equilibrium). But that distribution in testing does not come about because the test is administered to a large number of students. Instead, it is part and parcel of the test design. Why else would the vendors who developed the STS tests explicitly told the SBoE to fully expect more than half the test takers to be below “proficient” before the tests were even fully administered (this astounding fact is part of the SBoE records discussing the STS on May 5-7, 2010)? This makes it clear that they are tailoring the tests to get that distribution.

            Yet, this entire thread is mathematically moot since, for instance, the CST test scores were scaled according to a formula that distorts the raw scores into a curve that resembles a Gaussian but, unfortunately, leaves wide blank areas between the samples to the right of the proficient cutoff point. It might look like a Gaussian but the “area under the curve” is certainly not given by a continuous integral, as it should if the math is done right. Since you don’t believe me, do it yourself with the CST raw and scaled scores as reported in either the 2012 or the 2013 Technical reports. I would post the graphs here if I were able, but I can’t.

            Anyway, you continue to hold the belief that if the right pixie dust is sprinkled, everything will be alright. Yes, I get it, that’s what you spent your life working on and therefore makes you an expert. However, examining the scores without having such preconceived bias raises many questions on their validity. For example, the test scores show an achievement gap that hasn’t budged for as long as those tests have been given; hence, it is reasonable to conclude that they are flawed. Doesn’t the historical stability of the achievement gap give you pause? It does to me because the alternative is too horrible to contemplate and has, in fact, been discredited in countries where the population is ethnically homogeneous.

    • CarolineSF 1 year ago1 year ago

      The claim that SAT scores have been declining for decades needs to be examined. The SAT is a voluntary test intended for college applicants. In past times, the culture (to give a simplified but accurate version) was that only the elite attended college -- children of privilege, white males. That meant only a rarefied, privileged elite took the SAT, and given the fact that test scores correlate closely with income, those were the students likely … Read More

      The claim that SAT scores have been declining for decades needs to be examined.

      The SAT is a voluntary test intended for college applicants. In past times, the culture (to give a simplified but accurate version) was that only the elite attended college — children of privilege, white males. That meant only a rarefied, privileged elite took the SAT, and given the fact that test scores correlate closely with income, those were the students likely to score well. In recent decades, the culture has changed, first to encourage academically inclined non-wealthy students to aim for college, then to include demographics other than white males; more recently to promote the view that ALL students should/must be college-bound.

      Along the way, the pool of students taking the SAT has grown enormously. That means the overall scores are inherently going to slide — this demonstrates a principle called Simpson’s Paradox.

      There was a journalistic kerfuffle some years ago. A Washington Post columnist not versed in education did a slapdash column about how today’s students are stupider than their predecessors, based on the slide in overall SAT scores. An academic named Gerald Bracey wrote a rebuttal explaining Simpson’s Paradox that the Post published, clarifying the columnist’s erroneous reasoning. After that, the late Bracey became an influential writer in the field of “how to lie with statistics,” a guru to many who challenge the predominant storyline about education “reform.”

      Regarding the SAT and the days when only the privileged were expected to attend college: Ironically, the SAT was developed to change that practice. It was intended to promote meritocracy, though the original image of the person who would be lifted by it was a smart Midwestern farm boy — still male, still white. Nicholas Lemann’s “The Big Test,” giving the detailed history, is an interesting read.

  3. Chris Reed 1 year ago1 year ago

    Kind of amazing that a piece this long could not mention once that the hostility to testing in California is fueled to a significant degree by teachers unions’ opposition. The powerful CTA and CFT know that if testing is minimized, it will be more difficult to identify and fire poor teachers.

    Replies

    • Gary Ravani 1 year ago1 year ago

      There is an amazing amount of information available on why student test scores do not, and cannot, be used to evaluate teachers' performance. You could start with the National Research Council and then move on to the American Statistical Association. Yes, teachers' unions are labor organizations that represent teachers in collective bargaining where teachers' working conditions and students' learning conditions are negotiated. However, they are also professional organizations providing professional development for their membership, and … Read More

      There is an amazing amount of information available on why student test scores do not, and cannot, be used to evaluate teachers’ performance. You could start with the National Research Council and then move on to the American Statistical Association.

      Yes, teachers’ unions are labor organizations that represent teachers in collective bargaining where teachers’ working conditions and students’ learning conditions are negotiated. However, they are also professional organizations providing professional development for their membership, and the unions keep in close contact with policy makers and researchers about what is, and is not, best practice. As a 35-year classroom educator I always advised my students to do their homework prior to stating opinions. Words to live by.

    • Louis Freedberg 1 year ago1 year ago

      With respect, this comment makes no reference to the analyses by the National Research Council and other distinguished statisticians from UC Berkeley, Stanford and elsewhere that point out the flaws in value-added methodology, and the hazards of linking test scores to teacher evaluations. Arguing that this is mainly a plot by teachers unions to avoid getting teachers fired vastly oversimplifies this issue.

      • Tom 1 year ago1 year ago

        There clearly has been documented opposition to standardized testing by the CTA, or do you guys want to argue that point? Is the public supposed to believe that this opposition is meant to protect the kids from the evils of testing? I find that really hard to believe.

        • Manuel 1 year ago1 year ago

          So what if there is opposition by CTA et al? Given that the use of standardized test scores is not appropriate to determine the effectiveness of teachers by the reasons given by Mr. Freedberg, CTA would be derelict on its duty if it did not oppose this use. It is my opinion that people who support such use are simply looking for a way to undermine the position of teachers and eventually reduce them to at-will … Read More

          So what if there is opposition by CTA et al?

          Given that the use of standardized test scores is not appropriate to determine the effectiveness of teachers by the reasons given by Mr. Freedberg, CTA would be derelict on its duty if it did not oppose this use.

          It is my opinion that people who support such use are simply looking for a way to undermine the position of teachers and eventually reduce them to at-will laborers. Thus, this is an adult-centered political position, not one “for the children.”

        • Gary Ravani 1 year ago1 year ago

          Tom: You should read the entire post. As I stated, teachers' unions have a dual role as bargaining representative and as professional organization. As such, the unions keep in close contact with real researchers, academics, and policy experts. As Mr. Freedberg and I both noted, the most legitimate national experts and researchers assert, with considerable evidence to back them, that using student test scores to evaluate teachers is wildly unreliable (statistically) and counterproductive in terms of … Read More

          Tom:

          You should read the entire post. As I stated, teachers’ unions have a dual role as bargaining representative and as professional organization. As such, the unions keep in close contact with real researchers, academics, and policy experts. As Mr. Freedberg and I both noted, the most legitimate national experts and researchers assert, with considerable evidence to back them, that using student test scores to evaluate teachers is wildly unreliable (statistically) and counterproductive in terms of learning. Of course, CTA, acting as a professional group would take a position in opposition to debunked practices that undermine learning. And, as a bargaining representative, it only makes sense that they don’t want members evaluated based on wildly inaccurate data. What else makes sense? Facts can’t always be looked at as having a liberal bias.

  4. Monty Neill 1 year ago1 year ago

    Good steps for CA. Unfortunately, the feds keep insisting on testing in every grade 3-8 instead of just once in elementary and middle school, so CA has to do something. However, probably new federal law will allow states to design new assessment systems. Even if one thinks SBAC is a better standardized test, Linda Darling-Hammond is among those who know it is a very small step forward - a 2 on a 1-5 scale is … Read More

    Good steps for CA. Unfortunately, the feds keep insisting on testing in every grade 3-8 instead of just once in elementary and middle school, so CA has to do something. However, probably new federal law will allow states to design new assessment systems. Even if one thinks SBAC is a better standardized test, Linda Darling-Hammond is among those who know it is a very small step forward – a 2 on a 1-5 scale is how she described it. So CA should start down the path of designing a new system, which can start with pilot districts. Such a system could include SBAC in three grades, with the rest being a mix of some state and most local, school and classroom-based performance tasks designed and controlled by teachers. NH has won a waiver from the feds to start building a system like that and it is now in 8 districts. The best example is the NY Performance Standards Consortium in which students work with teachers to design their HS graduation tasks in 4 subjects, which are scored using guides designed by teachers across the 38 public high schools that participate. http://performanceassessment.org Portfolios are also a valuable option in the mix.

    Replies

    • FloydThursby1941 1 year ago1 year ago

      If you only test once per school, you won’t be able to find out if some teachers are significantly more effective than others, study the effects of how many personal days they take, etc. You won’t have a tranformative impact on the profession. The union will see this as going back to how it was before, grade inflation, social promotion, blame poverty, no teacher is different from any other they claim, etc.