Opinion > Commentary

Bill would prevent double testing and double frustration for students, teachers


Randolph Ward

Randolph Ward

Of all the bills sitting on the desk of Gov. Jerry Brown, perhaps none is more important to the future of education in California than Assembly Bill 484. Sponsored by State Superintendent of Public Instruction Tom Torlakson and authored by Assemblywoman Susan Bonilla, D-Concord, AB 484 would end the standardized tests that have been in place since 1999 and move California forward in implementing tests based on the new Common Core State Standards.

The new standards represent the first major academic overhaul since 1997, and along with clear new goals for student learning come new assessments.  Students will take the new tests using computers (or iPads or similar devices), which means our local school districts need to figure out how to get those devices and the supporting technology such as Internet access into the hands of San Diego County’s 500,000 students. That’s why districts in the county have been hard at work preparing for the new standards and tests. Many San Diego County districts participated in earlier “pilots” of the new tests.

Before AB 484 removed the need for our students to endure double testing, the vision had been that the state would roll out field tests on the new standards in the spring of 2014 while maintaining the old paper-and-pencil examinations, called the Standardized Testing and Reporting (STAR) tests. Thankfully, the Legislature overwhelmingly supported AB 484, allowing educators to focus on implementing the new standards and students on learning skills they need for career and college.

So what is the controversy? Some, including United States Secretary of Education Arne Duncan, are opposed to AB 484. Critics say suspending STAR tests will diminish schools’ accountability to parents and the community. The truth is, our current accountability system under the federal No Child Left Behind law results in the majority of schools being labeled “failing,” regardless of how well they are doing in preparing students for college, career and beyond.

Is a multiple-choice test really holding anyone accountable, and if it is, accountable for what? How many of us use those skills in our daily work? In switching to the new assessments, we will finally have data that is connected to what students need to be successful in the real world. Field tests will offer a very strong “reality check” to our school districts, showing where the gaps are in instruction. AB 484 allows a year to transition, to do what it takes in our classroom and beyond.

Let’s be clear about what will happen with that transition. The Common Core standards will change teaching and learning, because they change the expectations of what skills and knowledge students should be able to demonstrate and apply. In English language arts, for example, students will be expected to read as much nonfiction as fiction and to increase academic vocabulary. In math, students will learn more about fewer, key topics so they can think fast and solve real-world problems. In all subjects, students will increase critical and creative thinking, communication and collaboration – skills the old tests simply do not measure.

Under the new state budget, California schools have received $1.25 billion to support implementation of Common Core. These funds could easily be tapped by school districts to expand pilot tests to include students at all levels, thus ensuring these “tests of the test” really work. That’s why AB 484 enjoys support from groups as diverse as the Los Angeles Chamber of Commerce, the California Teachers Association and the California County Superintendents Educational Services Association.

I believe that maintaining two testing systems, each of which requires very different instructional techniques and looks at very different skills, will result in a schizophrenic environment in our schools that isn’t in anyone’s best interest. Suspending the STAR tests and expanding field testing of new assessments allows time for educational leaders to prepare students for a completely different manner of demonstrating knowledge while ensuring districts have the means to provide accountability to the public.

Our students deserve our efforts to be fully focused in one direction, and that direction must be Common Core. AB 484 will allow us to prepare our students and staff for success rather than set them up for frustration.

•••

Dr. Randolph Ward has led the San Diego County Office of Education since 2006, focusing on technology for everyone and world languages. Before that, he was the state-appointed administrator of the Oakland Unified School District from 2003 to 2006 and Compton Unified School District from 1996 to 2003. He began his education career in 1979 as a preschool teacher.

Filed under: Commentary, Common Core, Curriculum, Federal Education Policy, Legislation, State Education Policy, Testing and Accountability

Tags:

Comments

EdSource encourages a robust debate on education issues and welcomes comments from our readers. The level of thoughtfulness of our community of readers is rare among online news sites. To preserve a civil dialogue, writers should avoid personal, gratuitous attacks and invective. Comments should be relevant to the subject of the article responded to. EdSource retains the right not to publish inappropriate and non-germaine comments. EdSource encourages commenters to use their real names. Commenters who do decide to use a pseudonym should use it consistently.

Leave a Comment

Your email address will not be published. Required fields are marked *

 characters available

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

40 Responses to “Bill would prevent double testing and double frustration for students, teachers”

  1. Doug McRae said

    on September 23, 2013 at 9:20 am

    That’s a very enthusiastic support position for AB 484 and against “double testing.” Unfortunately, it’s long on enthusiasm and very short on facts.

    The commentary skips over the technology challenge of taking statewide tests on computer — aside from simply installing hardware and bandwidth in our schools, we also have to train teachers to learn how to use that hardware so they can familarize students with the particulars of academic computing that will be necessary for kids to take computerized tests and get valid/reliable scores not confounded by lack of computing facility. A reasonable look at current CA technology readiness indicates less than half of CA schools will have that readiness in 2014, and it will likely be 2018 before a substantial majority of schools have technology readiness to support statewide testing.

    The commentary also justifies AB 484 by saying “districts in SD Co have been hard at work preparing for new standards and tests.” That may be true in SD Co and many other CA schools, especially since summer 2012 [we were too mired in fiscal woes prior to that to concentrate on common core challenges, for the most part], but on a state level the instructional supports for common core instruction at the local level [that is, curriculum frameworks and approved instructional materials] won’t be finalized by the IQC and SBE until spring 2014 for math and fall 2015 for E/LA math. Local schools will then have to adopt aligned textbooks and train their teachers on the new curriculum frameworks and the specific instructional materials before they can claim they have instruction in place for the common core. Common Core tests in 2014 and 2015 have no chance of being instructionally valid/reliable given the statewide instrutional timetable. By the way, the commentary says many SD Co districts participated in “pilots” last year; I wonder what the percentage of SD Co K-12 students actually participated in those pilots . . . . I’d guess quite a bit less than 10 percent.

    The commentary assumes that a full common core test will be ready for administration by spring 2015, which AB 484 also assumes. A nasty little largely unknown fact is that the consortium to which CA belongs for common core test development is running at least a year behind schedule and likely won’t have a full computer-adaptive test available for operational use in 2015 — it appears that spring 2016 will be the earliest available date for full operational consortium common core tests. Maybe that’s why AB 484 has loopholes for the state board to delay initiation of common core tests indefinitely.

    Finally, the commentary says the two testing systems (the old STAR and new proposed Smarter Balanced computer-adaptive tests) look at two very different sets of skills. The facts are CA old 1997 standards and the new common core standards overlap quite heavily and most expects say CA’s 1997 standards were roughly as rigorous as the new common core. Tests reflect standards, and I’d suggest that 60-80 percent of the items on STAR also reflect new common core standards. Why can’t CA develop a shortened common core aligned version of current STAR tests, like Massachusetts has done with their “old” MCAS statewide testing system, and use a shortened common core version of STAR for the necessary four or so years needed to transition to new common core computerized tests. This idea has been floated, but rejected by advocates for Smarter Balanced tests by 2015 even though CA won’t be ready for new common core computerized tests by 2015.

    The result of unchecked enthusiam for AB 484 will likely be a train wreck for CA’s statewide assessment system. And, in the meantime, we won’t have any data to report to the public on statewide academic achievement status or progress, and any judgments on CA’s progress for implementing the common core will be left to unfounded opinion and rhetoric. I guess that’s politics as usual.

    • navigio replied

      on September 23, 2013 at 10:27 am

      I think districts still have to report demographic information. We can just use that to measure the ‘quality’ of schools in the meantime..

    • Manuel replied

      on September 23, 2013 at 11:10 am

      FWIW, I agree with Doug with one exception: I don’t believe we need testing conducted by the state for accountability.

      This “accountability” is part of a Grand Kabuki play that the general public does not pay attention to, if one is to believe a recent poll that got a lot of press recently. Here are the results without the spin:

      http://mfour.com/wp-content/uploads/2013/09/2013050517-PACEUSCRossierSchoolofEducationPoll.2013.FINAL_.TOPLINE.pdf

      The poll was given the requisite spin (“A decade after ‘No Child Left Behind,’ Californians remain strongly supportive of standardized testing,”) but what really matters is that 42.3% of respondents judge schools by the quality of their teachers and staff (question 47) and only 15.3% believe the API and the AYP are school quality indicators!! Adding the number who believe on school reputation and school’s emphasis on core programs, the total percent of respondents who don’t believe on standardized testing goes to 66.6%. Of course, this is not useful to advocates of testing so they fall back on saying that the public supports testing. Of course they support testing in the school, they just don’t believe on the state’s tests as indicators of school quality.

      BTW, 20.6% believe that student performance on standardized tests should define a teacher as bad or good (question 48). Isn’t that something?

      • navigio replied

        on September 23, 2013 at 12:39 pm

        But among parents with school children, that 15% jumps to 23%. And ironically, the percentage of parents with school children who felt the quality of teachers and staff was most important was lower than that of the general public. I also think some of the questions (eg 48) were designed to intentionally lower the response rates for specific factors. :-)

        • Doug McRae replied

          on September 23, 2013 at 1:52 pm

          The support for accountability testing in our schools is rooted far beyond the public-at-large support from the recent PACE public opinion poll here in CA; over time public support typically runs from 2/3 to over 4/5 in favor of annual tests to measure achievement status and progress in long standing polls such as the Phi Delta Kappan polls. It also includes specific support from the business community and high level policymakers over time. For example, the Title I testing requirement that AB 484 challenges goes back to the mid-1960’s when Title I was first initiated as a LBJ Great Society program — it’s always been that if a state or district or school wants Title I dollars, in return they test their kids and submit accountability reports to the feds.

          But, I’d also challenge the message that the headline for this commentary communicates — double testing, double frustration suggests 100 percent more testing than usual. When one looks at the facts, the Smarter Balanced scientific sample asks for a 20 percent sample of schools each testing one of two content areas, or a 10 percent amount of testing over and above the 100 percent if one assumes a full STAR testing schedule. Ten percent additional isn’t double testing . . . its 110 percent, not 200 percent. And if SB asks for only one random grade level from each participating school rather than all kids in eligible grades, as they did for the pilot testing last spring, then we are down to around 3 percent more testing. Then look at the potential for reducing the past STAR testing schedule by getting rid of the gross overtesting at the high school level (unneeded duplicative CAHSEE test administrations are a primary culprit here) as well as cutting down STAR tests to only items that are aligned to the common core, and we can generate a 20 to 40 percent reduction of testing time as well as significant dollar savings. My point is, with a few rational design decisions, CA can conduct a logical phase-in introduction of common core assessments over a reasonable period of time, with reduced testing time and test administration dollar savings each year, yet continue to generate data for federal Title I requirements and data for continuity of CA’s API calculations. The time and money savings from a slimmed down STAR program can be used to support additional (i.e., beyond SB test development sampling requirements) local district experiences with new common core computerized tests, experiences that are very valuable to ramp up for computerized tests as well as ramp up common core instructional efforts, experiences I would hope the state would support from the time and money savings from a rational phased-in approach to a new common core testing program.

        • Manuel replied

          on September 23, 2013 at 3:15 pm

          Even with that 8% increase, 77 out of 100 parents don’t think the tests mean much.Shouldn’t that give our politicians pause? Does this mean that all this Rheeformer/Status Quo fight has created its own reality? Or is this taking place in a parallel universe?

          What is even funnier is that this poll was run by people that seem to have an ax to grind. Just read their statements in this article:

          http://universityofsoutherncalifornia.createsend1.com/t/ViewEmail/j/D26AE92173E11962

          Heck, just read the headline! Nothing in there says anything about what I am pointing out.

          Spin, spin, spin…

      • Chris Stampolis replied

        on September 23, 2013 at 4:13 pm

        Manuel, your conclusions are not academically factual since you did not go deeply enough into the data. The question you cited explicitly requests what is “‘most’ influential in helping you decide if a school is ‘good’ or ‘bad.'” The question does not state that “only 15.3% believe API and AYP are school quality indicators.” It cites that 15.3% believe those criteria are the most influential. (Additionally, a lot more school-age parents cite API and AYP as their top indicator.)

        Two other questions from this poll show significant support for non-teacher oversight – including by teacher respondents:

        “40. In order to measure student achievement in K‐12 education, policy makers must set standards that help decide if a student has met expectations for a particular subject or grade level. Who do you think should be most responsible for setting these education standards?” The top answer is Local School Boards at 34.0%, followed by state government at 25.0%, then by individual classroom teachers at 23.2%. This means 76.8% of the voting population want decision makers outside the classroom to set the standards.

        Further in question 49, a solid majority of voters surveyed believe elected representatives should make decisions about the school’s future.

        “49. Policy makers must decide whether a school is doing a good job or a bad job educating students. If the school is doing a bad job; they need to decide how to improve the school’s performance, or possibly close the school. Who do you think should be most responsible for deciding whether a school is doing a good job or a bad job educating its students?” Local school board: 39.6% – Teachers: 11.1%

        And, Manuel, question 54 shows that only 13.4% believe that “Standardized testing should not be used at all.” And, in the crosstabs less than 25% of teachers themselves chose that option. (BTW, the crosstabs provide some fascinating insight since the answers of teacher respondents are disaggregated for comparison.)

        No one should worship standardized test results. But they have value. And, the crosstabs data point that shows half of teachers look to parents as primary influencers of academic success merits discussion.

        – Chris Stampolis

        • navigio replied

          on September 23, 2013 at 7:46 pm

          Test results clearly show us that academic achievment is directly correlated with parent education. In addition, children spend about 5 years with their parents (their first teachers) before they ever set foot in school. Research also tells us the achievement gap exists before starting school. In another question teachers chose increased parental involvement over all other options as the single thing that would improve schools the most.
          That all seems consistent. Why do you think the response merits discussion?

        • Manuel replied

          on September 25, 2013 at 10:14 pm

          “Academically factual?” What in the world is that, Chris?

          Anyway, question 47 offers the responder many choices. It is plain that only 15.3% of the total think API/AYP is the most influential. You can put lipstick on this pig all you want, but it still clear that more members of the public as represented by this sample think that quality of teachers and staff is most influential. Since the quality of said teachers and staff is still not measured by API/AYP, then the public is more confident on that nebulous quality (I think it is like porno, they know it when they see it) than on a number calculated by some spreadsheet based on the scores of a test that is meaningless to students.

          It is not surprising that more members of the public think that it is the responsibility of the local school board to set standards for schools. That’s why they elected them for. And that is why some board members are asked to resign if the public perceives them as not doing their job.

          This is of course reflected also in question 49. The public elects people to the Boards. The public expects them to supervise the Superintendent and if the Superintendent can get schools to perform at their best, then the Board is responsible for finding a Superintendent that can. And if the Board can’t do that minimal job, then the Board will receive a no confidence vote by the voters and possibly be replaced at the next election. I hope that this will happen at LAUSD given the recent “iPad for all” debacle. I would not be surprised if that doesn’t happen elsewhere as we move forward in implementing Common Core.

          As for question 54, if parents knew that the standardized test is a zero sum game, that is, some of their precious children will never be above average in these tests regardless of their classroom mark, they would have a cow. If anything, the responses show they have no idea what is in a standardized test. Do you, Chris? Have you ever looked at how the scores shake down across the state? Did you do what I suggested you should do with the Algebra scores in your district? Why don’t you? You might have to break a few eggs to make that omelet, but I guarantee you it will be an interesting culinary experience.

  2. el said

    on September 23, 2013 at 11:22 am

    Speaking of teaching kids to use the computers… do we have anywhere in the standard curriculum – common core or not – a place where we teach touch-typing? When it comes to essay responses or other short answers, the difference between hunt-and-peck and touch typing is going to dramatically change the effort involved in answering them well and error-free.

    • Manuel replied

      on September 23, 2013 at 3:20 pm

      Touch-typing? That used to be taught back in the day to all the girls in secretarial courses while the boys were sent to shop.

      “Put your hands on the keyboard so that your index fingers are at the “home” position! That’s right, ladies, your left index on the “F” and your right index on the “J”! Let’s begins, “The quick brown fox…””

      That and learning Pitman or Gregg shorthand has gone into the heap of history…

      • navigio replied

        on September 23, 2013 at 3:23 pm

        I cant tell you how many times I wish I knew shorthand..

        I took typing and shop..

        • Manuel replied

          on September 23, 2013 at 3:29 pm

          How come you did not take Home Ec?

          • navigio replied

            on September 23, 2013 at 3:44 pm

            that too.. :-)

  3. Chris Stampolis said

    on September 23, 2013 at 3:13 pm

    Some of Dr. Ward’s comments are underresearched, especially with regard to his description of the content of the SBAC tests that will replace the STAR tests. Dr. Ward’s own employing County Board of Education members should ask him during public session which sample SBAC exams he has reviewed online at the CDE website to ensure that he is speaking from factual knowledge. Every member of the SDCBOE and Dr. Ward’s subordinate staff members also should be aware that those sample SBAC exams now uploaded primarily present multiple choice questions – the same type of testing Dr. Ward criticizes.

    Ward asks “Is a multiple-choice test really holding anyone accountable, and if it is, accountable for what?” He then writes “In switching to the new assessments, we will finally have data that is connected to what students need to be successful in the real world.”

    SAT exams, GRE exams, MCAT exams, LSAT exams all are primarily multiple choice, as are the thousands of University-professor-administered Scantron quizzes and exams that degree-seeking students face each year to earn points towards grades in their credit-bearing courses.

    I suggest three analyses immediately should be requested and agendized by San Diego County Board of Education members:

    1) demand a full report regarding how many multiple choice tests in the past two years were administered as classroom assessments by County-office-employed teachers who are within Ward’s chain of command;

    2) ask Dr. Ward to explain in open session what “data” he believes will be surfaced by the “new assessments” he references, especially in the context of what STAR evaluated and what SBAC theoretically will evaluate;

    3) hold Dr. Ward accountable to explain in public during an agendized Board item why he believes the SBAC computerized set of multiple choice and fill-in-the-blank questions is a “completely different manner of demonstrating knowledge” than the current system. “Completely different” is a broad descriptor chosen by Dr. Ward that merits public accountability.

    Ward suggests “In all subjects, students will increase critical and creative thinking, communication and collaboration – skills the old tests simply do not measure.” Let’s hope Dr. Ward’s employing Board members will ask data-rooted questions and hold Dr. Ward accountable to define what he believes has not been measured by the STAR tests and what he expects will be measured in specific by each individual student’s SBAC tests.

    The new SBAC tests may be fine – even wonderful. But California does not benefit by pausing the current STAR evaluation tools during the transition to SBAC. Dr. Ward’s arguments in favor of suspending California testing needlessly disparage STAR results. No testing for several years will make it more difficult for education leaders to track the achievement gap on many levels.

    – Chris Stampolis
    Governing Board Member, Santa Clara Unified School District
    Member, Democratic National Commmittee
    408-771-6858 / stampolis@aol.com

  4. David B. Cohen said

    on September 25, 2013 at 12:32 am

    Thank you, Dr. Ward, for writing this. For those who want to see the STAR/CST tests administered, I would keep coming back to the key issue: those scores won’t be valid. They won’t be valid. You many want them, but they will not be valid. It’s not productive to argue “they’re kind of valid” or “they’re probably pretty close.” See, the original idea of testing to get some potentially useful data about schools and systems wasn’t too bad, but once we start attaching high stakes to those single tests, we’ve already corrupted the measure. If you want to use tests for school rankings, funding, governance decisions, parent trigger eligibility, NCLB sanctions, and even teacher evaluations, AND you want teachers to teach with one set of standards in the classroom and put their jobs and schools on the line with a test based on other standards, then you simply do not understand the stressful effects put into the classroom and school as a result. It’s like telling doctors we’re going to change the diagnostic criteria and the treatment protocols for a certain condition, but not at the same time, so for now, use the new treatments but the old diagnoses. Yes, it would kind of probably work, but you couldn’t, during that phase, draw any clear conclusions about either the old or the new.

    • navigio replied

      on September 25, 2013 at 7:37 am

      But Superintendent Deasy said that the very reason that they had such huge jumps in 6th and 9th grade CST results this year was specifically because they’ve chosen to move forward with the standards there (while clearly using the existing tests). In fact, ETS said the same thing at a state level. Are you saying he and ETS are wrong?

    • Doug McRae replied

      on September 25, 2013 at 8:41 am

      David: I’d dispute your claim on the key issue — I say the STAR/CST scores will be valid. Repeat for emphasis: they WILL be valid. Why? Because our 1997 content standards (for which the STAR CSTs were built) and the new common core content standards heavily overlap, so tests built for th old standards will be substantially valid/reliable for the new standards depending on the degree of overlap for each grade level and content area. If CA follows the Massachusetts strategy of constructing short-form common core only versions of STAR CSTs, then those tests would not only be entirely valid measures of the common core but also reliable (tho less than the full form versions for STAR, due to their reduced length). I do not suggest that short-form common core versions of STAR CSTs should be a long-term solution for new tests to measure the common core, but rather that they can be useful for a judicious transition period to allow for continued statewide and local district/school data on achievement and to allow for a ramp-up of both common core instruction and computerized test administration practices during a reasonable multi-year transition period.

      I’d also dispute your claim that the original idea of STAR tests “wasn’t too bad, but once we start attaching high stakes to those single tests, we’ve already corrupted the measure.” High stakes do not corrupt the measure, rather it is unfortunate teaching-to-the-test instructional strategies that corrupt both good instruction and good assessment. And despite the promises for open ended and performance task items in new common core tests, those items will also be suseptible to the evils of teaching-to-the-test — in all my years of designing and developing tests, I’ve never seen an item format that cannot be compromised by enterprising students (and/or teachers). In fact, as I’ve noted before, AB 484 even has a section that enables teaching-to-the-test behavior by schools — Sec 60642.6 mandates statewide purchase of consortium interim tests and the SBAC interim testing package includes practice tests that mirror SBAC’s secure summative tests. At the same time, AB 484 prohibits use of practice tests for a teaching-to-the-test program by local schools. Talk about mixed messages about good instructiona and good assessment practices . . . . . that’s the ultimate for inconsistency within a single bill.

      • navigio replied

        on September 25, 2013 at 9:45 am

        Test results are not learning, yet we still use them as a proxy for that. Similarly, high stakes may not corrupt directly, but they provide incentive for corrupting behavior. I don’t see how we can realistically disregard the impact of that incentive on behavior.

      • el replied

        on September 25, 2013 at 10:03 am

        Part of the problem is that they are becoming high stakes for teachers and schools, but they are zero stakes for the students. There are some straightforward solutions for this. Some high schools in my county elected to give a grade bump for any students scoring proficient or advanced – ie, if you scored proficient in Algebra, and you were getting a B based on homework and class tests, your grade would be bumped to A. However, there was no penalty for doing poorly. This gave students some incentive to actually study for the exams and to have some interest and even excitement about taking them, rather than viewing them as a waste of their time and attention.

        The tests do not, and never have measured, what is taught. They measure what students choose to respond.

        If I were to evaluate a plumbing job on a 1 mile length of pipe, I can measure water in and water out. If all the water comes out at the end, I know that everyone who worked on that pipe did a good job. If I test a year later and all the water doesn’t come out, I don’t know who failed or what the failure was, or even if it was the fault of any plumber: it could be that someone pierced the pipe with the shovel after the pipeline was installed. Using this data to fire the plumber who installed the last pipe junction would be unfair and inappropriate.

        • Doug McRae replied

          on September 25, 2013 at 10:45 am

          Navigio: I agree that high stakes tests provide incentive for corrupting behavior . . . but I chalk that up to the human condition where lots of things in life provide incentives for corrupting behavior [paying taxes, speed limits, you name it]. We do a pretty good job not disregarding the impact of those incentives for some corrupting behaviors, such as outright cheating, but do less well with more subtle corrupting behaviors such as shortcuts for good quality instructional practices, like for instance teaching-to-the-test behaviors.

          EL: Again, I agree. We’ve done a poor job finding ways to motivate students to take statewide tests seriously. One very good possibility for motivation would be to use STAR tests as an “early qualification” for the CAHSEE high school graduation requirement. When a kid scores proficient on the Algebra end-of-course test in middle school or grade 9, why do we insist on wasting time and money giving that kid the CAHSEE math test? That student has already demonstrated the achievement required for HS graduation. Yet, unnneeded duplicative CAHSEE test administrations have clogged our HS statewide assessment schedules for years, with no attention from CDE or SBE leaders. We can certainly use more common sense in the overall design of our statewide testing system, and working on the student motivation factor should be one of the common sense factors to address. I also agree that tests cannot simply measure what is taught, but only what is learned. And trying to attribute HOW it was learned is way beyond any test’s pay grade. But, I will say one of the characteristics of a good teacher is finding ways to connect with students and encourage students to learn . . . certainly the best teachers I’ve interacted with, both when I was in school in the dark ages and when my kids were in school, have been teachers who connect with kids and motivate students to be enthusiastic learners.

          • navigio replied

            on September 25, 2013 at 12:30 pm

            Im with you on the human condition thing, but that doesnt mean we ignore the human condition when setting policy. Furthermore, it doesnt mean we intentionally create proxies that can be so separated from their referent. There are probably situations where a teacher can decide whether to act in such a way that improves tests scores, or alternatively to act in such a way that reaches the most students in the most meaningful way possible, but at the expense of test scores. To the extent stakes are placed on test scores, the former will tend to be chosen, especially in our current environment of under-resourced, large classrooms and schools ‘in competition’ with each other and charter policy.

            Also, regarding CAHSEE, one reason I asked about the correlation with CST results for that test was to hint that maybe using CST results as a replacement for CAHSEE would be appropriate. I havent yet looked at the report you cited, but if the correlation is strong, it seems odd not to do that. I expect the original argument for the CAHSEE addressed that issue though..

          • el replied

            on September 25, 2013 at 2:13 pm

            That difference of skin in the game may explain why kids who pass the CAHSEE effortlessly are getting low scores on the STAR exams, rather than, as we were speculating in a different article, that the CAHSEE is easier than expected/intended.

          • Doug McRae replied

            on September 25, 2013 at 4:06 pm

            Yup, it’s hard to underestimate the importance of student motivation for any large scale testing program. It’s also hard to underestimate the power of a choice between an easier test and and harder test within a testing system — for instance, the easier CMAs vs the harder CSTs for Spec Educ students in recent years, the easier grade 8 common core vs a harder full Algebra I for middle school math students in the future (provided we have full Algebra I tests at all in the future), easier paper/pencil vs harder computer-adaptive tests if and when that choice is put before schools and districts. It doesn’t take an experienced psychometrician to predict that folks will choose the easier tests unless motivation and reward conditions are adjusted to spur taking the harder tests. But, these factors (important as they are) are small in the overall context of AB 484 where we are choosing between having no reportable data at all for at least two years if not very possibly longer vs reportable but imperfect data for a reasonable transition period for initiating common core computerized tests. That’s the basic choice that 484 presents. My choice is reportable but imperfect data for a logical multi-year transition period, with of course adequate attention paid to student motivation factors as well as the inevitable easy/hard situations.

          • el replied

            on September 25, 2013 at 6:00 pm

            Let me ask this, Doug:

            What actions or policies would you have contingent on 2014 test results?

            That is, if the data exists, how will or should that change what curriculum is chosen, where money is spent, what policies are followed, what should be done?

            If you saw a school go down, what would you say to that school?
            If you saw a school go up, what would you say to that school?
            If you see a school that’s low and staying the same, what would you say?

            Regardless of what numbers are reported, given that we’re already mid-transition, I can’t see any reason they’re useful. We’ll be half changed and we’ll be continuing forward with that change.

            If the numbers tank, no one is going to say, “Oh, Common Core is a total failure. Let’s go back to the old stuff.”

          • Manuel replied

            on September 25, 2013 at 10:23 pm

            el, I think that the numbers tanked in New York. Badly.

            I haven’t heard that New York is going back to the old tests. But it might be too early to tell. Maybe if they tank again next year the pitchforks and torches might be brought out…

          • navigio replied

            on September 26, 2013 at 8:09 am

            if there was anything ‘positive’ about the ny numbers it was that charter schools did similarly bad (and in many cases worse) than traditional schools (there were some exceptions). to the extent people believe common core and the associated tests are somehow being ‘more honest’ i’d like to see a re-evaluation of charter policy based on those results. not that we care about kids or anything..

  5. Doug McRae said

    on September 26, 2013 at 9:13 am

    Responding to EL @ 6 pm 9/25 [we've run out of reply buttons on this strand]: I’m more concerned with 484 longer term issues (2015 to 2020) than I am with 2014 concerns, so I’ll talk about those first and then try to answer your Q’s re 2014. Longer term, my view is that CA won’t be able to generate valid/reliable scores from the proposed SBAC computerized tests until 2018. That breaks down to 2016 for instructionally valid scores for math and 2017 for instructionally valid scores for E/LA, following the statutory timelines for common core curriculum frameworks and instructional materials for the work of Honig’s IQC and approvals by SBE. In addition, my reading of the available technology-ready information is that less than half of CA schools will be technology-ready for SBAC testing this coming spring, maybe about half by spring 2015, and it will be 2018 before the substantial majority are technology-ready, including the human capacity to use the necessary hardware and bandwidth. Finally, SBAC itself likely won’t have a fully operational computer-adaptive test with interpretable scores by spring 2015, so 2016 is the earliest available date from that perspective. As a result of this information, my concern is what do we do for 2015 thru 2017 to both phase-in computerized testing to get our schools ready for common core computerized tests by 2018, and that’s why I’ve argued for a slimmed down set of STAR paper/pencil CSTs that are targeted to measure only common core aligned standards, along with other efficientcies especially for the HS grades by using selected STAR end-of-course CSTs (slimmed down versions) for both CAHSEE HS graduation purposes and federal reporting requirements during the transition period, while simultaneously increasing our exposure to the coming common core computerized testing protocol that would be scheduled for full initiation spring 2018. That plan would allow us to retain sufficiently valid/reliable test data and continue with API accountability data during the 2015-17 transition period to provide statewide, district/school, and subgroup data for all of the same uses we have had over the past 10 years, thus fostering continuity of student academic achievement status and progress data during the transition period.

    For 2014, it would have been nice if this sort of transition plan was discussed over the past six months during the legislative vetting of AB 484. But, the facts are it wasn’t. It was raised, but the sponsor and author for 484 refused to consider it. Instead, we had an end-of-session power play that resulted in extensive amendments for 484 first verbally described Sept 4, the amendments made available in print on Sept 5, and then approved by the legislature on essentially straight party line votes within a week without policy or fiscal committee vetting. So, apparently, the choice now on the Gov’s desk is either to veto 484 (the result would be we would a 2014 STAR program the same STAR program that was administered in 2013) and ask the legislature to return with another plan next year for 2015 and future years, or to sign 484 with its suspension of almost all STAR testing in 2014 and rather blind anticipation that initiation of SBAC tests in 2015 will make everything hunky dory for CA’s statewide assessment system in the future. But, when one reads the fine print in 484, one sees there are loopholes that permit delays in initiation of common core computerized tests not only for 2015 but also indefinitely for 2016 thru 2020. If the latter is what happens, then CA will have an indefinite delay for statewide assessment information (and in addition for API accountability data) — with of course the issue of federal requirements alos hanging over CA’s assessment and accountability systems. Options are limited for 2014 now, due to the time crunch. I’d like to say that reduced STAR CSTs measuring only common core aligned standards are possible for 2014, but the window of opportunity for that is closing fast — normally the vendor (ETS) has to have final test forms in October to meet the logistical demands of distributing test booklets by February for early testing districts and schools (the year round campuses). So, this option is fast becomming unrealistic from a pure logistics point of view, not a policy point of view.

    On your final comment, that folks won’t be able to evaluate common core implementation via data from 2014 or 2015, my view is we have adopted common core as the academic content standards for K-12 schools in CA and we are going to have those for the next 10 years at least. None of this discussion will impact that reality. Rather, what this discussion is about is how best to implement a statewide assessment system to measure the common core content standards. I’m not against the common core; I’m not against computerized testing at all, but rather I’ve actively fer it ’cause it’s the right way to go; but I want a reasonable transition plan over several years for a common core computerized statewide assessment system that won’t crash and burn on take-off.

    • el replied

      on September 26, 2013 at 9:44 am

      Thanks for your comments, Doug. I appreciate the time you take to write here.

      And yes, we need better commenting technology! In particular, I wish we could see more than the last 5 recent comments, or some other way to make it possible to find all the ‘new’ comments. There are many excellent ones here and I hate to miss any. I feel bad about adding comments and popping better ones off that stack of 5!

  6. Doug McRae said

    on September 26, 2013 at 9:27 am

    Responding to comments above from Manuel and Navigio on NY scores: NY chose to implement “early” common core tests last spring and it chose not to invest in comparability studies between their old tests and their new CC tests. Scores went down drastically when the media and public saw the numbers from the old to the new. Much consternation. By way of contrast, KY also chose to implement an “early” common core assessment spring 2012 [per Time magazine, KY has been one of the really early implementers of the CC, they began in 2009 before the CC was even finalized]. But KY did conduct old to new comparability studies. When they released their 2012 scores, they had data to explain why the scores went down and there wasn’t the media consternation that happened in NY. In fact, the decline in scores was relatively the same for KY and NY, but KY was prepared with information to explain the change in scores. MA executed the slimmed down old test strategy, modifing their long standing MCAS tests to include only common core aligned standards, for their spring 2013 testing. When they release the MA 2013 scores a week or so ago, they showed no decline from old tests to modified tests. Why? The explanation was that MA’s old standards were relatively equal in rigor to the new CC standards. I think CA’s old 1997 standards are also relatively equal in rigor to the new CC standards — if CA uses the MA strategy for a transition to the new CC, I think CA will have about the same result as MA just had — no decline in scores for modified STAR CSTs that measure only CC aligned content standards.

    • navigio replied

      on September 26, 2013 at 11:39 am

      I tend to agree with your last sentence, though I cant tell you how disappointing that makes the fact that so many superintendents this year have blamed their drop in scores on the transition to common core, even as (I keep mentioning) our largest district is blaming its increase on common core.

      After this year, I will never again be able to believe what a district leader says about their district’s performance results. Similarly, ETS itself made a claim about the results this year which it has yet to back up, and because of the extreme (relatively) variability in this year’s results as well as their refusal to provide additional data for previous years, I can no longer even believe ‘the experts’.

      I too appreciate Doug’s comments, even if I dont always agree with them. Virtually all other statements, whether from our testing companies, our state and district education leaders, and especially our legislators are very clearly either flat out political lies, or a tacit admission that they simply have no understanding of what our test scores truly mean or what causes them to do what they do. As a result, the fact that we want to base accountability policy on them seems to the height of misguidedness.

      I used to be fairly accountability-agnostic (recognized both potential positives and negatives), but as it becomes more clear that we are not being truthful, I think the negatives far outweigh the positives. I dont necessarily mind if we go on testing because its fairly easy for me to discredit any test results with a few, well-chosen anecdotes (we have hundreds in our district alone). And I will continue to do so until our leaders do an about-face on the honesty front.

      – Švejk

      • el replied

        on September 26, 2013 at 12:23 pm

        I’ve watched scores in my local school for 7 years now, and the classes are small enough that I know all of the teachers and a significant percentage of the kids, especially in my daughter’s class. I look at them and with a few very obvious outlier exceptions, I can’t make any explanation that holds true over a long period. We have used it to ask questions of ourselves that have sometimes produced answers that had everyone agreeing that it was wise to change the program, and some others where we felt the scores did not indicate a problem with the program. This last year, we saw an uptick in the SED population, so we wondered if that meant that kids that had been in our system had had a change in fortune or if it meant that SED kids had moved in. We made a chart with incoming students and their score ranges and outgoing students with their score ranges. We saw that a roughly equal number of high achieving students had moved in and out, but that quite a few more low performing students had moved in than had moved out. To me this says that we obviously need to go looking out for those kids, and figure out how to meet their needs, but it also suggests that the problem isn’t that our program deteriorated nor is it that kids in our system for years are suddenly unable to handle a new grade.

        But the Feds can’t (or won’t) look at data with that kind of nuance. Instead, NCLB (and RTTT) prescribes particular interventions without even a decent diagnosis of the issue.

    • TheMorrigan replied

      on September 27, 2013 at 5:41 pm

      While I agree that our standards are similar to CCSS, Doug, there are hills of differences we should not ignore. KY, NY, and MA belong to PARCC. California is Smarter Balance. The MCAS has always had a short answer response section. The CA CSTs have not. While not a huge factor, test familiarity/similarity will have an effect. MA modified its test to fit some CCSS; KY and NY did not. If CA modified the CSTs with short answer, the drop will certainly be demonstrative. If CA modifies some multiple choice answers to better fit the CCSS, then I would agree with your point.

      I do not think the drop will be as huge as it was in NY, but it will be a drop.

      • Doug McRae replied

        on September 27, 2013 at 8:26 pm

        TheMorrigan: Re PARCC vs Smarter Balanced, the two summative tests are being designed to yield results on the same scale of measurement so that performance standards (basic, proficient, advanced) will be comparable, so from a results perspective which consortium test is being used shouldn’t make a difference. Yes, MCAS has some short answer response sections and so when MA modified MCAS to measure only common core aligned standards it was appropriate to include a similar amount of short answer response in their modified tests. If CA was to do a similar strategy for transition to a full common core test, it would be appropriate for CA to use the same formats for a modified test as they used for STAR which would be all MC except for writing samples. My understanding is KY and NY contracted to have entirely new tests measuring the common core, rather than modifying their old tests. From a test designer perspective for a transition test, I would not recommend CA include short answer responses — that would throw in a confounding variable that isn’t needed. So, essentially, I think we agree that a modified STAR to measure only common core aligned content should not generate a decrease in scores; of course, a small study to confirm that would be the only way to empirically check that out. I’d agree such a test would not be a full measure of the common core, for that we’d have to await for the initiation of the consortia computerized test (in 2018 from my perspective); but in line with the purpose for a transition, we need to actively support voluntary experiences with new common core computerized tests with their short answer item types, their computer-enhanced item types, and their performance tasks to build statewide familarity with those item types during the transition period. Also, during the transition period, I believe it will be possible to develop old to new comparabiity data (perhaps on a step-by-step basis) so that when a changeover is executed in 2018, we have the data to explain any drops in scores similar to the data KY had. To get that comparability data would be both good assessment policy and good assessment system practice.

  7. David B. Cohen said

    on October 1, 2013 at 12:58 am

    Quite a discussion! I’m coming in late here, but want to respond to Doug. First of all, I appreciate the legnthy and detailed replies.

    You wrote: “High stakes do not corrupt the measure, rather it is unfortunate teaching-to-the-test instructional strategies that corrupt both good instruction and good assessment.” I agree with you, though a minor change of wording allows for that agreement and still maintains my point, which has been amply proven through the NCLB years. So I should have said “High stakes inevitably lead to many poor instructional strategies and school policies – which will corrupt the measure.”

    Your faith in CST tests is also something worth exploring. As a general overview of a school or system, I think standardized tests have their place. We could easily accomplish that oversight and accountability through sampling – greatly reducing the number of tests purchased and administered, saving time and money for better purposes. A review of standardized tests by J. Cizik (2007, as cited by Bob Marzano in his book on grading and assessment), found that the subtests are instructionally useless for individual students. For a given standard they might offer just a few questions, and you can’t reach valid conclusions when there are so few questions. Furthermore, many of the skills cannot be measured by multiple choice, including many which are supposedly measured that way. For example, multiple choice questions about predictions don’t measure a student’s ability to make good predictions, but rather their test-taking skills, or ability to infer what the test designers want to hear. Questions about figuring out the meanings of words in context do not control for prior knowledge of the word, or the words in the context, or knowledge of the contextual information. Questions about bibliography formats are just stupid, as I would never teach a student to memorize the format, but rather teach them how to use guides or online auto-fill tools; of course, those questions can be answered by using a certain degree of logic, but students don’t look at the questions that way. They think, “Damn, I didn’t memorize this. Whatever. B.”

    Did you read the Atlanta newspaper articles about the low quality of tests, especially as the industry experiences consolidation? What did you think of it? Some of what you’re advocating makes sense in the abstract, but from the school practitioner standpoint, I don’t think you fully appreciate the level of stress people experience when we have mediocre tests, and too much testing, and overlapping standards, and inappropriate uses of the test results hanging over our heads, possibly including school sanctions and teachers getting undeservedly poor evaluations.

    So, the bill is flawed? Don’t expect any sympathy from teachers. Almost everything about this system and the current laws and practices is more than flawed, but I’ll stick to polite language here. And if this bill makes it easier for us to do a better job with our students, and saves us money, and clarifies the work we need to get done and how that will be measured, that’s a good thing overall.

    • Doug McRae replied

      on October 1, 2013 at 10:41 am

      David: I’ll try to respond paragraph by paragraph —

      Re “high stakes inevitably lead to poor instructional strategies,” I just don’t agree with the inevitable part. There are many examples where schools and teachers ignore the high stakes and simply pursue high quality instructional practices rather than teaching-to-the-test. I don’t agree that high stakes inevitably lead to cheating on tests either . . . . by way of analogy, folks may not like the stress of paying taxes either, but that doesn’t justify refusal to pay taxes. What is needed is good leadership to point out the significant downsides associated with trying to shortcut good instructional practices, rather than leadership that pressures high scores regardless of method used to achieve increases in scores.

      Re increased use of sampling, the problem is that with sampling (particularly matrix sampling) we lose comparability of scores for schools and districts except for large schools and districts. CA explored this option extensively in the late 90’s, and rejected it due to issues of comparability across schools. I don’t think any state has used an extensive sampling design for statewide testing since the 1990’s — the only time you see sampling used is for test development activities, not for live tests focused on results.

      Re statewide tests providing “instructually useless” information, you have hit on probably the major tension in the design of testing programs — you can design a test for instructional purposes OR you can design a test to measure the results of instruction, but no single test can accomplish both purposes simultaneously. Tests for instructional purposes should be administered during instruction, while tests to measure the results of instruction have to be admnistered after instruction is completed (or as close to that as possible given over constraints). Also, tests for instructional purposes should be available for all teachers to access and use, while tests to measure the results of instruction need to be secure in order to preserve comparability of scores within schools and across schools. These two easily understood design tensions lead to a conclusion that a statewide test has to be designed for one or the other, but cannot do both. As a result, for folks against tests to measure the results of instruction (i.e., against accountability tests) the policy argument in favor of instructional tests is an argument that promotes their anti-accountability position. This tension has been present for the entire 40+ years I’ve been in the test making business, but it’s been much higher profile for the last 20 years.

      Re the shortcomings of multiple-choice items, I fundamentally agree with you . . . . it would be great to avoid those shortcomings as much as possible and have vastly increased use of open-ended response items including those sexy (oops, attractive) performace tasks. The problem is — those item types force increased test administration time (which takes away from available instruction time) and cost a great deal more $$. That’s the statewide assessment design tension on this element.

      Re the Atlanta media material, I’ve only read synopses but in effect if one defines “good” tests in terms of instructional usefulness, then statewide tests designed to measure the results of instruction (i.e., accountability tests) will be criticized as mediocre tests and draw pushback re inappropriate use of test results. So, I don’t expect much sympathy from teachers or other school folk when I talk about what is needed for good accountability tests — the support for the data generated by good accountability tests comes from outside K-12 education insiders. But, I’d observe, tests have never been very popular with students . . . I wasn’t all that pleased with the various “important” tests I had to take when I was a student, my kids weren’t either, and now that tests are being used to track status and progress of academic achievement over time, it’s not a big surprise to me their popularity among teachers is low. Data for school accountability is valuable in the larger picture, but I wholeheartedly agree the design of statewide assessments needs to be restrained in terms of test administration time [my advice has been no more than one percent of instruction time, or roughly 9 hours per year] and in order to test a decent amount of what we expect to be learned it drives the assessment design train toward the more efficient (time and cost) multiple-choice items and away from the more instructionally useful open-ended items — this dynamic is present even for the proposed new consortium tests which will be 70-75 percent multiple-choice.

      The design tensions for a good statewide assessment system are many . . . . good systems come from an extensive amount of discussion. I’m afraid the end-of-session power play with AB 484 doesn’t meet this good statewide assessment system design criteria.

      • el replied

        on October 1, 2013 at 1:51 pm

        Doug:

        Re “high stakes inevitably lead to poor instructional strategies,” I just don’t agree with the inevitable part. There are many examples where schools and teachers ignore the high stakes and simply pursue high quality instructional practices rather than teaching-to-the-test. I don’t agree that high stakes inevitably lead to cheating on tests either . . . . by way of analogy, folks may not like the stress of paying taxes either, but that doesn’t justify refusal to pay taxes. What is needed is good leadership to point out the significant downsides associated with trying to shortcut good instructional practices, rather than leadership that pressures high scores regardless of method used to achieve increases in scores.

        When you’re in a school that is just on the bubble, where a 5 point difference in scores is going to change whether or not you are subject to NCLB sanction and potentially loss of control of your program, there is a ton of pressure to do whatever can be done to get those 5 points. No one I have ever met thinks that kids will be better off if a school is moved into Program Improvement. That is our problem! Much of this isn’t the fault of the test or even the fault of the ridiculous supposition that test scores (like housing prices and the stock market) can only go up, but the fault of horribly, terribly designed sanctions that will damage or even kill off quality programs based on what are actually tiny differences in results.

  8. TheMorrigan said

    on October 1, 2013 at 3:54 pm

    Doug,

    I do not agree with the “inevitable” part either. Assessment, in and of itself, does not lead to “poor instructional strategies.” Assessments do not lead to cheating either. However, poor policies (teacher merit pay, possible termination based on VAM, meeting AYP, and tyrannical leadership, for instance) that peripherally surround the outcomes of the assessments do lead to “poor instructional strategies.” DC and the Atlanta scandals are perfect examples of a poor mix of policies practiced in the extreme. I agree that a balance must be maintained, but as el said above, the flawed policies of NCLB/RttT has tilted the scales in a way that rather soils your point here.

  9. David B. Cohen said

    on October 2, 2013 at 8:44 am

    Doug, et. al.,

    Thanks for keeping the dialogue going. Just one clarification: when I said “inevitable,” I meant that we know it’s going to happen (probably a lot), but I didn’t mean it’s inevitable that everyone is going to do it. NCLB didn’t require schools to narrow their curriculum, but there was plenty of reason to think it would happen, and it did. (I know there are some studies suggesting curriculum didn’t narrow b/c of NCLB, but other studies suggest it did, and I’ve heard it over and over and over at state and national conferences. The same can be said of cheating. Raising the stakes on tests doesn’t mean that everyone will cheat, but history (and Campbell’s Law) suggest that cheating is inevitable.

    • Doug McRae replied

      on October 2, 2013 at 9:22 am

      Well, OK, not inevitable but not surprising that narrowing of curriculum, teaching-to-the-test, some cheating or fringe-of-cheating behavior per Campbell’s Law. And how does that change with the replacement of STAR tests by Smarter Balanced tests? It doesn’t . . . adding 30 percent or so open-ended / performance task items won’t change any of the above. What it does is use the mandatory statewide assessment system to leverage influence for a different type of instruction, the type of instruction advocated by the common core, even thought the content targets for the common core are substantially the same as the content targets for CA 1997 content standards upon which the STAR tests were built. Using a mandatory statewide assessment to leverage instructional methods is a violation of Brown’s principle of subsidiarity . . . . instruction should be locally controlled, at as low a local level as possible [i.e, teacher level, then building level, then district level]. With state controlled summative assessments leveraging local choices via high stakes (i.e., accountability) uses, and state-supplied interim/formative instructional assessments [rather than locally chosen instructional assessments to maximize coordination with locally chosen instructional materials and locally delivered teacher professional development), what happens to local control of instruction? That’s right, it is compromised . . . . Will state-controlled pacing charts be far behind? The longer-term implication of AB 484 is more state level control of instruction, using the statewide assessment system as the mechanism for the increase of state control. And with a closed system for instruction and assessment, the assessment system results will be less credible. I think we are far better off having a statewide assessment system as independent as possible from the instructional system, to monitor student achievement and progress, built to measure the content of the state-adopted content standards, rather than oriented to a particular type of instruction. That way, local districts can control their own curriculum and instruction strategies, and the statewide assessment data can fairly reflect the relative successes of those local decisions across the state.

Template last modified: